Re: ExpressionScriptCompilationException[Field [field name] used in expression does not exist in mappings];

2015-05-18 Thread Alex Schokking
It looks like the same question (but with more context/information) 
here: 
http://stackoverflow.com/questions/28986964/expressions-with-dynamicly-generated-schemas-throw-exceptions-when-some-indices
 
but doesn't have any answers yet either.

Does anyone here happen to know what the best practice way of addressing 
these indices without the mapping in question? I'd really hate to have to 
go through and hand update them all to have the mapping :(

On Friday, March 6, 2015 at 8:27:48 AM UTC-8, Alex Schokking wrote:

 Hi there,
 We're just getting started with ELK and are using:
 Elasticsearch 1.4.4
 Kibana 4.0
 on
 Ubuntu 14.04

 We needed to create a scripted field to calculate the ratio between two 
 numeric fields. These fields are not on all events and only started 
 appearing at all a day ago (so older indexes don't have it at all). 
 name: ads_per_page
 script: doc['ads_found'].value / max(1, doc['pages_parsed'].value)

 It seemed to be working great at first but now Kibana has been resurfacing 
 these elasticsearch errors constantly and I can't seem to find any 
 information about it online (too new?).
 This repeats for every shard as far as I can tell (there are about 2 weeks 
 of indexes there. Any suggestions would be appreciated.
 Shard Failures

 The following shard failures ocurred:

- *Index:* logstash-2015.02.21 *Shard:* 0 *Reason:* 
 SearchParseException[[logstash-2015.02.21][0]: 
query[ConstantScore(BooleanFilter(+cache(@timestamp:[1425571563220 TO 
1425657963220])))],from[-1],size[500],sort[custom:@timestamp: 

 org.elasticsearch.index.fielddata.fieldcomparator.LongValuesComparatorSource@4c1942d3!]:
  
Parse Failure [Failed to parse source 

 [{size:500,sort:{@timestamp:desc},query:{filtered:{query:{query_string:{analyze_wildcard:true,query:*}},filter:{bool:{must:[{range:{@timestamp:{gte:1425571563220,lte:1425657963220}}}],must_not:[],highlight:{pre_tags:[@kibana-highlighted-field@],post_tags:[@/kibana-highlighted-field@],fields:{*:{}}},aggs:{2:{date_histogram:{field:@timestamp,interval:30m,pre_zone:-08:00,pre_zone_adjust_large_interval:true,min_doc_count:0,extended_bounds:{min:1425571563220,max:1425657963220,fields:[*,_source],script_fields:{ads_per_page:{script:doc['ads_found'].value
  
/ max(1, 

 doc['pages_parsed'].value),lang:expression}},fielddata_fields:[@timestamp]}]]];
  
nested: ExpressionScriptCompilationException[Field [ads_found] used in 
expression does not exist in mappings];



-- 
Please update your bookmarks! We have moved to https://discuss.elastic.co/
--- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/db5d1ac5-0961-4274-b329-00832834340e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Perma-Unallocated primary shards after a node has left the cluster

2015-05-01 Thread Alex Schokking
Probably super evident but the output above was actually from 
_cat/allocation?v not /recovery, sorry about that.

On Wednesday, April 29, 2015 at 5:19:08 PM UTC-7, Alex Schokking wrote:

 Hi guys, I would really appreciate some help understanding what's going 
 down with shard allocation in this case: 

 Elasticsearch version: 1.4.4

 We had 3 nodes with 1 shard and 1 replica per index (so net 2 copies of 
 everything). 1 node went down and the cluster went red. It started to 
 reallocate shards as expected and there were originally ~50 unallocated 
 shards with 15 primary and the rest replicas. 

 It's been a few hours now and there are still 15 outstanding shards that 
 are all primary that don't seem to be getting re-allocated. I thought this 
 would be a pretty standard scenario so I was really hoping I wouldn't need 
 to manually walk through and re-allocate the primary shards, but I'm not 
 sure what else to try at this point to get back to green. Any pointers 
 would be really appreciated. Here is some of the relevant seeming bits 
 folks asked about on the IRC:

 In the ES logs for the unallocated index names there are lines along the 
 line of 
 [2015-04-29 22:08:22,803][DEBUG][action.admin.indices.stats] [Agent Axis] 
 [webaccesslogs-2015.04.24][0], node[-r2iQnH4R-mcUy4NicCB5g], [P], 
 s[STARTED]: failed to execute 
 [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@6a564a91] 
 org.elasticsearch.transport.SendRequestTransportException: [Jean-Paul 
 Beaubier][inet[/10.155.165.126:9300]][indices:monitor/stats[s]]
 Jean-Paul Beaubier is the node that went down

 _cat/recovery
 shards disk.used disk.avail disk.total disk.percent host  ip   
   node
42021.2gb   77gb 98.3gb   21 ip-10-234-164-148 
 10.234.164.148 Agent Axis  
420  41gb 57.2gb 98.3gb   41 ip-10-218-145-237 
 10.218.145.237 Ebon Seeker 
 15 
   UNASSIGNED 

 I'm trying to understand why it's stuck in this state given there is no 
 other info in the logs as far as I can tell about why the shards can't be 
 allocated. Shouldn't the replicas just be promoted in place to new 
 primaries and then new replicas created on the other node?

 Thanks and regards -- Alex 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/44f2f680-0560-448f-a19f-893fda5aab41%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Perma-Unallocated primary shards after a node has left the cluster

2015-04-29 Thread Alex Schokking
Hi guys, I would really appreciate some help understanding what's going 
down with shard allocation in this case: 

Elasticsearch version: 1.4.4

We had 3 nodes with 1 shard and 1 replica per index (so net 2 copies of 
everything). 1 node went down and the cluster went red. It started to 
reallocate shards as expected and there were originally ~50 unallocated 
shards with 15 primary and the rest replicas. 

It's been a few hours now and there are still 15 outstanding shards that 
are all primary that don't seem to be getting re-allocated. I thought this 
would be a pretty standard scenario so I was really hoping I wouldn't need 
to manually walk through and re-allocate the primary shards, but I'm not 
sure what else to try at this point to get back to green. Any pointers 
would be really appreciated. Here is some of the relevant seeming bits 
folks asked about on the IRC:

In the ES logs for the unallocated index names there are lines along the 
line of 
[2015-04-29 22:08:22,803][DEBUG][action.admin.indices.stats] [Agent Axis] 
[webaccesslogs-2015.04.24][0], node[-r2iQnH4R-mcUy4NicCB5g], [P], 
s[STARTED]: failed to execute 
[org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@6a564a91] 
org.elasticsearch.transport.SendRequestTransportException: [Jean-Paul 
Beaubier][inet[/10.155.165.126:9300]][indices:monitor/stats[s]]
Jean-Paul Beaubier is the node that went down

_cat/recovery
shards disk.used disk.avail disk.total disk.percent host  ip   
  node
   42021.2gb   77gb 98.3gb   21 ip-10-234-164-148 
10.234.164.148 Agent Axis  
   420  41gb 57.2gb 98.3gb   41 ip-10-218-145-237 
10.218.145.237 Ebon Seeker 
15 
  UNASSIGNED 

I'm trying to understand why it's stuck in this state given there is no 
other info in the logs as far as I can tell about why the shards can't be 
allocated. Shouldn't the replicas just be promoted in place to new 
primaries and then new replicas created on the other node?

Thanks and regards -- Alex 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9adda07d-88b0-4fa2-805b-37d4739d6f1a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Copying fields to a geopoint type ?

2015-04-01 Thread Alex Schokking
Were you ever able to figure out a solution to this? I'm in a similar boat.

On Thursday, September 11, 2014 at 2:14:29 AM UTC-7, Kushal Zamkade wrote:

 Hello,

 I have created a location filed by using below code 

  if [latitude] and [longitude] {

 mutate {
   rename = [ latitude, [location][lat], longitude, 
 [location][lon] ]

 }
   }

 But when i check location field type then it is not created as geo_point.

 when i am trying to search a geo_point then i am getting below error.
  QueryParsingException[[logstash-2014.09.11] failed to find geo_point field 
 [location1]]; 

 can you help me to resolve this



 On Thursday, April 10, 2014 2:42:22 AM UTC+5:30, Pascal VINCENT wrote:

 Hi,

 I have included logstash in my stack and started to play with it. I'm 
 sure it can do the trick I was looking for, and much more.
 Thank you ... 

 [waiting for your blog post :)] 

 Pascal. 


 On Mon, Apr 7, 2014 at 9:38 AM, Alexander Reelsen a...@spinscale.de 
 wrote:

 Hey,

 I dont know about your stack, but maybe logstash would be a good idea to 
 add it in there. It is more flexible than the csv river and features a CSV 
 input as well. You can easily change the structure of the data you want to 
 index. This is how the logstash config would look like

   if [latitude] and [longitude] {

 mutate {
   rename = [ latitude, [location][lat], longitude, 
 [location][lon] ]

 }
   }

 I am currently working on a blog post how to utilize elasticsearch, 
 logstash and kibana on CSV based data and hope to release it soonish on the 
 .org blog - which covers exactly this. Stay tuned! :-)


 --Alex



 On Thu, Apr 3, 2014 at 12:21 AM, Pascal VINCENT pasvi...@gmail.com 
 wrote:

 Hi,

 I'm new to elasticsearch. My usecase is to load a csv file containing 
 some agencies with geo location, each lines are like :

 id;label;address;zipcode;city;region;*latitude*;*longitude*;(and some 
 others fields)+

 I'm using the csv river plugin to index the file.

 My mapping is :

 {
   office: {
 properties: {

 *(first fields omitted...)*

   *latitude*: {
 type: double,
   },
   *longitude*: {
 type: double,
   },
   *location*: {
 type: geo_point,
 lat_lon: true
   }
 }  
 }

 I'd like to index the location .lon and .lat value from the latitude 
 and longitude fields. I tried the copy_to function with no success :
   latitude: {
 type: double,
 copy_to: location.lat
   },
   longitude: {
 type: double,
 copy_to: location.lon
   },

 Is there any way to feed the location property from latitude and 
 longitude fields at indexation ?

 My point is that I don't want to modify the input csv file to adapt it 
 to the GeoJSON format (i.e concat lat and lon in one field in the csv 
 file).

 Thank you for any hints.

 Pascal.

  -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/6e12ced7-5b1a-4142-93d1-a3d22d7138a2%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/6e12ced7-5b1a-4142-93d1-a3d22d7138a2%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  -- 
 You received this message because you are subscribed to a topic in the 
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit 
 https://groups.google.com/d/topic/elasticsearch/QaI1fj74RlM/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to 
 elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-uHKT74qVbDT%3D8qg5Cv4vH0y%3DOzC8hGyO2uq_sY3sJ8g%40mail.gmail.com
  
 https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-uHKT74qVbDT%3D8qg5Cv4vH0y%3DOzC8hGyO2uq_sY3sJ8g%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4102f8c3-bdb5-457c-8adb-6c19cb2627c2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Copying fields to a geopoint type ?

2015-04-01 Thread Alex Schokking
Woah crazy, never would've thought of that, thanks a lot for following up!

On Wed, Apr 1, 2015 at 12:31 PM, Pascal VINCENT pasvinc...@gmail.com
wrote:

 I finally come up with :

  if [latitude] and [longitude] {
 mutate {
 add_field = [ [location], %{longitude} ]
 add_field = [ [location], %{latitude} ]
 }
 mutate {
 convert = [ [location], float ]
 }
   }


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/QaI1fj74RlM/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/47c8334d-109e-4052-9973-afa69dd49709%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/47c8334d-109e-4052-9973-afa69dd49709%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
-- Alex

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CACXbQESyNcYr9g1u2dEFRovHE-NtB_JwugNn76G7_36Tm0Mteg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Cluster issue - raiseTimeoutFailure

2015-03-20 Thread Alex
Hi,

 I have a java application that is indexind data in an Elasticsearch 
cluster(*3* *nodes*). The ES is well configured and is working ok(indexing 
the received data from java). 
 Cluster configuration for each node from 
/etc/elasticsearch/elasticsearch.yml

 ES_MAX_MEM: 2g
ES_MIN_MEM: 2g
bootstrap:
  mlockall: true
cluster:
  name: clusterName
discovery:
  zen:
ping:
  multicast:
enabled: false
  unicast:
hosts:
 - elasticsearch-test-2-node-1
 - elasticsearch-test-2-node-2
 - elasticsearch-test-2-node-3
http:
  max_initial_line_length: 48k
index:
  number_of_replicas: 2
  number_of_shards: 6
node:
  name: elasticsearch-test-2-node-3
threadpool:
  index:
type: fixed
size: 6
queue_size: 1500
  search:
type: fixed
size: 6
queue_size: 1200

  When I'm connecting the Es cluster(from java), I specify all the nodes : 
node1, node2, node3. 

 The issue is appearing when I stop the 2 data nodes one by one(stop the 
elasticsearch). In this case the cluster health is yellow and i can see the 
remained master node(using head plugin). The *master* has now *all the 
primary shards*. The replicas are Unassigned. But the java application is 
not indexing any more the data. The next exception appear on java :

org.elasticsearch.action.UnavailableShardsException: [indexName][2] [3] 
shardIt, [1] active : Timeout waiting for [1m], request: index 
{[indexName][typeName][Id], source[{ . }]}
at 
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.raiseTimeoutFailure(TransportShardReplicationOperationAction.java:548)
 
~[elasticsearch-1.1.0.jar:na]
at 
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$3.onTimeout(TransportShardReplicationOperationAction.java:538)
 
~[elasticsearch-1.1.0.jar:na]
at 
org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:491)
 
~[elasticsearch-1.1.0.jar:na]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
~[na:1.7.0_51]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
~[na:1.7.0_51]
at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_51]


* Shouldn't work properly the indexing in this case even with only the 
master? *

 If I am going to kill also the master the next *logical* exception appears 
org.elasticsearch.client.transport.NoNodeAvailableException: No node 
available
at 
org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:263)
 
~[elasticsearch-1.1.0.jar:na]
at 
org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:231)
 
~[elasticsearch-1.1.0.jar:na]
at 
org.elasticsearch.client.transport.support.InternalTransportClient.execute(InternalTransportClient.java:106)
 
~[elasticsearch-1.1.0.jar:na]
at 
org.elasticsearch.client.support.AbstractClient.update(AbstractClient.java:107) 
~[elasticsearch-1.1.0.jar:na]

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/176b3f2e-9e18-4018-a4a9-46b009dfd3d2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


ElasticSearch across multiple data center architecture design options

2015-03-12 Thread Alex
Hi all,

We are planning to use ELK for our log analysis. We have multiple data 
centers. Since it is not recommended to have across data center cluster, we 
are going to have one ES cluster per data center,  here are the three 
design options we have:

1. Use snapshot  restore to replicate data across clusters.
2. Use tribe node to achieve across cluster queries
3. Ship and index logs to each cluster

Here are our questions, and any comments will be appreciated:
1. How complex is snapshot  restore, anyone has experience on this purpose?
2. Would the performance of only one tribe node be a concern or bottleneck, 
is it possible to have multiple tribe nodes for scale up or load balancing?
3. Is it possible to customize Kibana so that it can go to different 
cluster to query data depends on the query?

Thank you!
Abigail

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2d46f80b-8579-4f2b-86c0-5ad654a5bba3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


ExpressionScriptCompilationException[Field [field name] used in expression does not exist in mappings];

2015-03-06 Thread Alex Schokking
Hi there,
We're just getting started with ELK and are using:
Elasticsearch 1.4.4
Kibana 4.0
on
Ubuntu 14.04

We needed to create a scripted field to calculate the ratio between two 
numeric fields. These fields are not on all events and only started 
appearing at all a day ago (so older indexes don't have it at all). 
name: ads_per_page
script: doc['ads_found'].value / max(1, doc['pages_parsed'].value)

It seemed to be working great at first but now Kibana has been resurfacing 
these elasticsearch errors constantly and I can't seem to find any 
information about it online (too new?).
This repeats for every shard as far as I can tell (there are about 2 weeks 
of indexes there. Any suggestions would be appreciated.
Shard Failures

The following shard failures ocurred:

   - *Index:* logstash-2015.02.21 *Shard:* 0 *Reason:* 
SearchParseException[[logstash-2015.02.21][0]: 
   query[ConstantScore(BooleanFilter(+cache(@timestamp:[1425571563220 TO 
   1425657963220])))],from[-1],size[500],sort[custom:@timestamp: 
   
org.elasticsearch.index.fielddata.fieldcomparator.LongValuesComparatorSource@4c1942d3!]:
 
   Parse Failure [Failed to parse source 
   
[{size:500,sort:{@timestamp:desc},query:{filtered:{query:{query_string:{analyze_wildcard:true,query:*}},filter:{bool:{must:[{range:{@timestamp:{gte:1425571563220,lte:1425657963220}}}],must_not:[],highlight:{pre_tags:[@kibana-highlighted-field@],post_tags:[@/kibana-highlighted-field@],fields:{*:{}}},aggs:{2:{date_histogram:{field:@timestamp,interval:30m,pre_zone:-08:00,pre_zone_adjust_large_interval:true,min_doc_count:0,extended_bounds:{min:1425571563220,max:1425657963220,fields:[*,_source],script_fields:{ads_per_page:{script:doc['ads_found'].value
 
   / max(1, 
   
doc['pages_parsed'].value),lang:expression}},fielddata_fields:[@timestamp]}]]];
 
   nested: ExpressionScriptCompilationException[Field [ads_found] used in 
   expression does not exist in mappings];

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/971bfbf2-6210-473b-9098-6262ed302846%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Using function_score error

2015-02-20 Thread alex
The error is in your groovy script, as indicated 
by GroovyScriptExecutionException. All the other info is just making it 
more difficult to help you.

script: _score  doc['reviews'].value

Your script doesn't use any operator. It's likely that you just want to 
multiply:  _score  * doc['reviews'].value.
In groovy, function call arguments do not need to be enclosed in brackets. 
E.g. println 'hello' is equivalent to println('hello'). By omitting the 
operator, your script is trying to call _score (which is some 
UpdatableFloat), with the document field as an argument.

Cheers!

Op maandag 3 november 2014 19:46:07 UTC+1 schreef Manuel Sciuto:

 I have an error

 My mapping 


- mappings: {
   - comida: {
  - dynamic: true,
  - numeric_detection: true,
  - properties: {
 - id: {
- type: integer
 },
 - reviews: {
- type: integer
 },
 - name: {
- analyzer: myAnalyzerDestinos,
- type: string
 }
  }
   },
   - actividades: {
  - dynamic: true,
  - numeric_detection: true,
  - properties: {
 - id: {
- type: integer
 },
 - reviews: {
- type: integer
 },
 - name: {
- analyzer: myAnalyzerDestinos,
- type: string
 }
  }
   },
   - alojamiento: {
  - dynamic: true,
  - numeric_detection: true,
  - properties: {
 - id: {
- type: integer
 },
 - reviews: {
- type: integer
 },
 - name: {
- analyzer: myAnalyzerDestinos,
- type: string
 }
  }
   },
   - transporte__servicios: {
  - dynamic: true,
  - numeric_detection: true,
  - properties: {
 - id: {
- type: integer
 },
 - reviews: {
- type: integer
 },
 - name: {
- analyzer: myAnalyzerDestinos,
- type: string
 }
  }
   }
},



 My Query 

 GET /business/_search
 {
  query: {
function_score: {
  query: {match: {name: sheraton}},
  script_score: {
script: _score  doc['reviews'].value,
lang: groovy
  }
}
  }
 }

 Response 

 {
error: SearchPhaseExecutionException[Failed to execute phase 
 [query], all shards failed; shardFailures 
 {[pGQYzpifRMumKUcblgTp2Q][business][0]: 
 QueryPhaseExecutionException[[business][0]: query[function score (name:she 
 name:sher name:shera name:sherat name:sherato 
 name:sheraton,function=script[_score  doc['reviews'].value], params 
 [null])],from[0],size[10]: Query Failed [Failed to execute main query]]; 
 nested: GroovyScriptExecutionException[MissingMethodException[No signature 
 of method: 
 org.elasticsearch.script.groovy.GroovyScriptEngineService$GroovyScript$UpdateableFloat.call()
  
 is applicable for argument types: (java.lang.Long) values: [11]\nPossible 
 solutions: wait(long), wait(), abs(), any(), wait(long, int), 
 and(java.lang.Number)]]; }{[pGQYzpifRMumKUcblgTp2Q][business][1]: 
 QueryPhaseExecutionException[[business][1]: query[function score (name:she 
 name:sher name:shera name:sherat name:sherato 
 name:sheraton,function=script[_score  doc['reviews'].value], params 
 [null])],from[0],size[10]: Query Failed [Failed to execute main query]]; 
 nested: GroovyScriptExecutionException[MissingMethodException[No signature 
 of method: 
 org.elasticsearch.script.groovy.GroovyScriptEngineService$GroovyScript$UpdateableFloat.call()
  
 is applicable for argument types: (java.lang.Long) values: [16]\nPossible 
 solutions: wait(long), wait(), abs(), any(), wait(long, int), 
 and(java.lang.Number)]]; }],
status: 500
 }

 Why?



 El sábado, 1 de noviembre de 2014 13:02:13 UTC-3, Ryan Ernst escribió:

 The root cause of the error is here:
 ScriptException[dynamic scripting for [mvel] disabled]; 

 I would guess you are running on ES 1.2 or 1.3? Dynamic scripting was 
 disabled by default in 1.2, and for non sandboxed languages in 1.3.  In 
 1.4, the default script language was changed to Groovy, which is sandboxed, 
 and thus can be safely compiled dynamically.

 See this blog for more details:
 http://www.elasticsearch.org/blog/scripting-security/

 If running in 1.3, you can simply change the language of the script:
 GET /searchtube/_search
 {
  query: {
function_score: {
  query: {match: {_all: severed}},
  script_score: {
script: _score * log(doc['likes'].value + doc['views'].value + 
 1),
lang: groovy
  }
}
  }
 }

 Although you could also use the expr lang (expressions) for this simple 
 script, which will be much faster!

 On Wednesday, October 29, 2014 

Re: JsonObject to SortBuilder object

2015-01-20 Thread Alex Thurston
Here's a simpler way to ask the question:

I've got this:

GeoDistanceSortBuilder sorter = new 
GeoDistanceSortBuilder(values.geo_location);
sorter.point(0.0, 0.0);
sorter.order(SortOrder.DESC);
sorter.unit(DistanceUnit.KILOMETERS);
sorter.geoDistance(GeoDistance.PLANE);
sorter.toString()


Which produces the string

_geo_distance{
   values.geo_location : [ 0.0, 0.0 ],
   unit : km,
   distance_type : plane,
   reverse : true
}


I would like to do the opposite.  I have the above string, and I want to 
turn it into a GeoDistanceSortBuilder without having to manually parse it.

On Monday, January 19, 2015 at 7:50:17 PM UTC-5, Alex Thurston wrote:

 I would like to turn an arbitrary JsonObject (which presumably follows the 
 Search/Sort DSL into a SortBuilder which can then be passed to the 
 SearchRequestBuilder::addSort.

 I've gotten this to work by simple parsing the JsonObject myself and 
 calling the appropriate calls in the SortBuilder, but that means that I 
 have to implement the parsing for every variation of the DSL.

 If I've got a Java JsonObject that looks like:

 {
first_name: asc
 }

 OR

 {
   first_name: {
 order: asc
   }
 }

 OR

 {
   _geo_distance:{
 my_position:{
   order: asc
 }
   }
 }

 All of which are valid Json for the sort, I would imagine there's a way to 
 call:

 JsonObject sort_json = EXAMPLE FROM ABOVE
 SortBuilder sort = new SortBuilder()
 sort.setSort(sort_json);

 I'm almost certain I'm missing something but can't for the life of me 
 figure out how to do it.

 Thanks in advance.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a756b6b3-7e28-44f7-be88-e8d6e102b6e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


JsonObject to SortBuilder object

2015-01-19 Thread Alex Thurston
I would like to turn an arbitrary JsonObject (which presumably follows the 
Search/Sort DSL into a SortBuilder which can then be passed to the 
SearchRequestBuilder::addSort.

I've gotten this to work by simple parsing the JsonObject myself and 
calling the appropriate calls in the SortBuilder, but that means that I 
have to implement the parsing for every variation of the DSL.

If I've got a Java JsonObject that looks like:

{
   first_name: asc
}

OR

{
  first_name: {
order: asc
  }
}

OR

{
  _geo_distance:{
my_position:{
  order: asc
}
  }
}

All of which are valid Json for the sort, I would imagine there's a way to 
call:

JsonObject sort_json = EXAMPLE FROM ABOVE
SortBuilder sort = new SortBuilder()
sort.setSort(sort_json);

I'm almost certain I'm missing something but can't for the life of me 
figure out how to do it.

Thanks in advance.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c09c4e67-68fe-4552-9161-eca139264511%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Getting SQLFeatureNotSupportedException while connecting Hbase phoenix via river-jdbc

2014-12-11 Thread Alex Kamil
Rimita, this was fixed in phoenix 3.1.0, pls follow these instructions:
http://lessc0de.github.io/connecting_hbase_to_elasticsearch.html

On Thu, Dec 11, 2014 at 3:53 AM, cto@TCS rimita.mit...@gmail.com wrote:

 Thank you so much


 On Thursday, December 11, 2014 12:49:49 PM UTC+5:30, cto@TCS wrote:

 Hi,

 I have a HBase database and I use phoenix as a RDBMS skin over it.
 Now I am trying to retrieve those data via ElasticSearch using the
 river-jdbc plugin.

 I am using the following:-
 1). elasticsearch-1.4.0
 2). elasticsearch-river-jdbc-1.4.0.3.Beta1
 3). phoenix-3.0.0-incubating-client
 4). HBase 0.94.1

 But I keep getting the following exception when I try to create a river.

 [2014-12-11 12:34:48,957][INFO ][org.apache.zookeeper.ZooKeeper]
 Session: 0x14a380290fe001d closed
 [2014-12-11 12:34:48,957][INFO ][org.apache.zookeeper.ClientCnxn]
 EventThread shut down
 [2014-12-11 12:34:49,022][ERROR][river.jdbc.SimpleRiverSource] while
 opening read connection: jdbc:phoenix:localhost:2181 null
 java.sql.SQLFeatureNotSupportedException
 at org.apache.phoenix.jdbc.PhoenixConnection.setReadOnly(
 PhoenixConnection.java:587)
 at org.xbib.elasticsearch.river.jdbc.strategy.simple.
 SimpleRiverSource.getConnectionForReading(SimpleRiverSource.java:226)
 at org.xbib.elasticsearch.river.jdbc.strategy.simple.
 SimpleRiverSource.execute(SimpleRiverSource.java:376)
 at org.xbib.elasticsearch.river.jdbc.strategy.simple.
 SimpleRiverSource.fetch(SimpleRiverSource.java:320)
 at org.xbib.elasticsearch.river.jdbc.strategy.simple.
 SimpleRiverFlow.fetch(SimpleRiverFlow.java:209)
 at org.xbib.elasticsearch.river.jdbc.strategy.simple.
 SimpleRiverFlow.execute(SimpleRiverFlow.java:139)
 at org.xbib.elasticsearch.plugin.jdbc.RiverPipeline.request(
 RiverPipeline.java:88)
 at org.xbib.elasticsearch.plugin.jdbc.RiverPipeline.call(
 RiverPipeline.java:66)
 at org.xbib.elasticsearch.plugin.jdbc.RiverPipeline.call(
 RiverPipeline.java:30)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(
 ThreadPoolExecutor.java:1145)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(
 ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 [2014-12-11 12:34:49,432][INFO ][river.jdbc.RiverMetrics  ] pipeline
 org.xbib.elasticsearch.plugin.jdbc.RiverPipeline@700dd36f is running:
 river jdbc/myriver metrics: 0 rows, 0.0 mean, (0.0 0.0 0.0), ingest
 metrics: elapsed 9 seconds, 0.0 bytes bytes, 0.0 bytes avg, 0 MB/s


 Please help!

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/0fff1fbc-c05e-4479-b095-d0d345b74512%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/0fff1fbc-c05e-4479-b095-d0d345b74512%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAOtKWX7e%2Br6k1ux2oY1%3DFbPapv-ibXj2chn_YZDX0d934bmX1w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch Maven plugin on GitHub

2014-12-05 Thread Alex Cojocaru
Here's my suggestion:

modify StartElasticsearchNodeMojo#execute() and, right before the method
returns (ie. after the ES node is started) read a System property called
waitIndefinitely. If the property is set, then wait indefinitely (see
below for details), else continue the execution.
You will have to provide that property when running maven, like mvn clean
verify -DwaitIndefinitely, for the plugin to wait indefinitely.

I hope that helps. Let me now if you need additional help.

To wait indefinitely:
see the waitIndefinitely() method in
http://svn.apache.org/viewvc/tomcat/maven-plugin/tags/tomcat-maven-plugin-2.0/tomcat7-maven-plugin/src/main/java/org/apache/tomcat/maven/plugin/tomcat7/run/AbstractRunMojo.java?view=markup

alex


On Thu, Dec 4, 2014 at 2:13 AM, Chetan Padhye chetanpad...@gmail.com
wrote:

 Hi Good plugin . .  I tried to run it but it start and then stop once pom
 execution is finished.  How can we modify plugin to keep it running once
 started. My intention is to use this plugin for demo installations. so i
 can install elastic search node and start it on any machine for my demo.



 On Friday, 17 January 2014 06:14:22 UTC+5:30, AlexC wrote:

 If anyone is interested in using a Maven plugin to run Elasticsearch for
 integration testing, I just published one on GitHub:
 https://github.com/alexcojocaru/elasticsearch-maven-plugin.

 It is an alternative to starting a node through the code.

 The readme should provide enough information, but let me know if
 something is missing or not clear enough.
 It uses ES v0.90.7, but it can be easily updated to the latest ES version
 by changing the dependency version in the pom.xml file.

 alex

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/0I2TGylTRHc/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/a9188126-93df-4d43-aaf8-4c324ceb12a3%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/a9188126-93df-4d43-aaf8-4c324ceb12a3%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
   [image: Ping Identity logo] https://www.pingidentity.com/
Alex Cojocaru
Sr. Development Engineer
  @ acojoc...@pingidentity.com  [image: phone] +1 604.697.7052  Connect
with us…  [image: twitter logo] https://twitter.com/pingidentity [image:
youtube logo] https://www.youtube.com/user/PingIdentityTV [image:
LinkedIn logo] https://www.linkedin.com/company/21870 [image: Facebook
logo] https://www.facebook.com/pingidentitypage [image: Google+ logo]
https://plus.google.com/u/0/114266977739397708540 [image: slideshare logo]
http://www.slideshare.net/PingIdentity [image: flipboard logo]
http://flip.it/vjBF7 [image: rss feed icon]
https://www.pingidentity.com/blogs/

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHUBgW8an8uQZHmdEZdnryXDMhjiSTpmV%2B%2BAXpxzvoOGsZnNcA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is there any way to pass search query in POST parameter rather than body?

2014-11-26 Thread Alex Roytman
Thanks Kelsey that could be useful. I managed to get my UI framework
(ExtJS) to play better with POST so I am not dependent on having to use GET
any more

On Wed Nov 26 2014 at 5:54:43 PM Kelsey Hamer kelsey.ha...@gmail.com
wrote:

 I had a similar issue.
 I manged to get the parameters from the post by doing:











 @Override

 public void handleRequest(final RestRequest request, final RestChannel
 channel) {

   MapString,String params = new HashMapString, String();
 RestUtils.decodeQueryString(request.content().toUtf8(), 0, params);
 String paramValue = params.get(parameter);

 //DO SOMETHING

 }




 Notice that the json query you want to pass in doesn't need to be encoded
 on the client side (with an Http.GET it needs to be)


 Hope that helps


 On Thursday, January 31, 2013 10:03:33 AM UTC-8, AlexR wrote:


 I am already doing it with GET and source parameter and it works well.
 One huge benefit is that size and start (and hopefully sort but have not
 tested it yet) url parameters override whatever is in source={...} - big
 help integrating with UI components that manage paging and generate these
 http parameters

 Now the problem is that for all practical purposes uri length is limited
 to 2000 characters so GET may very well fail with bigger queries (as I said
 query with facet based filtered, facets themselves, filters can get
 pretty long plus of course url-encoding of all spaces and {} )

 I wish the same functionality were available via POST. Couldn't ES check
 encoding in POST header and if it is *application/x-www-form-urlencoded *just
 extract encoded parameters and use source parameter just like it does with
 GET?

 Do you think I should put an enhancement request into Git?

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/tDmS4ABPag0/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/bddfebfc-519f-4320-b8aa-024849ae0c31%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/bddfebfc-519f-4320-b8aa-024849ae0c31%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAAY7rMS4sh%2BDHF%2BYTCOs6OT3Q7iZfCDOR2o_ct9adqP7dPTH%3Dw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch fuzzy intersection of two arrays

2014-11-12 Thread Alex Fedorov
I have an object in the Elasticsearch index that has a nested object which 
is a list of strings. 
I would like to do the intersection against this list in both exact and 
fuzzy ways.
So for example I have browser names with versions in the index like: 
browsers: [{name:Chrome 38}, {name:Firefox 32}, {name:Safari 
5}]

the request could be:
[{name:Chrome 38}, {name:IE 10}]
then I have just 1 exact match. 

or another example: 
[{name:Chrome 39}, {name:Firefox 33}, {name:Safari 5]
here I have 2 fuzzy-matches(Levenstein=2) and 1 exact match

Those results which have more matches should be on top.
How would you write this kind of query ? 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9d47a409-acc1-466d-99ba-47e264f68360%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Kibana 4 - filters in a dashboard

2014-11-10 Thread Alex
Dear Elasticsearch team,

could you possibly please clarify a question regarding Kibana 4 vs 
Kibana 3 Dashboard feature.

In Kibana 3 Dashboard I was able to interact with widgets i.e drill-down 
the data by clicking on basically any of the widgets to add more filters to 
current dashboard.
Does it suppose to work the same way in Kibana 4 as well? 

It does not work for me at all now and I'm wondering is that's something 
that I can not figure out how to do or is it something that's not 
implemented yet.

Thanks in advance,
Alex.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/aabaa0a3-20c0-4f3d-8241-9a3f3953624e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: unable to make snapshots to NFS filesystem

2014-10-08 Thread Alex Harvey
Ciprian,

Thanks for your input - I had indeed missed that disk space failure and it 
turns out I was hitting an intermittent disk space issue.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/02b41de5-478e-4083-b5d2-c9b493f24732%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


unable to make snapshots to NFS filesystem

2014-09-29 Thread Alex Harvey
Hi all,

I have been struggling to put together a backup solution for my ES cluster.

As far as I understand the documentation at
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html

I can't understand why the following might be failing:

I have exported an NFS filesystem to both nodes of my 2-node ES cluster, 
mounted as /srv/backup.

I created the elastic search user on the NFS server too and then 

[root@back01 ~]# ls -ld /srv/backup/es_backup

drwxrwx---. 3 elasticsearch elasticsearch 4096 Sep 29 18:37 
/srv/backup/es_backup

Start with a clean filesystem:

[root@logdata01 ~]# rm -rf /srv/backup/*

Register the backup area:

[root@logdata01 ~]# curl -s -XPUT http://localhost:9200/_snapshot/backup -d 
'{ 
 

  type: fs,

  settings: {

location: /srv/backup

  }

}'

{acknowledged:true}

Create a snapshot:

[root@logdata01 ~]# curl -XPUT 
localhost:9200/_snapshot/backup/tcom_snapshot?wait_for_completion=truepretty

I then get failures on various shards
https://gist.github.com/alexharv074/b4c7d35028c425f70f20

Any help on how I could get this cluster into a sane state that can be 
backed up greatly appreciated.

Best regards
Alex

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ec09f9bd-5075-4bd5-adf7-d88cbb636c1f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: unable to make snapshots to NFS filesystem

2014-09-29 Thread Alex Harvey
Thanks for responding.

It doesn't seem to be a permissions problem -

[root@logdata01 ~]# ls -ld /srv/backup

drwxrwx---. 3 elasticsearch elasticsearch 4096 Sep 29 18:43 /srv/backup

[root@logdata01 ~]# find /srv/backup/ \! -user elasticsearch -or \! -group 
elasticsearch

[root@logdata01 ~]# 

[root@logdata01 ~]# find /srv/backup -ls |head

1310734 drwxrwx---   3 elasticsearch elasticsearch 4096 Sep 29 
18:43 /srv/backup

1310764 drwxr-xr-x  12 elasticsearch elasticsearch 4096 Sep 29 
18:37 /srv/backup/indices

1310954 drwxr-xr-x   6 elasticsearch elasticsearch 4096 Sep 29 
18:42 /srv/backup/indices/logstash-2014.09.28

1310968 -rw-r--r--   1 elasticsearch elasticsearch 4120 Sep 29 
18:37 /srv/backup/indices/logstash-2014.09.28/snapshot-tcom_snapshot

1311894 drwxr-xr-x   2 elasticsearch elasticsearch 4096 Sep 29 
18:37 /srv/backup/indices/logstash-2014.09.28/3

1311938 -rw-r--r--   1 elasticsearch elasticsearch 4443 Sep 29 
18:37 /srv/backup/indices/logstash-2014.09.28/3/__3

1312014 -rw-r--r--   1 elasticsearch elasticsearch  689 Sep 29 
18:37 /srv/backup/indices/logstash-2014.09.28/3/__b

1312024 -rw-r--r--   1 elasticsearch elasticsearch   61 Sep 29 
18:37 /srv/backup/indices/logstash-2014.09.28/3/__c

1312064 -rw-r--r--   1 elasticsearch elasticsearch  281 Sep 29 
18:37 /srv/backup/indices/logstash-2014.09.28/3/__g

1312004 -rw-r--r--   1 elasticsearch elasticsearch  349 Sep 29 
18:37 /srv/backup/indices/logstash-2014.09.28/3/__a


On Monday, September 29, 2014 8:02:42 PM UTC+10, Mark Walkom wrote:

 Can you do an ls -ld /srv/backup and provide the output?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com

 On 29 September 2014 18:45, Alex Harvey alexh...@gmail.com javascript: 
 wrote:

 Hi all,

 I have been struggling to put together a backup solution for my ES 
 cluster.

 As far as I understand the documentation at

 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html

 I can't understand why the following might be failing:

 I have exported an NFS filesystem to both nodes of my 2-node ES cluster, 
 mounted as /srv/backup.

 I created the elastic search user on the NFS server too and then 

 [root@back01 ~]# ls -ld /srv/backup/es_backup

 drwxrwx---. 3 elasticsearch elasticsearch 4096 Sep 29 18:37 
 /srv/backup/es_backup

 Start with a clean filesystem:

 [root@logdata01 ~]# rm -rf /srv/backup/*

 Register the backup area:

 [root@logdata01 ~]# curl -s -XPUT http://localhost:9200/_snapshot/backup 
 -d '{   


   type: fs,

   settings: {

 location: /srv/backup

   }

 }'

 {acknowledged:true}

 Create a snapshot:

 [root@logdata01 ~]# curl -XPUT 
 localhost:9200/_snapshot/backup/tcom_snapshot?wait_for_completion=truepretty

 I then get failures on various shards
 https://gist.github.com/alexharv074/b4c7d35028c425f70f20

 Any help on how I could get this cluster into a sane state that can be 
 backed up greatly appreciated.

 Best regards
 Alex

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/ec09f9bd-5075-4bd5-adf7-d88cbb636c1f%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/ec09f9bd-5075-4bd5-adf7-d88cbb636c1f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.





-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1160a631-29b6-4242-97ec-67f446da81bb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Using ES as a primary datastore.

2014-09-17 Thread Alex Kamil
ES is a fantastic search engine but there is some risk
http://aphyr.com/posts/317-call-me-maybe-elasticsearch of data loss,
and a few
other
https://www.quora.com/Why-should-I-NOT-use-ElasticSearch-as-my-primary-datastore
 potential disadvantages which might or might not be relevant to you. You
can always combine ES via JDBC river
https://github.com/jprante/elasticsearch-river-jdbc with a stable, secure
database, e.g. Mysql
https://www.quora.com/How-do-i-use-Elastic-search-with-mysql-database-I-am-currently-experimenting-with-jdbc-river-but-will-it-be-fast-enough-in-productionor
Hbase http://lessc0de.github.io/connecting_hbase_to_elasticsearch.html,
since you have lots of data hbase might be a better option.


On Wed, Sep 17, 2014 at 8:04 AM, Thomas thomas.bo...@gmail.com wrote:

 Hi,

 You have to calculate the volumes you will keep in one shard first then
 you have to break your volumes into the number of shards you will maintain
 and then scale accordingly into a number of nodes, or at least as your
 volumes grow you should grow your cluster as well.

 It is difficult to predict what problems may arise it is too generic your
 case, what will be the usage of the cluster? what queries you will perform,
 you will mostly do indexing and occasionally querying or you will
 intensively query your data.

 Most important you need to  think how you will partition your data, will
 you have one index, multiple index like a logstash approach? or not
 Maybe check here: https://www.found.no/foundation/sizing-elasticsearch/

 For data more than a year what you will do delete them? Do you afford to
 lose data? Will you keep backups?

 IMHO, these are some of the questions you must answer in order to see
 whether such an approach suit your needs. It is hardware, structure and
 partitioning of your data.

 Thomas

 On Wednesday, 17 September 2014 13:41:55 UTC+3, P Suman wrote:

 Hello,

  We are planning to use ES as a primary datastore.

 Here is my usecase

 We receive a million transactions per day (all are inserts).
 Each transaction is around 500KB size, transaction has 10 fields we
 should be able to search on all 10 fields.
 We want to keep around 1 yr worth of data, this comes around 180TB

 Can you please let me know any problems that might arise if i use elastic
 search as the primary datastore.



 Regards,
 Suman




  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/0612d5d3-05df-4538-a3f0-e87cd9b3dc49%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/0612d5d3-05df-4538-a3f0-e87cd9b3dc49%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAOtKWX47iRi6P%2BSp-GC%2B8JL1xmwKoL4yHerMC4PG5rYDiL8YXA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is there any way to prevent ES from disclosing exception details in REST response?

2014-09-15 Thread Alex Roytman
Thanks Jorg,

Unfortunately it is not an option - we are not at liberty to touch anything
beyond our app servers. We are using transport-wares servlet for ES and I
could easily tweak AbstractServletRestChannel to handle Rest Channel
response with codes 400,500 but I would like to avoid modifying the code
directly and there is no way to do it nicely. I put a request on github for
enhancements of the NodeServlet but was hoping ES may have an option to
turn error details on/off. I think it would be nice to control error level
 in REST responses with three levels - suppress/message/stack-trace

On Mon, Sep 15, 2014 at 6:01 PM, joergpra...@gmail.com 
joergpra...@gmail.com wrote:

 You can put a revers proxy like nginx between ES cluster and the rest of
 the world and filter away all HTTP status 500 responses.

 Jörg

 On Mon, Sep 15, 2014 at 11:57 PM, AlexR roytm...@gmail.com wrote:

 We expose ES _search endpoint directly to consumers. When our REST API
 get scanned for security vulnerabilities it complains on ES returning
 exception details. For example a malformed query will be included in the
 response along with exception. While it is more or a less harmless the tool
 complains of various injections and internals disclosures. I would like to
 be able to turn error message in the response off (or substitute it with a
 generic message) in production while keeping normal response logic in
 development.

 Is there any way I can do it?

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/83e6b126-4db1-44cc-9a0a-0dd6c6d44a64%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/83e6b126-4db1-44cc-9a0a-0dd6c6d44a64%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/Qx1TPSMS9ro/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdM-ih%2BW_XCgHiphemWSSGtmK2Z5C%3D%2BiWD-ciBokbWHg%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdM-ih%2BW_XCgHiphemWSSGtmK2Z5C%3D%2BiWD-ciBokbWHg%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAAY7rMRi4zsnv7QapKEr3iwyWxY0EDOnDczDeYa_0nzsFYT9sQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is there any way to prevent ES from disclosing exception details in REST response?

2014-09-15 Thread Alex Roytman
I guess I could but it would mean passing a response wrapper to capture
output stream and then copy it to real request or discard it in case of an
error. That would be a second copy of response - the first one being done
in the NodeServlet - will hurt performance for large responses :-(

On Mon, Sep 15, 2014 at 6:40 PM, joergpra...@gmail.com 
joergpra...@gmail.com wrote:

 Then why don't you simply add a servlet filter that filters unwanted
 responses away?

 Jörg

 On Tue, Sep 16, 2014 at 12:21 AM, Alex Roytman roytm...@gmail.com wrote:

 Thanks Jorg,

 Unfortunately it is not an option - we are not at liberty to touch
 anything beyond our app servers. We are using transport-wares servlet for
 ES and I could easily tweak AbstractServletRestChannel to handle Rest
 Channel response with codes 400,500 but I would like to avoid modifying the
 code directly and there is no way to do it nicely. I put a request on
 github for enhancements of the NodeServlet but was hoping ES may have an
 option to turn error details on/off. I think it would be nice to control
 error level  in REST responses with three levels -
 suppress/message/stack-trace

 On Mon, Sep 15, 2014 at 6:01 PM, joergpra...@gmail.com 
 joergpra...@gmail.com wrote:

 You can put a revers proxy like nginx between ES cluster and the rest of
 the world and filter away all HTTP status 500 responses.

 Jörg

 On Mon, Sep 15, 2014 at 11:57 PM, AlexR roytm...@gmail.com wrote:

 We expose ES _search endpoint directly to consumers. When our REST API
 get scanned for security vulnerabilities it complains on ES returning
 exception details. For example a malformed query will be included in the
 response along with exception. While it is more or a less harmless the tool
 complains of various injections and internals disclosures. I would like to
 be able to turn error message in the response off (or substitute it with a
 generic message) in production while keeping normal response logic in
 development.

 Is there any way I can do it?

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/83e6b126-4db1-44cc-9a0a-0dd6c6d44a64%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/83e6b126-4db1-44cc-9a0a-0dd6c6d44a64%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/Qx1TPSMS9ro/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdM-ih%2BW_XCgHiphemWSSGtmK2Z5C%3D%2BiWD-ciBokbWHg%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdM-ih%2BW_XCgHiphemWSSGtmK2Z5C%3D%2BiWD-ciBokbWHg%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAAY7rMRi4zsnv7QapKEr3iwyWxY0EDOnDczDeYa_0nzsFYT9sQ%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAAY7rMRi4zsnv7QapKEr3iwyWxY0EDOnDczDeYa_0nzsFYT9sQ%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/Qx1TPSMS9ro/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH8fWhWpAKU4MT6oVxgRNG%2BBS1Y3QdOyn2eWiqwCErsJQ%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH8fWhWpAKU4MT6oVxgRNG%2BBS1Y3QdOyn2eWiqwCErsJQ%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch

Re: Linking of query/search

2014-09-12 Thread Alex Kamil
you can combine ES with RDBMS, and run your SQL queries either directly
against db, or pull data via JDBC River into ES, I wrote about it here:
http://lessc0de.github.io/connecting_hbase_to_elasticsearch.html


On Fri, Sep 12, 2014 at 10:55 AM, Ivan Brusic i...@brusic.com wrote:

 You cannot join documents in Lucene/Elasticsearch (at least not like a
 RDBMS). You would need to either denormalize your data, join on the client
 side or execute 2+ queries.

 --
 Ivan

 On Fri, Sep 12, 2014 at 12:45 AM, matej.zerov...@gmail.com wrote:

 Hello!

 Can anyone shine some light on my question?
 Is the query in question achievable in ES directly?

 If not, I can probably do that in application later, but it would be
 nicer if ES could serve me the final results.

 Matej

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/6f3345f2-4b25-4b06-b203-4ad0de201e8f%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/6f3345f2-4b25-4b06-b203-4ad0de201e8f%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBgybZpCz1bKV%3DE7XF_cHGDuFKS1wruKNAYZTbo8t0jvA%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBgybZpCz1bKV%3DE7XF_cHGDuFKS1wruKNAYZTbo8t0jvA%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAOtKWX623repUH5k2XbkFBFNu-b3cSKyObuyf793AVhOt3Gb-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Connecting Hbase to Elasticsearch

2014-09-10 Thread Alex Kamil
I posted step-by-step instructions here
http://lessc0de.github.io/connecting_hbase_to_elasticsearch.html on using
Apache Hbase/Phoenix with Elasticsearch JDBC River.

This might be useful to Elasticsearch users who want to use Hbase as a
primary data store, and to Hbase users who wish to enable full-text search
on their existing tables via Elasticsearch API.

Alex

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAOtKWX4R81324NmKZou_zCT0e-DbFv%2BmWHg_pAinCmUapwyYcA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Backup and restore using snapshots

2014-09-10 Thread Alex Harvey
I could still use feedback on this plan.

On Sunday, August 31, 2014 9:08:12 PM UTC+10, Alex Harvey wrote:

 Hi all

 I could use some help getting my head around the snapshot and restore 
 functionality in ES.

 I have a requirement to do incremental daily tape backups and full backups 
 weekly using EMC's Avamar backup software.

 I'd really appreciate if someone can tell me if the following plan is 
 going to work -

 1)  Export an NFS filesystem from the storage node to both ES data nodes, 
 and mount that as /mnt/backup on both nodes.

 2)  From one of the ES nodes register this directory as the shared 
 repository: curl -XPUT 'http://localhost:9200/_snapshot/backup' -d 
 '{type: fs,settings: {location: /mnt/backup}}'

 3)  On Saturday do a full backup:

 i. Get a list of all snapshots using: curl -XGET 
 'localhost:9200/_snapshot/_status'
 ii. For each of these delete using a command like: curl -XDELETE 
 'localhost:9200/_snapshot/backup/snapshot_20140830'
 iii.  Create a full backup using:  curl -XPUT 
 localhost:9200/_snapshot/backup/snapshot_$(date 
 +%Y%m%d)?wait_for_completion=true
 iv.  Copy the /mnt/backup directory to tape telling Avamar to take a full 
 backup

 4)  On Sunday to Friday do incremental backups based on the Saturday 
 backup:

 i.  Simply run: curl -XPUT 
 localhost:9200/_snapshot/backup/snapshot_$(date 
 +%d%m%Y)?wait_for_completion=true
 ii.  Copy /mnt/backup to tape telling Avamar to take an incremental backup

 Is this plan going to work?  Is there a better way?

 Thanks very much in advance.

 Best regards,
 Alex


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6bea9d13-5137-41e6-842c-32fbe71c56b8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Get distinct data

2014-09-02 Thread Alex T
Hi all!

I have problem with getting unique data from elasticsearch. I have the 
following documents:

[
{
 message: Message 1,
 author: {
  id: 4,
  name: Author Name
  },
  sourceId: 123456789,
  userId: 123456
},
{
 message: Message 1,
 author: {
  id: 4,
  name: Author Name
  },
  sourceId: 123456789,
  userId: 654321
}
]

Different between this documents in userId. When I send query by 
author.id, I get response with 2 documents.

Can I get distinct data by sourceId field?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1b59b2a2-484b-46cc-a95b-695e84e6d6eb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Get distinct data

2014-09-02 Thread Alex T
Hi Vineeth! Thanks for your answer.

I use term aggregation, but I get anyway response with 2 documents, 
response data for example:


{
  took:23,
  timed_out:false,
  _shards:{total:5,successful:5,failed:0},
  hits:{
  total:2,
  max_score:null,
  hits:[
  {
  _index:feeditem_local,\
  _type:FeedItem,
  _id:53dbe9cf1d7859e15f8b4599,
  _score:null,
  _source:{
  sourceId:123456789, 
  message:Message 1,
  author:{id:120816414},
  userId: 123456
  },
  sort:[1406921136000]
  },
  {
  _index:feeditem_local,\
  _type:FeedItem,
  _id:53dbe9cf1d7859e15f8b4599,
  _score:null,
  _source:{
  sourceId:123456789, 
  message:Message 1,
  author:{id:120816414},
  userId: 654321
  },
  sort:[1406921136000]
  }
  ]
},
aggregations:{
source:{
buckets:[
{key:123456789,doc_count:2}
]
}
}
}




вторник, 2 сентября 2014 г., 9:45:41 UTC+3 пользователь vineeth mohan 
написал:

 Hello Alex , 

 Term aggregation is here to save your day - 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation

 Thanks
   Vineeth


 On Tue, Sep 2, 2014 at 12:07 PM, Alex T atr...@gmail.com javascript: 
 wrote:

 Hi all!

 I have problem with getting unique data from elasticsearch. I have the 
 following documents:

 [
 {
  message: Message 1,
  author: {
   id: 4,
   name: Author Name
   },
   sourceId: 123456789,
   userId: 123456
 },
 {
  message: Message 1,
  author: {
   id: 4,
   name: Author Name
   },
   sourceId: 123456789,
   userId: 654321
 }
 ]

 Different between this documents in userId. When I send query by 
 author.id, I get response with 2 documents.

 Can I get distinct data by sourceId field?


  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/1b59b2a2-484b-46cc-a95b-695e84e6d6eb%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/1b59b2a2-484b-46cc-a95b-695e84e6d6eb%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8dccd0b8-972f-419c-bb94-3291d412844b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Backup and restore using snapshots

2014-08-31 Thread Alex Harvey
Hi all

I could use some help getting my head around the snapshot and restore 
functionality in ES.

I have a requirement to do incremental daily tape backups and full backups 
weekly using EMC's Avamar backup software.

I'd really appreciate if someone can tell me if the following plan is going 
to work -

1)  Export an NFS filesystem from the storage node to both ES data nodes, 
and mount that as /mnt/backup on both nodes.

2)  From one of the ES nodes register this directory as the shared 
repository: curl -XPUT 'http://localhost:9200/_snapshot/backup' -d 
'{type: fs,settings: {location: /mnt/backup}}'

3)  On Saturday do a full backup:

i. Get a list of all snapshots using: curl -XGET 
'localhost:9200/_snapshot/_status'
ii. For each of these delete using a command like: curl -XDELETE 
'localhost:9200/_snapshot/backup/snapshot_20140830'
iii.  Create a full backup using:  curl -XPUT 
localhost:9200/_snapshot/backup/snapshot_$(date 
+%Y%m%d)?wait_for_completion=true
iv.  Copy the /mnt/backup directory to tape telling Avamar to take a full 
backup

4)  On Sunday to Friday do incremental backups based on the Saturday backup:

i.  Simply run: curl -XPUT localhost:9200/_snapshot/backup/snapshot_$(date 
+%d%m%Y)?wait_for_completion=true
ii.  Copy /mnt/backup to tape telling Avamar to take an incremental backup

Is this plan going to work?  Is there a better way?

Thanks very much in advance.

Best regards,
Alex

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2a21ece7-533c-480e-9ac1-218d76d85385%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Duplicate function MVEL script

2014-08-28 Thread Alex S.V.
The problem with MVEL that you can't redefine defined function in a script 
instance. Script class instantiates once the query starts, and then it's 
executing it again and again. MVEL is bad for complex scripting.
Yes, you could use groovy,and should :)

I found a good way to use it with the next code: 

*import* groovy.lang.Script
*class* MyScript *extends* Script {
  *def* run() {
// your code is here, also binded variables should be available here
  }
}

So how it works:
1. Groovy compiles this script and put to class cache.
2. One each query MyScript instance is created (on per node)
3. On each document run() method is executed (It should provide different 
return values for filter script, score script, sort script, script fields)

Alex

On Wednesday, August 27, 2014 5:50:11 PM UTC+3, k...@stylelabs.com wrote:

 Hello

 We are executing some concurrent updates on the same document using an 
 MVEL script together with some parameters. 
 The MVEL script contains some functions such as addRelations etc but 
 there is no sign of duplicate functions.

 ES throws the following error:

 [John Kafka][inet[/10.12.1.219:9300]][update]]; nested: 
 ElasticsearchIllegalArgumentException[failed to execute script]; nested: 
 *CompileException*[[Error: *duplicate function: addRelations*]
 [Near : {... def addRelations(relationNode, }]
  ^
 [Line: 1, Column: 1]

 ES Version 1.3.2

 If the updates are executed sequentially there is no error/problem with 
 the MVEL script.

 Any idea's?

 Best Regards,
 Kristof


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e549c53b-90ae-41ca-b106-5c6e812417f7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: groovy for scripting

2014-08-26 Thread Alex S.V.
providing self-update:

I found that I could create cross-request cache using next script (like a 
cross-request incrementer):

POST /test/_search
{
query: {match_all:{}},
script_fields: {
   a: { 
   script: import groovy.lang.Script;class A extends Script{static 
i=0;def run() {i++}},
   lang: groovy
   }
}
}

In good view mode the script is:

import groovy.lang.Script

class A extends Script{
  static i=0

  def run() {
 i++
  }
}

Actually here *i* variable is not thread-safe, but idea is clean - you need 
define a class, inherited from Script and implement abstract method run.
Also this class is access on each node-thread.
Now I'm looking for a solution to make a query-scope type counter (for 
one-node configuration). I think it's could be done by passing unique 
query_id in parameters, but I'm afraid of making code non thread safe, or 
vice versa - thread safe, but with reduce performance.
Researching more...

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fb402d2c-8820-4a1f-99e0-0453c0c82cf6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


groovy for scripting

2014-08-20 Thread Alex S.V.
I'm playing around with groovy scripting.

By checking groovy lang plugin source code I found next steps in code 
execution:

1. Code compilation into script class

2. Script initialization via static method newInstance()

3. Script execution via calling the code on each document with binding 
document parameters

Now assume I have class declaration in my script. Is it possible to execute 
class definition and class object initialization only once, and execute 
only a method from this object on each document?

Thanks

P.S. posting the same on SO

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bb3a9ca6-79fd-4ac6-ac78-ce0c102b9505%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: A few questions about node types + usage

2014-08-18 Thread Alex
Hello again Mark, 

Thanks for your response. Your answers really are very helpful.

As with our previous conversation 
https://groups.google.com/d/topic/elasticsearch/ZouS4NVsTJw/discussion I 
am confused about how to make a client node also be master eligible. This 
is what I posted there, I would really like some help understanding this:

I've done more investigating and it seems that a Client (AKA Query) node 
cannot also be a Master node. As it says here 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html#master-election

*Nodes can be excluded from becoming a master by setting node.master to 
false. Note, once a node is a client node (node.client set to true), it 
will not be allowed to become a master (node.master is automatically set to 
false).*

And from the elasticsearch.yml config file it says:












*# 2. You want this node to only serve as a master: to not store any data 
and # to have free resources. This will be the coordinator of your 
cluster. # #node.master: true #node.data: false # # 3. You want this node 
to be neither master nor data node, but # to act as a search load 
balancer (fetching data from nodes, # aggregating results, 
etc.) # #node.master: false #node.data: false*

So I'm wondering how exactly you set up your client nodes to also be master 
nodes. It seems like a master node can only either be purely a master or 
master + data.

Perhaps you could show the relevant parts of one of your client node's 
config?

Many thanks, Alex

On Saturday, 16 August 2014 01:04:37 UTC+1, Mark Walkom wrote:

 1 - Up to you. We use the http output and then just use a round robin A 
 record to our 3 masters.
 2 - They are routed but it makes more sense to specify.
 3 - You're right, but most people only use 1 or 2 masters which is why 
 they get recommended to have at least 3.
 4 - That sounds like a lot. We use masters that double as clients and they 
 only have 8GB, our use sounds similar and we don't have issues.

 I wouldn't bother with 3 client only nodes to start, use them as master 
 and client and then if you find you are hitting memory issues due to 
 queries you can re-evaluate things.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com


 On 15 August 2014 20:11, Alex alex@gmail.com javascript: wrote:

 Bump. Any help? Thanks

 On Wednesday, 13 August 2014 12:10:14 UTC+1, Alex wrote:

 Hello I would like some clarification about node types and their usage. 

 We will have 3 client nodes and 6 data nodes. The 6 1TB data nodes can 
 also be masters (discovery.zen.minimum_master_nodes set to 4). We will 
 use Logstash and Kibana. Kibana will be used 24/7 by between a couple and 
 handfuls of people.

 Some questions:

1. Should incoming Logstash write requests be sent to the cluster in 
general (using the *cluster* setting in the *elasticsearch* output) 
or specifically to the client nodes or to the data nodes (via load 
balancer)? I am unsure what kind of node is best for handling writes.

2. If client nodes exist in the cluster are Kibana requests 
automatically routed to them? Do I need to somehow specify to Kibana 
 which 
nodes to contact?

3. I have heard different information about master nodes and the 
minimum_master_node setting. I've heard that you should have a odd 
 number 
of master nodes but I fail to see why the parity of the number of 
 masters 
matters as long as minimum_master_node is set to at least N/2 + 1. Does 
 it 
really need to be odd?

4. I have been advised that the client nodes will use huge amount of 
memory (which makes sense due to the nature of the Kibana facet 
 queries). 
64GB per client node was recommended but I have no idea if that sounds 
right or not. I don't have the ability to actually test it right now so 
 any 
more guidance on that would be helpful. 

 I'd be so grateful to hear from you even if you only know something 
 about one of my queries.

 Thank you for your time,
 Alex

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/70b16a1e-319c-4f7c-b129-b68258b3652f%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/70b16a1e-319c-4f7c-b129-b68258b3652f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web

Re: A few questions about node types + usage

2014-08-15 Thread Alex
Bump. Any help? Thanks

On Wednesday, 13 August 2014 12:10:14 UTC+1, Alex wrote:

 Hello I would like some clarification about node types and their usage. 

 We will have 3 client nodes and 6 data nodes. The 6 1TB data nodes can 
 also be masters (discovery.zen.minimum_master_nodes set to 4). We will 
 use Logstash and Kibana. Kibana will be used 24/7 by between a couple and 
 handfuls of people.

 Some questions:

1. Should incoming Logstash write requests be sent to the cluster in 
general (using the *cluster* setting in the *elasticsearch* output) or 
specifically to the client nodes or to the data nodes (via load balancer)? 
I am unsure what kind of node is best for handling writes.

2. If client nodes exist in the cluster are Kibana requests 
automatically routed to them? Do I need to somehow specify to Kibana which 
nodes to contact?

3. I have heard different information about master nodes and the 
minimum_master_node setting. I've heard that you should have a odd number 
of master nodes but I fail to see why the parity of the number of masters 
matters as long as minimum_master_node is set to at least N/2 + 1. Does it 
really need to be odd?

4. I have been advised that the client nodes will use huge amount of 
memory (which makes sense due to the nature of the Kibana facet queries). 
64GB per client node was recommended but I have no idea if that sounds 
right or not. I don't have the ability to actually test it right now so 
 any 
more guidance on that would be helpful.

 I'd be so grateful to hear from you even if you only know something about 
 one of my queries.

 Thank you for your time,
 Alex


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/70b16a1e-319c-4f7c-b129-b68258b3652f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Recommendations needed for large ELK system design

2014-08-13 Thread Alex
Hi Mark,

I've done more investigating and it seems that a Client (AKA Query) node 
cannot also be a Master node. As it says here 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html#master-election

*Nodes can be excluded from becoming a master by setting node.master to 
false. Note, once a node is a client node (node.client set to true), it 
will not be allowed to become a master (node.master is automatically set to 
false).*

And from the elasticsearch.yml config file it says:












*# 2. You want this node to only serve as a master: to not store any data 
and # to have free resources. This will be the coordinator of your 
cluster. # #node.master: true #node.data: false # # 3. You want this node 
to be neither master nor data node, but # to act as a search load 
balancer (fetching data from nodes, # aggregating results, etc.) # 
#node.master: false #node.data: false*

So I'm wondering how exactly you set up your client nodes to also be master 
nodes. It seems like a master node can only either be purely a master or 
master + data.

Regards, Alex

On Thursday, 31 July 2014 23:57:26 UTC+1, Mark Walkom wrote:

 1 - Curator FTW.
 2 - Masters handle cluster state, shard allocation and a whole bunch of 
 other stuff around managing the cluster and it's members and data. A node 
 that is master and data set to false is considered a search node. But the 
 role of being a master is not onerous, so it made sense for us to double up 
 the roles. We then just round robin any queries to these three masters.
 3 - Yes, butit's entirely dependent on your environment. If you're 
 happy with that and you can get the go-ahead then see where it takes you.
 4 - Quorum is automatic and having the n/2+1 means that the majority of 
 nodes will have to take place in an election, which reduces the possibility 
 of split brain. If you set the discovery settings then you are also 
 essentially setting the quorum settings.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com


 On 31 July 2014 22:27, Alex alex@gmail.com javascript: wrote:

 Hello Mark,

 Thank you for your reply, it certainly helps to clarify many things.

 Of course I have some new questions for you!

1. I haven't looked into it much yet but I'm guessing Curator can 
handle different index naming schemes. E.g. logs-2014.06.30 and 
stats-2014.06.30. We'd actually be wanting to store the stats data for 2 
years and logs for 90 days so it would indeed be helpful to split the 
 data 
into different index sets. Do you use Curator?

2. You say that you have 3 masters that also handle queries... but 
I thought all masters did was handle queries? What is a master node that 
*doesn't* handle queries? Should we have search load balancer nodes? 
AKA not master and not data nodes.

3. In the interests of reducing the number of node combinations for 
us to test out would you say, then, that 3 master (and query(??)) only 
nodes, and the 6 1TB data only nodes would be good?

4. Quorum and split brain are new to me. This webpage 

 http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/
  about 
split brain recommends setting *discovery.zen.minimum_master_nodes* equal 
to *N/2 + 1*. This formula is similar to the one given in the 
documentation for quorum 

 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency:
  
index operations only succeed if a quorum (replicas/2+1) of active 
 shards 
are available. I completely understand the split brain issue, but not 
quorum. Is quorum handled automatically or should I change some settings? 

 Thanks again for your help, we appreciate your time and knowledge!
 Regards,
 Alex


 On Thursday, 31 July 2014 05:57:35 UTC+1, Mark Walkom wrote:

 1 - Looks ok, but why two replicas? You're chewing up disk for what 
 reason? Extra comments below.
 2 - It's personal preference really and depends on how your end points 
 send to redis.
 3 - 4GB for redis will cache quite a lot of data if you're only doing 50 
 events p/s (ie hours or even days based on what I've seen).
 4 - No, spread it out to all the nodes. More on that below though.
 5 - No it will handle that itself. Again, more on that below though.

 Suggestions;
 Set your indexes to (factors of) 6 shards, ie one per node, it spreads 
 query performance. I say factors of in that you can set it to 12 shards 
 per index to start and easily scale the node count and still spread the 
 load.
 Split your stats and your log data into different indexes, it'll make 
 management and retention easier.
 You can consider a master only node or (ideally) three that also handle 
 queries.
 Preferably have an uneven number of master eligible nodes, whether you 
 make them VMs

A few questions about node types + usage

2014-08-13 Thread Alex
Hello I would like some clarification about node types and their usage. 

We will have 3 client nodes and 6 data nodes. The 6 1TB data nodes can also 
be masters (discovery.zen.minimum_master_nodes set to 4). We will use 
Logstash and Kibana. Kibana will be used 24/7 by between a couple and 
handfuls of people.

Some questions:

   1. Should incoming Logstash write requests be sent to the cluster in 
   general (using the *cluster* setting in the *elasticsearch* output) or 
   specifically to the client nodes or to the data nodes (via load balancer)? 
   I am unsure what kind of node is best for handling writes.
   
   2. If client nodes exist in the cluster are Kibana requests 
   automatically routed to them? Do I need to somehow specify to Kibana which 
   nodes to contact?
   
   3. I have heard different information about master nodes and the 
   minimum_master_node setting. I've heard that you should have a odd number 
   of master nodes but I fail to see why the parity of the number of masters 
   matters as long as minimum_master_node is set to at least N/2 + 1. Does it 
   really need to be odd?
   
   4. I have been advised that the client nodes will use huge amount of 
   memory (which makes sense due to the nature of the Kibana facet queries). 
   64GB per client node was recommended but I have no idea if that sounds 
   right or not. I don't have the ability to actually test it right now so any 
   more guidance on that would be helpful.

I'd be so grateful to hear from you even if you only know something about 
one of my queries.

Thank you for your time,
Alex

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fe5adb02-5cd6-4554-8993-28b8e24160fc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


LookupScript per shard modification (native scripting)

2014-08-04 Thread Alex S.V.
Hi,

I'm trying LookupScript example here:
https://github.com/imotov/elasticsearch-native-script-example/blob/master/src/main/java/org/elasticsearch/examples/nativescript/script/LookupScript.java

The idea of my script is to pre-cache all child documents in LookupScript 
instance, but I want to query only current shard data. Is it possible? So 
every shard instance caches only it's documents.

Regards,
Alex

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bada48a1-7b74-41a0-81a9-564b5061b605%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cosine Similarity ElasticSearch

2014-08-01 Thread Alex S.V.
Hi, 

I found some native script codings from Igor Motov 
here: 
https://github.com/imotov/elasticsearch-native-script-example/blob/master/src/main/java/org/elasticsearch/examples/nativescript/script/CosineSimilarityScoreScript.java

and now playing with it

Alex

On Friday, August 1, 2014 11:53:24 AM UTC+3, Federico Bianchi wrote:

 There is someone that can help us? 

 Thank you very much! 



 -- 
 View this message in context: 
 http://elasticsearch-users.115913.n3.nabble.com/Cosine-Similarity-ElasticSearch-tp4060620p4061039.html
  
 Sent from the ElasticSearch Users mailing list archive at Nabble.com. 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5b9025dd-0173-4b09-ae09-31a2f78e99d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Recommendations needed for large ELK system design

2014-08-01 Thread Alex
Ok thank you Mark, you've been extremely helpful and we now have a better 
idea about what we're doing!

-Alex

On Thursday, 31 July 2014 23:57:26 UTC+1, Mark Walkom wrote:

 1 - Curator FTW.
 2 - Masters handle cluster state, shard allocation and a whole bunch of 
 other stuff around managing the cluster and it's members and data. A node 
 that is master and data set to false is considered a search node. But the 
 role of being a master is not onerous, so it made sense for us to double up 
 the roles. We then just round robin any queries to these three masters.
 3 - Yes, butit's entirely dependent on your environment. If you're 
 happy with that and you can get the go-ahead then see where it takes you.
 4 - Quorum is automatic and having the n/2+1 means that the majority of 
 nodes will have to take place in an election, which reduces the possibility 
 of split brain. If you set the discovery settings then you are also 
 essentially setting the quorum settings.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com


 On 31 July 2014 22:27, Alex alex@gmail.com javascript: wrote:

 Hello Mark,

 Thank you for your reply, it certainly helps to clarify many things.

 Of course I have some new questions for you!

1. I haven't looked into it much yet but I'm guessing Curator can 
handle different index naming schemes. E.g. logs-2014.06.30 and 
stats-2014.06.30. We'd actually be wanting to store the stats data for 2 
years and logs for 90 days so it would indeed be helpful to split the 
 data 
into different index sets. Do you use Curator?

2. You say that you have 3 masters that also handle queries... but 
I thought all masters did was handle queries? What is a master node that 
*doesn't* handle queries? Should we have search load balancer nodes? 
AKA not master and not data nodes.

3. In the interests of reducing the number of node combinations for 
us to test out would you say, then, that 3 master (and query(??)) only 
nodes, and the 6 1TB data only nodes would be good?

4. Quorum and split brain are new to me. This webpage 

 http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/
  about 
split brain recommends setting *discovery.zen.minimum_master_nodes* equal 
to *N/2 + 1*. This formula is similar to the one given in the 
documentation for quorum 

 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency:
  
index operations only succeed if a quorum (replicas/2+1) of active 
 shards 
are available. I completely understand the split brain issue, but not 
quorum. Is quorum handled automatically or should I change some settings? 

 Thanks again for your help, we appreciate your time and knowledge!
 Regards,
 Alex


 On Thursday, 31 July 2014 05:57:35 UTC+1, Mark Walkom wrote:

 1 - Looks ok, but why two replicas? You're chewing up disk for what 
 reason? Extra comments below.
 2 - It's personal preference really and depends on how your end points 
 send to redis.
 3 - 4GB for redis will cache quite a lot of data if you're only doing 50 
 events p/s (ie hours or even days based on what I've seen).
 4 - No, spread it out to all the nodes. More on that below though.
 5 - No it will handle that itself. Again, more on that below though.

 Suggestions;
 Set your indexes to (factors of) 6 shards, ie one per node, it spreads 
 query performance. I say factors of in that you can set it to 12 shards 
 per index to start and easily scale the node count and still spread the 
 load.
 Split your stats and your log data into different indexes, it'll make 
 management and retention easier.
 You can consider a master only node or (ideally) three that also handle 
 queries.
 Preferably have an uneven number of master eligible nodes, whether you 
 make them VMs or physicals, that way you can ensure quorum is reached with 
 minimal fuss and stop split brain.
 If you use VMs for master + query nodes then you might want to look at 
 load balancing the queries via an external service.

 To give you an idea, we have a 27 node cluster - 3 masters that also 
 handle queries and 24 data nodes. Masters are 8GB with small disks, data 
 nodes are 60GB (30 heap) and 512GB disk.
 We're running with one replica and have 11TB of logging data. At a high 
 level we're running out of disk more than heap or CPU and we're very write 
 heavy, with an average of 1K events p/s and comparatively minimal reads.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 31 July 2014 01:35, Alex alex@gmail.com wrote:

  Hello,

 We wish to set up an entire ELK system with the following features:

- Input from Logstash shippers located on 400 Linux VMs. Only a 
handful of log sources on each

3rd party scoring service

2014-07-31 Thread Alex S.V.
Hello,

My idea is to use 3rd party scoring service (REST), and currently I'd like 
to use native scripts and play with NativeScriptFactory.
The approach has many drawbacks. 

Here is my problem - assume we have two entities - products and product 
prices. I should filter by price. 
Price is a complex thing, because it depends on many factors, like request 
date, remote user information, custom provided parameters. In case of 
regular parent - child relation and has_child query it's too complex and 
too slow to implement it using scripting (currently mvel).

Also one more condition - i have not many products - around 25K, and around 
25M different base price items (which are basic for future price 
calculation).
There are next ideas:
1. Have a service, which returns exact price for all product by custom 
parameters like. The drawback is - there should be 5 same calls from each 
shard (if 5 by default). In this case it doesn't matter, where base prices 
are stored - in elasticsearch index, in database or in in-memory storage. 
2. Write a code, which operates over child price documents on concrete 
shard. In this case it will generate prices only for all properties from 
particular shard. But I don't know, if I can access shard index or make 
calls to the index from concrete shard in NativeScriptFactory class. 

Could you point me the right way?

P.S. Initially I was interested in Redis-Elasticsearch 
example http://java.dzone.com/articles/connecting-redis-elasticsearch

Thanks,
Alex

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/893b22dc-1415-475b-8675-596119f4f1f8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: 3rd party scoring service

2014-07-31 Thread Alex S.V.
I think it's acceptable if service responds with 20ms and using some thrift 
protocol for example. It's much better then current 500ms - 5s calculations 
using elasticsearch scripting. 
If we have 25K products than it could be around 300Kb data package from 
this service. The risk is in possible broken communication or some 
increased latency

Alex

On Thursday, July 31, 2014 1:59:36 PM UTC+3, Itamar Syn-Hershko wrote:

 You should bring the price over to Elasticsearch and not the other way 
 around. Scoring against an external service is an added friction with huge 
 performance costs.

 --

 Itamar Syn-Hershko
 http://code972.com | @synhershko https://twitter.com/synhershko
 Freelance Developer  Consultant
 Author of RavenDB in Action http://manning.com/synhershko/


 On Thu, Jul 31, 2014 at 1:50 PM, Alex S.V. alexs.v...@gmail.com 
 javascript: wrote:

 Hello,

 My idea is to use 3rd party scoring service (REST), and currently I'd 
 like to use native scripts and play with NativeScriptFactory.
 The approach has many drawbacks. 

 Here is my problem - assume we have two entities - products and product 
 prices. I should filter by price. 
 Price is a complex thing, because it depends on many factors, like 
 request date, remote user information, custom provided parameters. In case 
 of regular parent - child relation and has_child query it's too complex and 
 too slow to implement it using scripting (currently mvel).

 Also one more condition - i have not many products - around 25K, and 
 around 25M different base price items (which are basic for future price 
 calculation).
 There are next ideas:
 1. Have a service, which returns exact price for all product by custom 
 parameters like. The drawback is - there should be 5 same calls from each 
 shard (if 5 by default). In this case it doesn't matter, where base prices 
 are stored - in elasticsearch index, in database or in in-memory storage. 
 2. Write a code, which operates over child price documents on concrete 
 shard. In this case it will generate prices only for all properties from 
 particular shard. But I don't know, if I can access shard index or make 
 calls to the index from concrete shard in NativeScriptFactory class. 

 Could you point me the right way?

 P.S. Initially I was interested in Redis-Elasticsearch example 
 http://java.dzone.com/articles/connecting-redis-elasticsearch

 Thanks,
 Alex

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/893b22dc-1415-475b-8675-596119f4f1f8%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/893b22dc-1415-475b-8675-596119f4f1f8%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c61f9637-3de8-4906-a2c4-49055dee2cd5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Recommendations needed for large ELK system design

2014-07-31 Thread Alex
Hello Mark,

Thank you for your reply, it certainly helps to clarify many things.

Of course I have some new questions for you!

   1. I haven't looked into it much yet but I'm guessing Curator can handle 
   different index naming schemes. E.g. logs-2014.06.30 and 
   stats-2014.06.30. We'd actually be wanting to store the stats data for 2 
   years and logs for 90 days so it would indeed be helpful to split the data 
   into different index sets. Do you use Curator?
   
   2. You say that you have 3 masters that also handle queries... but I 
   thought all masters did was handle queries? What is a master node that 
   *doesn't* handle queries? Should we have search load balancer nodes? AKA 
   not master and not data nodes.
   
   3. In the interests of reducing the number of node combinations for us 
   to test out would you say, then, that 3 master (and query(??)) only nodes, 
   and the 6 1TB data only nodes would be good?
   
   4. Quorum and split brain are new to me. This webpage 
   
http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/
 about 
   split brain recommends setting *discovery.zen.minimum_master_nodes* equal 
   to *N/2 + 1*. This formula is similar to the one given in the 
   documentation for quorum 
   
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency:
 
   index operations only succeed if a quorum (replicas/2+1) of active shards 
   are available. I completely understand the split brain issue, but not 
   quorum. Is quorum handled automatically or should I change some settings?

Thanks again for your help, we appreciate your time and knowledge!
Regards,
Alex

On Thursday, 31 July 2014 05:57:35 UTC+1, Mark Walkom wrote:

 1 - Looks ok, but why two replicas? You're chewing up disk for what 
 reason? Extra comments below.
 2 - It's personal preference really and depends on how your end points 
 send to redis.
 3 - 4GB for redis will cache quite a lot of data if you're only doing 50 
 events p/s (ie hours or even days based on what I've seen).
 4 - No, spread it out to all the nodes. More on that below though.
 5 - No it will handle that itself. Again, more on that below though.

 Suggestions;
 Set your indexes to (factors of) 6 shards, ie one per node, it spreads 
 query performance. I say factors of in that you can set it to 12 shards 
 per index to start and easily scale the node count and still spread the 
 load.
 Split your stats and your log data into different indexes, it'll make 
 management and retention easier.
 You can consider a master only node or (ideally) three that also handle 
 queries.
 Preferably have an uneven number of master eligible nodes, whether you 
 make them VMs or physicals, that way you can ensure quorum is reached with 
 minimal fuss and stop split brain.
 If you use VMs for master + query nodes then you might want to look at 
 load balancing the queries via an external service.

 To give you an idea, we have a 27 node cluster - 3 masters that also 
 handle queries and 24 data nodes. Masters are 8GB with small disks, data 
 nodes are 60GB (30 heap) and 512GB disk.
 We're running with one replica and have 11TB of logging data. At a high 
 level we're running out of disk more than heap or CPU and we're very write 
 heavy, with an average of 1K events p/s and comparatively minimal reads.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com


 On 31 July 2014 01:35, Alex alex@gmail.com javascript: wrote:

 Hello,

 We wish to set up an entire ELK system with the following features:

- Input from Logstash shippers located on 400 Linux VMs. Only a 
handful of log sources on each VM. 
- Data retention for 30 days, which is roughly 2TB of data in indexed 
ES JSON form (not including replica shards)
- Estimated input data rate of 50 messages per second at peak hours. 
Mostly short or medium length one-line messages but there will be Java 
traces and very large service responses (in the form of XML) to deal with 
too. 
- The entire system would be on our company LAN.
- The stored data will be a mix of application logs (info, errors 
etc) and server stats (CPU, memory usage etc) and would mostly be 
 accessed 
through Kibana. 

 This is our current plan:

- Have the LS shippers perform minimal parsing (but would do 
multiline). Have them point to two load-balanced servers containing Redis 
and LS indexers (which would do all parsing). 
- 2 replica shards for each index, which ramps the total data storage 
up to 6TB
- ES cluster spread over 6 nodes. Each node is 1TB in size 
- LS indexers pointing to cluster.

 So I have a couple questions regarding the setup and would greatly 
 appreciate the advice of someone with experience!

1. Does the balance between the number of nodes, the number of 
replica

Recommendations needed for large ELK system design

2014-07-30 Thread Alex
Hello,

We wish to set up an entire ELK system with the following features:

   - Input from Logstash shippers located on 400 Linux VMs. Only a handful 
   of log sources on each VM.
   - Data retention for 30 days, which is roughly 2TB of data in indexed ES 
   JSON form (not including replica shards)
   - Estimated input data rate of 50 messages per second at peak hours. 
   Mostly short or medium length one-line messages but there will be Java 
   traces and very large service responses (in the form of XML) to deal with 
   too.
   - The entire system would be on our company LAN.
   - The stored data will be a mix of application logs (info, errors etc) 
   and server stats (CPU, memory usage etc) and would mostly be accessed 
   through Kibana.

This is our current plan:

   - Have the LS shippers perform minimal parsing (but would do multiline). 
   Have them point to two load-balanced servers containing Redis and LS 
   indexers (which would do all parsing).
   - 2 replica shards for each index, which ramps the total data storage up 
   to 6TB
   - ES cluster spread over 6 nodes. Each node is 1TB in size
   - LS indexers pointing to cluster.

So I have a couple questions regarding the setup and would greatly 
appreciate the advice of someone with experience!

   1. Does the balance between the number of nodes, the number of replica 
   shards, and storage size of each node seem about right? We use 
   high-performance equipment and would expect minimal downtime.
   
   2. What is your recommendation for the system design of the LS indexers 
   and Redis? I've seen various designs with each indexer assigned to a single 
   Redis, or all indexers reading from all Redises.
   
   3. Leading from the previous question, what would your recommend data 
   size for the Redis servers be?
   
   4. Not sure what to do about master/data nodes. Assuming all the nodes 
   are on identical hardware would it be beneficial to have a node which is 
   only a master which would only handle requests?
   
   5. Do we need to do any additional load balancing on the ES nodes?

We are open to any and all suggestions. We have not yet committed to any 
particular design so can change if needed.

Thank you for your time and responses,
Alex

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b0aee66a-35bb-4770-927b-d9c7e13ad9fc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


When to use multiple clusters

2014-07-23 Thread Alex Kehayias
I have several large indices (100M docs) on the same cluster. Is there any 
advice of when it is appropriate to separate into multiple clusters vs one 
large one? Each index has a slightly different usage profile (read vs write 
heavy, update vs insert). How many indices would you recommend for a single 
cluster? Is it ok to have many large indices on the same cluster? 

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/24ebf4dc-f281-4574-8cbb-cb049c4fac71%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: When to use multiple clusters

2014-07-23 Thread Alex Kehayias
Thanks Mark! We're deploying on EC2 (always a good time). Seems like the 
mixture of different indices that have different usage profiles is leading 
to some performance issues that a dedicated cluster would be more 
appropriate for.


On Wednesday, July 23, 2014 7:04:34 PM UTC-4, Mark Walkom wrote:

 Depends what your hardware profiles are like, and a bunch of other things 
 related to you and your environment.
 eg If you have high end servers then it makes sense to put your heavy 
 read/write indexes into a cluster on those, then leave the rest for more 
 average machines.

 We have multiple clusters based on use. One for an application text based 
 search, one for application logging, one for system logging and we're going 
 to spin up another one for a new project we're starting. This might sound 
 like a waste of resources, and it probably is to a degree, but we have the 
 infrastructure for it and it makes things easier to manage.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com


 On 24 July 2014 00:34, Alex Kehayias al...@shareablee.com javascript: 
 wrote:

 I have several large indices (100M docs) on the same cluster. Is there 
 any advice of when it is appropriate to separate into multiple clusters vs 
 one large one? Each index has a slightly different usage profile (read vs 
 write heavy, update vs insert). How many indices would you recommend for a 
 single cluster? Is it ok to have many large indices on the same cluster? 

 Thanks!

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/24ebf4dc-f281-4574-8cbb-cb049c4fac71%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/24ebf4dc-f281-4574-8cbb-cb049c4fac71%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6571673c-472f-4013-9608-d511a9f66d86%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


elasticsearch dynamic scripting vs static script - deployment

2014-07-11 Thread Alex S.V.
Hi,

We've been also hacked on our staging server because of opened ports :)
I find dynamic scripting flexible for applications, but static scripting 
causes bunch of problems:

1. I should deploy it in special directory at elasticsearch node? We are 
using capistrano for web-app deployment and it's easy procedure, though we 
should provide additional access to elasticsearch node filesystem
2. I don't know, how to support script versions? just append _v1, _v2, etc. 
suffixes in filename?
3. Should I deploy on one node, or on each node? If I must deploy on each 
node - what happens if one node has a script, and other doesn't have?

Regards,
Alex


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/20681e2f-bb8b-4602-8b19-ed27b661a88b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


help resolving a classpath problem(?) with elasticsearch 1.2.1 in apache-storm

2014-06-23 Thread Alex Lovell-Troy
I'm using apache-storm as a data pipeline that indexes results with 
elasticsearch.  Using the latest I can find of all components, I get an 
error any time a storm component attempts to join elasticsearch as a Node 
client, which I believe will give me better performance than 
TransportClient.

Caused by: java.lang.IllegalArgumentException: A SPI class of type 
org.apache.lucene.codecs.PostingsFormat with name 'Direct' does not exist. 
You need to add the corresponding JAR file supporting this SPI to your 
classpath.The current classpath supports the following names: 
[XBloomFilter, es090, completion090]

According to https://github.com/elasticsearch/elasticsearch/issues/3350 
this is just how the SPI loader stuff works that lucene uses

I tried following the directions in the issue, but even with the shade 
plugin, I'm still seeing the same thing.

Does anyone have experience with this that can share a pom.xml snippet or 
guide me to some applicable docs?

-alex

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9b9b9e6c-ea3d-4de9-b7aa-53c4bbd40586%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Need help on similarity ranking approach

2014-05-29 Thread Alex Ksikes
Hello,

I am not sure that would work. I'd first index you document, and then use 
mlt with this document id and include set to true (added in latest ES 
release). Then you'll know how far your documents are from the queried 
document. Also, make sure to pick up most of the terms, by 
setting percent_terms_to_match=0, max_query_terms=high value 
and min_doc_freq=1. In order to know what terms from the queried document 
have matched in the response, you can use explain.

Alex

On Thursday, May 29, 2014 10:42:47 AM UTC+2, Rgs wrote:

 hi, 

 What i did now is, i have created a custom similarity  similarity 
 provider 
 class which extends DefaultSimilarity and AbstractSimilarityProvider 
 classes 
 respectively and overridden the idf() method to return 1. 

 Now I'm getting some percentage values like 1, 0.987, 0.876 etc and 
 interpret it as 100%, 98%, 87% etc. 

 Can you please confirm whether this approach can be taken for finding the 
 percentage of similarity? 

 sorry for the late reply. 

 Thanks 
 Rgs 



 -- 
 View this message in context: 
 http://elasticsearch-users.115913.n3.nabble.com/Need-help-on-similarity-ranking-approach-tp4054847p4056680.html
  
 Sent from the ElasticSearch Users mailing list archive at Nabble.com. 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/184a015f-fe68-4a24-999b-367d60d23798%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Need help on similarity ranking approach

2014-05-29 Thread Alex Ksikes
Also this plugin could provide a solution to your problem:

http://yannbrrd.github.io/

On Thursday, May 29, 2014 10:42:47 AM UTC+2, Rgs wrote:

 hi, 

 What i did now is, i have created a custom similarity  similarity 
 provider 
 class which extends DefaultSimilarity and AbstractSimilarityProvider 
 classes 
 respectively and overridden the idf() method to return 1. 

 Now I'm getting some percentage values like 1, 0.987, 0.876 etc and 
 interpret it as 100%, 98%, 87% etc. 

 Can you please confirm whether this approach can be taken for finding the 
 percentage of similarity? 

 sorry for the late reply. 

 Thanks 
 Rgs 



 -- 
 View this message in context: 
 http://elasticsearch-users.115913.n3.nabble.com/Need-help-on-similarity-ranking-approach-tp4054847p4056680.html
  
 Sent from the ElasticSearch Users mailing list archive at Nabble.com. 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d4a2ee12-b9af-4142-a2e9-71b85cc9141c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Queries seem to ignore custom analyzer

2014-05-24 Thread Alex Philipp
I'm using a custom analyzer to stem possessive english.  My custom analyzer 
seems to be ignored.  As a sample search, we'll use McDonald's.

What I used to create my analyzer:

{
  settings: {
analysis: {
  analyzer: {
default: {
  type: custom,
  tokenizer: standard,
  filter: [
standard,
lowercase,
stop,
pos_english
  ]
}
  },
  filter: {
pos_english: {
  type: stemmer,
  name: possessive_english
}
  }
}
  }
}

My mapping:

{
  item: {
_boost: {
  name: custom_boost,
  null_value: 1
},
properties: {
  servings: {
enabled: false,
type: object
  },
  brand_name: {
index: analyzed,
type: string,
store: false
  },
  food_name: {
index: analyzed,
type: string,
store: false
  }
}
  }
} 


When I test the analyzer for the text 'McDonald's', it seems to work 
properly:

{
  tokens: [
{
  token: mcdonald,
  start_offset: 0,
  end_offset: 10,
  type: ALPHANUM,
  position: 1
}
  ]
}

However, if I search for 'McDonald', I get no results.  If I search for 
 'McDonald's' (with the possessive), I get my expected results.  It seems 
like the analyzer is being ignored during the query.

Search query that returns no results:

{
  query: {
match: {
  _all: {
query: mcdonalds
  }
}
  }
}

 Any idea what I'm doing wrong?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/93e3e3b3-1fc7-443d-970e-47bb43c757e4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Object interpolation in template queries

2014-05-17 Thread Alex G
Hello,

I'm interested in using query templates - 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-template-query.html
 
- however for my purposes I was hoping ES treaded queries as simple strings 
for the purpose of mustache interpretation and I would to be able to 
substitute in parameters more complex than partial strings - for example to 
define a param as the contents of a value and then pass in an arbitrary 
object. 

IE something along the lines of 
{
  query: {
template: {
  query: {
filtered: { 
  filter : {
and : {{ filters }}
  }
}
  },
  params : {
filters : [
  {terms : { foo : [a,b ] } },
  {terms : { bar : [q,z ] } }
]
  }
}
  }
}


Experimentation suggests this isn't supported but I understand that the 
query templates system is somewhat under construction or review - are there 
plans to offer support for passing in entire parts of queries via params or 
should I look at doing this kind of interpolation before the query gets to 
ES? Or is this possible and I'm simply doing it wrong?

Thanks,
Alex

-- 


--

*CONFIDENTIALITY NOTICE:* The information contained in this message may be 
privileged and/or confidential. It is the property of CrowdStrike.  If you are 
not the intended recipient, or responsible for delivering this message to the 
intended recipient, any review, forwarding, dissemination, distribution or 
copying of this communication or any attachment(s) is strictly prohibited. If 
you have received this message in error, please notify the sender immediately, 
and delete it and all attachments from your computer and network.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8b83afe9-aa92-4751-8178-2c33bbc94428%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: more like this on numbers

2014-05-08 Thread Alex Ksikes
Hi Valentin,

For these types of searches, have you looked into range queries, perhaps
combined in a boolean query?

Alex
On May 7, 2014 4:14 PM, Valentin plet...@gmail.com wrote:

 Hi Alex,

 thanks. Good idea to convert the numbers into strings. But converting the
 number fields to string won't exactly solve my problem. Only if there would
 be an analyzer which breaks down numbers into multiple tokens. Eg 300 into
 100, 200, 300

 Cheers,
 Valentin

 On Tuesday, May 6, 2014 12:04:53 PM UTC+2, Alex Ksikes wrote:

 Hi Valentin,

 As you know, you can only perform mlt on fields which are analyzed.
 However, you can convert your other fields (number, ..) to text using a
 multi field with type string at indexing time.

 Cheers,

 Alex

 On Thursday, March 27, 2014 4:31:58 PM UTC+1, Valentin wrote:

 Hi,

 as far as I understand it the more like this query allows to find
 documents where the same tokens are used. I wonder if there is a
 possibility to find documents where a particular field is compared based on
 its value (number).

 Regards
 Valentin

 PS: elasticsearch rocks!

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/Wsye6JD__ys/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/195f8fa2-821f-4556-b9ae-8924b35c859f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/195f8fa2-821f-4556-b9ae-8924b35c859f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAMrXmPdWStJjTaW5%3D27MrMNLHPkK1hihgrs%3DDs-SAiHzHz9eAQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: MoreLikeThis can't identify that 2 documents with exactly same attachments are duplicates

2014-05-08 Thread Alex Ksikes
On May 8, 2014 8:09 AM, Zoran Jeremic zoran.jere...@gmail.com wrote:

 Hi Alex,

 Thank you for this explanation. This really helped me to understand how
it works, and now I managed to get results I was expecting just after
setting max_query_terms value to be 0 or some very high value. With these
results in my tests I was able to identify duplicates. I noticed couple of
things though.

 - I got much better results with web pages when I indexed attachment as
html source and use text extracted by Jsoup in query, then when I indexed
text extracted from web page as attachment and used text in query. I
suppose that difference is related to the fact that Jsoup did not extract
text in the same way as Tika parser used by ES did.
 - There was significant improvement in the results in the second test
when I have indexed 50 web pages, then in first test when I indexed 10 web
pages. I deleted index before each test. I suppose that this is related to
the tf*idf.
 If so, does it make sense to provide some training set for elasticsearch
that will be used to populate index before system is started to be used?

Perhaps you are asking for a background dataset to bias the selection of
interesting terms. This could make sense depending on your application.

 Could you please define relevant in your setting? In a corpus of very
similar documents, is your goal to find the ones which are oddly different?
Have you looked into ES significant terms?
 I have the service that recommends documents to the students based on
their current learning context. It creates tokenized string from titles,
descriptions and keywords of the course lessons student is working at the
moment. I'm using this string as input to the mlt_like_text to find some
interesting resources that could help them.
 I want to avoid having duplicates (or very similar documents) among top
documents that are recommended.
 My idea was that during the documents uploading (before I index it with
elasticsearch) I find if there already exists it's duplicate, and store
this information as ES document field. Later, in query I can specify that
duplicates are not recommended.

 Here you should probably strip the html tags, and solely index the text
in its own field.
 As I already mentioned this didn't give me good results for some reason.

 Do you think this approach would work fine with large textual documents,
e.g. pdf documents having couple of hundred of pages? My main concern is
related to performances of these queries using like_text, so that's why I
was trying to avoid this approach and use mlt with document id as input.

I don't think this approach would work well in this case, but you should
try. I think what you are after is to either extract good features for your
PDF documents and search on that, or finger printing. This could be
achieved by playing with analyzers.

 Thanks,
 Zoran



 On Wednesday, 7 May 2014 06:14:56 UTC-7, Alex Ksikes wrote:

 Hi Zoran,

 In a nutshell 'more like this' creates a large boolean disjunctive query
of 'max_query_terms' number of interesting terms from a text specified in
'like_text'. The interesting terms are picked up with respect to the their
tf-idf scores in the whole corpus. These later parameters could be tuned
with 'min_term_freq', 'min_doc_freq', and 'min_doc_freq' parameters. The
number of boolean clauses that must match is controlled by
'percent_terms_to_match'. In the case of specifying only one field in
'fields', the analyzer used to pick up the terms in 'like_text' is the one
associated with the field, unless specified specified by 'analyzer'. So as
an example, the default is to create a boolean query of 25 interesting
terms where only 30% of the should clauses must match.

 On Wednesday, May 7, 2014 5:14:11 AM UTC+2, Zoran Jeremic wrote:

 Hi Alex,


 If you are looking for exact duplicates then hashing the file content,
and doing a search for that hash would do the job.
 This trick won't work for me as these are not exact duplicates. For
example, I have 10 students working on the same 100 pages long word
document. Each of these students could change only one sentence and upload
a document. The hash will be different, but it's 99,99 % same documents.
 I have the other service that uses mlt_like_text to recommend some
relevant documents, and my problem is if this document has best score, then
all duplicates will be among top hits and instead recommending users with
several most relevant documents I will recommend 10 instances of same
document.


 Could you please define relevant in your setting? In a corpus of very
similar documents, is your goal to find the ones which are oddly different?
Have you looked into ES significant terms?


 If you are looking for near duplicates, then I would recommend
extracting whatever text you have in your html, pdf, doc, indexing that and
running more like this with like_text set to that content.
 I tried that as well, and results are very disappointing, though I'm
not sure if that would

Re: How to find the difference between aggregate min from aggregate max(max - min) in ES?

2014-05-07 Thread Alex Mathew
Thank you Adrien Grand for reply.
Is it possible to use aggregate functions inside script??

On Wednesday, May 7, 2014 5:31:20 PM UTC+5:30, Adrien Grand wrote:

 Hi,

 There is no way to do it on the Elasticsearch side for the moment. It can 
 only be done on client side.


 On Wed, May 7, 2014 at 1:37 PM, Alex Mathew 
 alexmathe...@gmail.comjavascript:
  wrote:

 How to write an ES query to find the difference between max and min value 
 of a field?

 I am a newbee in elastic search, In my case I feed lot of events along 
 with session_id and time in to elastic search. My event structure is

 Event_name string
 Client_id  string
 App_id string
 Session_id string
 User_idstring
 Ip_address string
 Latitude   int64 
 Longitude  int64 
 Event_time time.Time 


 I want to find the life time of a session_id based the feeded events. For 
 that I can retrive the maximum Event_time and minimum Event_time for a 
 particular session_id by the following ES query.

 {  
   size: 0,
   query: {
  match: {
 Session_id: dummySessionId
  }
   },
aggs: {
   max_time: {
  max: {
field: Time
   }
},
min_time:{
   min: {
 field: Time
   }
}
 }
   }


 But what I exact want is (max_time - min_time) How to write the ES query for 
 the same

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/1e937884-4052-4a5a-91db-bc1449c43efe%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/1e937884-4052-4a5a-91db-bc1449c43efe%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




 -- 
 Adrien Grand
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ab72a9e2-60d4-4865-9c71-351b79322f29%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to find the difference between aggregate min from aggregate max(max - min) in ES?

2014-05-07 Thread Alex Mathew
Thank you Adrien Grand for reply.
Is it possible to use aggregate functions inside script??

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c7a1d6a8-1bb7-472c-9be1-7da4d9327e3e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: MoreLikeThis can't identify that 2 documents with exactly same attachments are duplicates

2014-05-07 Thread Alex Ksikes
Hi Zoran,

In a nutshell 'more like this' creates a large boolean disjunctive query of 
'max_query_terms' number of interesting terms from a text specified in 
'like_text'. The interesting terms are picked up with respect to the their 
tf-idf scores in the whole corpus. These later parameters could be tuned 
with 'min_term_freq', 'min_doc_freq', and 'min_doc_freq' parameters. The 
number of boolean clauses that must match is controlled by 
'percent_terms_to_match'. In the case of specifying only one field in 
'fields', the analyzer used to pick up the terms in 'like_text' is the one 
associated with the field, unless specified specified by 'analyzer'. So as 
an example, the default is to create a boolean query of 25 interesting 
terms where only 30% of the should clauses must match.

On Wednesday, May 7, 2014 5:14:11 AM UTC+2, Zoran Jeremic wrote:

 Hi Alex,


 If you are looking for exact duplicates then hashing the file content, and 
 doing a search for that hash would do the job.
 This trick won't work for me as these are not exact duplicates. For 
 example, I have 10 students working on the same 100 pages long word 
 document. Each of these students could change only one sentence and upload 
 a document. The hash will be different, but it's 99,99 % same documents. 
 I have the other service that uses mlt_like_text to recommend some 
 relevant documents, and my problem is if this document has best score, then 
 all duplicates will be among top hits and instead recommending users with 
 several most relevant documents I will recommend 10 instances of same 
 document. 


Could you please define relevant in your setting? In a corpus of very 
similar documents, is your goal to find the ones which are oddly different? 
Have you looked into ES significant terms?
 

 If you are looking for near duplicates, then I would recommend extracting 
 whatever text you have in your html, pdf, doc, indexing that and running 
 more like this with like_text set to that content.
 I tried that as well, and results are very disappointing, though I'm not 
 sure if that would be good idea having in mind that long textual documents 
 could be used. For testing purpose, I made a simple test with 10 web pages. 
 Maybe I'm making some mistake there. What I did is to index 10 web pages 
 and store it in document as attachment. Content is stored as byte[]. Then 
 I'm using the same 10 pages, extract content using Jsoup, and try to find 
 similar web pages. Here is the code that I used to find similar web pages 
 to the provided one:
 System.out.println(Duplicates for link:+link);
  System.out.println(
 );
  String indexName=ESIndexNames.INDEX_DOCUMENTS;
  String indexType=ESIndexTypes.DOCUMENT;
  String mapping = copyToStringFromClasspath(
 /org/prosolo/services/indexing/document-mapping.json);
  client.admin().indices().putMapping(putMappingRequest(
 indexName).type(indexType).source(mapping)).actionGet();
  URL url = new URL(link);
 org.jsoup.nodes.Document doc=Jsoup.connect(link).get();
   String html=doc.html(); //doc.text();
  QueryBuilder qb = null;
  // create the query
  qb = QueryBuilders.moreLikeThisQuery(file)
  .likeText(html).minTermFreq(0).minDocFreq(0);
  SearchResponse sr = client.prepareSearch(ESIndexNames.
 INDEX_DOCUMENTS)
  .setQuery(qb).addFields(url, title, contentType
 )
  .setFrom(0).setSize(5).execute().actionGet();
  if (sr != null) {
  SearchHits searchHits = sr.getHits();
  IteratorSearchHit hitsIter = searchHits.iterator();
  while (hitsIter.hasNext()) {
  SearchHit searchHit = hitsIter.next();
  System.out.println(Duplicate: + searchHit.getId()
  +  title:+searchHit.getFields().get(url).
 getValue()+ score: + searchHit.getScore());
   }
  }

 And results of the execution of this for each of 10 urls is:
  
 Duplicates for link:http://en.wikipedia.org/wiki/Mathematical_logic
 
 Duplicate:Crwk_36bTUCEso1ambs0bA URL:http://
 en.wikipedia.org/wiki/Mathematical_logic score:0.3335998
 Duplicate:--3l-WRuQL2osXg71ixw7A URL:http://
 en.wikipedia.org/wiki/Chemistry score:0.16319205
 Duplicate:8dDa6HsBS12HrI0XgFVLvA URL:http://
 en.wikipedia.org/wiki/Formal_science score:0.13035104
 Duplicate:1APeDW0KQnWRv_8mihrz4A 
 URL:http://en.wikipedia.org/wiki/Starscore:0.12292466
 Duplicate:2NElV2ULQxqcbFhd2pVy0w URL:http://
 en.wikipedia.org/wiki/Crystallography score:0.117023855

 Duplicates for link:http://en.wikipedia.org/wiki/Mathematical_statistics
 
 Duplicate:Crwk_36bTUCEso1ambs0bA URL:http://
 en.wikipedia.org/wiki

Re: MoreLikeThis can't identify that 2 documents with exactly same attachments are duplicates

2014-05-06 Thread Alex Ksikes
Hi Zoran,

If you are looking for exact duplicates then hashing the file content, and 
doing a search for that hash would do the job. If you are looking for near 
duplicates, then I would recommend extracting whatever text you have in 
your html, pdf, doc, indexing that and running more like this with 
like_text set to that content. Additionally you can perform a mlt search on 
more fields including the meta-data fields extracted with the attachment 
plugin. Hope this helps.

Alex

On Monday, May 5, 2014 8:08:30 PM UTC+2, Zoran Jeremic wrote:

 Hi Alex,

 Thank you for your explanation. It makes sense now. However, I'm not sure 
 I understood your proposal. 

 So I would adjust the mlt_fields accordingly, and possibly extract the 
 relevant portions of texts manually
 What do you mean by adjusting mlt_fields? The only shared field that is 
 guaranteed to be same is file. Different users could add different titles 
 to documents, but attach same or almost the same documents. If I compare 
 documents based on the other fields, it doesn't mean that it will match, 
 even though attached files are exactly the same.
 I'm also not sure what did you mean by extract the relevant portions of 
 text manually. How would I do that and what to do with it?

 Thanks,
 Zoran
  

 On Monday, 5 May 2014 01:23:49 UTC-7, Alex Ksikes wrote:

 Hi Zoran,

 Using the attachment type, you can text search over the attached document 
 meta-data, but not its actual content, as it is base 64 encoded. So I would 
 adjust the mlt_fields accordingly, and possibly extract the relevant 
 portions of texts manually. Also set percent_terms_to_match = 0, to ensure 
 that all boolean clauses match. Let me know how this works out for you.

 Cheers,

 Alex

 On Monday, May 5, 2014 5:50:07 AM UTC+2, Zoran Jeremic wrote:

 Hi guys,

 I have a document that stores a content of html file, pdf, doc  or other 
 textual document in one of it's fields as byte array using attachment 
 plugin. Mapping is as follows:

 { document:{
 properties:{
  title:{type:string,store:true },
  description:{type:string,store:yes},
  contentType:{type:string,store:yes},
  url:{store:yes, type:string},
   visibility: { store:yes, type:string},
   ownerId: {type: long,   store:yes },
   relatedToType: { type: string, store:yes },
   relatedToId: {type: long, store:yes },
   file:{
 path: full,type:attachment,
 fields:{
 author: { type: string },
 title: { store: true,type: string },
 keywords: { type: string },
 file: { store: true, term_vector: 
 with_positions_offsets,type: string },
 name: { type: string },
 content_length: { type: integer },
 date: { format: dateOptionalTime, type: 
 date },
 content_type: { type: string }
 }
 }}
 And the code I'm using to store the document is:

 VisibilityType.PUBLIC

 These files seems to be stored fine and I can search content. However, I 
 need to identify if there are duplicates of web pages or files stored in 
 ES, so I don't return the same documents to the user as search or 
 recommendation result. My expectation was that I could use MoreLikeThis 
 after the document was indexed to identify if there are duplicates of that 
 document and accordingly to mark it as duplicate. However, results look 
 weird for me, or I don't understand very well how MoreLikeThis works.

 For example, I indexed web page http://en.wikipedia.org/wiki/Linguistics3 
 times, and all 3 documents in ES have exactly the same binary content 
 under file. Then for the following query:

 http://localhost:9200/documents/document/WpkcK-ZjSMi_l6iRq0Vuhg/_mlt?mlt_fields=filemin_doc_freq=1
 where ID is id of one of these documents I got these results:
 http://en.wikipedia.org/wiki/Linguistics with score 0.6633003
 http://en.wikipedia.org/wiki/Linguistics with score 0.6197818
 http://en.wikipedia.org/wiki/Computational_linguistics with score 
 0.48509508
 ...

 For some other examples, scores for the same documents are much lower, 
 and sometimes (though not that often) I don't get duplicates on the first 
 positions. I would expect here to have score 1.0 or higher for documents 
 that are exactly the same, but it's not the case, and I can't figure out 
 how could I identify if there are duplicates in the Elasticsearch index.

 I would appreciate if somebody could explain if this is expected 
 behaviour or I didn't use it properly.

 Thanks,
 Zoran



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit

Re: Need help on similarity ranking approach

2014-05-06 Thread Alex Ksikes
Hello,

What you want to know is the score of the document that has matched itself 
using more like this. The API excludes the queried document. However, it is 
equivalent to running a boolean query of more like this field for each of 
the queried document field. This will give you as top result, the document 
that has matched itself, so that you can compute the percentage of 
similarity of the remaining matched documents.

Alex

On Friday, May 2, 2014 3:22:34 PM UTC+2, Rgs wrote:

 Thanks Binh Ly and Ivan Brusic for your replies. 

 I need to find the similarity in percentage of a document against other 
 documents and this will be considered for grouping the documents. 

 is it possible to get the similarity percentage using more like this 
 query? 
 or is any other way to calculate the percentage of similarity from the 
 query 
 result? 

 Eg:  document1 is 90% similar to document2. 
   document1 is 45% similar to document3 
   etc.. 

 Thanks 



 -- 
 View this message in context: 
 http://elasticsearch-users.115913.n3.nabble.com/Need-help-on-similarity-ranking-approach-tp4054847p4055227.html
  
 Sent from the ElasticSearch Users mailing list archive at Nabble.com. 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/05db016b-1c2e-497c-9275-37dcccedfae3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: MoreLikeThis ignores queries?

2014-05-06 Thread Alex Ksikes
Hello Alexey,

You should use the query DSL and not the more like this API. You can create 
a boolean query where one clause is your more like this query and the other 
one is your ignore category query (better use a filter here if you can). 

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html

However, more like this of the DSL only takes a like_text parameter, you 
cannot pass the id of the document. This will change in a subsequent 
version of ES. For now, to simulate this functionality, you can use 
multiple mlt queries with a like_text set to the value of each field of the 
queried document, inside a boolean query. Let me know if this helps.

Alex

On Wednesday, March 19, 2014 5:01:06 AM UTC+1, Alexey Bagryancev wrote:

 Anyone can help me? It really does not work...

 среда, 19 марта 2014 г., 2:05:49 UTC+7 пользователь Alexey Bagryancev 
 написал:

 Hi,

 I am trying to filter moreLikeThis results by adding additional query - 
 but it seems to ignore it at all.

 I tried to run my ignoreQuery separately and it works fine, but how to 
 make it work with moreLikeThis? Please help me.

 $ignoreQuery = $this-IgnoreCategoryQuery('movies')



 $this-resultsSet = $this-index-moreLikeThis(
new \Elastica\Document($id), 
array_merge($this-mlt_fields, array('search_size' = $this-
 size, 'search_from' = $this-from)), 
$ignoreQuery);



 My IgnoreCategory function:

 public function IgnoreCategoryQuery($category = 'main') 
 { 
  $categoriesTermQuery = new \Elastica\Query\Term();
  $categoriesTermQuery-setTerm('categories', $category);
  
  $categoriesBoolQuery = new \Elastica\Query\Bool(); 
  $categoriesBoolQuery-addMustNot($categoriesTermQuery);
  
  return $categoriesBoolQuery;
 }




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e605d6e2-b42b-4661-b819-90735a9581ec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elastic Search MLT API, how to use fields with weights.

2014-05-06 Thread Alex Ksikes
I'd like to add to this that mlt API is the same as a boolean query DSL 
made of multiple more like this field clauses, where each field is set to 
the content of the field of the queried document.

On Thursday, February 20, 2014 4:20:36 PM UTC+1, Binh Ly wrote:

 I do not believe you can boost individual fields/terms separately in a MLT 
 query. Your best bet is to probably run a bool query of multiple MLT 
 queries each with a different field and boost, but you'll need to first 
 extract the MLT text before you can do this.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2fcd0453-58dc-4a66-b7d9-2e785a2a7fa6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Interesting Terms for MoreLikeThis Query in ElasticSearch

2014-05-06 Thread Alex Ksikes
You could always use explain to find out the best matching terms of any 
query. In order to get all the interesting terms, you could run a query 
where the top result document has matched itself.

Also the new significant terms might be of interest to you:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html

On Thursday, January 30, 2014 9:59:02 PM UTC+1, api...@clearedgeit.com 
wrote:

 I have been trying to figure out how to get interesting terms using the 
 MLT query.  Does ElasticSearch have this functionality similar to solr or 
 if not, is there a work around?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/201edd47-d5d1-4fcf-a520-184737b6b7ec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: More like this scoring algorithm unclear

2014-05-06 Thread Alex Ksikes
Hi Maarten,

Your 'like_text' is analyzed, the same way your 'product_id' field is 
analyzed, unless specified by 'analyzer'. I would recommend setting 
'percent_terms_to_match' to 0. However, if you are only searching over 
product ids then a simple boolean query would do. If not, then I would 
create a boolean query where each clause is a 'more like this field' for 
each field of the queried document. This is actually what the mlt API does.

Cheers,

Alex

On Wednesday, January 8, 2014 7:20:05 PM UTC+1, Maarten Roosendaal wrote:

 scoring algorithm is still vague but i got the query to act like the API, 
 although the results are different so i'm still doing it wrong, here's an 
 example:
 {
   explain: true,
   query: {
 more_like_this: {
   fields: [
 PRODUCT_ID
   ],
   like_text: 104004855475 1001004002067765 100200494210 
 1002004004499883,
   min_term_freq: 1,
   min_doc_freq: 1,
   max_query_terms: 1,
   percent_terms_to_match: 0.5
 }
   },
   from: 0,
   size: 50,
   sort: [],
   facets: {}
 }

 the like_text contains product_id's from a wishlist for which i want to 
 find similair lists

 Op woensdag 8 januari 2014 16:50:53 UTC+1 schreef Maarten Roosendaal:

 Hi,

 Thanks, i'm not quite sure how to do that. I'm using:
 http://localhost:9200/lists/list/[id of 
 list]/_mlt?mlt_field=product_idmin_term_freq=1min_doc_freq=1

 the body does not seem to be respected (i'm using the elasticsearch head 
 plugin) if i ad:
 {
   explain: true
 }

 i've been trying to rewrite the mlt api as an mlt query but no luck so 
 far. Any suggestions?

 Thanks,
 Maarten

 Op woensdag 8 januari 2014 16:14:25 UTC+1 schreef Justin Treher:

 Hey Maarten,

 I would use the explain:true option to see just why your documents are 
 being scored higher than others. MoreLikeThis using the same fulltext 
 scoring as far as I know, so term position would affect score. 


 http://lucene.apache.org/core/3_0_3/api/contrib-queries/org/apache/lucene/search/similar/MoreLikeThis.html

 Justin

 On Wednesday, January 8, 2014 3:04:47 AM UTC-5, Maarten Roosendaal wrote:

 Hi,

 I have a question about why the 'more like this' algorithm scores 
 documents higher than others, while they are (at first glance) the same.

 What i've done is index wishlist-documents which contain 1 property: 
 product_id, this property contains an array of product_id's (e.g. [1234, 
 , , ]. What i'm trying to do is find similair wishlist for a 
 given wishlist with id x. The MLT API seems to work, it returns other 
 documents which contain at least 1 of the product_id's from the original 
 list.

 But what is see is that, for example. i get 10 hits, the first 6 hits 
 contain the same (and only 1) product_id, this product_id is present in 
 the 
 original wishlist. What i would expect is that the score of the first 6 is 
 the same. However what i see is that only the first 2 have the same, the 
 next 2 a lower score and the next 2 even lower. Why is this?

 Also, i'm trying to write the MLT API as an MLT query, but somehow it 
 doesn't work. I would expect that i need to take the entire content of the 
 original product_id property and feed is as input for the 'like_text'. The 
 documentation is not very clear and doesn't provide examples so i'm a 
 little lost.

 Hope someone can give some pointers.

 Thanks,
 Maarten



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/91734252-74d0-4001-becc-a184af0f2997%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Snapshot Restore Frequency

2014-05-06 Thread Alex Philipp
According to the docs, snapshot operations are online and only store diffs. 
 Is there any particular reason to not run them at a fairly high frequency? 
 E.g. every 15 minutes?

Alex

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/60c3365b-7dec-487b-be45-c174ef992329%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Shared facet filtration

2014-04-09 Thread Alex G
Fantastic, that's exactly what I was looking for, thankyou!

On Wednesday, April 9, 2014 3:12:42 AM UTC+10, Ivan Brusic wrote:

 You should be able to use filtered queries instead, where the filter is 
 your facet filter: 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html

 The filtered query will filter documents before the query. Facets work on 
 the documents returned by the query, so if the documents are pre-filtered, 
 the facets will not even work on them.

 -- 
 Ivan




 On Mon, Apr 7, 2014 at 6:56 PM, Alex G alex@crowdstrike.comjavascript:
  wrote:

 Hello,

 I’m implementing a faceted interface that requires that all the facets be 
 filtered by a shared filter - below is roughly how the queries currently 
 look, is there a more efficient/performant way to make this kind of query? 
 Less fussed about actual query verbosity but if there some way of sharing 
 or referencing the repeated facet_filter other than search templates that’d 
 be fantastic.

 Thanks,

 Alex

 {
 facets: {
 facetOne: {
 facet_filter: {
 bool: {
 must: [
 {
 term: {
 foo.bar: test
 }
 },
 {
 term: {
 baz:test*
 }
 }
 ]
 }
 },
 terms: {
 field: facetOne.field,
 order: [count],
 size: 50
 }
 },

 facetTwo: {
 facet_filter: {
 bool: {
 must: [
 {
 term: {
 foo.bar: test
 }
 },
 {
 term: {
 baz:test*
 }
 }
 ]
 }
 },
 terms: {
 field: facetTwo.field,
 order: [count],
 size: 50
 }
 },

 facetThree: {
 facet_filter: {
 bool: {
 must: [
 {
 term: {
 foo.bar: test
 }
 },
 {
 term: {
 baz:test*
 }
 }
 ]
 }
 },
 terms: {
 field: facetThree.field,
 order: [count],
 size: 50
 }
 }
 },
 size: 0
 }




 --

 *CONFIDENTIALITY NOTICE:* The information contained in this message may be 
 privileged and/or confidential. It is the property of CrowdStrike.  If you 
 are not the intended recipient, or responsible for delivering this message 
 to the intended recipient, any review, forwarding, dissemination, 
 distribution or copying of this communication or any attachment(s) is 
 strictly prohibited. If you have received this message in error, please 
 notify the sender immediately, and delete it and all attachments from your 
 computer and network.

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/0993b970-b60c-4e38-b42c-953394abdac1%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/0993b970-b60c-4e38-b42c-953394abdac1%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.



-- 


--

*CONFIDENTIALITY NOTICE:* The information contained in this message may be 
privileged and/or confidential. It is the property of CrowdStrike.  If you are 
not the intended recipient, or responsible for delivering this message to the 
intended recipient, any review, forwarding, dissemination, distribution or 
copying of this communication or any attachment(s) is strictly prohibited. If 
you have received this message in error, please notify the sender immediately, 
and delete it and all attachments from your computer and network.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com

Re: synonyms in a query

2014-04-08 Thread Alex K
Hi Luiz,

thank you again for your reply.

I don't fully understand the part you mentioned:

After index one document with the title equals do core:

 curl -XPOST 'localhost:9200/myindex/test/1' -d '{
 title: core
 }'


Sorry, I am pretty new to ES and haven't understand pretty much.

Now what happens there?
And what if I don't have hardcoded synonyms, but a file which someone can 
fill out.
I need something like 

synonyms_path : analysis/synonym.txt

 in my filter, but then what about the setp you mentioned that I did not 
understand?

Sorry for all the trouble 



Am Montag, 7. April 2014 09:29:17 UTC+2 schrieb Alex K:

 Hello there,

 i have a query, example is this:
 {
 query: {
 bool: {
 should: [
 {
 multi_match: {
 query: foo,
 fields: [
 TITLE,
 SHORTDESC
 ],
 type: phrase_prefix
 }
 },
 {
 multi_match: {
 query: foo,
 cutoff_frequency: null,
 fields: [
 TITLE,
 SHORTDESC
 ]
 }
 }
 ]
 }
 },
 filter: {
 term: {
 ACTIVE: 1
 }
 },
 sort: {
 TITLE: {
 order: asc
 }
 },
 size: 7
 }

 Now I have the question if I can use synonyms here?

 I already saw that you can use a synonym-token inside an analyzer.
 But I have a query here, not an analyzer.
 Do I have to put an analyzer inside the query?

 I don't know much about ES yet, so this may be a total stupid question.
 Thank you in advance :-)


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/637c5a0a-89cc-47d3-9f59-785ddf6ccfc3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: synonyms in a query

2014-04-08 Thread Alex K
Hi Luiz,

thank you again for your reply.

A colleague of mine told me that I might miss a plugin to use my 
settings-file.
I will check this out and later write down here what I found out.

Sorry for all the trouble 

Am Montag, 7. April 2014 09:29:17 UTC+2 schrieb Alex K:

 Hello there,

 i have a query, example is this:
 {
 query: {
 bool: {
 should: [
 {
 multi_match: {
 query: foo,
 fields: [
 TITLE,
 SHORTDESC
 ],
 type: phrase_prefix
 }
 },
 {
 multi_match: {
 query: foo,
 cutoff_frequency: null,
 fields: [
 TITLE,
 SHORTDESC
 ]
 }
 }
 ]
 }
 },
 filter: {
 term: {
 ACTIVE: 1
 }
 },
 sort: {
 TITLE: {
 order: asc
 }
 },
 size: 7
 }

 Now I have the question if I can use synonyms here?

 I already saw that you can use a synonym-token inside an analyzer.
 But I have a query here, not an analyzer.
 Do I have to put an analyzer inside the query?

 I don't know much about ES yet, so this may be a total stupid question.
 Thank you in advance :-)


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ba838067-1277-4db9-a8f9-e306d47d6591%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: synonyms in a query

2014-04-08 Thread Alex K
Me again,

seems it was a local problem for me.
The way Luiz mentioned is the exact correct way.
Thank you very much, Luiz
You helped me really out of this!

Am Montag, 7. April 2014 09:29:17 UTC+2 schrieb Alex K:

 Hello there,

 i have a query, example is this:
 {
 query: {
 bool: {
 should: [
 {
 multi_match: {
 query: foo,
 fields: [
 TITLE,
 SHORTDESC
 ],
 type: phrase_prefix
 }
 },
 {
 multi_match: {
 query: foo,
 cutoff_frequency: null,
 fields: [
 TITLE,
 SHORTDESC
 ]
 }
 }
 ]
 }
 },
 filter: {
 term: {
 ACTIVE: 1
 }
 },
 sort: {
 TITLE: {
 order: asc
 }
 },
 size: 7
 }

 Now I have the question if I can use synonyms here?

 I already saw that you can use a synonym-token inside an analyzer.
 But I have a query here, not an analyzer.
 Do I have to put an analyzer inside the query?

 I don't know much about ES yet, so this may be a total stupid question.
 Thank you in advance :-)


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7a1878b4-6c79-42a3-b0d7-5562d0cbdece%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: running a specific integration test

2014-04-08 Thread Alex Marandon


On Friday, 9 December 2011 10:21:40 UTC+1, Karussell wrote:

 It is possible:


 http://maven.apache.org/plugins/maven-surefire-plugin/examples/single-test.htmlhttp://www.google.com/url?q=http%3A%2F%2Fmaven.apache.org%2Fplugins%2Fmaven-surefire-plugin%2Fexamples%2Fsingle-test.htmlsa=Dsntz=1usg=AFQjCNGZmEOlv7GgGLzLlscGY82yGfiGYw

 http://stackoverflow.com/questions/1873995/run-a-single-test-method-with-maven


Is this advice still valid?

I've tried different variations of the mvn test command with no luck so 
far. Example :

$ ES_TEST_LOCAL=true mvn test 
-Dtest=SimpleValidateQueryTests#simpleValidateQuery 
[INFO] Scanning for projects...
[...]
Executing 501 suites with 3 JVMs.

[...]
Suite: org.elasticsearch.search.aggregations.bucket.GeoDistanceTests
[...]

Can you give me a cli example of executing a specific test? Or do I have to 
use an IDE?

Thanks,
Alex

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3465d32d-ca52-4482-83b6-45d55751a12b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


synonyms in a query

2014-04-07 Thread Alex K
Hello there,

i have a query, example is this:
{
query: {
bool: {
should: [
{
multi_match: {
query: foo,
fields: [
TITLE,
SHORTDESC
],
type: phrase_prefix
}
},
{
multi_match: {
query: foo,
cutoff_frequency: null,
fields: [
TITLE,
SHORTDESC
]
}
}
]
}
},
filter: {
term: {
ACTIVE: 1
}
},
sort: {
TITLE: {
order: asc
}
},
size: 7
}

Now I have the question if I can use synonyms here?

I already saw that you can use a synonym-token inside an analyzer.
But I have a query here, not an analyzer.
Do I have to put an analyzer inside the query?

I don't know much about ES yet, so this may be a total stupid question.
Thank you in advance :-)

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5fe2d157-b437-4bd8-8a18-8aa4f41f63fe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: synonyms in a query

2014-04-07 Thread Alex K
Hello Luiz, thank you for your reply!

As we use rivers, I was told to declare the analyzer there.
It looks like this for me:
{
   index : {
  analysis : {
 filter : {
synonym_filter : {
type : synonym,
synonyms : [
foo, foo bar = core
]
}
 },
 analyzer : {
synonym : {
tokenizer : whitespace,
filter : [
synonym_filter
],
type : custom,
}
 }
  }
   }
}
which acctually says, for testing-purpose, 'if someone searches for 'foo' 
or 'foo bar', search for 'core' '

Now my query uses the analyzer:
{
query: {
bool: {
should: [
{
multi_match: {
query: foo,
fields: [
TITLE,
SHORTDESC
],
type: phrase_prefix,
analyzer: synonym
}
},
{
multi_match: {
query: foo,
cutoff_frequency: null,
fields: [
TITLE,
SHORTDESC
]
}
}
]
}
},
filter: {
term: {
ACTIVE: 1
}
},
sort: {
TITLE: {
order: asc
}
},
size: 7
}

But I get an error there:
[...]nested: QueryParsingException[[test484] [multi_match] analyzer 
[synonym] not found];[...]

What am I doing wrong here?


Am Montag, 7. April 2014 09:29:17 UTC+2 schrieb Alex K:

 Hello there,

 i have a query, example is this:
 {
 query: {
 bool: {
 should: [
 {
 multi_match: {
 query: foo,
 fields: [
 TITLE,
 SHORTDESC
 ],
 type: phrase_prefix
 }
 },
 {
 multi_match: {
 query: foo,
 cutoff_frequency: null,
 fields: [
 TITLE,
 SHORTDESC
 ]
 }
 }
 ]
 }
 },
 filter: {
 term: {
 ACTIVE: 1
 }
 },
 sort: {
 TITLE: {
 order: asc
 }
 },
 size: 7
 }

 Now I have the question if I can use synonyms here?

 I already saw that you can use a synonym-token inside an analyzer.
 But I have a query here, not an analyzer.
 Do I have to put an analyzer inside the query?

 I don't know much about ES yet, so this may be a total stupid question.
 Thank you in advance :-)


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3ec9e97d-f210-4a88-a269-f6306bf0266c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Shared facet filtration

2014-04-07 Thread Alex G
 

Hello,

I’m implementing a faceted interface that requires that all the facets be 
filtered by a shared filter - below is roughly how the queries currently 
look, is there a more efficient/performant way to make this kind of query? 
Less fussed about actual query verbosity but if there some way of sharing 
or referencing the repeated facet_filter other than search templates that’d 
be fantastic.

Thanks,

Alex

{
facets: {
facetOne: {
facet_filter: {
bool: {
must: [
{
term: {
foo.bar: test
}
},
{
term: {
baz:test*
}
}
]
}
},
terms: {
field: facetOne.field,
order: [count],
size: 50
}
},

facetTwo: {
facet_filter: {
bool: {
must: [
{
term: {
foo.bar: test
}
},
{
term: {
baz:test*
}
}
]
}
},
terms: {
field: facetTwo.field,
order: [count],
size: 50
}
},

facetThree: {
facet_filter: {
bool: {
must: [
{
term: {
foo.bar: test
}
},
{
term: {
baz:test*
}
}
]
}
},
terms: {
field: facetThree.field,
order: [count],
size: 50
}
}
},
size: 0
}




-- 


--

*CONFIDENTIALITY NOTICE:* The information contained in this message may be 
privileged and/or confidential. It is the property of CrowdStrike.  If you are 
not the intended recipient, or responsible for delivering this message to the 
intended recipient, any review, forwarding, dissemination, distribution or 
copying of this communication or any attachment(s) is strictly prohibited. If 
you have received this message in error, please notify the sender immediately, 
and delete it and all attachments from your computer and network.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0993b970-b60c-4e38-b42c-953394abdac1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Make a autosuggest-search searching in realtime doesn't work properly

2014-04-02 Thread Alex K
Hi there,

I have the following Request I send to ES:

{
query: {
filtered: {
query: {
bool: {
should: [
{
multi_match: {
query: socks purple,
fields: [
TITLE
],
type: phrase_prefix
}
},
{
multi_match: {
query: socks purple,
fields: [
TITLE
],
}
}
]
}
},
filter: {
and: [
{
terms: {
ACTIVE: [
1
]
}
}
]
}
}
},
size: 7
}

Now, the first multi_match gives me good results, when I input the words I 
search in the correct manner (e.g. Purple Socks).
But when I enter it in a 'wrong' way (e.g. Socks Purple) it doesn't find 
anything.
A colleague of mine said I could try using a second multi_match.
I have not much knowledge of ES, almost all of the above was already there, 
I just extended the code with the second multimatch.
But now there is the problem, that if I input socks it gives me all 
matches for socks.now when I continue to enter purple, it gives me 
not just purple socks, but everything matching purple (although I would 
expect only purple socks)
Anyone knows what the problem here is?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/074ec987-379b-4591-a5dc-0d2b482d4ec8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


multi_match and cutoff_frequency

2014-03-31 Thread Alex K
Hello there,

I am a total ES-noob, so please forgive me if my question is weird or 
something ;-)

Currently I have a task to implement the cutoff_frequency for our 
elasticsearch-queries.
The current query looks like this:

{
query:{
bool:{
should:[
{multi_match:{
query:the,
cutoff_frequency:0.001,
fields:[TITLE,SHORTDESC],
type:phrase_prefix}
}
]
}
},
filter:{
term:{
ACTIVE:1
}
},
sort:{
TITLE:{
order:asc
}
},
size:7}

This works perfectly fine, like before, BUT it seems that the 
cutoff_frequency there doesn't matter.
Is it the wrong place to put there?
Or doesn't it work with multi_match?

I have to admit that I haven't fully understand the things that 
cutoff_frequency does.
But I have lots of entries in the index for this query which has The in 
the title.
Wouldn't cutoff_frequency:0.001 mean that the word the is ignored if it 
is in 1/1000 of all the words in the titles?
(and yes, in case I did understand it the other way around, I also tried 
1.0, which would mean 1000/1000 = every word, yes? didn't make a diference 
for my query)

Sorry for my bad english, I am german.
I hope I don't confuse anyone too much...

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/94f1c37e-b01f-465a-bf55-55cf848613b3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Indexing performance with doc values (particularly with larger number of fields)

2014-03-23 Thread Alex at Ikanow
This might be more of a Lucene question, but a quick google didn't throw up 
anything.

Has anyone done/seen any benchmarking on indexing performance (overhead) 
due to using doc values?

I often index quite large JSON objects, with many fields (eg 50), I'm 
trying to get a feel for whether I can just let all of them be doc values 
on the off chance I'll want to aggregate over them, or whether I need to 
pick beforehand which fields will support aggregation.

(A related question: presumably allowing a mix of doc values fields and 
legacy fields is a bad idea, because if you use doc values fields you 
want a low max heap so that the file cache has lots of memory available, 
whereas if you use the field cache you need a large heap - is that about 
right, or am i missing something?)

Thanks for any insight!

Alex
Ikanow

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0361eda4-ab39-4536-b91a-ccb710921edd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Install issues with Kibana3 vs elasticsearch 0.19.11

2014-03-20 Thread Alex at Ikanow
Never mind, I'm an idiot, it clearly mentions it needs 0.90.x in the README 
:(

On Wednesday, March 19, 2014 12:49:46 PM UTC-4, Alex at Ikanow wrote:


 I downloaded the latest Kibana3, popped it on a tomcat instance sharing 
 space with my elasticsearch (0.19.11) instance and tried to connect (both: 
 using an ssh tunnel to connect localhost:9200 back to the server, and 
 opening port 9200 in the firewall)

 In both cases, the browser makes a call to _nodes (eg returns 
 {ok:true,cluster_name:infinite-dev,nodes:{Yup-Cmn0QwCrkYI6l7SdRw:{name:Firefrost,transport_address:inet[/
 10.113.42.186:9300],hostname:ip-10-113-42-186,http_address:inet[/
 10.113.42.186:9200]}}})

 and then returns the following error:

 TypeError: Cannot call method 'split' of undefined at 
 http://SERVER/kibana-3.0.0/app/app.js:22:11260http://dev.ikanow.com/kibana-3.0.0/app/app.jsat
  he (
 http:// 
 http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:7:20041)
  
 at Function.Yb (http:// http://dev.ikanow.com/kibana-3.0.0/app/app.js
 SERVER 
 http://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:7:7025)
  
 at http:// 
 http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:22:11204
  
 at i (http:// 
 http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:458)
  
 at i (http:// 
 http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:458)
  
 at http:// 
 http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:1014
  
 at Object.f.$eval (http:// http://dev.ikanow.com/kibana-3.0.0/app/app.js
 SERVER 
 http://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:6963)
  
 at Object.f.$digest (http://http://dev.ikanow.com/kibana-3.0.0/app/app.js
 SERVER 
 http://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:5755)
  
 at Object.f.$apply (http://http://dev.ikanow.com/kibana-3.0.0/app/app.js
 SERVER http://dev.ikanow.com/kibana-3.0.0/app/app.js
 /kibana-3.0.0/app/app.js:9:7111) 

 I don't see any other calls back to elasticseach

 I couldn't find a statement anywhere of which versions Kibana3 is 
 compatible with - does it just need a later version (anyone know the 
 earliest with which it is compatible, out of curiosity; though I'm planning 
 to move to 1.0 anyway soon), or am I doing something wrong

 Thanks for any insight/help anyone can provide!

 Alex






-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c6a2f998-89a9-4794-bdf8-15d1dcd26aae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Install issues with Kibana3 vs elasticsearch 0.19.11

2014-03-19 Thread Alex at Ikanow

I downloaded the latest Kibana3, popped it on a tomcat instance sharing 
space with my elasticsearch (0.19.11) instance and tried to connect (both: 
using an ssh tunnel to connect localhost:9200 back to the server, and 
opening port 9200 in the firewall)

In both cases, the browser makes a call to _nodes (eg returns 
{ok:true,cluster_name:infinite-dev,nodes:{Yup-Cmn0QwCrkYI6l7SdRw:{name:Firefrost,transport_address:inet[/10.113.42.186:9300],hostname:ip-10-113-42-186,http_address:inet[/10.113.42.186:9200]}}})

and then returns the following error:

TypeError: Cannot call method 'split' of undefined at 
http://SERVER/kibana-3.0.0/app/app.js:22:11260http://dev.ikanow.com/kibana-3.0.0/app/app.jsat
 he (
http:// 
http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:7:20041)
 
at Function.Yb (http:// http://dev.ikanow.com/kibana-3.0.0/app/app.js
SERVER 
http://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:7:7025) 
at http:// 
http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:22:11204
 
at i (http:// 
http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:458)
 
at i (http:// 
http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:458)
 
at http:// 
http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:1014
 
at Object.f.$eval (http:// http://dev.ikanow.com/kibana-3.0.0/app/app.js
SERVER 
http://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:6963) 
at Object.f.$digest (http:// http://dev.ikanow.com/kibana-3.0.0/app/app.js
SERVER 
http://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:5755) 
at Object.f.$apply (http:// http://dev.ikanow.com/kibana-3.0.0/app/app.js
SERVER http://dev.ikanow.com/kibana-3.0.0/app/app.js
/kibana-3.0.0/app/app.js:9:7111) 

I don't see any other calls back to elasticseach

I couldn't find a statement anywhere of which versions Kibana3 is 
compatible with - does it just need a later version (anyone know the 
earliest with which it is compatible, out of curiosity; though I'm planning 
to move to 1.0 anyway soon), or am I doing something wrong

Thanks for any insight/help anyone can provide!

Alex




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ed77f1e7-547d-4c87-b23f-ee97dece9533%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: EsRejectedExecutionException when searching date based indices.

2014-02-26 Thread Alex Clark
That is correct, I was mixing the terms nodes and shards (sorry about 
that).  I'm running the test on a single node (machine).  I've chosen 20 
shards so we could eventually go to a 20 server cluster without 
re-indexing.  It's unlikely we'll ever need to go that high but we never 
know and given we receive 750 million messages a day, the thought of 
reindexing after collecting a years worth of data makes me nervous.  If I 
can over shard and avoid a massive reindex then I'll be a happy guy.

I thought about reducing the 20 shards but even if I go to say 5 shards on 
5 machines (1 shard per machine?) then I'll still run into the issue if a 
user searches several years back.  Any other thoughts on a possible 
solution?  Would increasing the queue size be a good option.  Is there a 
down side (performance hit, running out of resources, etc)?

Thanks again!

On Tuesday, February 25, 2014 11:32:26 PM UTC-8, David Pilato wrote:

 You are mixing nodes and shards, right?
 How many elasticsearch nodes do you have to manage your 7300 shards?
 Why did you set 20 shards per index?

 You can increase the queue size in elasticsearch.yml but I'm not sure it's 
 the right thing to do here.

 My 2 cents

 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


 Le 26 févr. 2014 à 01:36, Alex Clark al...@bitstew.com javascript: a 
 écrit :

 Hello all, I’m getting failed nodes when running searches and I’m hoping 
 someone can point me in the right direction.  I have indices created per 
 day to store messages.  The pattern is pretty straight forward: the index 
 for January 1 is messages_20140101, for January 2 is messages_20140102 
 and so on.  Each index is created against a template that specifies 20 
 shards. A full year will give 365 indices * 20 shards = 7300 nodes. I have 
 recently upgraded to ES 1.0.

 When I search for all messages in a year (either using an alias or 
 specifying “messages_2013*”), I get many failed nodes.  The reason given 
 is: “EsRejectedExecutionException[rejected execution (queue capacity 
 1000) on 
 org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4@651b8924javascript:]”).
   
 The more often I search, the fewer failed nodes I get (probably caching in 
 ES) but I can’t get down to 0 failed nodes.  I’m using ES for analytics so 
 the document counts coming back have to be accurate. The aggregate counts 
 will change depending on the number of node failures.  We use the Java API 
 to create a local node to index and search the documents.  However, we also 
 see the issue if we use the URL search API on port 9200.

 If I restrict the search for 30 days then I do not see any failures (it’s 
 under 1000 nodes so as expected).  However, it is a pretty common use case 
 for our customers to search messages spanning an entire year.  Any 
 suggestions on how I can prevent these failures?  

 Thank you for your help!

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/9bf6d3bb-34e5-44c4-8d76-24f868d283a0%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/954f7266-6587-4509-8159-aae5897dc2b6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


EsRejectedExecutionException when searching date based indices.

2014-02-25 Thread Alex Clark
 

Hello all, I’m getting failed nodes when running searches and I’m hoping 
someone can point me in the right direction.  I have indices created per 
day to store messages.  The pattern is pretty straight forward: the index 
for January 1 is messages_20140101, for January 2 is messages_20140102 
and so on.  Each index is created against a template that specifies 20 
shards. A full year will give 365 indices * 20 shards = 7300 nodes. I have 
recently upgraded to ES 1.0.

When I search for all messages in a year (either using an alias or 
specifying “messages_2013*”), I get many failed nodes.  The reason given 
is: “EsRejectedExecutionException[rejected execution (queue capacity 1000) 
on 
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4@651b8924]”).
  
The more often I search, the fewer failed nodes I get (probably caching in 
ES) but I can’t get down to 0 failed nodes.  I’m using ES for analytics so 
the document counts coming back have to be accurate. The aggregate counts 
will change depending on the number of node failures.  We use the Java API 
to create a local node to index and search the documents.  However, we also 
see the issue if we use the URL search API on port 9200.

If I restrict the search for 30 days then I do not see any failures (it’s 
under 1000 nodes so as expected).  However, it is a pretty common use case 
for our customers to search messages spanning an entire year.  Any 
suggestions on how I can prevent these failures?  

Thank you for your help!

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9bf6d3bb-34e5-44c4-8d76-24f868d283a0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Elasticsearch Maven plugin on GitHub

2014-01-17 Thread Alex Cojocaru
You are very welcome, David.

I believe the project is pretty much complete, for it also contains tests
which exercise the mojos.
As mentioned already, it depends on an ES version which is already old. I
will try to keep it up to date, but contributions of any sort are more than
welcome.

alex



On Fri, Jan 17, 2014 at 2:54 AM, David Pilato da...@pilato.fr wrote:

 Hey Alex

 That's great! I started a project like this some months ago but did not
 find enough time to finish it.
 Thanks for sharing it!

 --
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | 
 @elasticsearchfrhttps://twitter.com/elasticsearchfr


 Le 17 janvier 2014 at 01:44:26, AlexC 
 (acojoc...@pingidentity.com//acojoc...@pingidentity.com)
 a écrit:

 If anyone is interested in using a Maven plugin to run Elasticsearch for
 integration testing, I just published one on GitHub:
 https://github.com/alexcojocaru/elasticsearch-maven-plugin.

 It is an alternative to starting a node through the code.

 The readme should provide enough information, but let me know if something
 is missing or not clear enough.
 It uses ES v0.90.7, but it can be easily updated to the latest ES version
 by changing the dependency version in the pom.xml file.

 alex

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.

 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/91db7e6a-9bab-4cde-b52f-7f4d660c1248%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/0I2TGylTRHc/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/etPan.52d8e1b1.6763845e.dc5%40MacBook-Air-de-David.local
 .

 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHUBgW-BJbavcMR0Tqn0djA%2BygtXCNsbZ5c8U%3DphGEHiqdt84g%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.