Re: ExpressionScriptCompilationException[Field [field name] used in expression does not exist in mappings];
It looks like the same question (but with more context/information) here: http://stackoverflow.com/questions/28986964/expressions-with-dynamicly-generated-schemas-throw-exceptions-when-some-indices but doesn't have any answers yet either. Does anyone here happen to know what the best practice way of addressing these indices without the mapping in question? I'd really hate to have to go through and hand update them all to have the mapping :( On Friday, March 6, 2015 at 8:27:48 AM UTC-8, Alex Schokking wrote: Hi there, We're just getting started with ELK and are using: Elasticsearch 1.4.4 Kibana 4.0 on Ubuntu 14.04 We needed to create a scripted field to calculate the ratio between two numeric fields. These fields are not on all events and only started appearing at all a day ago (so older indexes don't have it at all). name: ads_per_page script: doc['ads_found'].value / max(1, doc['pages_parsed'].value) It seemed to be working great at first but now Kibana has been resurfacing these elasticsearch errors constantly and I can't seem to find any information about it online (too new?). This repeats for every shard as far as I can tell (there are about 2 weeks of indexes there. Any suggestions would be appreciated. Shard Failures The following shard failures ocurred: - *Index:* logstash-2015.02.21 *Shard:* 0 *Reason:* SearchParseException[[logstash-2015.02.21][0]: query[ConstantScore(BooleanFilter(+cache(@timestamp:[1425571563220 TO 1425657963220])))],from[-1],size[500],sort[custom:@timestamp: org.elasticsearch.index.fielddata.fieldcomparator.LongValuesComparatorSource@4c1942d3!]: Parse Failure [Failed to parse source [{size:500,sort:{@timestamp:desc},query:{filtered:{query:{query_string:{analyze_wildcard:true,query:*}},filter:{bool:{must:[{range:{@timestamp:{gte:1425571563220,lte:1425657963220}}}],must_not:[],highlight:{pre_tags:[@kibana-highlighted-field@],post_tags:[@/kibana-highlighted-field@],fields:{*:{}}},aggs:{2:{date_histogram:{field:@timestamp,interval:30m,pre_zone:-08:00,pre_zone_adjust_large_interval:true,min_doc_count:0,extended_bounds:{min:1425571563220,max:1425657963220,fields:[*,_source],script_fields:{ads_per_page:{script:doc['ads_found'].value / max(1, doc['pages_parsed'].value),lang:expression}},fielddata_fields:[@timestamp]}]]]; nested: ExpressionScriptCompilationException[Field [ads_found] used in expression does not exist in mappings]; -- Please update your bookmarks! We have moved to https://discuss.elastic.co/ --- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/db5d1ac5-0961-4274-b329-00832834340e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Perma-Unallocated primary shards after a node has left the cluster
Probably super evident but the output above was actually from _cat/allocation?v not /recovery, sorry about that. On Wednesday, April 29, 2015 at 5:19:08 PM UTC-7, Alex Schokking wrote: Hi guys, I would really appreciate some help understanding what's going down with shard allocation in this case: Elasticsearch version: 1.4.4 We had 3 nodes with 1 shard and 1 replica per index (so net 2 copies of everything). 1 node went down and the cluster went red. It started to reallocate shards as expected and there were originally ~50 unallocated shards with 15 primary and the rest replicas. It's been a few hours now and there are still 15 outstanding shards that are all primary that don't seem to be getting re-allocated. I thought this would be a pretty standard scenario so I was really hoping I wouldn't need to manually walk through and re-allocate the primary shards, but I'm not sure what else to try at this point to get back to green. Any pointers would be really appreciated. Here is some of the relevant seeming bits folks asked about on the IRC: In the ES logs for the unallocated index names there are lines along the line of [2015-04-29 22:08:22,803][DEBUG][action.admin.indices.stats] [Agent Axis] [webaccesslogs-2015.04.24][0], node[-r2iQnH4R-mcUy4NicCB5g], [P], s[STARTED]: failed to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@6a564a91] org.elasticsearch.transport.SendRequestTransportException: [Jean-Paul Beaubier][inet[/10.155.165.126:9300]][indices:monitor/stats[s]] Jean-Paul Beaubier is the node that went down _cat/recovery shards disk.used disk.avail disk.total disk.percent host ip node 42021.2gb 77gb 98.3gb 21 ip-10-234-164-148 10.234.164.148 Agent Axis 420 41gb 57.2gb 98.3gb 41 ip-10-218-145-237 10.218.145.237 Ebon Seeker 15 UNASSIGNED I'm trying to understand why it's stuck in this state given there is no other info in the logs as far as I can tell about why the shards can't be allocated. Shouldn't the replicas just be promoted in place to new primaries and then new replicas created on the other node? Thanks and regards -- Alex -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/44f2f680-0560-448f-a19f-893fda5aab41%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Perma-Unallocated primary shards after a node has left the cluster
Hi guys, I would really appreciate some help understanding what's going down with shard allocation in this case: Elasticsearch version: 1.4.4 We had 3 nodes with 1 shard and 1 replica per index (so net 2 copies of everything). 1 node went down and the cluster went red. It started to reallocate shards as expected and there were originally ~50 unallocated shards with 15 primary and the rest replicas. It's been a few hours now and there are still 15 outstanding shards that are all primary that don't seem to be getting re-allocated. I thought this would be a pretty standard scenario so I was really hoping I wouldn't need to manually walk through and re-allocate the primary shards, but I'm not sure what else to try at this point to get back to green. Any pointers would be really appreciated. Here is some of the relevant seeming bits folks asked about on the IRC: In the ES logs for the unallocated index names there are lines along the line of [2015-04-29 22:08:22,803][DEBUG][action.admin.indices.stats] [Agent Axis] [webaccesslogs-2015.04.24][0], node[-r2iQnH4R-mcUy4NicCB5g], [P], s[STARTED]: failed to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@6a564a91] org.elasticsearch.transport.SendRequestTransportException: [Jean-Paul Beaubier][inet[/10.155.165.126:9300]][indices:monitor/stats[s]] Jean-Paul Beaubier is the node that went down _cat/recovery shards disk.used disk.avail disk.total disk.percent host ip node 42021.2gb 77gb 98.3gb 21 ip-10-234-164-148 10.234.164.148 Agent Axis 420 41gb 57.2gb 98.3gb 41 ip-10-218-145-237 10.218.145.237 Ebon Seeker 15 UNASSIGNED I'm trying to understand why it's stuck in this state given there is no other info in the logs as far as I can tell about why the shards can't be allocated. Shouldn't the replicas just be promoted in place to new primaries and then new replicas created on the other node? Thanks and regards -- Alex -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9adda07d-88b0-4fa2-805b-37d4739d6f1a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Copying fields to a geopoint type ?
Were you ever able to figure out a solution to this? I'm in a similar boat. On Thursday, September 11, 2014 at 2:14:29 AM UTC-7, Kushal Zamkade wrote: Hello, I have created a location filed by using below code if [latitude] and [longitude] { mutate { rename = [ latitude, [location][lat], longitude, [location][lon] ] } } But when i check location field type then it is not created as geo_point. when i am trying to search a geo_point then i am getting below error. QueryParsingException[[logstash-2014.09.11] failed to find geo_point field [location1]]; can you help me to resolve this On Thursday, April 10, 2014 2:42:22 AM UTC+5:30, Pascal VINCENT wrote: Hi, I have included logstash in my stack and started to play with it. I'm sure it can do the trick I was looking for, and much more. Thank you ... [waiting for your blog post :)] Pascal. On Mon, Apr 7, 2014 at 9:38 AM, Alexander Reelsen a...@spinscale.de wrote: Hey, I dont know about your stack, but maybe logstash would be a good idea to add it in there. It is more flexible than the csv river and features a CSV input as well. You can easily change the structure of the data you want to index. This is how the logstash config would look like if [latitude] and [longitude] { mutate { rename = [ latitude, [location][lat], longitude, [location][lon] ] } } I am currently working on a blog post how to utilize elasticsearch, logstash and kibana on CSV based data and hope to release it soonish on the .org blog - which covers exactly this. Stay tuned! :-) --Alex On Thu, Apr 3, 2014 at 12:21 AM, Pascal VINCENT pasvi...@gmail.com wrote: Hi, I'm new to elasticsearch. My usecase is to load a csv file containing some agencies with geo location, each lines are like : id;label;address;zipcode;city;region;*latitude*;*longitude*;(and some others fields)+ I'm using the csv river plugin to index the file. My mapping is : { office: { properties: { *(first fields omitted...)* *latitude*: { type: double, }, *longitude*: { type: double, }, *location*: { type: geo_point, lat_lon: true } } } I'd like to index the location .lon and .lat value from the latitude and longitude fields. I tried the copy_to function with no success : latitude: { type: double, copy_to: location.lat }, longitude: { type: double, copy_to: location.lon }, Is there any way to feed the location property from latitude and longitude fields at indexation ? My point is that I don't want to modify the input csv file to adapt it to the GeoJSON format (i.e concat lat and lon in one field in the csv file). Thank you for any hints. Pascal. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6e12ced7-5b1a-4142-93d1-a3d22d7138a2%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/6e12ced7-5b1a-4142-93d1-a3d22d7138a2%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/QaI1fj74RlM/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-uHKT74qVbDT%3D8qg5Cv4vH0y%3DOzC8hGyO2uq_sY3sJ8g%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-uHKT74qVbDT%3D8qg5Cv4vH0y%3DOzC8hGyO2uq_sY3sJ8g%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4102f8c3-bdb5-457c-8adb-6c19cb2627c2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Copying fields to a geopoint type ?
Woah crazy, never would've thought of that, thanks a lot for following up! On Wed, Apr 1, 2015 at 12:31 PM, Pascal VINCENT pasvinc...@gmail.com wrote: I finally come up with : if [latitude] and [longitude] { mutate { add_field = [ [location], %{longitude} ] add_field = [ [location], %{latitude} ] } mutate { convert = [ [location], float ] } } -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/QaI1fj74RlM/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/47c8334d-109e-4052-9973-afa69dd49709%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/47c8334d-109e-4052-9973-afa69dd49709%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- -- Alex -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACXbQESyNcYr9g1u2dEFRovHE-NtB_JwugNn76G7_36Tm0Mteg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Cluster issue - raiseTimeoutFailure
Hi, I have a java application that is indexind data in an Elasticsearch cluster(*3* *nodes*). The ES is well configured and is working ok(indexing the received data from java). Cluster configuration for each node from /etc/elasticsearch/elasticsearch.yml ES_MAX_MEM: 2g ES_MIN_MEM: 2g bootstrap: mlockall: true cluster: name: clusterName discovery: zen: ping: multicast: enabled: false unicast: hosts: - elasticsearch-test-2-node-1 - elasticsearch-test-2-node-2 - elasticsearch-test-2-node-3 http: max_initial_line_length: 48k index: number_of_replicas: 2 number_of_shards: 6 node: name: elasticsearch-test-2-node-3 threadpool: index: type: fixed size: 6 queue_size: 1500 search: type: fixed size: 6 queue_size: 1200 When I'm connecting the Es cluster(from java), I specify all the nodes : node1, node2, node3. The issue is appearing when I stop the 2 data nodes one by one(stop the elasticsearch). In this case the cluster health is yellow and i can see the remained master node(using head plugin). The *master* has now *all the primary shards*. The replicas are Unassigned. But the java application is not indexing any more the data. The next exception appear on java : org.elasticsearch.action.UnavailableShardsException: [indexName][2] [3] shardIt, [1] active : Timeout waiting for [1m], request: index {[indexName][typeName][Id], source[{ . }]} at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.raiseTimeoutFailure(TransportShardReplicationOperationAction.java:548) ~[elasticsearch-1.1.0.jar:na] at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$3.onTimeout(TransportShardReplicationOperationAction.java:538) ~[elasticsearch-1.1.0.jar:na] at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:491) ~[elasticsearch-1.1.0.jar:na] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_51] at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_51] * Shouldn't work properly the indexing in this case even with only the master? * If I am going to kill also the master the next *logical* exception appears org.elasticsearch.client.transport.NoNodeAvailableException: No node available at org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:263) ~[elasticsearch-1.1.0.jar:na] at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:231) ~[elasticsearch-1.1.0.jar:na] at org.elasticsearch.client.transport.support.InternalTransportClient.execute(InternalTransportClient.java:106) ~[elasticsearch-1.1.0.jar:na] at org.elasticsearch.client.support.AbstractClient.update(AbstractClient.java:107) ~[elasticsearch-1.1.0.jar:na] -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/176b3f2e-9e18-4018-a4a9-46b009dfd3d2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
ElasticSearch across multiple data center architecture design options
Hi all, We are planning to use ELK for our log analysis. We have multiple data centers. Since it is not recommended to have across data center cluster, we are going to have one ES cluster per data center, here are the three design options we have: 1. Use snapshot restore to replicate data across clusters. 2. Use tribe node to achieve across cluster queries 3. Ship and index logs to each cluster Here are our questions, and any comments will be appreciated: 1. How complex is snapshot restore, anyone has experience on this purpose? 2. Would the performance of only one tribe node be a concern or bottleneck, is it possible to have multiple tribe nodes for scale up or load balancing? 3. Is it possible to customize Kibana so that it can go to different cluster to query data depends on the query? Thank you! Abigail -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2d46f80b-8579-4f2b-86c0-5ad654a5bba3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
ExpressionScriptCompilationException[Field [field name] used in expression does not exist in mappings];
Hi there, We're just getting started with ELK and are using: Elasticsearch 1.4.4 Kibana 4.0 on Ubuntu 14.04 We needed to create a scripted field to calculate the ratio between two numeric fields. These fields are not on all events and only started appearing at all a day ago (so older indexes don't have it at all). name: ads_per_page script: doc['ads_found'].value / max(1, doc['pages_parsed'].value) It seemed to be working great at first but now Kibana has been resurfacing these elasticsearch errors constantly and I can't seem to find any information about it online (too new?). This repeats for every shard as far as I can tell (there are about 2 weeks of indexes there. Any suggestions would be appreciated. Shard Failures The following shard failures ocurred: - *Index:* logstash-2015.02.21 *Shard:* 0 *Reason:* SearchParseException[[logstash-2015.02.21][0]: query[ConstantScore(BooleanFilter(+cache(@timestamp:[1425571563220 TO 1425657963220])))],from[-1],size[500],sort[custom:@timestamp: org.elasticsearch.index.fielddata.fieldcomparator.LongValuesComparatorSource@4c1942d3!]: Parse Failure [Failed to parse source [{size:500,sort:{@timestamp:desc},query:{filtered:{query:{query_string:{analyze_wildcard:true,query:*}},filter:{bool:{must:[{range:{@timestamp:{gte:1425571563220,lte:1425657963220}}}],must_not:[],highlight:{pre_tags:[@kibana-highlighted-field@],post_tags:[@/kibana-highlighted-field@],fields:{*:{}}},aggs:{2:{date_histogram:{field:@timestamp,interval:30m,pre_zone:-08:00,pre_zone_adjust_large_interval:true,min_doc_count:0,extended_bounds:{min:1425571563220,max:1425657963220,fields:[*,_source],script_fields:{ads_per_page:{script:doc['ads_found'].value / max(1, doc['pages_parsed'].value),lang:expression}},fielddata_fields:[@timestamp]}]]]; nested: ExpressionScriptCompilationException[Field [ads_found] used in expression does not exist in mappings]; -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/971bfbf2-6210-473b-9098-6262ed302846%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Using function_score error
The error is in your groovy script, as indicated by GroovyScriptExecutionException. All the other info is just making it more difficult to help you. script: _score doc['reviews'].value Your script doesn't use any operator. It's likely that you just want to multiply: _score * doc['reviews'].value. In groovy, function call arguments do not need to be enclosed in brackets. E.g. println 'hello' is equivalent to println('hello'). By omitting the operator, your script is trying to call _score (which is some UpdatableFloat), with the document field as an argument. Cheers! Op maandag 3 november 2014 19:46:07 UTC+1 schreef Manuel Sciuto: I have an error My mapping - mappings: { - comida: { - dynamic: true, - numeric_detection: true, - properties: { - id: { - type: integer }, - reviews: { - type: integer }, - name: { - analyzer: myAnalyzerDestinos, - type: string } } }, - actividades: { - dynamic: true, - numeric_detection: true, - properties: { - id: { - type: integer }, - reviews: { - type: integer }, - name: { - analyzer: myAnalyzerDestinos, - type: string } } }, - alojamiento: { - dynamic: true, - numeric_detection: true, - properties: { - id: { - type: integer }, - reviews: { - type: integer }, - name: { - analyzer: myAnalyzerDestinos, - type: string } } }, - transporte__servicios: { - dynamic: true, - numeric_detection: true, - properties: { - id: { - type: integer }, - reviews: { - type: integer }, - name: { - analyzer: myAnalyzerDestinos, - type: string } } } }, My Query GET /business/_search { query: { function_score: { query: {match: {name: sheraton}}, script_score: { script: _score doc['reviews'].value, lang: groovy } } } } Response { error: SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[pGQYzpifRMumKUcblgTp2Q][business][0]: QueryPhaseExecutionException[[business][0]: query[function score (name:she name:sher name:shera name:sherat name:sherato name:sheraton,function=script[_score doc['reviews'].value], params [null])],from[0],size[10]: Query Failed [Failed to execute main query]]; nested: GroovyScriptExecutionException[MissingMethodException[No signature of method: org.elasticsearch.script.groovy.GroovyScriptEngineService$GroovyScript$UpdateableFloat.call() is applicable for argument types: (java.lang.Long) values: [11]\nPossible solutions: wait(long), wait(), abs(), any(), wait(long, int), and(java.lang.Number)]]; }{[pGQYzpifRMumKUcblgTp2Q][business][1]: QueryPhaseExecutionException[[business][1]: query[function score (name:she name:sher name:shera name:sherat name:sherato name:sheraton,function=script[_score doc['reviews'].value], params [null])],from[0],size[10]: Query Failed [Failed to execute main query]]; nested: GroovyScriptExecutionException[MissingMethodException[No signature of method: org.elasticsearch.script.groovy.GroovyScriptEngineService$GroovyScript$UpdateableFloat.call() is applicable for argument types: (java.lang.Long) values: [16]\nPossible solutions: wait(long), wait(), abs(), any(), wait(long, int), and(java.lang.Number)]]; }], status: 500 } Why? El sábado, 1 de noviembre de 2014 13:02:13 UTC-3, Ryan Ernst escribió: The root cause of the error is here: ScriptException[dynamic scripting for [mvel] disabled]; I would guess you are running on ES 1.2 or 1.3? Dynamic scripting was disabled by default in 1.2, and for non sandboxed languages in 1.3. In 1.4, the default script language was changed to Groovy, which is sandboxed, and thus can be safely compiled dynamically. See this blog for more details: http://www.elasticsearch.org/blog/scripting-security/ If running in 1.3, you can simply change the language of the script: GET /searchtube/_search { query: { function_score: { query: {match: {_all: severed}}, script_score: { script: _score * log(doc['likes'].value + doc['views'].value + 1), lang: groovy } } } } Although you could also use the expr lang (expressions) for this simple script, which will be much faster! On Wednesday, October 29, 2014
Re: JsonObject to SortBuilder object
Here's a simpler way to ask the question: I've got this: GeoDistanceSortBuilder sorter = new GeoDistanceSortBuilder(values.geo_location); sorter.point(0.0, 0.0); sorter.order(SortOrder.DESC); sorter.unit(DistanceUnit.KILOMETERS); sorter.geoDistance(GeoDistance.PLANE); sorter.toString() Which produces the string _geo_distance{ values.geo_location : [ 0.0, 0.0 ], unit : km, distance_type : plane, reverse : true } I would like to do the opposite. I have the above string, and I want to turn it into a GeoDistanceSortBuilder without having to manually parse it. On Monday, January 19, 2015 at 7:50:17 PM UTC-5, Alex Thurston wrote: I would like to turn an arbitrary JsonObject (which presumably follows the Search/Sort DSL into a SortBuilder which can then be passed to the SearchRequestBuilder::addSort. I've gotten this to work by simple parsing the JsonObject myself and calling the appropriate calls in the SortBuilder, but that means that I have to implement the parsing for every variation of the DSL. If I've got a Java JsonObject that looks like: { first_name: asc } OR { first_name: { order: asc } } OR { _geo_distance:{ my_position:{ order: asc } } } All of which are valid Json for the sort, I would imagine there's a way to call: JsonObject sort_json = EXAMPLE FROM ABOVE SortBuilder sort = new SortBuilder() sort.setSort(sort_json); I'm almost certain I'm missing something but can't for the life of me figure out how to do it. Thanks in advance. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a756b6b3-7e28-44f7-be88-e8d6e102b6e5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
JsonObject to SortBuilder object
I would like to turn an arbitrary JsonObject (which presumably follows the Search/Sort DSL into a SortBuilder which can then be passed to the SearchRequestBuilder::addSort. I've gotten this to work by simple parsing the JsonObject myself and calling the appropriate calls in the SortBuilder, but that means that I have to implement the parsing for every variation of the DSL. If I've got a Java JsonObject that looks like: { first_name: asc } OR { first_name: { order: asc } } OR { _geo_distance:{ my_position:{ order: asc } } } All of which are valid Json for the sort, I would imagine there's a way to call: JsonObject sort_json = EXAMPLE FROM ABOVE SortBuilder sort = new SortBuilder() sort.setSort(sort_json); I'm almost certain I'm missing something but can't for the life of me figure out how to do it. Thanks in advance. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c09c4e67-68fe-4552-9161-eca139264511%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Getting SQLFeatureNotSupportedException while connecting Hbase phoenix via river-jdbc
Rimita, this was fixed in phoenix 3.1.0, pls follow these instructions: http://lessc0de.github.io/connecting_hbase_to_elasticsearch.html On Thu, Dec 11, 2014 at 3:53 AM, cto@TCS rimita.mit...@gmail.com wrote: Thank you so much On Thursday, December 11, 2014 12:49:49 PM UTC+5:30, cto@TCS wrote: Hi, I have a HBase database and I use phoenix as a RDBMS skin over it. Now I am trying to retrieve those data via ElasticSearch using the river-jdbc plugin. I am using the following:- 1). elasticsearch-1.4.0 2). elasticsearch-river-jdbc-1.4.0.3.Beta1 3). phoenix-3.0.0-incubating-client 4). HBase 0.94.1 But I keep getting the following exception when I try to create a river. [2014-12-11 12:34:48,957][INFO ][org.apache.zookeeper.ZooKeeper] Session: 0x14a380290fe001d closed [2014-12-11 12:34:48,957][INFO ][org.apache.zookeeper.ClientCnxn] EventThread shut down [2014-12-11 12:34:49,022][ERROR][river.jdbc.SimpleRiverSource] while opening read connection: jdbc:phoenix:localhost:2181 null java.sql.SQLFeatureNotSupportedException at org.apache.phoenix.jdbc.PhoenixConnection.setReadOnly( PhoenixConnection.java:587) at org.xbib.elasticsearch.river.jdbc.strategy.simple. SimpleRiverSource.getConnectionForReading(SimpleRiverSource.java:226) at org.xbib.elasticsearch.river.jdbc.strategy.simple. SimpleRiverSource.execute(SimpleRiverSource.java:376) at org.xbib.elasticsearch.river.jdbc.strategy.simple. SimpleRiverSource.fetch(SimpleRiverSource.java:320) at org.xbib.elasticsearch.river.jdbc.strategy.simple. SimpleRiverFlow.fetch(SimpleRiverFlow.java:209) at org.xbib.elasticsearch.river.jdbc.strategy.simple. SimpleRiverFlow.execute(SimpleRiverFlow.java:139) at org.xbib.elasticsearch.plugin.jdbc.RiverPipeline.request( RiverPipeline.java:88) at org.xbib.elasticsearch.plugin.jdbc.RiverPipeline.call( RiverPipeline.java:66) at org.xbib.elasticsearch.plugin.jdbc.RiverPipeline.call( RiverPipeline.java:30) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) [2014-12-11 12:34:49,432][INFO ][river.jdbc.RiverMetrics ] pipeline org.xbib.elasticsearch.plugin.jdbc.RiverPipeline@700dd36f is running: river jdbc/myriver metrics: 0 rows, 0.0 mean, (0.0 0.0 0.0), ingest metrics: elapsed 9 seconds, 0.0 bytes bytes, 0.0 bytes avg, 0 MB/s Please help! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0fff1fbc-c05e-4479-b095-d0d345b74512%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/0fff1fbc-c05e-4479-b095-d0d345b74512%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOtKWX7e%2Br6k1ux2oY1%3DFbPapv-ibXj2chn_YZDX0d934bmX1w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch Maven plugin on GitHub
Here's my suggestion: modify StartElasticsearchNodeMojo#execute() and, right before the method returns (ie. after the ES node is started) read a System property called waitIndefinitely. If the property is set, then wait indefinitely (see below for details), else continue the execution. You will have to provide that property when running maven, like mvn clean verify -DwaitIndefinitely, for the plugin to wait indefinitely. I hope that helps. Let me now if you need additional help. To wait indefinitely: see the waitIndefinitely() method in http://svn.apache.org/viewvc/tomcat/maven-plugin/tags/tomcat-maven-plugin-2.0/tomcat7-maven-plugin/src/main/java/org/apache/tomcat/maven/plugin/tomcat7/run/AbstractRunMojo.java?view=markup alex On Thu, Dec 4, 2014 at 2:13 AM, Chetan Padhye chetanpad...@gmail.com wrote: Hi Good plugin . . I tried to run it but it start and then stop once pom execution is finished. How can we modify plugin to keep it running once started. My intention is to use this plugin for demo installations. so i can install elastic search node and start it on any machine for my demo. On Friday, 17 January 2014 06:14:22 UTC+5:30, AlexC wrote: If anyone is interested in using a Maven plugin to run Elasticsearch for integration testing, I just published one on GitHub: https://github.com/alexcojocaru/elasticsearch-maven-plugin. It is an alternative to starting a node through the code. The readme should provide enough information, but let me know if something is missing or not clear enough. It uses ES v0.90.7, but it can be easily updated to the latest ES version by changing the dependency version in the pom.xml file. alex -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/0I2TGylTRHc/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a9188126-93df-4d43-aaf8-4c324ceb12a3%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/a9188126-93df-4d43-aaf8-4c324ceb12a3%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- [image: Ping Identity logo] https://www.pingidentity.com/ Alex Cojocaru Sr. Development Engineer @ acojoc...@pingidentity.com [image: phone] +1 604.697.7052 Connect with us… [image: twitter logo] https://twitter.com/pingidentity [image: youtube logo] https://www.youtube.com/user/PingIdentityTV [image: LinkedIn logo] https://www.linkedin.com/company/21870 [image: Facebook logo] https://www.facebook.com/pingidentitypage [image: Google+ logo] https://plus.google.com/u/0/114266977739397708540 [image: slideshare logo] http://www.slideshare.net/PingIdentity [image: flipboard logo] http://flip.it/vjBF7 [image: rss feed icon] https://www.pingidentity.com/blogs/ -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHUBgW8an8uQZHmdEZdnryXDMhjiSTpmV%2B%2BAXpxzvoOGsZnNcA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Is there any way to pass search query in POST parameter rather than body?
Thanks Kelsey that could be useful. I managed to get my UI framework (ExtJS) to play better with POST so I am not dependent on having to use GET any more On Wed Nov 26 2014 at 5:54:43 PM Kelsey Hamer kelsey.ha...@gmail.com wrote: I had a similar issue. I manged to get the parameters from the post by doing: @Override public void handleRequest(final RestRequest request, final RestChannel channel) { MapString,String params = new HashMapString, String(); RestUtils.decodeQueryString(request.content().toUtf8(), 0, params); String paramValue = params.get(parameter); //DO SOMETHING } Notice that the json query you want to pass in doesn't need to be encoded on the client side (with an Http.GET it needs to be) Hope that helps On Thursday, January 31, 2013 10:03:33 AM UTC-8, AlexR wrote: I am already doing it with GET and source parameter and it works well. One huge benefit is that size and start (and hopefully sort but have not tested it yet) url parameters override whatever is in source={...} - big help integrating with UI components that manage paging and generate these http parameters Now the problem is that for all practical purposes uri length is limited to 2000 characters so GET may very well fail with bigger queries (as I said query with facet based filtered, facets themselves, filters can get pretty long plus of course url-encoding of all spaces and {} ) I wish the same functionality were available via POST. Couldn't ES check encoding in POST header and if it is *application/x-www-form-urlencoded *just extract encoded parameters and use source parameter just like it does with GET? Do you think I should put an enhancement request into Git? -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/tDmS4ABPag0/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bddfebfc-519f-4320-b8aa-024849ae0c31%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/bddfebfc-519f-4320-b8aa-024849ae0c31%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAAY7rMS4sh%2BDHF%2BYTCOs6OT3Q7iZfCDOR2o_ct9adqP7dPTH%3Dw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Elasticsearch fuzzy intersection of two arrays
I have an object in the Elasticsearch index that has a nested object which is a list of strings. I would like to do the intersection against this list in both exact and fuzzy ways. So for example I have browser names with versions in the index like: browsers: [{name:Chrome 38}, {name:Firefox 32}, {name:Safari 5}] the request could be: [{name:Chrome 38}, {name:IE 10}] then I have just 1 exact match. or another example: [{name:Chrome 39}, {name:Firefox 33}, {name:Safari 5] here I have 2 fuzzy-matches(Levenstein=2) and 1 exact match Those results which have more matches should be on top. How would you write this kind of query ? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9d47a409-acc1-466d-99ba-47e264f68360%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Kibana 4 - filters in a dashboard
Dear Elasticsearch team, could you possibly please clarify a question regarding Kibana 4 vs Kibana 3 Dashboard feature. In Kibana 3 Dashboard I was able to interact with widgets i.e drill-down the data by clicking on basically any of the widgets to add more filters to current dashboard. Does it suppose to work the same way in Kibana 4 as well? It does not work for me at all now and I'm wondering is that's something that I can not figure out how to do or is it something that's not implemented yet. Thanks in advance, Alex. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/aabaa0a3-20c0-4f3d-8241-9a3f3953624e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: unable to make snapshots to NFS filesystem
Ciprian, Thanks for your input - I had indeed missed that disk space failure and it turns out I was hitting an intermittent disk space issue. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/02b41de5-478e-4083-b5d2-c9b493f24732%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
unable to make snapshots to NFS filesystem
Hi all, I have been struggling to put together a backup solution for my ES cluster. As far as I understand the documentation at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html I can't understand why the following might be failing: I have exported an NFS filesystem to both nodes of my 2-node ES cluster, mounted as /srv/backup. I created the elastic search user on the NFS server too and then [root@back01 ~]# ls -ld /srv/backup/es_backup drwxrwx---. 3 elasticsearch elasticsearch 4096 Sep 29 18:37 /srv/backup/es_backup Start with a clean filesystem: [root@logdata01 ~]# rm -rf /srv/backup/* Register the backup area: [root@logdata01 ~]# curl -s -XPUT http://localhost:9200/_snapshot/backup -d '{ type: fs, settings: { location: /srv/backup } }' {acknowledged:true} Create a snapshot: [root@logdata01 ~]# curl -XPUT localhost:9200/_snapshot/backup/tcom_snapshot?wait_for_completion=truepretty I then get failures on various shards https://gist.github.com/alexharv074/b4c7d35028c425f70f20 Any help on how I could get this cluster into a sane state that can be backed up greatly appreciated. Best regards Alex -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ec09f9bd-5075-4bd5-adf7-d88cbb636c1f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: unable to make snapshots to NFS filesystem
Thanks for responding. It doesn't seem to be a permissions problem - [root@logdata01 ~]# ls -ld /srv/backup drwxrwx---. 3 elasticsearch elasticsearch 4096 Sep 29 18:43 /srv/backup [root@logdata01 ~]# find /srv/backup/ \! -user elasticsearch -or \! -group elasticsearch [root@logdata01 ~]# [root@logdata01 ~]# find /srv/backup -ls |head 1310734 drwxrwx--- 3 elasticsearch elasticsearch 4096 Sep 29 18:43 /srv/backup 1310764 drwxr-xr-x 12 elasticsearch elasticsearch 4096 Sep 29 18:37 /srv/backup/indices 1310954 drwxr-xr-x 6 elasticsearch elasticsearch 4096 Sep 29 18:42 /srv/backup/indices/logstash-2014.09.28 1310968 -rw-r--r-- 1 elasticsearch elasticsearch 4120 Sep 29 18:37 /srv/backup/indices/logstash-2014.09.28/snapshot-tcom_snapshot 1311894 drwxr-xr-x 2 elasticsearch elasticsearch 4096 Sep 29 18:37 /srv/backup/indices/logstash-2014.09.28/3 1311938 -rw-r--r-- 1 elasticsearch elasticsearch 4443 Sep 29 18:37 /srv/backup/indices/logstash-2014.09.28/3/__3 1312014 -rw-r--r-- 1 elasticsearch elasticsearch 689 Sep 29 18:37 /srv/backup/indices/logstash-2014.09.28/3/__b 1312024 -rw-r--r-- 1 elasticsearch elasticsearch 61 Sep 29 18:37 /srv/backup/indices/logstash-2014.09.28/3/__c 1312064 -rw-r--r-- 1 elasticsearch elasticsearch 281 Sep 29 18:37 /srv/backup/indices/logstash-2014.09.28/3/__g 1312004 -rw-r--r-- 1 elasticsearch elasticsearch 349 Sep 29 18:37 /srv/backup/indices/logstash-2014.09.28/3/__a On Monday, September 29, 2014 8:02:42 PM UTC+10, Mark Walkom wrote: Can you do an ls -ld /srv/backup and provide the output? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 29 September 2014 18:45, Alex Harvey alexh...@gmail.com javascript: wrote: Hi all, I have been struggling to put together a backup solution for my ES cluster. As far as I understand the documentation at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html I can't understand why the following might be failing: I have exported an NFS filesystem to both nodes of my 2-node ES cluster, mounted as /srv/backup. I created the elastic search user on the NFS server too and then [root@back01 ~]# ls -ld /srv/backup/es_backup drwxrwx---. 3 elasticsearch elasticsearch 4096 Sep 29 18:37 /srv/backup/es_backup Start with a clean filesystem: [root@logdata01 ~]# rm -rf /srv/backup/* Register the backup area: [root@logdata01 ~]# curl -s -XPUT http://localhost:9200/_snapshot/backup -d '{ type: fs, settings: { location: /srv/backup } }' {acknowledged:true} Create a snapshot: [root@logdata01 ~]# curl -XPUT localhost:9200/_snapshot/backup/tcom_snapshot?wait_for_completion=truepretty I then get failures on various shards https://gist.github.com/alexharv074/b4c7d35028c425f70f20 Any help on how I could get this cluster into a sane state that can be backed up greatly appreciated. Best regards Alex -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ec09f9bd-5075-4bd5-adf7-d88cbb636c1f%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/ec09f9bd-5075-4bd5-adf7-d88cbb636c1f%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1160a631-29b6-4242-97ec-67f446da81bb%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Using ES as a primary datastore.
ES is a fantastic search engine but there is some risk http://aphyr.com/posts/317-call-me-maybe-elasticsearch of data loss, and a few other https://www.quora.com/Why-should-I-NOT-use-ElasticSearch-as-my-primary-datastore potential disadvantages which might or might not be relevant to you. You can always combine ES via JDBC river https://github.com/jprante/elasticsearch-river-jdbc with a stable, secure database, e.g. Mysql https://www.quora.com/How-do-i-use-Elastic-search-with-mysql-database-I-am-currently-experimenting-with-jdbc-river-but-will-it-be-fast-enough-in-productionor Hbase http://lessc0de.github.io/connecting_hbase_to_elasticsearch.html, since you have lots of data hbase might be a better option. On Wed, Sep 17, 2014 at 8:04 AM, Thomas thomas.bo...@gmail.com wrote: Hi, You have to calculate the volumes you will keep in one shard first then you have to break your volumes into the number of shards you will maintain and then scale accordingly into a number of nodes, or at least as your volumes grow you should grow your cluster as well. It is difficult to predict what problems may arise it is too generic your case, what will be the usage of the cluster? what queries you will perform, you will mostly do indexing and occasionally querying or you will intensively query your data. Most important you need to think how you will partition your data, will you have one index, multiple index like a logstash approach? or not Maybe check here: https://www.found.no/foundation/sizing-elasticsearch/ For data more than a year what you will do delete them? Do you afford to lose data? Will you keep backups? IMHO, these are some of the questions you must answer in order to see whether such an approach suit your needs. It is hardware, structure and partitioning of your data. Thomas On Wednesday, 17 September 2014 13:41:55 UTC+3, P Suman wrote: Hello, We are planning to use ES as a primary datastore. Here is my usecase We receive a million transactions per day (all are inserts). Each transaction is around 500KB size, transaction has 10 fields we should be able to search on all 10 fields. We want to keep around 1 yr worth of data, this comes around 180TB Can you please let me know any problems that might arise if i use elastic search as the primary datastore. Regards, Suman -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0612d5d3-05df-4538-a3f0-e87cd9b3dc49%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/0612d5d3-05df-4538-a3f0-e87cd9b3dc49%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOtKWX47iRi6P%2BSp-GC%2B8JL1xmwKoL4yHerMC4PG5rYDiL8YXA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Is there any way to prevent ES from disclosing exception details in REST response?
Thanks Jorg, Unfortunately it is not an option - we are not at liberty to touch anything beyond our app servers. We are using transport-wares servlet for ES and I could easily tweak AbstractServletRestChannel to handle Rest Channel response with codes 400,500 but I would like to avoid modifying the code directly and there is no way to do it nicely. I put a request on github for enhancements of the NodeServlet but was hoping ES may have an option to turn error details on/off. I think it would be nice to control error level in REST responses with three levels - suppress/message/stack-trace On Mon, Sep 15, 2014 at 6:01 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: You can put a revers proxy like nginx between ES cluster and the rest of the world and filter away all HTTP status 500 responses. Jörg On Mon, Sep 15, 2014 at 11:57 PM, AlexR roytm...@gmail.com wrote: We expose ES _search endpoint directly to consumers. When our REST API get scanned for security vulnerabilities it complains on ES returning exception details. For example a malformed query will be included in the response along with exception. While it is more or a less harmless the tool complains of various injections and internals disclosures. I would like to be able to turn error message in the response off (or substitute it with a generic message) in production while keeping normal response logic in development. Is there any way I can do it? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/83e6b126-4db1-44cc-9a0a-0dd6c6d44a64%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/83e6b126-4db1-44cc-9a0a-0dd6c6d44a64%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/Qx1TPSMS9ro/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdM-ih%2BW_XCgHiphemWSSGtmK2Z5C%3D%2BiWD-ciBokbWHg%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdM-ih%2BW_XCgHiphemWSSGtmK2Z5C%3D%2BiWD-ciBokbWHg%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAAY7rMRi4zsnv7QapKEr3iwyWxY0EDOnDczDeYa_0nzsFYT9sQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Is there any way to prevent ES from disclosing exception details in REST response?
I guess I could but it would mean passing a response wrapper to capture output stream and then copy it to real request or discard it in case of an error. That would be a second copy of response - the first one being done in the NodeServlet - will hurt performance for large responses :-( On Mon, Sep 15, 2014 at 6:40 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: Then why don't you simply add a servlet filter that filters unwanted responses away? Jörg On Tue, Sep 16, 2014 at 12:21 AM, Alex Roytman roytm...@gmail.com wrote: Thanks Jorg, Unfortunately it is not an option - we are not at liberty to touch anything beyond our app servers. We are using transport-wares servlet for ES and I could easily tweak AbstractServletRestChannel to handle Rest Channel response with codes 400,500 but I would like to avoid modifying the code directly and there is no way to do it nicely. I put a request on github for enhancements of the NodeServlet but was hoping ES may have an option to turn error details on/off. I think it would be nice to control error level in REST responses with three levels - suppress/message/stack-trace On Mon, Sep 15, 2014 at 6:01 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: You can put a revers proxy like nginx between ES cluster and the rest of the world and filter away all HTTP status 500 responses. Jörg On Mon, Sep 15, 2014 at 11:57 PM, AlexR roytm...@gmail.com wrote: We expose ES _search endpoint directly to consumers. When our REST API get scanned for security vulnerabilities it complains on ES returning exception details. For example a malformed query will be included in the response along with exception. While it is more or a less harmless the tool complains of various injections and internals disclosures. I would like to be able to turn error message in the response off (or substitute it with a generic message) in production while keeping normal response logic in development. Is there any way I can do it? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/83e6b126-4db1-44cc-9a0a-0dd6c6d44a64%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/83e6b126-4db1-44cc-9a0a-0dd6c6d44a64%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/Qx1TPSMS9ro/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdM-ih%2BW_XCgHiphemWSSGtmK2Z5C%3D%2BiWD-ciBokbWHg%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdM-ih%2BW_XCgHiphemWSSGtmK2Z5C%3D%2BiWD-ciBokbWHg%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAAY7rMRi4zsnv7QapKEr3iwyWxY0EDOnDczDeYa_0nzsFYT9sQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAAY7rMRi4zsnv7QapKEr3iwyWxY0EDOnDczDeYa_0nzsFYT9sQ%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/Qx1TPSMS9ro/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH8fWhWpAKU4MT6oVxgRNG%2BBS1Y3QdOyn2eWiqwCErsJQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH8fWhWpAKU4MT6oVxgRNG%2BBS1Y3QdOyn2eWiqwCErsJQ%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch
Re: Linking of query/search
you can combine ES with RDBMS, and run your SQL queries either directly against db, or pull data via JDBC River into ES, I wrote about it here: http://lessc0de.github.io/connecting_hbase_to_elasticsearch.html On Fri, Sep 12, 2014 at 10:55 AM, Ivan Brusic i...@brusic.com wrote: You cannot join documents in Lucene/Elasticsearch (at least not like a RDBMS). You would need to either denormalize your data, join on the client side or execute 2+ queries. -- Ivan On Fri, Sep 12, 2014 at 12:45 AM, matej.zerov...@gmail.com wrote: Hello! Can anyone shine some light on my question? Is the query in question achievable in ES directly? If not, I can probably do that in application later, but it would be nicer if ES could serve me the final results. Matej -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6f3345f2-4b25-4b06-b203-4ad0de201e8f%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/6f3345f2-4b25-4b06-b203-4ad0de201e8f%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBgybZpCz1bKV%3DE7XF_cHGDuFKS1wruKNAYZTbo8t0jvA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBgybZpCz1bKV%3DE7XF_cHGDuFKS1wruKNAYZTbo8t0jvA%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOtKWX623repUH5k2XbkFBFNu-b3cSKyObuyf793AVhOt3Gb-Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Connecting Hbase to Elasticsearch
I posted step-by-step instructions here http://lessc0de.github.io/connecting_hbase_to_elasticsearch.html on using Apache Hbase/Phoenix with Elasticsearch JDBC River. This might be useful to Elasticsearch users who want to use Hbase as a primary data store, and to Hbase users who wish to enable full-text search on their existing tables via Elasticsearch API. Alex -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOtKWX4R81324NmKZou_zCT0e-DbFv%2BmWHg_pAinCmUapwyYcA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Backup and restore using snapshots
I could still use feedback on this plan. On Sunday, August 31, 2014 9:08:12 PM UTC+10, Alex Harvey wrote: Hi all I could use some help getting my head around the snapshot and restore functionality in ES. I have a requirement to do incremental daily tape backups and full backups weekly using EMC's Avamar backup software. I'd really appreciate if someone can tell me if the following plan is going to work - 1) Export an NFS filesystem from the storage node to both ES data nodes, and mount that as /mnt/backup on both nodes. 2) From one of the ES nodes register this directory as the shared repository: curl -XPUT 'http://localhost:9200/_snapshot/backup' -d '{type: fs,settings: {location: /mnt/backup}}' 3) On Saturday do a full backup: i. Get a list of all snapshots using: curl -XGET 'localhost:9200/_snapshot/_status' ii. For each of these delete using a command like: curl -XDELETE 'localhost:9200/_snapshot/backup/snapshot_20140830' iii. Create a full backup using: curl -XPUT localhost:9200/_snapshot/backup/snapshot_$(date +%Y%m%d)?wait_for_completion=true iv. Copy the /mnt/backup directory to tape telling Avamar to take a full backup 4) On Sunday to Friday do incremental backups based on the Saturday backup: i. Simply run: curl -XPUT localhost:9200/_snapshot/backup/snapshot_$(date +%d%m%Y)?wait_for_completion=true ii. Copy /mnt/backup to tape telling Avamar to take an incremental backup Is this plan going to work? Is there a better way? Thanks very much in advance. Best regards, Alex -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6bea9d13-5137-41e6-842c-32fbe71c56b8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Get distinct data
Hi all! I have problem with getting unique data from elasticsearch. I have the following documents: [ { message: Message 1, author: { id: 4, name: Author Name }, sourceId: 123456789, userId: 123456 }, { message: Message 1, author: { id: 4, name: Author Name }, sourceId: 123456789, userId: 654321 } ] Different between this documents in userId. When I send query by author.id, I get response with 2 documents. Can I get distinct data by sourceId field? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1b59b2a2-484b-46cc-a95b-695e84e6d6eb%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Get distinct data
Hi Vineeth! Thanks for your answer. I use term aggregation, but I get anyway response with 2 documents, response data for example: { took:23, timed_out:false, _shards:{total:5,successful:5,failed:0}, hits:{ total:2, max_score:null, hits:[ { _index:feeditem_local,\ _type:FeedItem, _id:53dbe9cf1d7859e15f8b4599, _score:null, _source:{ sourceId:123456789, message:Message 1, author:{id:120816414}, userId: 123456 }, sort:[1406921136000] }, { _index:feeditem_local,\ _type:FeedItem, _id:53dbe9cf1d7859e15f8b4599, _score:null, _source:{ sourceId:123456789, message:Message 1, author:{id:120816414}, userId: 654321 }, sort:[1406921136000] } ] }, aggregations:{ source:{ buckets:[ {key:123456789,doc_count:2} ] } } } вторник, 2 сентября 2014 г., 9:45:41 UTC+3 пользователь vineeth mohan написал: Hello Alex , Term aggregation is here to save your day - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation Thanks Vineeth On Tue, Sep 2, 2014 at 12:07 PM, Alex T atr...@gmail.com javascript: wrote: Hi all! I have problem with getting unique data from elasticsearch. I have the following documents: [ { message: Message 1, author: { id: 4, name: Author Name }, sourceId: 123456789, userId: 123456 }, { message: Message 1, author: { id: 4, name: Author Name }, sourceId: 123456789, userId: 654321 } ] Different between this documents in userId. When I send query by author.id, I get response with 2 documents. Can I get distinct data by sourceId field? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1b59b2a2-484b-46cc-a95b-695e84e6d6eb%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/1b59b2a2-484b-46cc-a95b-695e84e6d6eb%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8dccd0b8-972f-419c-bb94-3291d412844b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Backup and restore using snapshots
Hi all I could use some help getting my head around the snapshot and restore functionality in ES. I have a requirement to do incremental daily tape backups and full backups weekly using EMC's Avamar backup software. I'd really appreciate if someone can tell me if the following plan is going to work - 1) Export an NFS filesystem from the storage node to both ES data nodes, and mount that as /mnt/backup on both nodes. 2) From one of the ES nodes register this directory as the shared repository: curl -XPUT 'http://localhost:9200/_snapshot/backup' -d '{type: fs,settings: {location: /mnt/backup}}' 3) On Saturday do a full backup: i. Get a list of all snapshots using: curl -XGET 'localhost:9200/_snapshot/_status' ii. For each of these delete using a command like: curl -XDELETE 'localhost:9200/_snapshot/backup/snapshot_20140830' iii. Create a full backup using: curl -XPUT localhost:9200/_snapshot/backup/snapshot_$(date +%Y%m%d)?wait_for_completion=true iv. Copy the /mnt/backup directory to tape telling Avamar to take a full backup 4) On Sunday to Friday do incremental backups based on the Saturday backup: i. Simply run: curl -XPUT localhost:9200/_snapshot/backup/snapshot_$(date +%d%m%Y)?wait_for_completion=true ii. Copy /mnt/backup to tape telling Avamar to take an incremental backup Is this plan going to work? Is there a better way? Thanks very much in advance. Best regards, Alex -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2a21ece7-533c-480e-9ac1-218d76d85385%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Duplicate function MVEL script
The problem with MVEL that you can't redefine defined function in a script instance. Script class instantiates once the query starts, and then it's executing it again and again. MVEL is bad for complex scripting. Yes, you could use groovy,and should :) I found a good way to use it with the next code: *import* groovy.lang.Script *class* MyScript *extends* Script { *def* run() { // your code is here, also binded variables should be available here } } So how it works: 1. Groovy compiles this script and put to class cache. 2. One each query MyScript instance is created (on per node) 3. On each document run() method is executed (It should provide different return values for filter script, score script, sort script, script fields) Alex On Wednesday, August 27, 2014 5:50:11 PM UTC+3, k...@stylelabs.com wrote: Hello We are executing some concurrent updates on the same document using an MVEL script together with some parameters. The MVEL script contains some functions such as addRelations etc but there is no sign of duplicate functions. ES throws the following error: [John Kafka][inet[/10.12.1.219:9300]][update]]; nested: ElasticsearchIllegalArgumentException[failed to execute script]; nested: *CompileException*[[Error: *duplicate function: addRelations*] [Near : {... def addRelations(relationNode, }] ^ [Line: 1, Column: 1] ES Version 1.3.2 If the updates are executed sequentially there is no error/problem with the MVEL script. Any idea's? Best Regards, Kristof -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e549c53b-90ae-41ca-b106-5c6e812417f7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: groovy for scripting
providing self-update: I found that I could create cross-request cache using next script (like a cross-request incrementer): POST /test/_search { query: {match_all:{}}, script_fields: { a: { script: import groovy.lang.Script;class A extends Script{static i=0;def run() {i++}}, lang: groovy } } } In good view mode the script is: import groovy.lang.Script class A extends Script{ static i=0 def run() { i++ } } Actually here *i* variable is not thread-safe, but idea is clean - you need define a class, inherited from Script and implement abstract method run. Also this class is access on each node-thread. Now I'm looking for a solution to make a query-scope type counter (for one-node configuration). I think it's could be done by passing unique query_id in parameters, but I'm afraid of making code non thread safe, or vice versa - thread safe, but with reduce performance. Researching more... -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fb402d2c-8820-4a1f-99e0-0453c0c82cf6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
groovy for scripting
I'm playing around with groovy scripting. By checking groovy lang plugin source code I found next steps in code execution: 1. Code compilation into script class 2. Script initialization via static method newInstance() 3. Script execution via calling the code on each document with binding document parameters Now assume I have class declaration in my script. Is it possible to execute class definition and class object initialization only once, and execute only a method from this object on each document? Thanks P.S. posting the same on SO -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bb3a9ca6-79fd-4ac6-ac78-ce0c102b9505%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: A few questions about node types + usage
Hello again Mark, Thanks for your response. Your answers really are very helpful. As with our previous conversation https://groups.google.com/d/topic/elasticsearch/ZouS4NVsTJw/discussion I am confused about how to make a client node also be master eligible. This is what I posted there, I would really like some help understanding this: I've done more investigating and it seems that a Client (AKA Query) node cannot also be a Master node. As it says here http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html#master-election *Nodes can be excluded from becoming a master by setting node.master to false. Note, once a node is a client node (node.client set to true), it will not be allowed to become a master (node.master is automatically set to false).* And from the elasticsearch.yml config file it says: *# 2. You want this node to only serve as a master: to not store any data and # to have free resources. This will be the coordinator of your cluster. # #node.master: true #node.data: false # # 3. You want this node to be neither master nor data node, but # to act as a search load balancer (fetching data from nodes, # aggregating results, etc.) # #node.master: false #node.data: false* So I'm wondering how exactly you set up your client nodes to also be master nodes. It seems like a master node can only either be purely a master or master + data. Perhaps you could show the relevant parts of one of your client node's config? Many thanks, Alex On Saturday, 16 August 2014 01:04:37 UTC+1, Mark Walkom wrote: 1 - Up to you. We use the http output and then just use a round robin A record to our 3 masters. 2 - They are routed but it makes more sense to specify. 3 - You're right, but most people only use 1 or 2 masters which is why they get recommended to have at least 3. 4 - That sounds like a lot. We use masters that double as clients and they only have 8GB, our use sounds similar and we don't have issues. I wouldn't bother with 3 client only nodes to start, use them as master and client and then if you find you are hitting memory issues due to queries you can re-evaluate things. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 15 August 2014 20:11, Alex alex@gmail.com javascript: wrote: Bump. Any help? Thanks On Wednesday, 13 August 2014 12:10:14 UTC+1, Alex wrote: Hello I would like some clarification about node types and their usage. We will have 3 client nodes and 6 data nodes. The 6 1TB data nodes can also be masters (discovery.zen.minimum_master_nodes set to 4). We will use Logstash and Kibana. Kibana will be used 24/7 by between a couple and handfuls of people. Some questions: 1. Should incoming Logstash write requests be sent to the cluster in general (using the *cluster* setting in the *elasticsearch* output) or specifically to the client nodes or to the data nodes (via load balancer)? I am unsure what kind of node is best for handling writes. 2. If client nodes exist in the cluster are Kibana requests automatically routed to them? Do I need to somehow specify to Kibana which nodes to contact? 3. I have heard different information about master nodes and the minimum_master_node setting. I've heard that you should have a odd number of master nodes but I fail to see why the parity of the number of masters matters as long as minimum_master_node is set to at least N/2 + 1. Does it really need to be odd? 4. I have been advised that the client nodes will use huge amount of memory (which makes sense due to the nature of the Kibana facet queries). 64GB per client node was recommended but I have no idea if that sounds right or not. I don't have the ability to actually test it right now so any more guidance on that would be helpful. I'd be so grateful to hear from you even if you only know something about one of my queries. Thank you for your time, Alex -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/70b16a1e-319c-4f7c-b129-b68258b3652f%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/70b16a1e-319c-4f7c-b129-b68258b3652f%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web
Re: A few questions about node types + usage
Bump. Any help? Thanks On Wednesday, 13 August 2014 12:10:14 UTC+1, Alex wrote: Hello I would like some clarification about node types and their usage. We will have 3 client nodes and 6 data nodes. The 6 1TB data nodes can also be masters (discovery.zen.minimum_master_nodes set to 4). We will use Logstash and Kibana. Kibana will be used 24/7 by between a couple and handfuls of people. Some questions: 1. Should incoming Logstash write requests be sent to the cluster in general (using the *cluster* setting in the *elasticsearch* output) or specifically to the client nodes or to the data nodes (via load balancer)? I am unsure what kind of node is best for handling writes. 2. If client nodes exist in the cluster are Kibana requests automatically routed to them? Do I need to somehow specify to Kibana which nodes to contact? 3. I have heard different information about master nodes and the minimum_master_node setting. I've heard that you should have a odd number of master nodes but I fail to see why the parity of the number of masters matters as long as minimum_master_node is set to at least N/2 + 1. Does it really need to be odd? 4. I have been advised that the client nodes will use huge amount of memory (which makes sense due to the nature of the Kibana facet queries). 64GB per client node was recommended but I have no idea if that sounds right or not. I don't have the ability to actually test it right now so any more guidance on that would be helpful. I'd be so grateful to hear from you even if you only know something about one of my queries. Thank you for your time, Alex -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/70b16a1e-319c-4f7c-b129-b68258b3652f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Recommendations needed for large ELK system design
Hi Mark, I've done more investigating and it seems that a Client (AKA Query) node cannot also be a Master node. As it says here http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html#master-election *Nodes can be excluded from becoming a master by setting node.master to false. Note, once a node is a client node (node.client set to true), it will not be allowed to become a master (node.master is automatically set to false).* And from the elasticsearch.yml config file it says: *# 2. You want this node to only serve as a master: to not store any data and # to have free resources. This will be the coordinator of your cluster. # #node.master: true #node.data: false # # 3. You want this node to be neither master nor data node, but # to act as a search load balancer (fetching data from nodes, # aggregating results, etc.) # #node.master: false #node.data: false* So I'm wondering how exactly you set up your client nodes to also be master nodes. It seems like a master node can only either be purely a master or master + data. Regards, Alex On Thursday, 31 July 2014 23:57:26 UTC+1, Mark Walkom wrote: 1 - Curator FTW. 2 - Masters handle cluster state, shard allocation and a whole bunch of other stuff around managing the cluster and it's members and data. A node that is master and data set to false is considered a search node. But the role of being a master is not onerous, so it made sense for us to double up the roles. We then just round robin any queries to these three masters. 3 - Yes, butit's entirely dependent on your environment. If you're happy with that and you can get the go-ahead then see where it takes you. 4 - Quorum is automatic and having the n/2+1 means that the majority of nodes will have to take place in an election, which reduces the possibility of split brain. If you set the discovery settings then you are also essentially setting the quorum settings. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 31 July 2014 22:27, Alex alex@gmail.com javascript: wrote: Hello Mark, Thank you for your reply, it certainly helps to clarify many things. Of course I have some new questions for you! 1. I haven't looked into it much yet but I'm guessing Curator can handle different index naming schemes. E.g. logs-2014.06.30 and stats-2014.06.30. We'd actually be wanting to store the stats data for 2 years and logs for 90 days so it would indeed be helpful to split the data into different index sets. Do you use Curator? 2. You say that you have 3 masters that also handle queries... but I thought all masters did was handle queries? What is a master node that *doesn't* handle queries? Should we have search load balancer nodes? AKA not master and not data nodes. 3. In the interests of reducing the number of node combinations for us to test out would you say, then, that 3 master (and query(??)) only nodes, and the 6 1TB data only nodes would be good? 4. Quorum and split brain are new to me. This webpage http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/ about split brain recommends setting *discovery.zen.minimum_master_nodes* equal to *N/2 + 1*. This formula is similar to the one given in the documentation for quorum http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency: index operations only succeed if a quorum (replicas/2+1) of active shards are available. I completely understand the split brain issue, but not quorum. Is quorum handled automatically or should I change some settings? Thanks again for your help, we appreciate your time and knowledge! Regards, Alex On Thursday, 31 July 2014 05:57:35 UTC+1, Mark Walkom wrote: 1 - Looks ok, but why two replicas? You're chewing up disk for what reason? Extra comments below. 2 - It's personal preference really and depends on how your end points send to redis. 3 - 4GB for redis will cache quite a lot of data if you're only doing 50 events p/s (ie hours or even days based on what I've seen). 4 - No, spread it out to all the nodes. More on that below though. 5 - No it will handle that itself. Again, more on that below though. Suggestions; Set your indexes to (factors of) 6 shards, ie one per node, it spreads query performance. I say factors of in that you can set it to 12 shards per index to start and easily scale the node count and still spread the load. Split your stats and your log data into different indexes, it'll make management and retention easier. You can consider a master only node or (ideally) three that also handle queries. Preferably have an uneven number of master eligible nodes, whether you make them VMs
A few questions about node types + usage
Hello I would like some clarification about node types and their usage. We will have 3 client nodes and 6 data nodes. The 6 1TB data nodes can also be masters (discovery.zen.minimum_master_nodes set to 4). We will use Logstash and Kibana. Kibana will be used 24/7 by between a couple and handfuls of people. Some questions: 1. Should incoming Logstash write requests be sent to the cluster in general (using the *cluster* setting in the *elasticsearch* output) or specifically to the client nodes or to the data nodes (via load balancer)? I am unsure what kind of node is best for handling writes. 2. If client nodes exist in the cluster are Kibana requests automatically routed to them? Do I need to somehow specify to Kibana which nodes to contact? 3. I have heard different information about master nodes and the minimum_master_node setting. I've heard that you should have a odd number of master nodes but I fail to see why the parity of the number of masters matters as long as minimum_master_node is set to at least N/2 + 1. Does it really need to be odd? 4. I have been advised that the client nodes will use huge amount of memory (which makes sense due to the nature of the Kibana facet queries). 64GB per client node was recommended but I have no idea if that sounds right or not. I don't have the ability to actually test it right now so any more guidance on that would be helpful. I'd be so grateful to hear from you even if you only know something about one of my queries. Thank you for your time, Alex -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fe5adb02-5cd6-4554-8993-28b8e24160fc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
LookupScript per shard modification (native scripting)
Hi, I'm trying LookupScript example here: https://github.com/imotov/elasticsearch-native-script-example/blob/master/src/main/java/org/elasticsearch/examples/nativescript/script/LookupScript.java The idea of my script is to pre-cache all child documents in LookupScript instance, but I want to query only current shard data. Is it possible? So every shard instance caches only it's documents. Regards, Alex -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bada48a1-7b74-41a0-81a9-564b5061b605%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Cosine Similarity ElasticSearch
Hi, I found some native script codings from Igor Motov here: https://github.com/imotov/elasticsearch-native-script-example/blob/master/src/main/java/org/elasticsearch/examples/nativescript/script/CosineSimilarityScoreScript.java and now playing with it Alex On Friday, August 1, 2014 11:53:24 AM UTC+3, Federico Bianchi wrote: There is someone that can help us? Thank you very much! -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Cosine-Similarity-ElasticSearch-tp4060620p4061039.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5b9025dd-0173-4b09-ae09-31a2f78e99d7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Recommendations needed for large ELK system design
Ok thank you Mark, you've been extremely helpful and we now have a better idea about what we're doing! -Alex On Thursday, 31 July 2014 23:57:26 UTC+1, Mark Walkom wrote: 1 - Curator FTW. 2 - Masters handle cluster state, shard allocation and a whole bunch of other stuff around managing the cluster and it's members and data. A node that is master and data set to false is considered a search node. But the role of being a master is not onerous, so it made sense for us to double up the roles. We then just round robin any queries to these three masters. 3 - Yes, butit's entirely dependent on your environment. If you're happy with that and you can get the go-ahead then see where it takes you. 4 - Quorum is automatic and having the n/2+1 means that the majority of nodes will have to take place in an election, which reduces the possibility of split brain. If you set the discovery settings then you are also essentially setting the quorum settings. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 31 July 2014 22:27, Alex alex@gmail.com javascript: wrote: Hello Mark, Thank you for your reply, it certainly helps to clarify many things. Of course I have some new questions for you! 1. I haven't looked into it much yet but I'm guessing Curator can handle different index naming schemes. E.g. logs-2014.06.30 and stats-2014.06.30. We'd actually be wanting to store the stats data for 2 years and logs for 90 days so it would indeed be helpful to split the data into different index sets. Do you use Curator? 2. You say that you have 3 masters that also handle queries... but I thought all masters did was handle queries? What is a master node that *doesn't* handle queries? Should we have search load balancer nodes? AKA not master and not data nodes. 3. In the interests of reducing the number of node combinations for us to test out would you say, then, that 3 master (and query(??)) only nodes, and the 6 1TB data only nodes would be good? 4. Quorum and split brain are new to me. This webpage http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/ about split brain recommends setting *discovery.zen.minimum_master_nodes* equal to *N/2 + 1*. This formula is similar to the one given in the documentation for quorum http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency: index operations only succeed if a quorum (replicas/2+1) of active shards are available. I completely understand the split brain issue, but not quorum. Is quorum handled automatically or should I change some settings? Thanks again for your help, we appreciate your time and knowledge! Regards, Alex On Thursday, 31 July 2014 05:57:35 UTC+1, Mark Walkom wrote: 1 - Looks ok, but why two replicas? You're chewing up disk for what reason? Extra comments below. 2 - It's personal preference really and depends on how your end points send to redis. 3 - 4GB for redis will cache quite a lot of data if you're only doing 50 events p/s (ie hours or even days based on what I've seen). 4 - No, spread it out to all the nodes. More on that below though. 5 - No it will handle that itself. Again, more on that below though. Suggestions; Set your indexes to (factors of) 6 shards, ie one per node, it spreads query performance. I say factors of in that you can set it to 12 shards per index to start and easily scale the node count and still spread the load. Split your stats and your log data into different indexes, it'll make management and retention easier. You can consider a master only node or (ideally) three that also handle queries. Preferably have an uneven number of master eligible nodes, whether you make them VMs or physicals, that way you can ensure quorum is reached with minimal fuss and stop split brain. If you use VMs for master + query nodes then you might want to look at load balancing the queries via an external service. To give you an idea, we have a 27 node cluster - 3 masters that also handle queries and 24 data nodes. Masters are 8GB with small disks, data nodes are 60GB (30 heap) and 512GB disk. We're running with one replica and have 11TB of logging data. At a high level we're running out of disk more than heap or CPU and we're very write heavy, with an average of 1K events p/s and comparatively minimal reads. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 31 July 2014 01:35, Alex alex@gmail.com wrote: Hello, We wish to set up an entire ELK system with the following features: - Input from Logstash shippers located on 400 Linux VMs. Only a handful of log sources on each
3rd party scoring service
Hello, My idea is to use 3rd party scoring service (REST), and currently I'd like to use native scripts and play with NativeScriptFactory. The approach has many drawbacks. Here is my problem - assume we have two entities - products and product prices. I should filter by price. Price is a complex thing, because it depends on many factors, like request date, remote user information, custom provided parameters. In case of regular parent - child relation and has_child query it's too complex and too slow to implement it using scripting (currently mvel). Also one more condition - i have not many products - around 25K, and around 25M different base price items (which are basic for future price calculation). There are next ideas: 1. Have a service, which returns exact price for all product by custom parameters like. The drawback is - there should be 5 same calls from each shard (if 5 by default). In this case it doesn't matter, where base prices are stored - in elasticsearch index, in database or in in-memory storage. 2. Write a code, which operates over child price documents on concrete shard. In this case it will generate prices only for all properties from particular shard. But I don't know, if I can access shard index or make calls to the index from concrete shard in NativeScriptFactory class. Could you point me the right way? P.S. Initially I was interested in Redis-Elasticsearch example http://java.dzone.com/articles/connecting-redis-elasticsearch Thanks, Alex -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/893b22dc-1415-475b-8675-596119f4f1f8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: 3rd party scoring service
I think it's acceptable if service responds with 20ms and using some thrift protocol for example. It's much better then current 500ms - 5s calculations using elasticsearch scripting. If we have 25K products than it could be around 300Kb data package from this service. The risk is in possible broken communication or some increased latency Alex On Thursday, July 31, 2014 1:59:36 PM UTC+3, Itamar Syn-Hershko wrote: You should bring the price over to Elasticsearch and not the other way around. Scoring against an external service is an added friction with huge performance costs. -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Author of RavenDB in Action http://manning.com/synhershko/ On Thu, Jul 31, 2014 at 1:50 PM, Alex S.V. alexs.v...@gmail.com javascript: wrote: Hello, My idea is to use 3rd party scoring service (REST), and currently I'd like to use native scripts and play with NativeScriptFactory. The approach has many drawbacks. Here is my problem - assume we have two entities - products and product prices. I should filter by price. Price is a complex thing, because it depends on many factors, like request date, remote user information, custom provided parameters. In case of regular parent - child relation and has_child query it's too complex and too slow to implement it using scripting (currently mvel). Also one more condition - i have not many products - around 25K, and around 25M different base price items (which are basic for future price calculation). There are next ideas: 1. Have a service, which returns exact price for all product by custom parameters like. The drawback is - there should be 5 same calls from each shard (if 5 by default). In this case it doesn't matter, where base prices are stored - in elasticsearch index, in database or in in-memory storage. 2. Write a code, which operates over child price documents on concrete shard. In this case it will generate prices only for all properties from particular shard. But I don't know, if I can access shard index or make calls to the index from concrete shard in NativeScriptFactory class. Could you point me the right way? P.S. Initially I was interested in Redis-Elasticsearch example http://java.dzone.com/articles/connecting-redis-elasticsearch Thanks, Alex -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/893b22dc-1415-475b-8675-596119f4f1f8%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/893b22dc-1415-475b-8675-596119f4f1f8%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c61f9637-3de8-4906-a2c4-49055dee2cd5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Recommendations needed for large ELK system design
Hello Mark, Thank you for your reply, it certainly helps to clarify many things. Of course I have some new questions for you! 1. I haven't looked into it much yet but I'm guessing Curator can handle different index naming schemes. E.g. logs-2014.06.30 and stats-2014.06.30. We'd actually be wanting to store the stats data for 2 years and logs for 90 days so it would indeed be helpful to split the data into different index sets. Do you use Curator? 2. You say that you have 3 masters that also handle queries... but I thought all masters did was handle queries? What is a master node that *doesn't* handle queries? Should we have search load balancer nodes? AKA not master and not data nodes. 3. In the interests of reducing the number of node combinations for us to test out would you say, then, that 3 master (and query(??)) only nodes, and the 6 1TB data only nodes would be good? 4. Quorum and split brain are new to me. This webpage http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/ about split brain recommends setting *discovery.zen.minimum_master_nodes* equal to *N/2 + 1*. This formula is similar to the one given in the documentation for quorum http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency: index operations only succeed if a quorum (replicas/2+1) of active shards are available. I completely understand the split brain issue, but not quorum. Is quorum handled automatically or should I change some settings? Thanks again for your help, we appreciate your time and knowledge! Regards, Alex On Thursday, 31 July 2014 05:57:35 UTC+1, Mark Walkom wrote: 1 - Looks ok, but why two replicas? You're chewing up disk for what reason? Extra comments below. 2 - It's personal preference really and depends on how your end points send to redis. 3 - 4GB for redis will cache quite a lot of data if you're only doing 50 events p/s (ie hours or even days based on what I've seen). 4 - No, spread it out to all the nodes. More on that below though. 5 - No it will handle that itself. Again, more on that below though. Suggestions; Set your indexes to (factors of) 6 shards, ie one per node, it spreads query performance. I say factors of in that you can set it to 12 shards per index to start and easily scale the node count and still spread the load. Split your stats and your log data into different indexes, it'll make management and retention easier. You can consider a master only node or (ideally) three that also handle queries. Preferably have an uneven number of master eligible nodes, whether you make them VMs or physicals, that way you can ensure quorum is reached with minimal fuss and stop split brain. If you use VMs for master + query nodes then you might want to look at load balancing the queries via an external service. To give you an idea, we have a 27 node cluster - 3 masters that also handle queries and 24 data nodes. Masters are 8GB with small disks, data nodes are 60GB (30 heap) and 512GB disk. We're running with one replica and have 11TB of logging data. At a high level we're running out of disk more than heap or CPU and we're very write heavy, with an average of 1K events p/s and comparatively minimal reads. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 31 July 2014 01:35, Alex alex@gmail.com javascript: wrote: Hello, We wish to set up an entire ELK system with the following features: - Input from Logstash shippers located on 400 Linux VMs. Only a handful of log sources on each VM. - Data retention for 30 days, which is roughly 2TB of data in indexed ES JSON form (not including replica shards) - Estimated input data rate of 50 messages per second at peak hours. Mostly short or medium length one-line messages but there will be Java traces and very large service responses (in the form of XML) to deal with too. - The entire system would be on our company LAN. - The stored data will be a mix of application logs (info, errors etc) and server stats (CPU, memory usage etc) and would mostly be accessed through Kibana. This is our current plan: - Have the LS shippers perform minimal parsing (but would do multiline). Have them point to two load-balanced servers containing Redis and LS indexers (which would do all parsing). - 2 replica shards for each index, which ramps the total data storage up to 6TB - ES cluster spread over 6 nodes. Each node is 1TB in size - LS indexers pointing to cluster. So I have a couple questions regarding the setup and would greatly appreciate the advice of someone with experience! 1. Does the balance between the number of nodes, the number of replica
Recommendations needed for large ELK system design
Hello, We wish to set up an entire ELK system with the following features: - Input from Logstash shippers located on 400 Linux VMs. Only a handful of log sources on each VM. - Data retention for 30 days, which is roughly 2TB of data in indexed ES JSON form (not including replica shards) - Estimated input data rate of 50 messages per second at peak hours. Mostly short or medium length one-line messages but there will be Java traces and very large service responses (in the form of XML) to deal with too. - The entire system would be on our company LAN. - The stored data will be a mix of application logs (info, errors etc) and server stats (CPU, memory usage etc) and would mostly be accessed through Kibana. This is our current plan: - Have the LS shippers perform minimal parsing (but would do multiline). Have them point to two load-balanced servers containing Redis and LS indexers (which would do all parsing). - 2 replica shards for each index, which ramps the total data storage up to 6TB - ES cluster spread over 6 nodes. Each node is 1TB in size - LS indexers pointing to cluster. So I have a couple questions regarding the setup and would greatly appreciate the advice of someone with experience! 1. Does the balance between the number of nodes, the number of replica shards, and storage size of each node seem about right? We use high-performance equipment and would expect minimal downtime. 2. What is your recommendation for the system design of the LS indexers and Redis? I've seen various designs with each indexer assigned to a single Redis, or all indexers reading from all Redises. 3. Leading from the previous question, what would your recommend data size for the Redis servers be? 4. Not sure what to do about master/data nodes. Assuming all the nodes are on identical hardware would it be beneficial to have a node which is only a master which would only handle requests? 5. Do we need to do any additional load balancing on the ES nodes? We are open to any and all suggestions. We have not yet committed to any particular design so can change if needed. Thank you for your time and responses, Alex -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b0aee66a-35bb-4770-927b-d9c7e13ad9fc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
When to use multiple clusters
I have several large indices (100M docs) on the same cluster. Is there any advice of when it is appropriate to separate into multiple clusters vs one large one? Each index has a slightly different usage profile (read vs write heavy, update vs insert). How many indices would you recommend for a single cluster? Is it ok to have many large indices on the same cluster? Thanks! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/24ebf4dc-f281-4574-8cbb-cb049c4fac71%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: When to use multiple clusters
Thanks Mark! We're deploying on EC2 (always a good time). Seems like the mixture of different indices that have different usage profiles is leading to some performance issues that a dedicated cluster would be more appropriate for. On Wednesday, July 23, 2014 7:04:34 PM UTC-4, Mark Walkom wrote: Depends what your hardware profiles are like, and a bunch of other things related to you and your environment. eg If you have high end servers then it makes sense to put your heavy read/write indexes into a cluster on those, then leave the rest for more average machines. We have multiple clusters based on use. One for an application text based search, one for application logging, one for system logging and we're going to spin up another one for a new project we're starting. This might sound like a waste of resources, and it probably is to a degree, but we have the infrastructure for it and it makes things easier to manage. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 24 July 2014 00:34, Alex Kehayias al...@shareablee.com javascript: wrote: I have several large indices (100M docs) on the same cluster. Is there any advice of when it is appropriate to separate into multiple clusters vs one large one? Each index has a slightly different usage profile (read vs write heavy, update vs insert). How many indices would you recommend for a single cluster? Is it ok to have many large indices on the same cluster? Thanks! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/24ebf4dc-f281-4574-8cbb-cb049c4fac71%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/24ebf4dc-f281-4574-8cbb-cb049c4fac71%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6571673c-472f-4013-9608-d511a9f66d86%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
elasticsearch dynamic scripting vs static script - deployment
Hi, We've been also hacked on our staging server because of opened ports :) I find dynamic scripting flexible for applications, but static scripting causes bunch of problems: 1. I should deploy it in special directory at elasticsearch node? We are using capistrano for web-app deployment and it's easy procedure, though we should provide additional access to elasticsearch node filesystem 2. I don't know, how to support script versions? just append _v1, _v2, etc. suffixes in filename? 3. Should I deploy on one node, or on each node? If I must deploy on each node - what happens if one node has a script, and other doesn't have? Regards, Alex -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20681e2f-bb8b-4602-8b19-ed27b661a88b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
help resolving a classpath problem(?) with elasticsearch 1.2.1 in apache-storm
I'm using apache-storm as a data pipeline that indexes results with elasticsearch. Using the latest I can find of all components, I get an error any time a storm component attempts to join elasticsearch as a Node client, which I believe will give me better performance than TransportClient. Caused by: java.lang.IllegalArgumentException: A SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'Direct' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath.The current classpath supports the following names: [XBloomFilter, es090, completion090] According to https://github.com/elasticsearch/elasticsearch/issues/3350 this is just how the SPI loader stuff works that lucene uses I tried following the directions in the issue, but even with the shade plugin, I'm still seeing the same thing. Does anyone have experience with this that can share a pom.xml snippet or guide me to some applicable docs? -alex -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9b9b9e6c-ea3d-4de9-b7aa-53c4bbd40586%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Need help on similarity ranking approach
Hello, I am not sure that would work. I'd first index you document, and then use mlt with this document id and include set to true (added in latest ES release). Then you'll know how far your documents are from the queried document. Also, make sure to pick up most of the terms, by setting percent_terms_to_match=0, max_query_terms=high value and min_doc_freq=1. In order to know what terms from the queried document have matched in the response, you can use explain. Alex On Thursday, May 29, 2014 10:42:47 AM UTC+2, Rgs wrote: hi, What i did now is, i have created a custom similarity similarity provider class which extends DefaultSimilarity and AbstractSimilarityProvider classes respectively and overridden the idf() method to return 1. Now I'm getting some percentage values like 1, 0.987, 0.876 etc and interpret it as 100%, 98%, 87% etc. Can you please confirm whether this approach can be taken for finding the percentage of similarity? sorry for the late reply. Thanks Rgs -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Need-help-on-similarity-ranking-approach-tp4054847p4056680.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/184a015f-fe68-4a24-999b-367d60d23798%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Need help on similarity ranking approach
Also this plugin could provide a solution to your problem: http://yannbrrd.github.io/ On Thursday, May 29, 2014 10:42:47 AM UTC+2, Rgs wrote: hi, What i did now is, i have created a custom similarity similarity provider class which extends DefaultSimilarity and AbstractSimilarityProvider classes respectively and overridden the idf() method to return 1. Now I'm getting some percentage values like 1, 0.987, 0.876 etc and interpret it as 100%, 98%, 87% etc. Can you please confirm whether this approach can be taken for finding the percentage of similarity? sorry for the late reply. Thanks Rgs -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Need-help-on-similarity-ranking-approach-tp4054847p4056680.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d4a2ee12-b9af-4142-a2e9-71b85cc9141c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Queries seem to ignore custom analyzer
I'm using a custom analyzer to stem possessive english. My custom analyzer seems to be ignored. As a sample search, we'll use McDonald's. What I used to create my analyzer: { settings: { analysis: { analyzer: { default: { type: custom, tokenizer: standard, filter: [ standard, lowercase, stop, pos_english ] } }, filter: { pos_english: { type: stemmer, name: possessive_english } } } } } My mapping: { item: { _boost: { name: custom_boost, null_value: 1 }, properties: { servings: { enabled: false, type: object }, brand_name: { index: analyzed, type: string, store: false }, food_name: { index: analyzed, type: string, store: false } } } } When I test the analyzer for the text 'McDonald's', it seems to work properly: { tokens: [ { token: mcdonald, start_offset: 0, end_offset: 10, type: ALPHANUM, position: 1 } ] } However, if I search for 'McDonald', I get no results. If I search for 'McDonald's' (with the possessive), I get my expected results. It seems like the analyzer is being ignored during the query. Search query that returns no results: { query: { match: { _all: { query: mcdonalds } } } } Any idea what I'm doing wrong? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/93e3e3b3-1fc7-443d-970e-47bb43c757e4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Object interpolation in template queries
Hello, I'm interested in using query templates - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-template-query.html - however for my purposes I was hoping ES treaded queries as simple strings for the purpose of mustache interpretation and I would to be able to substitute in parameters more complex than partial strings - for example to define a param as the contents of a value and then pass in an arbitrary object. IE something along the lines of { query: { template: { query: { filtered: { filter : { and : {{ filters }} } } }, params : { filters : [ {terms : { foo : [a,b ] } }, {terms : { bar : [q,z ] } } ] } } } } Experimentation suggests this isn't supported but I understand that the query templates system is somewhat under construction or review - are there plans to offer support for passing in entire parts of queries via params or should I look at doing this kind of interpolation before the query gets to ES? Or is this possible and I'm simply doing it wrong? Thanks, Alex -- -- *CONFIDENTIALITY NOTICE:* The information contained in this message may be privileged and/or confidential. It is the property of CrowdStrike. If you are not the intended recipient, or responsible for delivering this message to the intended recipient, any review, forwarding, dissemination, distribution or copying of this communication or any attachment(s) is strictly prohibited. If you have received this message in error, please notify the sender immediately, and delete it and all attachments from your computer and network. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8b83afe9-aa92-4751-8178-2c33bbc94428%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: more like this on numbers
Hi Valentin, For these types of searches, have you looked into range queries, perhaps combined in a boolean query? Alex On May 7, 2014 4:14 PM, Valentin plet...@gmail.com wrote: Hi Alex, thanks. Good idea to convert the numbers into strings. But converting the number fields to string won't exactly solve my problem. Only if there would be an analyzer which breaks down numbers into multiple tokens. Eg 300 into 100, 200, 300 Cheers, Valentin On Tuesday, May 6, 2014 12:04:53 PM UTC+2, Alex Ksikes wrote: Hi Valentin, As you know, you can only perform mlt on fields which are analyzed. However, you can convert your other fields (number, ..) to text using a multi field with type string at indexing time. Cheers, Alex On Thursday, March 27, 2014 4:31:58 PM UTC+1, Valentin wrote: Hi, as far as I understand it the more like this query allows to find documents where the same tokens are used. I wonder if there is a possibility to find documents where a particular field is compared based on its value (number). Regards Valentin PS: elasticsearch rocks! -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/Wsye6JD__ys/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/195f8fa2-821f-4556-b9ae-8924b35c859f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/195f8fa2-821f-4556-b9ae-8924b35c859f%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMrXmPdWStJjTaW5%3D27MrMNLHPkK1hihgrs%3DDs-SAiHzHz9eAQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: MoreLikeThis can't identify that 2 documents with exactly same attachments are duplicates
On May 8, 2014 8:09 AM, Zoran Jeremic zoran.jere...@gmail.com wrote: Hi Alex, Thank you for this explanation. This really helped me to understand how it works, and now I managed to get results I was expecting just after setting max_query_terms value to be 0 or some very high value. With these results in my tests I was able to identify duplicates. I noticed couple of things though. - I got much better results with web pages when I indexed attachment as html source and use text extracted by Jsoup in query, then when I indexed text extracted from web page as attachment and used text in query. I suppose that difference is related to the fact that Jsoup did not extract text in the same way as Tika parser used by ES did. - There was significant improvement in the results in the second test when I have indexed 50 web pages, then in first test when I indexed 10 web pages. I deleted index before each test. I suppose that this is related to the tf*idf. If so, does it make sense to provide some training set for elasticsearch that will be used to populate index before system is started to be used? Perhaps you are asking for a background dataset to bias the selection of interesting terms. This could make sense depending on your application. Could you please define relevant in your setting? In a corpus of very similar documents, is your goal to find the ones which are oddly different? Have you looked into ES significant terms? I have the service that recommends documents to the students based on their current learning context. It creates tokenized string from titles, descriptions and keywords of the course lessons student is working at the moment. I'm using this string as input to the mlt_like_text to find some interesting resources that could help them. I want to avoid having duplicates (or very similar documents) among top documents that are recommended. My idea was that during the documents uploading (before I index it with elasticsearch) I find if there already exists it's duplicate, and store this information as ES document field. Later, in query I can specify that duplicates are not recommended. Here you should probably strip the html tags, and solely index the text in its own field. As I already mentioned this didn't give me good results for some reason. Do you think this approach would work fine with large textual documents, e.g. pdf documents having couple of hundred of pages? My main concern is related to performances of these queries using like_text, so that's why I was trying to avoid this approach and use mlt with document id as input. I don't think this approach would work well in this case, but you should try. I think what you are after is to either extract good features for your PDF documents and search on that, or finger printing. This could be achieved by playing with analyzers. Thanks, Zoran On Wednesday, 7 May 2014 06:14:56 UTC-7, Alex Ksikes wrote: Hi Zoran, In a nutshell 'more like this' creates a large boolean disjunctive query of 'max_query_terms' number of interesting terms from a text specified in 'like_text'. The interesting terms are picked up with respect to the their tf-idf scores in the whole corpus. These later parameters could be tuned with 'min_term_freq', 'min_doc_freq', and 'min_doc_freq' parameters. The number of boolean clauses that must match is controlled by 'percent_terms_to_match'. In the case of specifying only one field in 'fields', the analyzer used to pick up the terms in 'like_text' is the one associated with the field, unless specified specified by 'analyzer'. So as an example, the default is to create a boolean query of 25 interesting terms where only 30% of the should clauses must match. On Wednesday, May 7, 2014 5:14:11 AM UTC+2, Zoran Jeremic wrote: Hi Alex, If you are looking for exact duplicates then hashing the file content, and doing a search for that hash would do the job. This trick won't work for me as these are not exact duplicates. For example, I have 10 students working on the same 100 pages long word document. Each of these students could change only one sentence and upload a document. The hash will be different, but it's 99,99 % same documents. I have the other service that uses mlt_like_text to recommend some relevant documents, and my problem is if this document has best score, then all duplicates will be among top hits and instead recommending users with several most relevant documents I will recommend 10 instances of same document. Could you please define relevant in your setting? In a corpus of very similar documents, is your goal to find the ones which are oddly different? Have you looked into ES significant terms? If you are looking for near duplicates, then I would recommend extracting whatever text you have in your html, pdf, doc, indexing that and running more like this with like_text set to that content. I tried that as well, and results are very disappointing, though I'm not sure if that would
Re: How to find the difference between aggregate min from aggregate max(max - min) in ES?
Thank you Adrien Grand for reply. Is it possible to use aggregate functions inside script?? On Wednesday, May 7, 2014 5:31:20 PM UTC+5:30, Adrien Grand wrote: Hi, There is no way to do it on the Elasticsearch side for the moment. It can only be done on client side. On Wed, May 7, 2014 at 1:37 PM, Alex Mathew alexmathe...@gmail.comjavascript: wrote: How to write an ES query to find the difference between max and min value of a field? I am a newbee in elastic search, In my case I feed lot of events along with session_id and time in to elastic search. My event structure is Event_name string Client_id string App_id string Session_id string User_idstring Ip_address string Latitude int64 Longitude int64 Event_time time.Time I want to find the life time of a session_id based the feeded events. For that I can retrive the maximum Event_time and minimum Event_time for a particular session_id by the following ES query. { size: 0, query: { match: { Session_id: dummySessionId } }, aggs: { max_time: { max: { field: Time } }, min_time:{ min: { field: Time } } } } But what I exact want is (max_time - min_time) How to write the ES query for the same -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1e937884-4052-4a5a-91db-bc1449c43efe%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/1e937884-4052-4a5a-91db-bc1449c43efe%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ab72a9e2-60d4-4865-9c71-351b79322f29%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How to find the difference between aggregate min from aggregate max(max - min) in ES?
Thank you Adrien Grand for reply. Is it possible to use aggregate functions inside script?? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c7a1d6a8-1bb7-472c-9be1-7da4d9327e3e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: MoreLikeThis can't identify that 2 documents with exactly same attachments are duplicates
Hi Zoran, In a nutshell 'more like this' creates a large boolean disjunctive query of 'max_query_terms' number of interesting terms from a text specified in 'like_text'. The interesting terms are picked up with respect to the their tf-idf scores in the whole corpus. These later parameters could be tuned with 'min_term_freq', 'min_doc_freq', and 'min_doc_freq' parameters. The number of boolean clauses that must match is controlled by 'percent_terms_to_match'. In the case of specifying only one field in 'fields', the analyzer used to pick up the terms in 'like_text' is the one associated with the field, unless specified specified by 'analyzer'. So as an example, the default is to create a boolean query of 25 interesting terms where only 30% of the should clauses must match. On Wednesday, May 7, 2014 5:14:11 AM UTC+2, Zoran Jeremic wrote: Hi Alex, If you are looking for exact duplicates then hashing the file content, and doing a search for that hash would do the job. This trick won't work for me as these are not exact duplicates. For example, I have 10 students working on the same 100 pages long word document. Each of these students could change only one sentence and upload a document. The hash will be different, but it's 99,99 % same documents. I have the other service that uses mlt_like_text to recommend some relevant documents, and my problem is if this document has best score, then all duplicates will be among top hits and instead recommending users with several most relevant documents I will recommend 10 instances of same document. Could you please define relevant in your setting? In a corpus of very similar documents, is your goal to find the ones which are oddly different? Have you looked into ES significant terms? If you are looking for near duplicates, then I would recommend extracting whatever text you have in your html, pdf, doc, indexing that and running more like this with like_text set to that content. I tried that as well, and results are very disappointing, though I'm not sure if that would be good idea having in mind that long textual documents could be used. For testing purpose, I made a simple test with 10 web pages. Maybe I'm making some mistake there. What I did is to index 10 web pages and store it in document as attachment. Content is stored as byte[]. Then I'm using the same 10 pages, extract content using Jsoup, and try to find similar web pages. Here is the code that I used to find similar web pages to the provided one: System.out.println(Duplicates for link:+link); System.out.println( ); String indexName=ESIndexNames.INDEX_DOCUMENTS; String indexType=ESIndexTypes.DOCUMENT; String mapping = copyToStringFromClasspath( /org/prosolo/services/indexing/document-mapping.json); client.admin().indices().putMapping(putMappingRequest( indexName).type(indexType).source(mapping)).actionGet(); URL url = new URL(link); org.jsoup.nodes.Document doc=Jsoup.connect(link).get(); String html=doc.html(); //doc.text(); QueryBuilder qb = null; // create the query qb = QueryBuilders.moreLikeThisQuery(file) .likeText(html).minTermFreq(0).minDocFreq(0); SearchResponse sr = client.prepareSearch(ESIndexNames. INDEX_DOCUMENTS) .setQuery(qb).addFields(url, title, contentType ) .setFrom(0).setSize(5).execute().actionGet(); if (sr != null) { SearchHits searchHits = sr.getHits(); IteratorSearchHit hitsIter = searchHits.iterator(); while (hitsIter.hasNext()) { SearchHit searchHit = hitsIter.next(); System.out.println(Duplicate: + searchHit.getId() + title:+searchHit.getFields().get(url). getValue()+ score: + searchHit.getScore()); } } And results of the execution of this for each of 10 urls is: Duplicates for link:http://en.wikipedia.org/wiki/Mathematical_logic Duplicate:Crwk_36bTUCEso1ambs0bA URL:http:// en.wikipedia.org/wiki/Mathematical_logic score:0.3335998 Duplicate:--3l-WRuQL2osXg71ixw7A URL:http:// en.wikipedia.org/wiki/Chemistry score:0.16319205 Duplicate:8dDa6HsBS12HrI0XgFVLvA URL:http:// en.wikipedia.org/wiki/Formal_science score:0.13035104 Duplicate:1APeDW0KQnWRv_8mihrz4A URL:http://en.wikipedia.org/wiki/Starscore:0.12292466 Duplicate:2NElV2ULQxqcbFhd2pVy0w URL:http:// en.wikipedia.org/wiki/Crystallography score:0.117023855 Duplicates for link:http://en.wikipedia.org/wiki/Mathematical_statistics Duplicate:Crwk_36bTUCEso1ambs0bA URL:http:// en.wikipedia.org/wiki
Re: MoreLikeThis can't identify that 2 documents with exactly same attachments are duplicates
Hi Zoran, If you are looking for exact duplicates then hashing the file content, and doing a search for that hash would do the job. If you are looking for near duplicates, then I would recommend extracting whatever text you have in your html, pdf, doc, indexing that and running more like this with like_text set to that content. Additionally you can perform a mlt search on more fields including the meta-data fields extracted with the attachment plugin. Hope this helps. Alex On Monday, May 5, 2014 8:08:30 PM UTC+2, Zoran Jeremic wrote: Hi Alex, Thank you for your explanation. It makes sense now. However, I'm not sure I understood your proposal. So I would adjust the mlt_fields accordingly, and possibly extract the relevant portions of texts manually What do you mean by adjusting mlt_fields? The only shared field that is guaranteed to be same is file. Different users could add different titles to documents, but attach same or almost the same documents. If I compare documents based on the other fields, it doesn't mean that it will match, even though attached files are exactly the same. I'm also not sure what did you mean by extract the relevant portions of text manually. How would I do that and what to do with it? Thanks, Zoran On Monday, 5 May 2014 01:23:49 UTC-7, Alex Ksikes wrote: Hi Zoran, Using the attachment type, you can text search over the attached document meta-data, but not its actual content, as it is base 64 encoded. So I would adjust the mlt_fields accordingly, and possibly extract the relevant portions of texts manually. Also set percent_terms_to_match = 0, to ensure that all boolean clauses match. Let me know how this works out for you. Cheers, Alex On Monday, May 5, 2014 5:50:07 AM UTC+2, Zoran Jeremic wrote: Hi guys, I have a document that stores a content of html file, pdf, doc or other textual document in one of it's fields as byte array using attachment plugin. Mapping is as follows: { document:{ properties:{ title:{type:string,store:true }, description:{type:string,store:yes}, contentType:{type:string,store:yes}, url:{store:yes, type:string}, visibility: { store:yes, type:string}, ownerId: {type: long, store:yes }, relatedToType: { type: string, store:yes }, relatedToId: {type: long, store:yes }, file:{ path: full,type:attachment, fields:{ author: { type: string }, title: { store: true,type: string }, keywords: { type: string }, file: { store: true, term_vector: with_positions_offsets,type: string }, name: { type: string }, content_length: { type: integer }, date: { format: dateOptionalTime, type: date }, content_type: { type: string } } }} And the code I'm using to store the document is: VisibilityType.PUBLIC These files seems to be stored fine and I can search content. However, I need to identify if there are duplicates of web pages or files stored in ES, so I don't return the same documents to the user as search or recommendation result. My expectation was that I could use MoreLikeThis after the document was indexed to identify if there are duplicates of that document and accordingly to mark it as duplicate. However, results look weird for me, or I don't understand very well how MoreLikeThis works. For example, I indexed web page http://en.wikipedia.org/wiki/Linguistics3 times, and all 3 documents in ES have exactly the same binary content under file. Then for the following query: http://localhost:9200/documents/document/WpkcK-ZjSMi_l6iRq0Vuhg/_mlt?mlt_fields=filemin_doc_freq=1 where ID is id of one of these documents I got these results: http://en.wikipedia.org/wiki/Linguistics with score 0.6633003 http://en.wikipedia.org/wiki/Linguistics with score 0.6197818 http://en.wikipedia.org/wiki/Computational_linguistics with score 0.48509508 ... For some other examples, scores for the same documents are much lower, and sometimes (though not that often) I don't get duplicates on the first positions. I would expect here to have score 1.0 or higher for documents that are exactly the same, but it's not the case, and I can't figure out how could I identify if there are duplicates in the Elasticsearch index. I would appreciate if somebody could explain if this is expected behaviour or I didn't use it properly. Thanks, Zoran -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit
Re: Need help on similarity ranking approach
Hello, What you want to know is the score of the document that has matched itself using more like this. The API excludes the queried document. However, it is equivalent to running a boolean query of more like this field for each of the queried document field. This will give you as top result, the document that has matched itself, so that you can compute the percentage of similarity of the remaining matched documents. Alex On Friday, May 2, 2014 3:22:34 PM UTC+2, Rgs wrote: Thanks Binh Ly and Ivan Brusic for your replies. I need to find the similarity in percentage of a document against other documents and this will be considered for grouping the documents. is it possible to get the similarity percentage using more like this query? or is any other way to calculate the percentage of similarity from the query result? Eg: document1 is 90% similar to document2. document1 is 45% similar to document3 etc.. Thanks -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Need-help-on-similarity-ranking-approach-tp4054847p4055227.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/05db016b-1c2e-497c-9275-37dcccedfae3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: MoreLikeThis ignores queries?
Hello Alexey, You should use the query DSL and not the more like this API. You can create a boolean query where one clause is your more like this query and the other one is your ignore category query (better use a filter here if you can). http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html However, more like this of the DSL only takes a like_text parameter, you cannot pass the id of the document. This will change in a subsequent version of ES. For now, to simulate this functionality, you can use multiple mlt queries with a like_text set to the value of each field of the queried document, inside a boolean query. Let me know if this helps. Alex On Wednesday, March 19, 2014 5:01:06 AM UTC+1, Alexey Bagryancev wrote: Anyone can help me? It really does not work... среда, 19 марта 2014 г., 2:05:49 UTC+7 пользователь Alexey Bagryancev написал: Hi, I am trying to filter moreLikeThis results by adding additional query - but it seems to ignore it at all. I tried to run my ignoreQuery separately and it works fine, but how to make it work with moreLikeThis? Please help me. $ignoreQuery = $this-IgnoreCategoryQuery('movies') $this-resultsSet = $this-index-moreLikeThis( new \Elastica\Document($id), array_merge($this-mlt_fields, array('search_size' = $this- size, 'search_from' = $this-from)), $ignoreQuery); My IgnoreCategory function: public function IgnoreCategoryQuery($category = 'main') { $categoriesTermQuery = new \Elastica\Query\Term(); $categoriesTermQuery-setTerm('categories', $category); $categoriesBoolQuery = new \Elastica\Query\Bool(); $categoriesBoolQuery-addMustNot($categoriesTermQuery); return $categoriesBoolQuery; } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e605d6e2-b42b-4661-b819-90735a9581ec%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elastic Search MLT API, how to use fields with weights.
I'd like to add to this that mlt API is the same as a boolean query DSL made of multiple more like this field clauses, where each field is set to the content of the field of the queried document. On Thursday, February 20, 2014 4:20:36 PM UTC+1, Binh Ly wrote: I do not believe you can boost individual fields/terms separately in a MLT query. Your best bet is to probably run a bool query of multiple MLT queries each with a different field and boost, but you'll need to first extract the MLT text before you can do this. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2fcd0453-58dc-4a66-b7d9-2e785a2a7fa6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Interesting Terms for MoreLikeThis Query in ElasticSearch
You could always use explain to find out the best matching terms of any query. In order to get all the interesting terms, you could run a query where the top result document has matched itself. Also the new significant terms might be of interest to you: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html On Thursday, January 30, 2014 9:59:02 PM UTC+1, api...@clearedgeit.com wrote: I have been trying to figure out how to get interesting terms using the MLT query. Does ElasticSearch have this functionality similar to solr or if not, is there a work around? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/201edd47-d5d1-4fcf-a520-184737b6b7ec%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: More like this scoring algorithm unclear
Hi Maarten, Your 'like_text' is analyzed, the same way your 'product_id' field is analyzed, unless specified by 'analyzer'. I would recommend setting 'percent_terms_to_match' to 0. However, if you are only searching over product ids then a simple boolean query would do. If not, then I would create a boolean query where each clause is a 'more like this field' for each field of the queried document. This is actually what the mlt API does. Cheers, Alex On Wednesday, January 8, 2014 7:20:05 PM UTC+1, Maarten Roosendaal wrote: scoring algorithm is still vague but i got the query to act like the API, although the results are different so i'm still doing it wrong, here's an example: { explain: true, query: { more_like_this: { fields: [ PRODUCT_ID ], like_text: 104004855475 1001004002067765 100200494210 1002004004499883, min_term_freq: 1, min_doc_freq: 1, max_query_terms: 1, percent_terms_to_match: 0.5 } }, from: 0, size: 50, sort: [], facets: {} } the like_text contains product_id's from a wishlist for which i want to find similair lists Op woensdag 8 januari 2014 16:50:53 UTC+1 schreef Maarten Roosendaal: Hi, Thanks, i'm not quite sure how to do that. I'm using: http://localhost:9200/lists/list/[id of list]/_mlt?mlt_field=product_idmin_term_freq=1min_doc_freq=1 the body does not seem to be respected (i'm using the elasticsearch head plugin) if i ad: { explain: true } i've been trying to rewrite the mlt api as an mlt query but no luck so far. Any suggestions? Thanks, Maarten Op woensdag 8 januari 2014 16:14:25 UTC+1 schreef Justin Treher: Hey Maarten, I would use the explain:true option to see just why your documents are being scored higher than others. MoreLikeThis using the same fulltext scoring as far as I know, so term position would affect score. http://lucene.apache.org/core/3_0_3/api/contrib-queries/org/apache/lucene/search/similar/MoreLikeThis.html Justin On Wednesday, January 8, 2014 3:04:47 AM UTC-5, Maarten Roosendaal wrote: Hi, I have a question about why the 'more like this' algorithm scores documents higher than others, while they are (at first glance) the same. What i've done is index wishlist-documents which contain 1 property: product_id, this property contains an array of product_id's (e.g. [1234, , , ]. What i'm trying to do is find similair wishlist for a given wishlist with id x. The MLT API seems to work, it returns other documents which contain at least 1 of the product_id's from the original list. But what is see is that, for example. i get 10 hits, the first 6 hits contain the same (and only 1) product_id, this product_id is present in the original wishlist. What i would expect is that the score of the first 6 is the same. However what i see is that only the first 2 have the same, the next 2 a lower score and the next 2 even lower. Why is this? Also, i'm trying to write the MLT API as an MLT query, but somehow it doesn't work. I would expect that i need to take the entire content of the original product_id property and feed is as input for the 'like_text'. The documentation is not very clear and doesn't provide examples so i'm a little lost. Hope someone can give some pointers. Thanks, Maarten -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/91734252-74d0-4001-becc-a184af0f2997%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Snapshot Restore Frequency
According to the docs, snapshot operations are online and only store diffs. Is there any particular reason to not run them at a fairly high frequency? E.g. every 15 minutes? Alex -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/60c3365b-7dec-487b-be45-c174ef992329%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Shared facet filtration
Fantastic, that's exactly what I was looking for, thankyou! On Wednesday, April 9, 2014 3:12:42 AM UTC+10, Ivan Brusic wrote: You should be able to use filtered queries instead, where the filter is your facet filter: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html The filtered query will filter documents before the query. Facets work on the documents returned by the query, so if the documents are pre-filtered, the facets will not even work on them. -- Ivan On Mon, Apr 7, 2014 at 6:56 PM, Alex G alex@crowdstrike.comjavascript: wrote: Hello, I’m implementing a faceted interface that requires that all the facets be filtered by a shared filter - below is roughly how the queries currently look, is there a more efficient/performant way to make this kind of query? Less fussed about actual query verbosity but if there some way of sharing or referencing the repeated facet_filter other than search templates that’d be fantastic. Thanks, Alex { facets: { facetOne: { facet_filter: { bool: { must: [ { term: { foo.bar: test } }, { term: { baz:test* } } ] } }, terms: { field: facetOne.field, order: [count], size: 50 } }, facetTwo: { facet_filter: { bool: { must: [ { term: { foo.bar: test } }, { term: { baz:test* } } ] } }, terms: { field: facetTwo.field, order: [count], size: 50 } }, facetThree: { facet_filter: { bool: { must: [ { term: { foo.bar: test } }, { term: { baz:test* } } ] } }, terms: { field: facetThree.field, order: [count], size: 50 } } }, size: 0 } -- *CONFIDENTIALITY NOTICE:* The information contained in this message may be privileged and/or confidential. It is the property of CrowdStrike. If you are not the intended recipient, or responsible for delivering this message to the intended recipient, any review, forwarding, dissemination, distribution or copying of this communication or any attachment(s) is strictly prohibited. If you have received this message in error, please notify the sender immediately, and delete it and all attachments from your computer and network. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0993b970-b60c-4e38-b42c-953394abdac1%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/0993b970-b60c-4e38-b42c-953394abdac1%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- -- *CONFIDENTIALITY NOTICE:* The information contained in this message may be privileged and/or confidential. It is the property of CrowdStrike. If you are not the intended recipient, or responsible for delivering this message to the intended recipient, any review, forwarding, dissemination, distribution or copying of this communication or any attachment(s) is strictly prohibited. If you have received this message in error, please notify the sender immediately, and delete it and all attachments from your computer and network. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com
Re: synonyms in a query
Hi Luiz, thank you again for your reply. I don't fully understand the part you mentioned: After index one document with the title equals do core: curl -XPOST 'localhost:9200/myindex/test/1' -d '{ title: core }' Sorry, I am pretty new to ES and haven't understand pretty much. Now what happens there? And what if I don't have hardcoded synonyms, but a file which someone can fill out. I need something like synonyms_path : analysis/synonym.txt in my filter, but then what about the setp you mentioned that I did not understand? Sorry for all the trouble Am Montag, 7. April 2014 09:29:17 UTC+2 schrieb Alex K: Hello there, i have a query, example is this: { query: { bool: { should: [ { multi_match: { query: foo, fields: [ TITLE, SHORTDESC ], type: phrase_prefix } }, { multi_match: { query: foo, cutoff_frequency: null, fields: [ TITLE, SHORTDESC ] } } ] } }, filter: { term: { ACTIVE: 1 } }, sort: { TITLE: { order: asc } }, size: 7 } Now I have the question if I can use synonyms here? I already saw that you can use a synonym-token inside an analyzer. But I have a query here, not an analyzer. Do I have to put an analyzer inside the query? I don't know much about ES yet, so this may be a total stupid question. Thank you in advance :-) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/637c5a0a-89cc-47d3-9f59-785ddf6ccfc3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: synonyms in a query
Hi Luiz, thank you again for your reply. A colleague of mine told me that I might miss a plugin to use my settings-file. I will check this out and later write down here what I found out. Sorry for all the trouble Am Montag, 7. April 2014 09:29:17 UTC+2 schrieb Alex K: Hello there, i have a query, example is this: { query: { bool: { should: [ { multi_match: { query: foo, fields: [ TITLE, SHORTDESC ], type: phrase_prefix } }, { multi_match: { query: foo, cutoff_frequency: null, fields: [ TITLE, SHORTDESC ] } } ] } }, filter: { term: { ACTIVE: 1 } }, sort: { TITLE: { order: asc } }, size: 7 } Now I have the question if I can use synonyms here? I already saw that you can use a synonym-token inside an analyzer. But I have a query here, not an analyzer. Do I have to put an analyzer inside the query? I don't know much about ES yet, so this may be a total stupid question. Thank you in advance :-) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ba838067-1277-4db9-a8f9-e306d47d6591%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: synonyms in a query
Me again, seems it was a local problem for me. The way Luiz mentioned is the exact correct way. Thank you very much, Luiz You helped me really out of this! Am Montag, 7. April 2014 09:29:17 UTC+2 schrieb Alex K: Hello there, i have a query, example is this: { query: { bool: { should: [ { multi_match: { query: foo, fields: [ TITLE, SHORTDESC ], type: phrase_prefix } }, { multi_match: { query: foo, cutoff_frequency: null, fields: [ TITLE, SHORTDESC ] } } ] } }, filter: { term: { ACTIVE: 1 } }, sort: { TITLE: { order: asc } }, size: 7 } Now I have the question if I can use synonyms here? I already saw that you can use a synonym-token inside an analyzer. But I have a query here, not an analyzer. Do I have to put an analyzer inside the query? I don't know much about ES yet, so this may be a total stupid question. Thank you in advance :-) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7a1878b4-6c79-42a3-b0d7-5562d0cbdece%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: running a specific integration test
On Friday, 9 December 2011 10:21:40 UTC+1, Karussell wrote: It is possible: http://maven.apache.org/plugins/maven-surefire-plugin/examples/single-test.htmlhttp://www.google.com/url?q=http%3A%2F%2Fmaven.apache.org%2Fplugins%2Fmaven-surefire-plugin%2Fexamples%2Fsingle-test.htmlsa=Dsntz=1usg=AFQjCNGZmEOlv7GgGLzLlscGY82yGfiGYw http://stackoverflow.com/questions/1873995/run-a-single-test-method-with-maven Is this advice still valid? I've tried different variations of the mvn test command with no luck so far. Example : $ ES_TEST_LOCAL=true mvn test -Dtest=SimpleValidateQueryTests#simpleValidateQuery [INFO] Scanning for projects... [...] Executing 501 suites with 3 JVMs. [...] Suite: org.elasticsearch.search.aggregations.bucket.GeoDistanceTests [...] Can you give me a cli example of executing a specific test? Or do I have to use an IDE? Thanks, Alex -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3465d32d-ca52-4482-83b6-45d55751a12b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
synonyms in a query
Hello there, i have a query, example is this: { query: { bool: { should: [ { multi_match: { query: foo, fields: [ TITLE, SHORTDESC ], type: phrase_prefix } }, { multi_match: { query: foo, cutoff_frequency: null, fields: [ TITLE, SHORTDESC ] } } ] } }, filter: { term: { ACTIVE: 1 } }, sort: { TITLE: { order: asc } }, size: 7 } Now I have the question if I can use synonyms here? I already saw that you can use a synonym-token inside an analyzer. But I have a query here, not an analyzer. Do I have to put an analyzer inside the query? I don't know much about ES yet, so this may be a total stupid question. Thank you in advance :-) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5fe2d157-b437-4bd8-8a18-8aa4f41f63fe%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: synonyms in a query
Hello Luiz, thank you for your reply! As we use rivers, I was told to declare the analyzer there. It looks like this for me: { index : { analysis : { filter : { synonym_filter : { type : synonym, synonyms : [ foo, foo bar = core ] } }, analyzer : { synonym : { tokenizer : whitespace, filter : [ synonym_filter ], type : custom, } } } } } which acctually says, for testing-purpose, 'if someone searches for 'foo' or 'foo bar', search for 'core' ' Now my query uses the analyzer: { query: { bool: { should: [ { multi_match: { query: foo, fields: [ TITLE, SHORTDESC ], type: phrase_prefix, analyzer: synonym } }, { multi_match: { query: foo, cutoff_frequency: null, fields: [ TITLE, SHORTDESC ] } } ] } }, filter: { term: { ACTIVE: 1 } }, sort: { TITLE: { order: asc } }, size: 7 } But I get an error there: [...]nested: QueryParsingException[[test484] [multi_match] analyzer [synonym] not found];[...] What am I doing wrong here? Am Montag, 7. April 2014 09:29:17 UTC+2 schrieb Alex K: Hello there, i have a query, example is this: { query: { bool: { should: [ { multi_match: { query: foo, fields: [ TITLE, SHORTDESC ], type: phrase_prefix } }, { multi_match: { query: foo, cutoff_frequency: null, fields: [ TITLE, SHORTDESC ] } } ] } }, filter: { term: { ACTIVE: 1 } }, sort: { TITLE: { order: asc } }, size: 7 } Now I have the question if I can use synonyms here? I already saw that you can use a synonym-token inside an analyzer. But I have a query here, not an analyzer. Do I have to put an analyzer inside the query? I don't know much about ES yet, so this may be a total stupid question. Thank you in advance :-) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3ec9e97d-f210-4a88-a269-f6306bf0266c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Shared facet filtration
Hello, I’m implementing a faceted interface that requires that all the facets be filtered by a shared filter - below is roughly how the queries currently look, is there a more efficient/performant way to make this kind of query? Less fussed about actual query verbosity but if there some way of sharing or referencing the repeated facet_filter other than search templates that’d be fantastic. Thanks, Alex { facets: { facetOne: { facet_filter: { bool: { must: [ { term: { foo.bar: test } }, { term: { baz:test* } } ] } }, terms: { field: facetOne.field, order: [count], size: 50 } }, facetTwo: { facet_filter: { bool: { must: [ { term: { foo.bar: test } }, { term: { baz:test* } } ] } }, terms: { field: facetTwo.field, order: [count], size: 50 } }, facetThree: { facet_filter: { bool: { must: [ { term: { foo.bar: test } }, { term: { baz:test* } } ] } }, terms: { field: facetThree.field, order: [count], size: 50 } } }, size: 0 } -- -- *CONFIDENTIALITY NOTICE:* The information contained in this message may be privileged and/or confidential. It is the property of CrowdStrike. If you are not the intended recipient, or responsible for delivering this message to the intended recipient, any review, forwarding, dissemination, distribution or copying of this communication or any attachment(s) is strictly prohibited. If you have received this message in error, please notify the sender immediately, and delete it and all attachments from your computer and network. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0993b970-b60c-4e38-b42c-953394abdac1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Make a autosuggest-search searching in realtime doesn't work properly
Hi there, I have the following Request I send to ES: { query: { filtered: { query: { bool: { should: [ { multi_match: { query: socks purple, fields: [ TITLE ], type: phrase_prefix } }, { multi_match: { query: socks purple, fields: [ TITLE ], } } ] } }, filter: { and: [ { terms: { ACTIVE: [ 1 ] } } ] } } }, size: 7 } Now, the first multi_match gives me good results, when I input the words I search in the correct manner (e.g. Purple Socks). But when I enter it in a 'wrong' way (e.g. Socks Purple) it doesn't find anything. A colleague of mine said I could try using a second multi_match. I have not much knowledge of ES, almost all of the above was already there, I just extended the code with the second multimatch. But now there is the problem, that if I input socks it gives me all matches for socks.now when I continue to enter purple, it gives me not just purple socks, but everything matching purple (although I would expect only purple socks) Anyone knows what the problem here is? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/074ec987-379b-4591-a5dc-0d2b482d4ec8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
multi_match and cutoff_frequency
Hello there, I am a total ES-noob, so please forgive me if my question is weird or something ;-) Currently I have a task to implement the cutoff_frequency for our elasticsearch-queries. The current query looks like this: { query:{ bool:{ should:[ {multi_match:{ query:the, cutoff_frequency:0.001, fields:[TITLE,SHORTDESC], type:phrase_prefix} } ] } }, filter:{ term:{ ACTIVE:1 } }, sort:{ TITLE:{ order:asc } }, size:7} This works perfectly fine, like before, BUT it seems that the cutoff_frequency there doesn't matter. Is it the wrong place to put there? Or doesn't it work with multi_match? I have to admit that I haven't fully understand the things that cutoff_frequency does. But I have lots of entries in the index for this query which has The in the title. Wouldn't cutoff_frequency:0.001 mean that the word the is ignored if it is in 1/1000 of all the words in the titles? (and yes, in case I did understand it the other way around, I also tried 1.0, which would mean 1000/1000 = every word, yes? didn't make a diference for my query) Sorry for my bad english, I am german. I hope I don't confuse anyone too much... -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/94f1c37e-b01f-465a-bf55-55cf848613b3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Indexing performance with doc values (particularly with larger number of fields)
This might be more of a Lucene question, but a quick google didn't throw up anything. Has anyone done/seen any benchmarking on indexing performance (overhead) due to using doc values? I often index quite large JSON objects, with many fields (eg 50), I'm trying to get a feel for whether I can just let all of them be doc values on the off chance I'll want to aggregate over them, or whether I need to pick beforehand which fields will support aggregation. (A related question: presumably allowing a mix of doc values fields and legacy fields is a bad idea, because if you use doc values fields you want a low max heap so that the file cache has lots of memory available, whereas if you use the field cache you need a large heap - is that about right, or am i missing something?) Thanks for any insight! Alex Ikanow -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0361eda4-ab39-4536-b91a-ccb710921edd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Install issues with Kibana3 vs elasticsearch 0.19.11
Never mind, I'm an idiot, it clearly mentions it needs 0.90.x in the README :( On Wednesday, March 19, 2014 12:49:46 PM UTC-4, Alex at Ikanow wrote: I downloaded the latest Kibana3, popped it on a tomcat instance sharing space with my elasticsearch (0.19.11) instance and tried to connect (both: using an ssh tunnel to connect localhost:9200 back to the server, and opening port 9200 in the firewall) In both cases, the browser makes a call to _nodes (eg returns {ok:true,cluster_name:infinite-dev,nodes:{Yup-Cmn0QwCrkYI6l7SdRw:{name:Firefrost,transport_address:inet[/ 10.113.42.186:9300],hostname:ip-10-113-42-186,http_address:inet[/ 10.113.42.186:9200]}}}) and then returns the following error: TypeError: Cannot call method 'split' of undefined at http://SERVER/kibana-3.0.0/app/app.js:22:11260http://dev.ikanow.com/kibana-3.0.0/app/app.jsat he ( http:// http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:7:20041) at Function.Yb (http:// http://dev.ikanow.com/kibana-3.0.0/app/app.js SERVER http://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:7:7025) at http:// http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:22:11204 at i (http:// http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:458) at i (http:// http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:458) at http:// http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:1014 at Object.f.$eval (http:// http://dev.ikanow.com/kibana-3.0.0/app/app.js SERVER http://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:6963) at Object.f.$digest (http://http://dev.ikanow.com/kibana-3.0.0/app/app.js SERVER http://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:5755) at Object.f.$apply (http://http://dev.ikanow.com/kibana-3.0.0/app/app.js SERVER http://dev.ikanow.com/kibana-3.0.0/app/app.js /kibana-3.0.0/app/app.js:9:7111) I don't see any other calls back to elasticseach I couldn't find a statement anywhere of which versions Kibana3 is compatible with - does it just need a later version (anyone know the earliest with which it is compatible, out of curiosity; though I'm planning to move to 1.0 anyway soon), or am I doing something wrong Thanks for any insight/help anyone can provide! Alex -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c6a2f998-89a9-4794-bdf8-15d1dcd26aae%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Install issues with Kibana3 vs elasticsearch 0.19.11
I downloaded the latest Kibana3, popped it on a tomcat instance sharing space with my elasticsearch (0.19.11) instance and tried to connect (both: using an ssh tunnel to connect localhost:9200 back to the server, and opening port 9200 in the firewall) In both cases, the browser makes a call to _nodes (eg returns {ok:true,cluster_name:infinite-dev,nodes:{Yup-Cmn0QwCrkYI6l7SdRw:{name:Firefrost,transport_address:inet[/10.113.42.186:9300],hostname:ip-10-113-42-186,http_address:inet[/10.113.42.186:9200]}}}) and then returns the following error: TypeError: Cannot call method 'split' of undefined at http://SERVER/kibana-3.0.0/app/app.js:22:11260http://dev.ikanow.com/kibana-3.0.0/app/app.jsat he ( http:// http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:7:20041) at Function.Yb (http:// http://dev.ikanow.com/kibana-3.0.0/app/app.js SERVER http://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:7:7025) at http:// http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:22:11204 at i (http:// http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:458) at i (http:// http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:458) at http:// http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:1014 at Object.f.$eval (http:// http://dev.ikanow.com/kibana-3.0.0/app/app.js SERVER http://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:6963) at Object.f.$digest (http:// http://dev.ikanow.com/kibana-3.0.0/app/app.js SERVER http://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:5755) at Object.f.$apply (http:// http://dev.ikanow.com/kibana-3.0.0/app/app.js SERVER http://dev.ikanow.com/kibana-3.0.0/app/app.js /kibana-3.0.0/app/app.js:9:7111) I don't see any other calls back to elasticseach I couldn't find a statement anywhere of which versions Kibana3 is compatible with - does it just need a later version (anyone know the earliest with which it is compatible, out of curiosity; though I'm planning to move to 1.0 anyway soon), or am I doing something wrong Thanks for any insight/help anyone can provide! Alex -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ed77f1e7-547d-4c87-b23f-ee97dece9533%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: EsRejectedExecutionException when searching date based indices.
That is correct, I was mixing the terms nodes and shards (sorry about that). I'm running the test on a single node (machine). I've chosen 20 shards so we could eventually go to a 20 server cluster without re-indexing. It's unlikely we'll ever need to go that high but we never know and given we receive 750 million messages a day, the thought of reindexing after collecting a years worth of data makes me nervous. If I can over shard and avoid a massive reindex then I'll be a happy guy. I thought about reducing the 20 shards but even if I go to say 5 shards on 5 machines (1 shard per machine?) then I'll still run into the issue if a user searches several years back. Any other thoughts on a possible solution? Would increasing the queue size be a good option. Is there a down side (performance hit, running out of resources, etc)? Thanks again! On Tuesday, February 25, 2014 11:32:26 PM UTC-8, David Pilato wrote: You are mixing nodes and shards, right? How many elasticsearch nodes do you have to manage your 7300 shards? Why did you set 20 shards per index? You can increase the queue size in elasticsearch.yml but I'm not sure it's the right thing to do here. My 2 cents -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 26 févr. 2014 à 01:36, Alex Clark al...@bitstew.com javascript: a écrit : Hello all, I’m getting failed nodes when running searches and I’m hoping someone can point me in the right direction. I have indices created per day to store messages. The pattern is pretty straight forward: the index for January 1 is messages_20140101, for January 2 is messages_20140102 and so on. Each index is created against a template that specifies 20 shards. A full year will give 365 indices * 20 shards = 7300 nodes. I have recently upgraded to ES 1.0. When I search for all messages in a year (either using an alias or specifying “messages_2013*”), I get many failed nodes. The reason given is: “EsRejectedExecutionException[rejected execution (queue capacity 1000) on org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4@651b8924javascript:]”). The more often I search, the fewer failed nodes I get (probably caching in ES) but I can’t get down to 0 failed nodes. I’m using ES for analytics so the document counts coming back have to be accurate. The aggregate counts will change depending on the number of node failures. We use the Java API to create a local node to index and search the documents. However, we also see the issue if we use the URL search API on port 9200. If I restrict the search for 30 days then I do not see any failures (it’s under 1000 nodes so as expected). However, it is a pretty common use case for our customers to search messages spanning an entire year. Any suggestions on how I can prevent these failures? Thank you for your help! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9bf6d3bb-34e5-44c4-8d76-24f868d283a0%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/954f7266-6587-4509-8159-aae5897dc2b6%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
EsRejectedExecutionException when searching date based indices.
Hello all, I’m getting failed nodes when running searches and I’m hoping someone can point me in the right direction. I have indices created per day to store messages. The pattern is pretty straight forward: the index for January 1 is messages_20140101, for January 2 is messages_20140102 and so on. Each index is created against a template that specifies 20 shards. A full year will give 365 indices * 20 shards = 7300 nodes. I have recently upgraded to ES 1.0. When I search for all messages in a year (either using an alias or specifying “messages_2013*”), I get many failed nodes. The reason given is: “EsRejectedExecutionException[rejected execution (queue capacity 1000) on org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4@651b8924]”). The more often I search, the fewer failed nodes I get (probably caching in ES) but I can’t get down to 0 failed nodes. I’m using ES for analytics so the document counts coming back have to be accurate. The aggregate counts will change depending on the number of node failures. We use the Java API to create a local node to index and search the documents. However, we also see the issue if we use the URL search API on port 9200. If I restrict the search for 30 days then I do not see any failures (it’s under 1000 nodes so as expected). However, it is a pretty common use case for our customers to search messages spanning an entire year. Any suggestions on how I can prevent these failures? Thank you for your help! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9bf6d3bb-34e5-44c4-8d76-24f868d283a0%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Elasticsearch Maven plugin on GitHub
You are very welcome, David. I believe the project is pretty much complete, for it also contains tests which exercise the mojos. As mentioned already, it depends on an ES version which is already old. I will try to keep it up to date, but contributions of any sort are more than welcome. alex On Fri, Jan 17, 2014 at 2:54 AM, David Pilato da...@pilato.fr wrote: Hey Alex That's great! I started a project like this some months ago but did not find enough time to finish it. Thanks for sharing it! -- *David Pilato* | *Technical Advocate* | *Elasticsearch.com* @dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr Le 17 janvier 2014 at 01:44:26, AlexC (acojoc...@pingidentity.com//acojoc...@pingidentity.com) a écrit: If anyone is interested in using a Maven plugin to run Elasticsearch for integration testing, I just published one on GitHub: https://github.com/alexcojocaru/elasticsearch-maven-plugin. It is an alternative to starting a node through the code. The readme should provide enough information, but let me know if something is missing or not clear enough. It uses ES v0.90.7, but it can be easily updated to the latest ES version by changing the dependency version in the pom.xml file. alex -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/91db7e6a-9bab-4cde-b52f-7f4d660c1248%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/0I2TGylTRHc/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52d8e1b1.6763845e.dc5%40MacBook-Air-de-David.local . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHUBgW-BJbavcMR0Tqn0djA%2BygtXCNsbZ5c8U%3DphGEHiqdt84g%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.