from:"Alex"

Re: ExpressionScriptCompilationException[Field [field name] used in expression does not exist in mappings];

2015-05-18 Thread Alex Schokking

It looks like the same question (but with more context/information) 
here: 
http://stackoverflow.com/questions/28986964/expressions-with-dynamicly-generated-schemas-throw-exceptions-when-some-indices
 
but doesn't have any answers yet either.

Does anyone here happen to know what the best practice way of addressing 
these indices without the mapping in question? I'd really hate to have to 
go through and hand update them all to have the mapping :(

On Friday, March 6, 2015 at 8:27:48 AM UTC-8, Alex Schokking wrote:

 Hi there,
 We're just getting started with ELK and are using:
 Elasticsearch 1.4.4
 Kibana 4.0
 on
 Ubuntu 14.04

 We needed to create a scripted field to calculate the ratio between two 
 numeric fields. These fields are not on all events and only started 
 appearing at all a day ago (so older indexes don't have it at all). 
 name: ads_per_page
 script: doc['ads_found'].value / max(1, doc['pages_parsed'].value)

 It seemed to be working great at first but now Kibana has been resurfacing 
 these elasticsearch errors constantly and I can't seem to find any 
 information about it online (too new?).
 This repeats for every shard as far as I can tell (there are about 2 weeks 
 of indexes there. Any suggestions would be appreciated.
 Shard Failures

 The following shard failures ocurred:

- *Index:* logstash-2015.02.21 *Shard:* 0 *Reason:* 
 SearchParseException[[logstash-2015.02.21][0]: 
query[ConstantScore(BooleanFilter(+cache(@timestamp:[1425571563220 TO 
1425657963220])))],from[-1],size[500],sort[custom:@timestamp: 

 org.elasticsearch.index.fielddata.fieldcomparator.LongValuesComparatorSource@4c1942d3!]:
  
Parse Failure [Failed to parse source 

 [{size:500,sort:{@timestamp:desc},query:{filtered:{query:{query_string:{analyze_wildcard:true,query:*}},filter:{bool:{must:[{range:{@timestamp:{gte:1425571563220,lte:1425657963220}}}],must_not:[],highlight:{pre_tags:[@kibana-highlighted-field@],post_tags:[@/kibana-highlighted-field@],fields:{*:{}}},aggs:{2:{date_histogram:{field:@timestamp,interval:30m,pre_zone:-08:00,pre_zone_adjust_large_interval:true,min_doc_count:0,extended_bounds:{min:1425571563220,max:1425657963220,fields:[*,_source],script_fields:{ads_per_page:{script:doc['ads_found'].value
  
/ max(1, 

 doc['pages_parsed'].value),lang:expression}},fielddata_fields:[@timestamp]}]]];
  
nested: ExpressionScriptCompilationException[Field [ads_found] used in 
expression does not exist in mappings];



-- 
Please update your bookmarks! We have moved to https://discuss.elastic.co/
--- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/db5d1ac5-0961-4274-b329-00832834340e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Perma-Unallocated primary shards after a node has left the cluster

2015-05-01 Thread Alex Schokking

Probably super evident but the output above was actually from 
_cat/allocation?v not /recovery, sorry about that.

On Wednesday, April 29, 2015 at 5:19:08 PM UTC-7, Alex Schokking wrote:

 Hi guys, I would really appreciate some help understanding what's going 
 down with shard allocation in this case: 

 Elasticsearch version: 1.4.4

 We had 3 nodes with 1 shard and 1 replica per index (so net 2 copies of 
 everything). 1 node went down and the cluster went red. It started to 
 reallocate shards as expected and there were originally ~50 unallocated 
 shards with 15 primary and the rest replicas. 

 It's been a few hours now and there are still 15 outstanding shards that 
 are all primary that don't seem to be getting re-allocated. I thought this 
 would be a pretty standard scenario so I was really hoping I wouldn't need 
 to manually walk through and re-allocate the primary shards, but I'm not 
 sure what else to try at this point to get back to green. Any pointers 
 would be really appreciated. Here is some of the relevant seeming bits 
 folks asked about on the IRC:

 In the ES logs for the unallocated index names there are lines along the 
 line of 
 [2015-04-29 22:08:22,803][DEBUG][action.admin.indices.stats] [Agent Axis] 
 [webaccesslogs-2015.04.24][0], node[-r2iQnH4R-mcUy4NicCB5g], [P], 
 s[STARTED]: failed to execute 
 [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@6a564a91] 
 org.elasticsearch.transport.SendRequestTransportException: [Jean-Paul 
 Beaubier][inet[/10.155.165.126:9300]][indices:monitor/stats[s]]
 Jean-Paul Beaubier is the node that went down

 _cat/recovery
 shards disk.used disk.avail disk.total disk.percent host  ip   
   node
42021.2gb   77gb 98.3gb   21 ip-10-234-164-148 
 10.234.164.148 Agent Axis  
420  41gb 57.2gb 98.3gb   41 ip-10-218-145-237 
 10.218.145.237 Ebon Seeker 
 15 
   UNASSIGNED 

 I'm trying to understand why it's stuck in this state given there is no 
 other info in the logs as far as I can tell about why the shards can't be 
 allocated. Shouldn't the replicas just be promoted in place to new 
 primaries and then new replicas created on the other node?

 Thanks and regards -- Alex 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/44f2f680-0560-448f-a19f-893fda5aab41%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Perma-Unallocated primary shards after a node has left the cluster

2015-04-29 Thread Alex Schokking

Hi guys, I would really appreciate some help understanding what's going 
down with shard allocation in this case: 

Elasticsearch version: 1.4.4

We had 3 nodes with 1 shard and 1 replica per index (so net 2 copies of 
everything). 1 node went down and the cluster went red. It started to 
reallocate shards as expected and there were originally ~50 unallocated 
shards with 15 primary and the rest replicas. 

It's been a few hours now and there are still 15 outstanding shards that 
are all primary that don't seem to be getting re-allocated. I thought this 
would be a pretty standard scenario so I was really hoping I wouldn't need 
to manually walk through and re-allocate the primary shards, but I'm not 
sure what else to try at this point to get back to green. Any pointers 
would be really appreciated. Here is some of the relevant seeming bits 
folks asked about on the IRC:

In the ES logs for the unallocated index names there are lines along the 
line of 
[2015-04-29 22:08:22,803][DEBUG][action.admin.indices.stats] [Agent Axis] 
[webaccesslogs-2015.04.24][0], node[-r2iQnH4R-mcUy4NicCB5g], [P], 
s[STARTED]: failed to execute 
[org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@6a564a91] 
org.elasticsearch.transport.SendRequestTransportException: [Jean-Paul 
Beaubier][inet[/10.155.165.126:9300]][indices:monitor/stats[s]]
Jean-Paul Beaubier is the node that went down

_cat/recovery
shards disk.used disk.avail disk.total disk.percent host  ip   
  node
   42021.2gb   77gb 98.3gb   21 ip-10-234-164-148 
10.234.164.148 Agent Axis  
   420  41gb 57.2gb 98.3gb   41 ip-10-218-145-237 
10.218.145.237 Ebon Seeker 
15 
  UNASSIGNED 

I'm trying to understand why it's stuck in this state given there is no 
other info in the logs as far as I can tell about why the shards can't be 
allocated. Shouldn't the replicas just be promoted in place to new 
primaries and then new replicas created on the other node?

Thanks and regards -- Alex 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9adda07d-88b0-4fa2-805b-37d4739d6f1a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Copying fields to a geopoint type ?

2015-04-01 Thread Alex Schokking

Were you ever able to figure out a solution to this? I'm in a similar boat.

On Thursday, September 11, 2014 at 2:14:29 AM UTC-7, Kushal Zamkade wrote:

 Hello,

 I have created a location filed by using below code 

  if [latitude] and [longitude] {

 mutate {
   rename = [ latitude, [location][lat], longitude, 
 [location][lon] ]

 }
   }

 But when i check location field type then it is not created as geo_point.

 when i am trying to search a geo_point then i am getting below error.
  QueryParsingException[[logstash-2014.09.11] failed to find geo_point field 
 [location1]]; 

 can you help me to resolve this



 On Thursday, April 10, 2014 2:42:22 AM UTC+5:30, Pascal VINCENT wrote:

 Hi,

 I have included logstash in my stack and started to play with it. I'm 
 sure it can do the trick I was looking for, and much more.
 Thank you ... 

 [waiting for your blog post :)] 

 Pascal. 


 On Mon, Apr 7, 2014 at 9:38 AM, Alexander Reelsen a...@spinscale.de 
 wrote:

 Hey,

 I dont know about your stack, but maybe logstash would be a good idea to 
 add it in there. It is more flexible than the csv river and features a CSV 
 input as well. You can easily change the structure of the data you want to 
 index. This is how the logstash config would look like

   if [latitude] and [longitude] {

 mutate {
   rename = [ latitude, [location][lat], longitude, 
 [location][lon] ]

 }
   }

 I am currently working on a blog post how to utilize elasticsearch, 
 logstash and kibana on CSV based data and hope to release it soonish on the 
 .org blog - which covers exactly this. Stay tuned! :-)


 --Alex



 On Thu, Apr 3, 2014 at 12:21 AM, Pascal VINCENT pasvi...@gmail.com 
 wrote:

 Hi,

 I'm new to elasticsearch. My usecase is to load a csv file containing 
 some agencies with geo location, each lines are like :

 id;label;address;zipcode;city;region;*latitude*;*longitude*;(and some 
 others fields)+

 I'm using the csv river plugin to index the file.

 My mapping is :

 {
   office: {
 properties: {

 *(first fields omitted...)*

   *latitude*: {
 type: double,
   },
   *longitude*: {
 type: double,
   },
   *location*: {
 type: geo_point,
 lat_lon: true
   }
 }  
 }

 I'd like to index the location .lon and .lat value from the latitude 
 and longitude fields. I tried the copy_to function with no success :
   latitude: {
 type: double,
 copy_to: location.lat
   },
   longitude: {
 type: double,
 copy_to: location.lon
   },

 Is there any way to feed the location property from latitude and 
 longitude fields at indexation ?

 My point is that I don't want to modify the input csv file to adapt it 
 to the GeoJSON format (i.e concat lat and lon in one field in the csv 
 file).

 Thank you for any hints.

 Pascal.

  -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/6e12ced7-5b1a-4142-93d1-a3d22d7138a2%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/6e12ced7-5b1a-4142-93d1-a3d22d7138a2%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  -- 
 You received this message because you are subscribed to a topic in the 
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit 
 https://groups.google.com/d/topic/elasticsearch/QaI1fj74RlM/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to 
 elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-uHKT74qVbDT%3D8qg5Cv4vH0y%3DOzC8hGyO2uq_sY3sJ8g%40mail.gmail.com
  
 https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-uHKT74qVbDT%3D8qg5Cv4vH0y%3DOzC8hGyO2uq_sY3sJ8g%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4102f8c3-bdb5-457c-8adb-6c19cb2627c2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Copying fields to a geopoint type ?

2015-04-01 Thread Alex Schokking

Woah crazy, never would've thought of that, thanks a lot for following up!

On Wed, Apr 1, 2015 at 12:31 PM, Pascal VINCENT pasvinc...@gmail.com
wrote:

 I finally come up with :

  if [latitude] and [longitude] {
 mutate {
 add_field = [ [location], %{longitude} ]
 add_field = [ [location], %{latitude} ]
 }
 mutate {
 convert = [ [location], float ]
 }
   }


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/QaI1fj74RlM/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/47c8334d-109e-4052-9973-afa69dd49709%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/47c8334d-109e-4052-9973-afa69dd49709%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
-- Alex

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CACXbQESyNcYr9g1u2dEFRovHE-NtB_JwugNn76G7_36Tm0Mteg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Cluster issue - raiseTimeoutFailure

2015-03-20 Thread Alex

Hi,

 I have a java application that is indexind data in an Elasticsearch 
cluster(*3* *nodes*). The ES is well configured and is working ok(indexing 
the received data from java). 
 Cluster configuration for each node from 
/etc/elasticsearch/elasticsearch.yml

 ES_MAX_MEM: 2g
ES_MIN_MEM: 2g
bootstrap:
  mlockall: true
cluster:
  name: clusterName
discovery:
  zen:
ping:
  multicast:
enabled: false
  unicast:
hosts:
 - elasticsearch-test-2-node-1
 - elasticsearch-test-2-node-2
 - elasticsearch-test-2-node-3
http:
  max_initial_line_length: 48k
index:
  number_of_replicas: 2
  number_of_shards: 6
node:
  name: elasticsearch-test-2-node-3
threadpool:
  index:
type: fixed
size: 6
queue_size: 1500
  search:
type: fixed
size: 6
queue_size: 1200

  When I'm connecting the Es cluster(from java), I specify all the nodes : 
node1, node2, node3. 

 The issue is appearing when I stop the 2 data nodes one by one(stop the 
elasticsearch). In this case the cluster health is yellow and i can see the 
remained master node(using head plugin). The *master* has now *all the 
primary shards*. The replicas are Unassigned. But the java application is 
not indexing any more the data. The next exception appear on java :

org.elasticsearch.action.UnavailableShardsException: [indexName][2] [3] 
shardIt, [1] active : Timeout waiting for [1m], request: index 
{[indexName][typeName][Id], source[{ . }]}
at 
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.raiseTimeoutFailure(TransportShardReplicationOperationAction.java:548)
 
~[elasticsearch-1.1.0.jar:na]
at 
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$3.onTimeout(TransportShardReplicationOperationAction.java:538)
 
~[elasticsearch-1.1.0.jar:na]
at 
org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:491)
 
~[elasticsearch-1.1.0.jar:na]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
~[na:1.7.0_51]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
~[na:1.7.0_51]
at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_51]


* Shouldn't work properly the indexing in this case even with only the 
master? *

 If I am going to kill also the master the next *logical* exception appears 
org.elasticsearch.client.transport.NoNodeAvailableException: No node 
available
at 
org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:263)
 
~[elasticsearch-1.1.0.jar:na]
at 
org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:231)
 
~[elasticsearch-1.1.0.jar:na]
at 
org.elasticsearch.client.transport.support.InternalTransportClient.execute(InternalTransportClient.java:106)
 
~[elasticsearch-1.1.0.jar:na]
at 
org.elasticsearch.client.support.AbstractClient.update(AbstractClient.java:107) 
~[elasticsearch-1.1.0.jar:na]

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/176b3f2e-9e18-4018-a4a9-46b009dfd3d2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ElasticSearch across multiple data center architecture design options

2015-03-12 Thread Alex

Hi all,

We are planning to use ELK for our log analysis. We have multiple data 
centers. Since it is not recommended to have across data center cluster, we 
are going to have one ES cluster per data center,  here are the three 
design options we have:

1. Use snapshot  restore to replicate data across clusters.
2. Use tribe node to achieve across cluster queries
3. Ship and index logs to each cluster

Here are our questions, and any comments will be appreciated:
1. How complex is snapshot  restore, anyone has experience on this purpose?
2. Would the performance of only one tribe node be a concern or bottleneck, 
is it possible to have multiple tribe nodes for scale up or load balancing?
3. Is it possible to customize Kibana so that it can go to different 
cluster to query data depends on the query?

Thank you!
Abigail

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2d46f80b-8579-4f2b-86c0-5ad654a5bba3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ExpressionScriptCompilationException[Field [field name] used in expression does not exist in mappings];

2015-03-06 Thread Alex Schokking

Hi there,
We're just getting started with ELK and are using:
Elasticsearch 1.4.4
Kibana 4.0
on
Ubuntu 14.04

We needed to create a scripted field to calculate the ratio between two 
numeric fields. These fields are not on all events and only started 
appearing at all a day ago (so older indexes don't have it at all). 
name: ads_per_page
script: doc['ads_found'].value / max(1, doc['pages_parsed'].value)

It seemed to be working great at first but now Kibana has been resurfacing 
these elasticsearch errors constantly and I can't seem to find any 
information about it online (too new?).
This repeats for every shard as far as I can tell (there are about 2 weeks 
of indexes there. Any suggestions would be appreciated.
Shard Failures

The following shard failures ocurred:

   - *Index:* logstash-2015.02.21 *Shard:* 0 *Reason:* 
SearchParseException[[logstash-2015.02.21][0]: 
   query[ConstantScore(BooleanFilter(+cache(@timestamp:[1425571563220 TO 
   1425657963220])))],from[-1],size[500],sort[custom:@timestamp: 
   
org.elasticsearch.index.fielddata.fieldcomparator.LongValuesComparatorSource@4c1942d3!]:
 
   Parse Failure [Failed to parse source 
   
[{size:500,sort:{@timestamp:desc},query:{filtered:{query:{query_string:{analyze_wildcard:true,query:*}},filter:{bool:{must:[{range:{@timestamp:{gte:1425571563220,lte:1425657963220}}}],must_not:[],highlight:{pre_tags:[@kibana-highlighted-field@],post_tags:[@/kibana-highlighted-field@],fields:{*:{}}},aggs:{2:{date_histogram:{field:@timestamp,interval:30m,pre_zone:-08:00,pre_zone_adjust_large_interval:true,min_doc_count:0,extended_bounds:{min:1425571563220,max:1425657963220,fields:[*,_source],script_fields:{ads_per_page:{script:doc['ads_found'].value
 
   / max(1, 
   
doc['pages_parsed'].value),lang:expression}},fielddata_fields:[@timestamp]}]]];
 
   nested: ExpressionScriptCompilationException[Field [ads_found] used in 
   expression does not exist in mappings];

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/971bfbf2-6210-473b-9098-6262ed302846%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Using function_score error

2015-02-20 Thread alex

The error is in your groovy script, as indicated 
by GroovyScriptExecutionException. All the other info is just making it 
more difficult to help you.

script: _score  doc['reviews'].value

Your script doesn't use any operator. It's likely that you just want to 
multiply:  _score  * doc['reviews'].value.
In groovy, function call arguments do not need to be enclosed in brackets. 
E.g. println 'hello' is equivalent to println('hello'). By omitting the 
operator, your script is trying to call _score (which is some 
UpdatableFloat), with the document field as an argument.

Cheers!

Op maandag 3 november 2014 19:46:07 UTC+1 schreef Manuel Sciuto:

 I have an error

 My mapping 


- mappings: {
   - comida: {
  - dynamic: true,
  - numeric_detection: true,
  - properties: {
 - id: {
- type: integer
 },
 - reviews: {
- type: integer
 },
 - name: {
- analyzer: myAnalyzerDestinos,
- type: string
 }
  }
   },
   - actividades: {
  - dynamic: true,
  - numeric_detection: true,
  - properties: {
 - id: {
- type: integer
 },
 - reviews: {
- type: integer
 },
 - name: {
- analyzer: myAnalyzerDestinos,
- type: string
 }
  }
   },
   - alojamiento: {
  - dynamic: true,
  - numeric_detection: true,
  - properties: {
 - id: {
- type: integer
 },
 - reviews: {
- type: integer
 },
 - name: {
- analyzer: myAnalyzerDestinos,
- type: string
 }
  }
   },
   - transporte__servicios: {
  - dynamic: true,
  - numeric_detection: true,
  - properties: {
 - id: {
- type: integer
 },
 - reviews: {
- type: integer
 },
 - name: {
- analyzer: myAnalyzerDestinos,
- type: string
 }
  }
   }
},



 My Query 

 GET /business/_search
 {
  query: {
function_score: {
  query: {match: {name: sheraton}},
  script_score: {
script: _score  doc['reviews'].value,
lang: groovy
  }
}
  }
 }

 Response 

 {
error: SearchPhaseExecutionException[Failed to execute phase 
 [query], all shards failed; shardFailures 
 {[pGQYzpifRMumKUcblgTp2Q][business][0]: 
 QueryPhaseExecutionException[[business][0]: query[function score (name:she 
 name:sher name:shera name:sherat name:sherato 
 name:sheraton,function=script[_score  doc['reviews'].value], params 
 [null])],from[0],size[10]: Query Failed [Failed to execute main query]]; 
 nested: GroovyScriptExecutionException[MissingMethodException[No signature 
 of method: 
 org.elasticsearch.script.groovy.GroovyScriptEngineService$GroovyScript$UpdateableFloat.call()
  
 is applicable for argument types: (java.lang.Long) values: [11]\nPossible 
 solutions: wait(long), wait(), abs(), any(), wait(long, int), 
 and(java.lang.Number)]]; }{[pGQYzpifRMumKUcblgTp2Q][business][1]: 
 QueryPhaseExecutionException[[business][1]: query[function score (name:she 
 name:sher name:shera name:sherat name:sherato 
 name:sheraton,function=script[_score  doc['reviews'].value], params 
 [null])],from[0],size[10]: Query Failed [Failed to execute main query]]; 
 nested: GroovyScriptExecutionException[MissingMethodException[No signature 
 of method: 
 org.elasticsearch.script.groovy.GroovyScriptEngineService$GroovyScript$UpdateableFloat.call()
  
 is applicable for argument types: (java.lang.Long) values: [16]\nPossible 
 solutions: wait(long), wait(), abs(), any(), wait(long, int), 
 and(java.lang.Number)]]; }],
status: 500
 }

 Why?



 El sábado, 1 de noviembre de 2014 13:02:13 UTC-3, Ryan Ernst escribió:

 The root cause of the error is here:
 ScriptException[dynamic scripting for [mvel] disabled]; 

 I would guess you are running on ES 1.2 or 1.3? Dynamic scripting was 
 disabled by default in 1.2, and for non sandboxed languages in 1.3.  In 
 1.4, the default script language was changed to Groovy, which is sandboxed, 
 and thus can be safely compiled dynamically.

 See this blog for more details:
 http://www.elasticsearch.org/blog/scripting-security/

 If running in 1.3, you can simply change the language of the script:
 GET /searchtube/_search
 {
  query: {
function_score: {
  query: {match: {_all: severed}},
  script_score: {
script: _score * log(doc['likes'].value + doc['views'].value + 
 1),
lang: groovy
  }
}
  }
 }

 Although you could also use the expr lang (expressions) for this simple 
 script, which will be much faster!

 On Wednesday, October 29, 2014

Re: JsonObject to SortBuilder object

2015-01-20 Thread Alex Thurston

Here's a simpler way to ask the question:

I've got this:

GeoDistanceSortBuilder sorter = new 
GeoDistanceSortBuilder(values.geo_location);
sorter.point(0.0, 0.0);
sorter.order(SortOrder.DESC);
sorter.unit(DistanceUnit.KILOMETERS);
sorter.geoDistance(GeoDistance.PLANE);
sorter.toString()


Which produces the string

_geo_distance{
   values.geo_location : [ 0.0, 0.0 ],
   unit : km,
   distance_type : plane,
   reverse : true
}


I would like to do the opposite.  I have the above string, and I want to 
turn it into a GeoDistanceSortBuilder without having to manually parse it.

On Monday, January 19, 2015 at 7:50:17 PM UTC-5, Alex Thurston wrote:

 I would like to turn an arbitrary JsonObject (which presumably follows the 
 Search/Sort DSL into a SortBuilder which can then be passed to the 
 SearchRequestBuilder::addSort.

 I've gotten this to work by simple parsing the JsonObject myself and 
 calling the appropriate calls in the SortBuilder, but that means that I 
 have to implement the parsing for every variation of the DSL.

 If I've got a Java JsonObject that looks like:

 {
first_name: asc
 }

 OR

 {
   first_name: {
 order: asc
   }
 }

 OR

 {
   _geo_distance:{
 my_position:{
   order: asc
 }
   }
 }

 All of which are valid Json for the sort, I would imagine there's a way to 
 call:

 JsonObject sort_json = EXAMPLE FROM ABOVE
 SortBuilder sort = new SortBuilder()
 sort.setSort(sort_json);

 I'm almost certain I'm missing something but can't for the life of me 
 figure out how to do it.

 Thanks in advance.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a756b6b3-7e28-44f7-be88-e8d6e102b6e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

JsonObject to SortBuilder object

2015-01-19 Thread Alex Thurston

I would like to turn an arbitrary JsonObject (which presumably follows the 
Search/Sort DSL into a SortBuilder which can then be passed to the 
SearchRequestBuilder::addSort.

I've gotten this to work by simple parsing the JsonObject myself and 
calling the appropriate calls in the SortBuilder, but that means that I 
have to implement the parsing for every variation of the DSL.

If I've got a Java JsonObject that looks like:

{
   first_name: asc
}

OR

{
  first_name: {
order: asc
  }
}

OR

{
  _geo_distance:{
my_position:{
  order: asc
}
  }
}

All of which are valid Json for the sort, I would imagine there's a way to 
call:

JsonObject sort_json = EXAMPLE FROM ABOVE
SortBuilder sort = new SortBuilder()
sort.setSort(sort_json);

I'm almost certain I'm missing something but can't for the life of me 
figure out how to do it.

Thanks in advance.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c09c4e67-68fe-4552-9161-eca139264511%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Getting SQLFeatureNotSupportedException while connecting Hbase phoenix via river-jdbc

2014-12-11 Thread Alex Kamil

Rimita, this was fixed in phoenix 3.1.0, pls follow these instructions:
http://lessc0de.github.io/connecting_hbase_to_elasticsearch.html

On Thu, Dec 11, 2014 at 3:53 AM, cto@TCS rimita.mit...@gmail.com wrote:

 Thank you so much


 On Thursday, December 11, 2014 12:49:49 PM UTC+5:30, cto@TCS wrote:

 Hi,

 I have a HBase database and I use phoenix as a RDBMS skin over it.
 Now I am trying to retrieve those data via ElasticSearch using the
 river-jdbc plugin.

 I am using the following:-
 1). elasticsearch-1.4.0
 2). elasticsearch-river-jdbc-1.4.0.3.Beta1
 3). phoenix-3.0.0-incubating-client
 4). HBase 0.94.1

 But I keep getting the following exception when I try to create a river.

 [2014-12-11 12:34:48,957][INFO ][org.apache.zookeeper.ZooKeeper]
 Session: 0x14a380290fe001d closed
 [2014-12-11 12:34:48,957][INFO ][org.apache.zookeeper.ClientCnxn]
 EventThread shut down
 [2014-12-11 12:34:49,022][ERROR][river.jdbc.SimpleRiverSource] while
 opening read connection: jdbc:phoenix:localhost:2181 null
 java.sql.SQLFeatureNotSupportedException
 at org.apache.phoenix.jdbc.PhoenixConnection.setReadOnly(
 PhoenixConnection.java:587)
 at org.xbib.elasticsearch.river.jdbc.strategy.simple.
 SimpleRiverSource.getConnectionForReading(SimpleRiverSource.java:226)
 at org.xbib.elasticsearch.river.jdbc.strategy.simple.
 SimpleRiverSource.execute(SimpleRiverSource.java:376)
 at org.xbib.elasticsearch.river.jdbc.strategy.simple.
 SimpleRiverSource.fetch(SimpleRiverSource.java:320)
 at org.xbib.elasticsearch.river.jdbc.strategy.simple.
 SimpleRiverFlow.fetch(SimpleRiverFlow.java:209)
 at org.xbib.elasticsearch.river.jdbc.strategy.simple.
 SimpleRiverFlow.execute(SimpleRiverFlow.java:139)
 at org.xbib.elasticsearch.plugin.jdbc.RiverPipeline.request(
 RiverPipeline.java:88)
 at org.xbib.elasticsearch.plugin.jdbc.RiverPipeline.call(
 RiverPipeline.java:66)
 at org.xbib.elasticsearch.plugin.jdbc.RiverPipeline.call(
 RiverPipeline.java:30)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(
 ThreadPoolExecutor.java:1145)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(
 ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 [2014-12-11 12:34:49,432][INFO ][river.jdbc.RiverMetrics  ] pipeline
 org.xbib.elasticsearch.plugin.jdbc.RiverPipeline@700dd36f is running:
 river jdbc/myriver metrics: 0 rows, 0.0 mean, (0.0 0.0 0.0), ingest
 metrics: elapsed 9 seconds, 0.0 bytes bytes, 0.0 bytes avg, 0 MB/s


 Please help!

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/0fff1fbc-c05e-4479-b095-d0d345b74512%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/0fff1fbc-c05e-4479-b095-d0d345b74512%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAOtKWX7e%2Br6k1ux2oY1%3DFbPapv-ibXj2chn_YZDX0d934bmX1w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch Maven plugin on GitHub

2014-12-05 Thread Alex Cojocaru

Here's my suggestion:

modify StartElasticsearchNodeMojo#execute() and, right before the method
returns (ie. after the ES node is started) read a System property called
waitIndefinitely. If the property is set, then wait indefinitely (see
below for details), else continue the execution.
You will have to provide that property when running maven, like mvn clean
verify -DwaitIndefinitely, for the plugin to wait indefinitely.

I hope that helps. Let me now if you need additional help.

To wait indefinitely:
see the waitIndefinitely() method in
http://svn.apache.org/viewvc/tomcat/maven-plugin/tags/tomcat-maven-plugin-2.0/tomcat7-maven-plugin/src/main/java/org/apache/tomcat/maven/plugin/tomcat7/run/AbstractRunMojo.java?view=markup

alex

On Thu, Dec 4, 2014 at 2:13 AM, Chetan Padhye chetanpad...@gmail.com
wrote:

Hi Good plugin . . I tried to run it but it start and then stop once pom
execution is finished. How can we modify plugin to keep it running once
started. My intention is to use this plugin for demo installations. so i
can install elastic search node and start it on any machine for my demo.

On Friday, 17 January 2014 06:14:22 UTC+5:30, AlexC wrote:

If anyone is interested in using a Maven plugin to run Elasticsearch for
integration testing, I just published one on GitHub:
https://github.com/alexcojocaru/elasticsearch-maven-plugin.

It is an alternative to starting a node through the code.

The readme should provide enough information, but let me know if
something is missing or not clear enough.
It uses ES v0.90.7, but it can be easily updated to the latest ES version
by changing the dependency version in the pom.xml file.

alex

--
You received this message because you are subscribed to a topic in the
Google Groups elasticsearch group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0I2TGylTRHc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a9188126-93df-4d43-aaf8-4c324ceb12a3%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a9188126-93df-4d43-aaf8-4c324ceb12a3%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
[image: Ping Identity logo] https://www.pingidentity.com/
Alex Cojocaru
Sr. Development Engineer
@ acojoc...@pingidentity.com [image: phone] +1 604.697.7052 Connect
with us… [image: twitter logo] https://twitter.com/pingidentity [image:
youtube logo] https://www.youtube.com/user/PingIdentityTV [image:
LinkedIn logo] https://www.linkedin.com/company/21870 [image: Facebook
logo] https://www.facebook.com/pingidentitypage [image: Google+ logo]
https://plus.google.com/u/0/114266977739397708540 [image: slideshare logo]
http://www.slideshare.net/PingIdentity [image: flipboard logo]
http://flip.it/vjBF7 [image: rss feed icon]
https://www.pingidentity.com/blogs/

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHUBgW8an8uQZHmdEZdnryXDMhjiSTpmV%2B%2BAXpxzvoOGsZnNcA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Is there any way to pass search query in POST parameter rather than body?

2014-11-26 Thread Alex Roytman

Thanks Kelsey that could be useful. I managed to get my UI framework
(ExtJS) to play better with POST so I am not dependent on having to use GET
any more

On Wed Nov 26 2014 at 5:54:43 PM Kelsey Hamer kelsey.ha...@gmail.com
wrote:

 I had a similar issue.
 I manged to get the parameters from the post by doing:











 @Override

 public void handleRequest(final RestRequest request, final RestChannel
 channel) {

   MapString,String params = new HashMapString, String();
 RestUtils.decodeQueryString(request.content().toUtf8(), 0, params);
 String paramValue = params.get(parameter);

 //DO SOMETHING

 }




 Notice that the json query you want to pass in doesn't need to be encoded
 on the client side (with an Http.GET it needs to be)


 Hope that helps


 On Thursday, January 31, 2013 10:03:33 AM UTC-8, AlexR wrote:


 I am already doing it with GET and source parameter and it works well.
 One huge benefit is that size and start (and hopefully sort but have not
 tested it yet) url parameters override whatever is in source={...} - big
 help integrating with UI components that manage paging and generate these
 http parameters

 Now the problem is that for all practical purposes uri length is limited
 to 2000 characters so GET may very well fail with bigger queries (as I said
 query with facet based filtered, facets themselves, filters can get
 pretty long plus of course url-encoding of all spaces and {} )

 I wish the same functionality were available via POST. Couldn't ES check
 encoding in POST header and if it is *application/x-www-form-urlencoded *just
 extract encoded parameters and use source parameter just like it does with
 GET?

 Do you think I should put an enhancement request into Git?

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/tDmS4ABPag0/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/bddfebfc-519f-4320-b8aa-024849ae0c31%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/bddfebfc-519f-4320-b8aa-024849ae0c31%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAAY7rMS4sh%2BDHF%2BYTCOs6OT3Q7iZfCDOR2o_ct9adqP7dPTH%3Dw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Elasticsearch fuzzy intersection of two arrays

2014-11-12 Thread Alex Fedorov

I have an object in the Elasticsearch index that has a nested object which 
is a list of strings. 
I would like to do the intersection against this list in both exact and 
fuzzy ways.
So for example I have browser names with versions in the index like: 
browsers: [{name:Chrome 38}, {name:Firefox 32}, {name:Safari 
5}]

the request could be:
[{name:Chrome 38}, {name:IE 10}]
then I have just 1 exact match. 

or another example: 
[{name:Chrome 39}, {name:Firefox 33}, {name:Safari 5]
here I have 2 fuzzy-matches(Levenstein=2) and 1 exact match

Those results which have more matches should be on top.
How would you write this kind of query ? 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9d47a409-acc1-466d-99ba-47e264f68360%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kibana 4 - filters in a dashboard

2014-11-10 Thread Alex

Dear Elasticsearch team,

could you possibly please clarify a question regarding Kibana 4 vs 
Kibana 3 Dashboard feature.

In Kibana 3 Dashboard I was able to interact with widgets i.e drill-down 
the data by clicking on basically any of the widgets to add more filters to 
current dashboard.
Does it suppose to work the same way in Kibana 4 as well? 

It does not work for me at all now and I'm wondering is that's something 
that I can not figure out how to do or is it something that's not 
implemented yet.

Thanks in advance,
Alex.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/aabaa0a3-20c0-4f3d-8241-9a3f3953624e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: unable to make snapshots to NFS filesystem

2014-10-08 Thread Alex Harvey

Ciprian,

Thanks for your input - I had indeed missed that disk space failure and it 
turns out I was hitting an intermittent disk space issue.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/02b41de5-478e-4083-b5d2-c9b493f24732%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

unable to make snapshots to NFS filesystem

2014-09-29 Thread Alex Harvey

Hi all,

I have been struggling to put together a backup solution for my ES cluster.

As far as I understand the documentation at
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html

I can't understand why the following might be failing:

I have exported an NFS filesystem to both nodes of my 2-node ES cluster, 
mounted as /srv/backup.

I created the elastic search user on the NFS server too and then 

[root@back01 ~]# ls -ld /srv/backup/es_backup

drwxrwx---. 3 elasticsearch elasticsearch 4096 Sep 29 18:37 
/srv/backup/es_backup

Start with a clean filesystem:

[root@logdata01 ~]# rm -rf /srv/backup/*

Register the backup area:

[root@logdata01 ~]# curl -s -XPUT http://localhost:9200/_snapshot/backup -d 
'{ 
 

  type: fs,

  settings: {

location: /srv/backup

  }

}'

{acknowledged:true}

Create a snapshot:

[root@logdata01 ~]# curl -XPUT 
localhost:9200/_snapshot/backup/tcom_snapshot?wait_for_completion=truepretty

I then get failures on various shards
https://gist.github.com/alexharv074/b4c7d35028c425f70f20

Any help on how I could get this cluster into a sane state that can be 
backed up greatly appreciated.

Best regards
Alex

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ec09f9bd-5075-4bd5-adf7-d88cbb636c1f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: unable to make snapshots to NFS filesystem

2014-09-29 Thread Alex Harvey

Thanks for responding.

It doesn't seem to be a permissions problem -

[root@logdata01 ~]# ls -ld /srv/backup

drwxrwx---. 3 elasticsearch elasticsearch 4096 Sep 29 18:43 /srv/backup

[root@logdata01 ~]# find /srv/backup/ \! -user elasticsearch -or \! -group 
elasticsearch

[root@logdata01 ~]# 

[root@logdata01 ~]# find /srv/backup -ls |head

1310734 drwxrwx---   3 elasticsearch elasticsearch 4096 Sep 29 
18:43 /srv/backup

1310764 drwxr-xr-x  12 elasticsearch elasticsearch 4096 Sep 29 
18:37 /srv/backup/indices

1310954 drwxr-xr-x   6 elasticsearch elasticsearch 4096 Sep 29 
18:42 /srv/backup/indices/logstash-2014.09.28

1310968 -rw-r--r--   1 elasticsearch elasticsearch 4120 Sep 29 
18:37 /srv/backup/indices/logstash-2014.09.28/snapshot-tcom_snapshot

1311894 drwxr-xr-x   2 elasticsearch elasticsearch 4096 Sep 29 
18:37 /srv/backup/indices/logstash-2014.09.28/3

1311938 -rw-r--r--   1 elasticsearch elasticsearch 4443 Sep 29 
18:37 /srv/backup/indices/logstash-2014.09.28/3/__3

1312014 -rw-r--r--   1 elasticsearch elasticsearch  689 Sep 29 
18:37 /srv/backup/indices/logstash-2014.09.28/3/__b

1312024 -rw-r--r--   1 elasticsearch elasticsearch   61 Sep 29 
18:37 /srv/backup/indices/logstash-2014.09.28/3/__c

1312064 -rw-r--r--   1 elasticsearch elasticsearch  281 Sep 29 
18:37 /srv/backup/indices/logstash-2014.09.28/3/__g

1312004 -rw-r--r--   1 elasticsearch elasticsearch  349 Sep 29 
18:37 /srv/backup/indices/logstash-2014.09.28/3/__a


On Monday, September 29, 2014 8:02:42 PM UTC+10, Mark Walkom wrote:

 Can you do an ls -ld /srv/backup and provide the output?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com

 On 29 September 2014 18:45, Alex Harvey alexh...@gmail.com javascript: 
 wrote:

 Hi all,

 I have been struggling to put together a backup solution for my ES 
 cluster.

 As far as I understand the documentation at

 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html

 I can't understand why the following might be failing:

 I have exported an NFS filesystem to both nodes of my 2-node ES cluster, 
 mounted as /srv/backup.

 I created the elastic search user on the NFS server too and then 

 [root@back01 ~]# ls -ld /srv/backup/es_backup

 drwxrwx---. 3 elasticsearch elasticsearch 4096 Sep 29 18:37 
 /srv/backup/es_backup

 Start with a clean filesystem:

 [root@logdata01 ~]# rm -rf /srv/backup/*

 Register the backup area:

 [root@logdata01 ~]# curl -s -XPUT http://localhost:9200/_snapshot/backup 
 -d '{   


   type: fs,

   settings: {

 location: /srv/backup

   }

 }'

 {acknowledged:true}

 Create a snapshot:

 [root@logdata01 ~]# curl -XPUT 
 localhost:9200/_snapshot/backup/tcom_snapshot?wait_for_completion=truepretty

 I then get failures on various shards
 https://gist.github.com/alexharv074/b4c7d35028c425f70f20

 Any help on how I could get this cluster into a sane state that can be 
 backed up greatly appreciated.

 Best regards
 Alex

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/ec09f9bd-5075-4bd5-adf7-d88cbb636c1f%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/ec09f9bd-5075-4bd5-adf7-d88cbb636c1f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.





-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1160a631-29b6-4242-97ec-67f446da81bb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Using ES as a primary datastore.

2014-09-17 Thread Alex Kamil

ES is a fantastic search engine but there is some risk
http://aphyr.com/posts/317-call-me-maybe-elasticsearch of data loss,
and a few
other
https://www.quora.com/Why-should-I-NOT-use-ElasticSearch-as-my-primary-datastore
potential disadvantages which might or might not be relevant to you. You
can always combine ES via JDBC river
https://github.com/jprante/elasticsearch-river-jdbc with a stable, secure
database, e.g. Mysql
https://www.quora.com/How-do-i-use-Elastic-search-with-mysql-database-I-am-currently-experimenting-with-jdbc-river-but-will-it-be-fast-enough-in-productionor
Hbase http://lessc0de.github.io/connecting_hbase_to_elasticsearch.html,
since you have lots of data hbase might be a better option.

On Wed, Sep 17, 2014 at 8:04 AM, Thomas thomas.bo...@gmail.com wrote:

Hi,

You have to calculate the volumes you will keep in one shard first then
you have to break your volumes into the number of shards you will maintain
and then scale accordingly into a number of nodes, or at least as your
volumes grow you should grow your cluster as well.

It is difficult to predict what problems may arise it is too generic your
case, what will be the usage of the cluster? what queries you will perform,
you will mostly do indexing and occasionally querying or you will
intensively query your data.

Most important you need to think how you will partition your data, will
you have one index, multiple index like a logstash approach? or not
Maybe check here: https://www.found.no/foundation/sizing-elasticsearch/

For data more than a year what you will do delete them? Do you afford to
lose data? Will you keep backups?

IMHO, these are some of the questions you must answer in order to see
whether such an approach suit your needs. It is hardware, structure and
partitioning of your data.

Thomas

On Wednesday, 17 September 2014 13:41:55 UTC+3, P Suman wrote:

Hello,

We are planning to use ES as a primary datastore.

Here is my usecase

We receive a million transactions per day (all are inserts).
Each transaction is around 500KB size, transaction has 10 fields we
should be able to search on all 10 fields.
We want to keep around 1 yr worth of data, this comes around 180TB

Can you please let me know any problems that might arise if i use elastic
search as the primary datastore.

Regards,
Suman

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0612d5d3-05df-4538-a3f0-e87cd9b3dc49%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0612d5d3-05df-4538-a3f0-e87cd9b3dc49%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAOtKWX47iRi6P%2BSp-GC%2B8JL1xmwKoL4yHerMC4PG5rYDiL8YXA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Is there any way to prevent ES from disclosing exception details in REST response?

2014-09-15 Thread Alex Roytman

Thanks Jorg,

Unfortunately it is not an option - we are not at liberty to touch anything
beyond our app servers. We are using transport-wares servlet for ES and I
could easily tweak AbstractServletRestChannel to handle Rest Channel
response with codes 400,500 but I would like to avoid modifying the code
directly and there is no way to do it nicely. I put a request on github for
enhancements of the NodeServlet but was hoping ES may have an option to
turn error details on/off. I think it would be nice to control error level
in REST responses with three levels - suppress/message/stack-trace

On Mon, Sep 15, 2014 at 6:01 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

You can put a revers proxy like nginx between ES cluster and the rest of
the world and filter away all HTTP status 500 responses.

Jörg

On Mon, Sep 15, 2014 at 11:57 PM, AlexR roytm...@gmail.com wrote:

We expose ES _search endpoint directly to consumers. When our REST API
get scanned for security vulnerabilities it complains on ES returning
exception details. For example a malformed query will be included in the
response along with exception. While it is more or a less harmless the tool
complains of various injections and internals disclosures. I would like to
be able to turn error message in the response off (or substitute it with a
generic message) in production while keeping normal response logic in
development.

Is there any way I can do it?

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/83e6b126-4db1-44cc-9a0a-0dd6c6d44a64%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/83e6b126-4db1-44cc-9a0a-0dd6c6d44a64%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups elasticsearch group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/Qx1TPSMS9ro/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdM-ih%2BW_XCgHiphemWSSGtmK2Z5C%3D%2BiWD-ciBokbWHg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdM-ih%2BW_XCgHiphemWSSGtmK2Z5C%3D%2BiWD-ciBokbWHg%40mail.gmail.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAAY7rMRi4zsnv7QapKEr3iwyWxY0EDOnDczDeYa_0nzsFYT9sQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Is there any way to prevent ES from disclosing exception details in REST response?

2014-09-15 Thread Alex Roytman

I guess I could but it would mean passing a response wrapper to capture
output stream and then copy it to real request or discard it in case of an
error. That would be a second copy of response - the first one being done
in the NodeServlet - will hurt performance for large responses :-(

On Mon, Sep 15, 2014 at 6:40 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

Then why don't you simply add a servlet filter that filters unwanted
responses away?

Jörg

On Tue, Sep 16, 2014 at 12:21 AM, Alex Roytman roytm...@gmail.com wrote:

Thanks Jorg,

Unfortunately it is not an option - we are not at liberty to touch
anything beyond our app servers. We are using transport-wares servlet for
ES and I could easily tweak AbstractServletRestChannel to handle Rest
Channel response with codes 400,500 but I would like to avoid modifying the
code directly and there is no way to do it nicely. I put a request on
github for enhancements of the NodeServlet but was hoping ES may have an
option to turn error details on/off. I think it would be nice to control
error level in REST responses with three levels -
suppress/message/stack-trace

On Mon, Sep 15, 2014 at 6:01 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

You can put a revers proxy like nginx between ES cluster and the rest of
the world and filter away all HTTP status 500 responses.

Jörg

On Mon, Sep 15, 2014 at 11:57 PM, AlexR roytm...@gmail.com wrote:

Is there any way I can do it?

--
You received this message because you are subscribed to the Google
Groups elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/83e6b126-4db1-44cc-9a0a-0dd6c6d44a64%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/83e6b126-4db1-44cc-9a0a-0dd6c6d44a64%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups elasticsearch group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/Qx1TPSMS9ro/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdM-ih%2BW_XCgHiphemWSSGtmK2Z5C%3D%2BiWD-ciBokbWHg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdM-ih%2BW_XCgHiphemWSSGtmK2Z5C%3D%2BiWD-ciBokbWHg%40mail.gmail.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAAY7rMRi4zsnv7QapKEr3iwyWxY0EDOnDczDeYa_0nzsFYT9sQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAAY7rMRi4zsnv7QapKEr3iwyWxY0EDOnDczDeYa_0nzsFYT9sQ%40mail.gmail.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups elasticsearch group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/Qx1TPSMS9ro/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH8fWhWpAKU4MT6oVxgRNG%2BBS1Y3QdOyn2eWiqwCErsJQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH8fWhWpAKU4MT6oVxgRNG%2BBS1Y3QdOyn2eWiqwCErsJQ%40mail.gmail.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

Re: Linking of query/search

2014-09-12 Thread Alex Kamil

you can combine ES with RDBMS, and run your SQL queries either directly
against db, or pull data via JDBC River into ES, I wrote about it here:
http://lessc0de.github.io/connecting_hbase_to_elasticsearch.html

On Fri, Sep 12, 2014 at 10:55 AM, Ivan Brusic i...@brusic.com wrote:

You cannot join documents in Lucene/Elasticsearch (at least not like a
RDBMS). You would need to either denormalize your data, join on the client
side or execute 2+ queries.

--
Ivan

On Fri, Sep 12, 2014 at 12:45 AM, matej.zerov...@gmail.com wrote:

Hello!

Can anyone shine some light on my question?
Is the query in question achievable in ES directly?

If not, I can probably do that in application later, but it would be
nicer if ES could serve me the final results.

Matej

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6f3345f2-4b25-4b06-b203-4ad0de201e8f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/6f3345f2-4b25-4b06-b203-4ad0de201e8f%40googlegroups.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBgybZpCz1bKV%3DE7XF_cHGDuFKS1wruKNAYZTbo8t0jvA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBgybZpCz1bKV%3DE7XF_cHGDuFKS1wruKNAYZTbo8t0jvA%40mail.gmail.com?utm_medium=emailutm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAOtKWX623repUH5k2XbkFBFNu-b3cSKyObuyf793AVhOt3Gb-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Connecting Hbase to Elasticsearch

2014-09-10 Thread Alex Kamil

I posted step-by-step instructions here
http://lessc0de.github.io/connecting_hbase_to_elasticsearch.html on using
Apache Hbase/Phoenix with Elasticsearch JDBC River.

This might be useful to Elasticsearch users who want to use Hbase as a
primary data store, and to Hbase users who wish to enable full-text search
on their existing tables via Elasticsearch API.

Alex

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAOtKWX4R81324NmKZou_zCT0e-DbFv%2BmWHg_pAinCmUapwyYcA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Backup and restore using snapshots

2014-09-10 Thread Alex Harvey

I could still use feedback on this plan.

On Sunday, August 31, 2014 9:08:12 PM UTC+10, Alex Harvey wrote:

Hi all

I could use some help getting my head around the snapshot and restore
functionality in ES.

I have a requirement to do incremental daily tape backups and full backups
weekly using EMC's Avamar backup software.

I'd really appreciate if someone can tell me if the following plan is
going to work -

1) Export an NFS filesystem from the storage node to both ES data nodes,
and mount that as /mnt/backup on both nodes.

2) From one of the ES nodes register this directory as the shared
repository: curl -XPUT 'http://localhost:9200/_snapshot/backup' -d
'{type: fs,settings: {location: /mnt/backup}}'

3) On Saturday do a full backup:

i. Get a list of all snapshots using: curl -XGET
'localhost:9200/_snapshot/_status'
ii. For each of these delete using a command like: curl -XDELETE
'localhost:9200/_snapshot/backup/snapshot_20140830'
iii. Create a full backup using: curl -XPUT
localhost:9200/_snapshot/backup/snapshot_$(date
+%Y%m%d)?wait_for_completion=true
iv. Copy the /mnt/backup directory to tape telling Avamar to take a full
backup

4) On Sunday to Friday do incremental backups based on the Saturday
backup:

i. Simply run: curl -XPUT
localhost:9200/_snapshot/backup/snapshot_$(date
+%d%m%Y)?wait_for_completion=true
ii. Copy /mnt/backup to tape telling Avamar to take an incremental backup

Is this plan going to work? Is there a better way?

Thanks very much in advance.

Best regards,
Alex

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6bea9d13-5137-41e6-842c-32fbe71c56b8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Get distinct data

2014-09-02 Thread Alex T

Hi all!

I have problem with getting unique data from elasticsearch. I have the 
following documents:

[
{
 message: Message 1,
 author: {
  id: 4,
  name: Author Name
  },
  sourceId: 123456789,
  userId: 123456
},
{
 message: Message 1,
 author: {
  id: 4,
  name: Author Name
  },
  sourceId: 123456789,
  userId: 654321
}
]

Different between this documents in userId. When I send query by 
author.id, I get response with 2 documents.

Can I get distinct data by sourceId field?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1b59b2a2-484b-46cc-a95b-695e84e6d6eb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Get distinct data

2014-09-02 Thread Alex T

Hi Vineeth! Thanks for your answer.

I use term aggregation, but I get anyway response with 2 documents, 
response data for example:


{
  took:23,
  timed_out:false,
  _shards:{total:5,successful:5,failed:0},
  hits:{
  total:2,
  max_score:null,
  hits:[
  {
  _index:feeditem_local,\
  _type:FeedItem,
  _id:53dbe9cf1d7859e15f8b4599,
  _score:null,
  _source:{
  sourceId:123456789, 
  message:Message 1,
  author:{id:120816414},
  userId: 123456
  },
  sort:[1406921136000]
  },
  {
  _index:feeditem_local,\
  _type:FeedItem,
  _id:53dbe9cf1d7859e15f8b4599,
  _score:null,
  _source:{
  sourceId:123456789, 
  message:Message 1,
  author:{id:120816414},
  userId: 654321
  },
  sort:[1406921136000]
  }
  ]
},
aggregations:{
source:{
buckets:[
{key:123456789,doc_count:2}
]
}
}
}




вторник, 2 сентября 2014 г., 9:45:41 UTC+3 пользователь vineeth mohan 
написал:

 Hello Alex , 

 Term aggregation is here to save your day - 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation

 Thanks
   Vineeth


 On Tue, Sep 2, 2014 at 12:07 PM, Alex T atr...@gmail.com javascript: 
 wrote:

 Hi all!

 I have problem with getting unique data from elasticsearch. I have the 
 following documents:

 [
 {
  message: Message 1,
  author: {
   id: 4,
   name: Author Name
   },
   sourceId: 123456789,
   userId: 123456
 },
 {
  message: Message 1,
  author: {
   id: 4,
   name: Author Name
   },
   sourceId: 123456789,
   userId: 654321
 }
 ]

 Different between this documents in userId. When I send query by 
 author.id, I get response with 2 documents.

 Can I get distinct data by sourceId field?


  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/1b59b2a2-484b-46cc-a95b-695e84e6d6eb%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/1b59b2a2-484b-46cc-a95b-695e84e6d6eb%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8dccd0b8-972f-419c-bb94-3291d412844b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Backup and restore using snapshots

2014-08-31 Thread Alex Harvey

Hi all

I could use some help getting my head around the snapshot and restore
functionality in ES.

I have a requirement to do incremental daily tape backups and full backups
weekly using EMC's Avamar backup software.

I'd really appreciate if someone can tell me if the following plan is going
to work -

1) Export an NFS filesystem from the storage node to both ES data nodes,
and mount that as /mnt/backup on both nodes.

2) From one of the ES nodes register this directory as the shared
repository: curl -XPUT 'http://localhost:9200/_snapshot/backup' -d
'{type: fs,settings: {location: /mnt/backup}}'

3) On Saturday do a full backup:

4) On Sunday to Friday do incremental backups based on the Saturday backup:

i. Simply run: curl -XPUT localhost:9200/_snapshot/backup/snapshot_$(date
+%d%m%Y)?wait_for_completion=true
ii. Copy /mnt/backup to tape telling Avamar to take an incremental backup

Is this plan going to work? Is there a better way?

Thanks very much in advance.

Best regards,
Alex

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2a21ece7-533c-480e-9ac1-218d76d85385%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Duplicate function MVEL script

2014-08-28 Thread Alex S.V.

The problem with MVEL that you can't redefine defined function in a script 
instance. Script class instantiates once the query starts, and then it's 
executing it again and again. MVEL is bad for complex scripting.
Yes, you could use groovy,and should :)

I found a good way to use it with the next code: 

*import* groovy.lang.Script
*class* MyScript *extends* Script {
  *def* run() {
// your code is here, also binded variables should be available here
  }
}

So how it works:
1. Groovy compiles this script and put to class cache.
2. One each query MyScript instance is created (on per node)
3. On each document run() method is executed (It should provide different 
return values for filter script, score script, sort script, script fields)

Alex

On Wednesday, August 27, 2014 5:50:11 PM UTC+3, k...@stylelabs.com wrote:

 Hello

 We are executing some concurrent updates on the same document using an 
 MVEL script together with some parameters. 
 The MVEL script contains some functions such as addRelations etc but 
 there is no sign of duplicate functions.

 ES throws the following error:

 [John Kafka][inet[/10.12.1.219:9300]][update]]; nested: 
 ElasticsearchIllegalArgumentException[failed to execute script]; nested: 
 *CompileException*[[Error: *duplicate function: addRelations*]
 [Near : {... def addRelations(relationNode, }]
  ^
 [Line: 1, Column: 1]

 ES Version 1.3.2

 If the updates are executed sequentially there is no error/problem with 
 the MVEL script.

 Any idea's?

 Best Regards,
 Kristof


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e549c53b-90ae-41ca-b106-5c6e812417f7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: groovy for scripting

2014-08-26 Thread Alex S.V.

providing self-update:

I found that I could create cross-request cache using next script (like a 
cross-request incrementer):

POST /test/_search
{
query: {match_all:{}},
script_fields: {
   a: { 
   script: import groovy.lang.Script;class A extends Script{static 
i=0;def run() {i++}},
   lang: groovy
   }
}
}

In good view mode the script is:

import groovy.lang.Script

class A extends Script{
  static i=0

  def run() {
 i++
  }
}

Actually here *i* variable is not thread-safe, but idea is clean - you need 
define a class, inherited from Script and implement abstract method run.
Also this class is access on each node-thread.
Now I'm looking for a solution to make a query-scope type counter (for 
one-node configuration). I think it's could be done by passing unique 
query_id in parameters, but I'm afraid of making code non thread safe, or 
vice versa - thread safe, but with reduce performance.
Researching more...

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fb402d2c-8820-4a1f-99e0-0453c0c82cf6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

groovy for scripting

2014-08-20 Thread Alex S.V.

I'm playing around with groovy scripting.

By checking groovy lang plugin source code I found next steps in code 
execution:

1. Code compilation into script class

2. Script initialization via static method newInstance()

3. Script execution via calling the code on each document with binding 
document parameters

Now assume I have class declaration in my script. Is it possible to execute 
class definition and class object initialization only once, and execute 
only a method from this object on each document?

Thanks

P.S. posting the same on SO

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bb3a9ca6-79fd-4ac6-ac78-ce0c102b9505%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: A few questions about node types + usage

2014-08-18 Thread Alex

Hello again Mark,

Thanks for your response. Your answers really are very helpful.

As with our previous conversation
https://groups.google.com/d/topic/elasticsearch/ZouS4NVsTJw/discussion I
am confused about how to make a client node also be master eligible. This
is what I posted there, I would really like some help understanding this:

I've done more investigating and it seems that a Client (AKA Query) node
cannot also be a Master node. As it says here
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html#master-election

*Nodes can be excluded from becoming a master by setting node.master to
false. Note, once a node is a client node (node.client set to true), it
will not be allowed to become a master (node.master is automatically set to
false).*

And from the elasticsearch.yml config file it says:

*# 2. You want this node to only serve as a master: to not store any data
and # to have free resources. This will be the coordinator of your
cluster. # #node.master: true #node.data: false # # 3. You want this node
to be neither master nor data node, but # to act as a search load
balancer (fetching data from nodes, # aggregating results,
etc.) # #node.master: false #node.data: false*

So I'm wondering how exactly you set up your client nodes to also be master
nodes. It seems like a master node can only either be purely a master or
master + data.

Perhaps you could show the relevant parts of one of your client node's
config?

Many thanks, Alex

On Saturday, 16 August 2014 01:04:37 UTC+1, Mark Walkom wrote:

1 - Up to you. We use the http output and then just use a round robin A
record to our 3 masters.
2 - They are routed but it makes more sense to specify.
3 - You're right, but most people only use 1 or 2 masters which is why
they get recommended to have at least 3.
4 - That sounds like a lot. We use masters that double as clients and they
only have 8GB, our use sounds similar and we don't have issues.

I wouldn't bother with 3 client only nodes to start, use them as master
and client and then if you find you are hitting memory issues due to
queries you can re-evaluate things.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com javascript:
web: www.campaignmonitor.com

On 15 August 2014 20:11, Alex alex@gmail.com javascript: wrote:

Bump. Any help? Thanks

On Wednesday, 13 August 2014 12:10:14 UTC+1, Alex wrote:

Hello I would like some clarification about node types and their usage.

We will have 3 client nodes and 6 data nodes. The 6 1TB data nodes can
also be masters (discovery.zen.minimum_master_nodes set to 4). We will
use Logstash and Kibana. Kibana will be used 24/7 by between a couple and
handfuls of people.

Some questions:

1. Should incoming Logstash write requests be sent to the cluster in
general (using the *cluster* setting in the *elasticsearch* output)
or specifically to the client nodes or to the data nodes (via load
balancer)? I am unsure what kind of node is best for handling writes.

2. If client nodes exist in the cluster are Kibana requests
automatically routed to them? Do I need to somehow specify to Kibana
which
nodes to contact?

3. I have heard different information about master nodes and the
minimum_master_node setting. I've heard that you should have a odd
number
of master nodes but I fail to see why the parity of the number of
masters
matters as long as minimum_master_node is set to at least N/2 + 1. Does
it
really need to be odd?

4. I have been advised that the client nodes will use huge amount of
memory (which makes sense due to the nature of the Kibana facet
queries).
64GB per client node was recommended but I have no idea if that sounds
right or not. I don't have the ability to actually test it right now so
any
more guidance on that would be helpful.

I'd be so grateful to hear from you even if you only know something
about one of my queries.

Thank you for your time,
Alex

https://groups.google.com/d/msgid/elasticsearch/70b16a1e-319c-4f7c-b129-b68258b3652f%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

Re: A few questions about node types + usage

2014-08-15 Thread Alex

Bump. Any help? Thanks

On Wednesday, 13 August 2014 12:10:14 UTC+1, Alex wrote:

Hello I would like some clarification about node types and their usage.

Some questions:

1. Should incoming Logstash write requests be sent to the cluster in
general (using the *cluster* setting in the *elasticsearch* output) or
specifically to the client nodes or to the data nodes (via load balancer)?
I am unsure what kind of node is best for handling writes.

2. If client nodes exist in the cluster are Kibana requests
automatically routed to them? Do I need to somehow specify to Kibana which
nodes to contact?

3. I have heard different information about master nodes and the
minimum_master_node setting. I've heard that you should have a odd number
of master nodes but I fail to see why the parity of the number of masters
matters as long as minimum_master_node is set to at least N/2 + 1. Does it
really need to be odd?

4. I have been advised that the client nodes will use huge amount of
memory (which makes sense due to the nature of the Kibana facet queries).
64GB per client node was recommended but I have no idea if that sounds
right or not. I don't have the ability to actually test it right now so
any
more guidance on that would be helpful.

I'd be so grateful to hear from you even if you only know something about
one of my queries.

Thank you for your time,
Alex

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/70b16a1e-319c-4f7c-b129-b68258b3652f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Recommendations needed for large ELK system design

2014-08-13 Thread Alex

Hi Mark,

And from the elasticsearch.yml config file it says:

So I'm wondering how exactly you set up your client nodes to also be master
nodes. It seems like a master node can only either be purely a master or
master + data.

Regards, Alex

On Thursday, 31 July 2014 23:57:26 UTC+1, Mark Walkom wrote:

1 - Curator FTW.
2 - Masters handle cluster state, shard allocation and a whole bunch of
other stuff around managing the cluster and it's members and data. A node
that is master and data set to false is considered a search node. But the
role of being a master is not onerous, so it made sense for us to double up
the roles. We then just round robin any queries to these three masters.
3 - Yes, butit's entirely dependent on your environment. If you're
happy with that and you can get the go-ahead then see where it takes you.
4 - Quorum is automatic and having the n/2+1 means that the majority of
nodes will have to take place in an election, which reduces the possibility
of split brain. If you set the discovery settings then you are also
essentially setting the quorum settings.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com javascript:
web: www.campaignmonitor.com

On 31 July 2014 22:27, Alex alex@gmail.com javascript: wrote:

Hello Mark,

Thank you for your reply, it certainly helps to clarify many things.

Of course I have some new questions for you!

1. I haven't looked into it much yet but I'm guessing Curator can
handle different index naming schemes. E.g. logs-2014.06.30 and
stats-2014.06.30. We'd actually be wanting to store the stats data for 2
years and logs for 90 days so it would indeed be helpful to split the
data
into different index sets. Do you use Curator?

2. You say that you have 3 masters that also handle queries... but
I thought all masters did was handle queries? What is a master node that
*doesn't* handle queries? Should we have search load balancer nodes?
AKA not master and not data nodes.

3. In the interests of reducing the number of node combinations for
us to test out would you say, then, that 3 master (and query(??)) only
nodes, and the 6 1TB data only nodes would be good?

4. Quorum and split brain are new to me. This webpage

http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/
about
split brain recommends setting *discovery.zen.minimum_master_nodes* equal
to *N/2 + 1*. This formula is similar to the one given in the
documentation for quorum

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency:

index operations only succeed if a quorum (replicas/2+1) of active
shards
are available. I completely understand the split brain issue, but not
quorum. Is quorum handled automatically or should I change some settings?

Thanks again for your help, we appreciate your time and knowledge!
Regards,
Alex

On Thursday, 31 July 2014 05:57:35 UTC+1, Mark Walkom wrote:

1 - Looks ok, but why two replicas? You're chewing up disk for what
reason? Extra comments below.
2 - It's personal preference really and depends on how your end points
send to redis.
3 - 4GB for redis will cache quite a lot of data if you're only doing 50
events p/s (ie hours or even days based on what I've seen).
4 - No, spread it out to all the nodes. More on that below though.
5 - No it will handle that itself. Again, more on that below though.

A few questions about node types + usage

2014-08-13 Thread Alex

Hello I would like some clarification about node types and their usage.

We will have 3 client nodes and 6 data nodes. The 6 1TB data nodes can also
be masters (discovery.zen.minimum_master_nodes set to 4). We will use
Logstash and Kibana. Kibana will be used 24/7 by between a couple and
handfuls of people.

Some questions:

1. Should incoming Logstash write requests be sent to the cluster in
general (using the *cluster* setting in the *elasticsearch* output) or
specifically to the client nodes or to the data nodes (via load balancer)?
I am unsure what kind of node is best for handling writes.

2. If client nodes exist in the cluster are Kibana requests
automatically routed to them? Do I need to somehow specify to Kibana which
nodes to contact?

3. I have heard different information about master nodes and the
minimum_master_node setting. I've heard that you should have a odd number
of master nodes but I fail to see why the parity of the number of masters
matters as long as minimum_master_node is set to at least N/2 + 1. Does it
really need to be odd?

4. I have been advised that the client nodes will use huge amount of
memory (which makes sense due to the nature of the Kibana facet queries).
64GB per client node was recommended but I have no idea if that sounds
right or not. I don't have the ability to actually test it right now so any
more guidance on that would be helpful.

I'd be so grateful to hear from you even if you only know something about
one of my queries.

Thank you for your time,
Alex

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fe5adb02-5cd6-4554-8993-28b8e24160fc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

LookupScript per shard modification (native scripting)

2014-08-04 Thread Alex S.V.

Hi,

I'm trying LookupScript example here:
https://github.com/imotov/elasticsearch-native-script-example/blob/master/src/main/java/org/elasticsearch/examples/nativescript/script/LookupScript.java

The idea of my script is to pre-cache all child documents in LookupScript 
instance, but I want to query only current shard data. Is it possible? So 
every shard instance caches only it's documents.

Regards,
Alex

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bada48a1-7b74-41a0-81a9-564b5061b605%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Cosine Similarity ElasticSearch

2014-08-01 Thread Alex S.V.

Hi, 

I found some native script codings from Igor Motov 
here: 
https://github.com/imotov/elasticsearch-native-script-example/blob/master/src/main/java/org/elasticsearch/examples/nativescript/script/CosineSimilarityScoreScript.java

and now playing with it

Alex

On Friday, August 1, 2014 11:53:24 AM UTC+3, Federico Bianchi wrote:

 There is someone that can help us? 

 Thank you very much! 



 -- 
 View this message in context: 
 http://elasticsearch-users.115913.n3.nabble.com/Cosine-Similarity-ElasticSearch-tp4060620p4061039.html
  
 Sent from the ElasticSearch Users mailing list archive at Nabble.com. 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5b9025dd-0173-4b09-ae09-31a2f78e99d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Recommendations needed for large ELK system design

2014-08-01 Thread Alex

Ok thank you Mark, you've been extremely helpful and we now have a better
idea about what we're doing!

-Alex

On Thursday, 31 July 2014 23:57:26 UTC+1, Mark Walkom wrote:

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com javascript:
web: www.campaignmonitor.com

On 31 July 2014 22:27, Alex alex@gmail.com javascript: wrote:

Hello Mark,

Thank you for your reply, it certainly helps to clarify many things.

Of course I have some new questions for you!

3. In the interests of reducing the number of node combinations for
us to test out would you say, then, that 3 master (and query(??)) only
nodes, and the 6 1TB data only nodes would be good?

4. Quorum and split brain are new to me. This webpage

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency:

Thanks again for your help, we appreciate your time and knowledge!
Regards,
Alex

On Thursday, 31 July 2014 05:57:35 UTC+1, Mark Walkom wrote:

Suggestions;
Set your indexes to (factors of) 6 shards, ie one per node, it spreads
query performance. I say factors of in that you can set it to 12 shards
per index to start and easily scale the node count and still spread the
load.
Split your stats and your log data into different indexes, it'll make
management and retention easier.
You can consider a master only node or (ideally) three that also handle
queries.
Preferably have an uneven number of master eligible nodes, whether you
make them VMs or physicals, that way you can ensure quorum is reached with
minimal fuss and stop split brain.
If you use VMs for master + query nodes then you might want to look at
load balancing the queries via an external service.

To give you an idea, we have a 27 node cluster - 3 masters that also
handle queries and 24 data nodes. Masters are 8GB with small disks, data
nodes are 60GB (30 heap) and 512GB disk.
We're running with one replica and have 11TB of logging data. At a high
level we're running out of disk more than heap or CPU and we're very write
heavy, with an average of 1K events p/s and comparatively minimal reads.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 31 July 2014 01:35, Alex alex@gmail.com wrote:

Hello,

We wish to set up an entire ELK system with the following features:

- Input from Logstash shippers located on 400 Linux VMs. Only a
handful of log sources on each

3rd party scoring service

2014-07-31 Thread Alex S.V.

Hello,

My idea is to use 3rd party scoring service (REST), and currently I'd like
to use native scripts and play with NativeScriptFactory.
The approach has many drawbacks.

Here is my problem - assume we have two entities - products and product
prices. I should filter by price.
Price is a complex thing, because it depends on many factors, like request
date, remote user information, custom provided parameters. In case of
regular parent - child relation and has_child query it's too complex and
too slow to implement it using scripting (currently mvel).

Also one more condition - i have not many products - around 25K, and around
25M different base price items (which are basic for future price
calculation).
There are next ideas:
1. Have a service, which returns exact price for all product by custom
parameters like. The drawback is - there should be 5 same calls from each
shard (if 5 by default). In this case it doesn't matter, where base prices
are stored - in elasticsearch index, in database or in in-memory storage.
2. Write a code, which operates over child price documents on concrete
shard. In this case it will generate prices only for all properties from
particular shard. But I don't know, if I can access shard index or make
calls to the index from concrete shard in NativeScriptFactory class.

Could you point me the right way?

P.S. Initially I was interested in Redis-Elasticsearch
example http://java.dzone.com/articles/connecting-redis-elasticsearch

Thanks,
Alex

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/893b22dc-1415-475b-8675-596119f4f1f8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: 3rd party scoring service

2014-07-31 Thread Alex S.V.

I think it's acceptable if service responds with 20ms and using some thrift
protocol for example. It's much better then current 500ms - 5s calculations
using elasticsearch scripting.
If we have 25K products than it could be around 300Kb data package from
this service. The risk is in possible broken communication or some
increased latency

Alex

On Thursday, July 31, 2014 1:59:36 PM UTC+3, Itamar Syn-Hershko wrote:

You should bring the price over to Elasticsearch and not the other way
around. Scoring against an external service is an added friction with huge
performance costs.

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Jul 31, 2014 at 1:50 PM, Alex S.V. alexs.v...@gmail.com
javascript: wrote:

Hello,

My idea is to use 3rd party scoring service (REST), and currently I'd
like to use native scripts and play with NativeScriptFactory.
The approach has many drawbacks.

Here is my problem - assume we have two entities - products and product
prices. I should filter by price.
Price is a complex thing, because it depends on many factors, like
request date, remote user information, custom provided parameters. In case
of regular parent - child relation and has_child query it's too complex and
too slow to implement it using scripting (currently mvel).

Also one more condition - i have not many products - around 25K, and
around 25M different base price items (which are basic for future price
calculation).
There are next ideas:
1. Have a service, which returns exact price for all product by custom
parameters like. The drawback is - there should be 5 same calls from each
shard (if 5 by default). In this case it doesn't matter, where base prices
are stored - in elasticsearch index, in database or in in-memory storage.
2. Write a code, which operates over child price documents on concrete
shard. In this case it will generate prices only for all properties from
particular shard. But I don't know, if I can access shard index or make
calls to the index from concrete shard in NativeScriptFactory class.

Could you point me the right way?

P.S. Initially I was interested in Redis-Elasticsearch example
http://java.dzone.com/articles/connecting-redis-elasticsearch

Thanks,
Alex

https://groups.google.com/d/msgid/elasticsearch/893b22dc-1415-475b-8675-596119f4f1f8%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c61f9637-3de8-4906-a2c4-49055dee2cd5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Recommendations needed for large ELK system design

2014-07-31 Thread Alex

Hello Mark,

Thank you for your reply, it certainly helps to clarify many things.

Of course I have some new questions for you!

1. I haven't looked into it much yet but I'm guessing Curator can handle
different index naming schemes. E.g. logs-2014.06.30 and
stats-2014.06.30. We'd actually be wanting to store the stats data for 2
years and logs for 90 days so it would indeed be helpful to split the data
into different index sets. Do you use Curator?

2. You say that you have 3 masters that also handle queries... but I
thought all masters did was handle queries? What is a master node that
*doesn't* handle queries? Should we have search load balancer nodes? AKA
not master and not data nodes.

3. In the interests of reducing the number of node combinations for us
to test out would you say, then, that 3 master (and query(??)) only nodes,
and the 6 1TB data only nodes would be good?

4. Quorum and split brain are new to me. This webpage

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency:

index operations only succeed if a quorum (replicas/2+1) of active shards
are available. I completely understand the split brain issue, but not
quorum. Is quorum handled automatically or should I change some settings?

Thanks again for your help, we appreciate your time and knowledge!
Regards,
Alex

On Thursday, 31 July 2014 05:57:35 UTC+1, Mark Walkom wrote:

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com javascript:
web: www.campaignmonitor.com

On 31 July 2014 01:35, Alex alex@gmail.com javascript: wrote:

Hello,

We wish to set up an entire ELK system with the following features:

- Input from Logstash shippers located on 400 Linux VMs. Only a
handful of log sources on each VM.
- Data retention for 30 days, which is roughly 2TB of data in indexed
ES JSON form (not including replica shards)
- Estimated input data rate of 50 messages per second at peak hours.
Mostly short or medium length one-line messages but there will be Java
traces and very large service responses (in the form of XML) to deal with
too.
- The entire system would be on our company LAN.
- The stored data will be a mix of application logs (info, errors
etc) and server stats (CPU, memory usage etc) and would mostly be
accessed
through Kibana.

This is our current plan:

- Have the LS shippers perform minimal parsing (but would do
multiline). Have them point to two load-balanced servers containing Redis
and LS indexers (which would do all parsing).
- 2 replica shards for each index, which ramps the total data storage
up to 6TB
- ES cluster spread over 6 nodes. Each node is 1TB in size
- LS indexers pointing to cluster.

So I have a couple questions regarding the setup and would greatly
appreciate the advice of someone with experience!

1. Does the balance between the number of nodes, the number of
replica

Recommendations needed for large ELK system design

2014-07-30 Thread Alex

Hello,

We wish to set up an entire ELK system with the following features:

- Input from Logstash shippers located on 400 Linux VMs. Only a handful
of log sources on each VM.
- Data retention for 30 days, which is roughly 2TB of data in indexed ES
JSON form (not including replica shards)
- Estimated input data rate of 50 messages per second at peak hours.
Mostly short or medium length one-line messages but there will be Java
traces and very large service responses (in the form of XML) to deal with
too.
- The entire system would be on our company LAN.
- The stored data will be a mix of application logs (info, errors etc)
and server stats (CPU, memory usage etc) and would mostly be accessed
through Kibana.

This is our current plan:

- Have the LS shippers perform minimal parsing (but would do multiline).
Have them point to two load-balanced servers containing Redis and LS
indexers (which would do all parsing).
- 2 replica shards for each index, which ramps the total data storage up
to 6TB
- ES cluster spread over 6 nodes. Each node is 1TB in size
- LS indexers pointing to cluster.

So I have a couple questions regarding the setup and would greatly
appreciate the advice of someone with experience!

1. Does the balance between the number of nodes, the number of replica
shards, and storage size of each node seem about right? We use
high-performance equipment and would expect minimal downtime.

2. What is your recommendation for the system design of the LS indexers
and Redis? I've seen various designs with each indexer assigned to a single
Redis, or all indexers reading from all Redises.

3. Leading from the previous question, what would your recommend data
size for the Redis servers be?

4. Not sure what to do about master/data nodes. Assuming all the nodes
are on identical hardware would it be beneficial to have a node which is
only a master which would only handle requests?

5. Do we need to do any additional load balancing on the ES nodes?

We are open to any and all suggestions. We have not yet committed to any
particular design so can change if needed.

Thank you for your time and responses,
Alex

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b0aee66a-35bb-4770-927b-d9c7e13ad9fc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

When to use multiple clusters

2014-07-23 Thread Alex Kehayias

I have several large indices (100M docs) on the same cluster. Is there any 
advice of when it is appropriate to separate into multiple clusters vs one 
large one? Each index has a slightly different usage profile (read vs write 
heavy, update vs insert). How many indices would you recommend for a single 
cluster? Is it ok to have many large indices on the same cluster? 

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/24ebf4dc-f281-4574-8cbb-cb049c4fac71%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: When to use multiple clusters

2014-07-23 Thread Alex Kehayias

Thanks Mark! We're deploying on EC2 (always a good time). Seems like the
mixture of different indices that have different usage profiles is leading
to some performance issues that a dedicated cluster would be more
appropriate for.

On Wednesday, July 23, 2014 7:04:34 PM UTC-4, Mark Walkom wrote:

Depends what your hardware profiles are like, and a bunch of other things
related to you and your environment.
eg If you have high end servers then it makes sense to put your heavy
read/write indexes into a cluster on those, then leave the rest for more
average machines.

We have multiple clusters based on use. One for an application text based
search, one for application logging, one for system logging and we're going
to spin up another one for a new project we're starting. This might sound
like a waste of resources, and it probably is to a degree, but we have the
infrastructure for it and it makes things easier to manage.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com javascript:
web: www.campaignmonitor.com

On 24 July 2014 00:34, Alex Kehayias al...@shareablee.com javascript:
wrote:

I have several large indices (100M docs) on the same cluster. Is there
any advice of when it is appropriate to separate into multiple clusters vs
one large one? Each index has a slightly different usage profile (read vs
write heavy, update vs insert). How many indices would you recommend for a
single cluster? Is it ok to have many large indices on the same cluster?

Thanks!

https://groups.google.com/d/msgid/elasticsearch/24ebf4dc-f281-4574-8cbb-cb049c4fac71%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6571673c-472f-4013-9608-d511a9f66d86%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

elasticsearch dynamic scripting vs static script - deployment

2014-07-11 Thread Alex S.V.

Hi,

We've been also hacked on our staging server because of opened ports :)
I find dynamic scripting flexible for applications, but static scripting 
causes bunch of problems:

1. I should deploy it in special directory at elasticsearch node? We are 
using capistrano for web-app deployment and it's easy procedure, though we 
should provide additional access to elasticsearch node filesystem
2. I don't know, how to support script versions? just append _v1, _v2, etc. 
suffixes in filename?
3. Should I deploy on one node, or on each node? If I must deploy on each 
node - what happens if one node has a script, and other doesn't have?

Regards,
Alex


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/20681e2f-bb8b-4602-8b19-ed27b661a88b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

help resolving a classpath problem(?) with elasticsearch 1.2.1 in apache-storm

2014-06-23 Thread Alex Lovell-Troy

I'm using apache-storm as a data pipeline that indexes results with 
elasticsearch.  Using the latest I can find of all components, I get an 
error any time a storm component attempts to join elasticsearch as a Node 
client, which I believe will give me better performance than 
TransportClient.

Caused by: java.lang.IllegalArgumentException: A SPI class of type 
org.apache.lucene.codecs.PostingsFormat with name 'Direct' does not exist. 
You need to add the corresponding JAR file supporting this SPI to your 
classpath.The current classpath supports the following names: 
[XBloomFilter, es090, completion090]

According to https://github.com/elasticsearch/elasticsearch/issues/3350 
this is just how the SPI loader stuff works that lucene uses

I tried following the directions in the issue, but even with the shade 
plugin, I'm still seeing the same thing.

Does anyone have experience with this that can share a pom.xml snippet or 
guide me to some applicable docs?

-alex

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9b9b9e6c-ea3d-4de9-b7aa-53c4bbd40586%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Need help on similarity ranking approach

2014-05-29 Thread Alex Ksikes

Hello,

I am not sure that would work. I'd first index you document, and then use 
mlt with this document id and include set to true (added in latest ES 
release). Then you'll know how far your documents are from the queried 
document. Also, make sure to pick up most of the terms, by 
setting percent_terms_to_match=0, max_query_terms=high value 
and min_doc_freq=1. In order to know what terms from the queried document 
have matched in the response, you can use explain.

Alex

On Thursday, May 29, 2014 10:42:47 AM UTC+2, Rgs wrote:

 hi, 

 What i did now is, i have created a custom similarity  similarity 
 provider 
 class which extends DefaultSimilarity and AbstractSimilarityProvider 
 classes 
 respectively and overridden the idf() method to return 1. 

 Now I'm getting some percentage values like 1, 0.987, 0.876 etc and 
 interpret it as 100%, 98%, 87% etc. 

 Can you please confirm whether this approach can be taken for finding the 
 percentage of similarity? 

 sorry for the late reply. 

 Thanks 
 Rgs 



 -- 
 View this message in context: 
 http://elasticsearch-users.115913.n3.nabble.com/Need-help-on-similarity-ranking-approach-tp4054847p4056680.html
  
 Sent from the ElasticSearch Users mailing list archive at Nabble.com. 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/184a015f-fe68-4a24-999b-367d60d23798%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Need help on similarity ranking approach

2014-05-29 Thread Alex Ksikes

Also this plugin could provide a solution to your problem:

http://yannbrrd.github.io/

On Thursday, May 29, 2014 10:42:47 AM UTC+2, Rgs wrote:

 hi, 

 What i did now is, i have created a custom similarity  similarity 
 provider 
 class which extends DefaultSimilarity and AbstractSimilarityProvider 
 classes 
 respectively and overridden the idf() method to return 1. 

 Now I'm getting some percentage values like 1, 0.987, 0.876 etc and 
 interpret it as 100%, 98%, 87% etc. 

 Can you please confirm whether this approach can be taken for finding the 
 percentage of similarity? 

 sorry for the late reply. 

 Thanks 
 Rgs 



 -- 
 View this message in context: 
 http://elasticsearch-users.115913.n3.nabble.com/Need-help-on-similarity-ranking-approach-tp4054847p4056680.html
  
 Sent from the ElasticSearch Users mailing list archive at Nabble.com. 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d4a2ee12-b9af-4142-a2e9-71b85cc9141c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Queries seem to ignore custom analyzer

2014-05-24 Thread Alex Philipp

I'm using a custom analyzer to stem possessive english.  My custom analyzer 
seems to be ignored.  As a sample search, we'll use McDonald's.

What I used to create my analyzer:

{
  settings: {
analysis: {
  analyzer: {
default: {
  type: custom,
  tokenizer: standard,
  filter: [
standard,
lowercase,
stop,
pos_english
  ]
}
  },
  filter: {
pos_english: {
  type: stemmer,
  name: possessive_english
}
  }
}
  }
}

My mapping:

{
  item: {
_boost: {
  name: custom_boost,
  null_value: 1
},
properties: {
  servings: {
enabled: false,
type: object
  },
  brand_name: {
index: analyzed,
type: string,
store: false
  },
  food_name: {
index: analyzed,
type: string,
store: false
  }
}
  }
} 


When I test the analyzer for the text 'McDonald's', it seems to work 
properly:

{
  tokens: [
{
  token: mcdonald,
  start_offset: 0,
  end_offset: 10,
  type: ALPHANUM,
  position: 1
}
  ]
}

However, if I search for 'McDonald', I get no results.  If I search for 
 'McDonald's' (with the possessive), I get my expected results.  It seems 
like the analyzer is being ignored during the query.

Search query that returns no results:

{
  query: {
match: {
  _all: {
query: mcdonalds
  }
}
  }
}

 Any idea what I'm doing wrong?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/93e3e3b3-1fc7-443d-970e-47bb43c757e4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Object interpolation in template queries

2014-05-17 Thread Alex G

Hello,

I'm interested in using query templates - 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-template-query.html
 
- however for my purposes I was hoping ES treaded queries as simple strings 
for the purpose of mustache interpretation and I would to be able to 
substitute in parameters more complex than partial strings - for example to 
define a param as the contents of a value and then pass in an arbitrary 
object. 

IE something along the lines of 
{
  query: {
template: {
  query: {
filtered: { 
  filter : {
and : {{ filters }}
  }
}
  },
  params : {
filters : [
  {terms : { foo : [a,b ] } },
  {terms : { bar : [q,z ] } }
]
  }
}
  }
}


Experimentation suggests this isn't supported but I understand that the 
query templates system is somewhat under construction or review - are there 
plans to offer support for passing in entire parts of queries via params or 
should I look at doing this kind of interpolation before the query gets to 
ES? Or is this possible and I'm simply doing it wrong?

Thanks,
Alex

-- 


--

*CONFIDENTIALITY NOTICE:* The information contained in this message may be 
privileged and/or confidential. It is the property of CrowdStrike.  If you are 
not the intended recipient, or responsible for delivering this message to the 
intended recipient, any review, forwarding, dissemination, distribution or 
copying of this communication or any attachment(s) is strictly prohibited. If 
you have received this message in error, please notify the sender immediately, 
and delete it and all attachments from your computer and network.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8b83afe9-aa92-4751-8178-2c33bbc94428%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: more like this on numbers

2014-05-08 Thread Alex Ksikes

Hi Valentin,

For these types of searches, have you looked into range queries, perhaps
combined in a boolean query?

Alex
On May 7, 2014 4:14 PM, Valentin plet...@gmail.com wrote:

Hi Alex,

thanks. Good idea to convert the numbers into strings. But converting the
number fields to string won't exactly solve my problem. Only if there would
be an analyzer which breaks down numbers into multiple tokens. Eg 300 into
100, 200, 300

Cheers,
Valentin

On Tuesday, May 6, 2014 12:04:53 PM UTC+2, Alex Ksikes wrote:

Hi Valentin,

As you know, you can only perform mlt on fields which are analyzed.
However, you can convert your other fields (number, ..) to text using a
multi field with type string at indexing time.

Cheers,

Alex

On Thursday, March 27, 2014 4:31:58 PM UTC+1, Valentin wrote:

Hi,

as far as I understand it the more like this query allows to find
documents where the same tokens are used. I wonder if there is a
possibility to find documents where a particular field is compared based on
its value (number).

Regards
Valentin

PS: elasticsearch rocks!

--
You received this message because you are subscribed to a topic in the
Google Groups elasticsearch group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/Wsye6JD__ys/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/195f8fa2-821f-4556-b9ae-8924b35c859f%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/195f8fa2-821f-4556-b9ae-8924b35c859f%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAMrXmPdWStJjTaW5%3D27MrMNLHPkK1hihgrs%3DDs-SAiHzHz9eAQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: MoreLikeThis can't identify that 2 documents with exactly same attachments are duplicates

2014-05-08 Thread Alex Ksikes

On May 8, 2014 8:09 AM, Zoran Jeremic zoran.jere...@gmail.com wrote:

 Hi Alex,

 Thank you for this explanation. This really helped me to understand how
it works, and now I managed to get results I was expecting just after
setting max_query_terms value to be 0 or some very high value. With these
results in my tests I was able to identify duplicates. I noticed couple of
things though.

 - I got much better results with web pages when I indexed attachment as
html source and use text extracted by Jsoup in query, then when I indexed
text extracted from web page as attachment and used text in query. I
suppose that difference is related to the fact that Jsoup did not extract
text in the same way as Tika parser used by ES did.
 - There was significant improvement in the results in the second test
when I have indexed 50 web pages, then in first test when I indexed 10 web
pages. I deleted index before each test. I suppose that this is related to
the tf*idf.
 If so, does it make sense to provide some training set for elasticsearch
that will be used to populate index before system is started to be used?

Perhaps you are asking for a background dataset to bias the selection of
interesting terms. This could make sense depending on your application.

 Could you please define relevant in your setting? In a corpus of very
similar documents, is your goal to find the ones which are oddly different?
Have you looked into ES significant terms?
 I have the service that recommends documents to the students based on
their current learning context. It creates tokenized string from titles,
descriptions and keywords of the course lessons student is working at the
moment. I'm using this string as input to the mlt_like_text to find some
interesting resources that could help them.
 I want to avoid having duplicates (or very similar documents) among top
documents that are recommended.
 My idea was that during the documents uploading (before I index it with
elasticsearch) I find if there already exists it's duplicate, and store
this information as ES document field. Later, in query I can specify that
duplicates are not recommended.

 Here you should probably strip the html tags, and solely index the text
in its own field.
 As I already mentioned this didn't give me good results for some reason.

 Do you think this approach would work fine with large textual documents,
e.g. pdf documents having couple of hundred of pages? My main concern is
related to performances of these queries using like_text, so that's why I
was trying to avoid this approach and use mlt with document id as input.

I don't think this approach would work well in this case, but you should
try. I think what you are after is to either extract good features for your
PDF documents and search on that, or finger printing. This could be
achieved by playing with analyzers.

 Thanks,
 Zoran



 On Wednesday, 7 May 2014 06:14:56 UTC-7, Alex Ksikes wrote:

 Hi Zoran,

 In a nutshell 'more like this' creates a large boolean disjunctive query
of 'max_query_terms' number of interesting terms from a text specified in
'like_text'. The interesting terms are picked up with respect to the their
tf-idf scores in the whole corpus. These later parameters could be tuned
with 'min_term_freq', 'min_doc_freq', and 'min_doc_freq' parameters. The
number of boolean clauses that must match is controlled by
'percent_terms_to_match'. In the case of specifying only one field in
'fields', the analyzer used to pick up the terms in 'like_text' is the one
associated with the field, unless specified specified by 'analyzer'. So as
an example, the default is to create a boolean query of 25 interesting
terms where only 30% of the should clauses must match.

 On Wednesday, May 7, 2014 5:14:11 AM UTC+2, Zoran Jeremic wrote:

 Hi Alex,


 If you are looking for exact duplicates then hashing the file content,
and doing a search for that hash would do the job.
 This trick won't work for me as these are not exact duplicates. For
example, I have 10 students working on the same 100 pages long word
document. Each of these students could change only one sentence and upload
a document. The hash will be different, but it's 99,99 % same documents.
 I have the other service that uses mlt_like_text to recommend some
relevant documents, and my problem is if this document has best score, then
all duplicates will be among top hits and instead recommending users with
several most relevant documents I will recommend 10 instances of same
document.


 Could you please define relevant in your setting? In a corpus of very
similar documents, is your goal to find the ones which are oddly different?
Have you looked into ES significant terms?


 If you are looking for near duplicates, then I would recommend
extracting whatever text you have in your html, pdf, doc, indexing that and
running more like this with like_text set to that content.
 I tried that as well, and results are very disappointing, though I'm
not sure if that would

Re: How to find the difference between aggregate min from aggregate max(max - min) in ES?

2014-05-07 Thread Alex Mathew

Thank you Adrien Grand for reply.
Is it possible to use aggregate functions inside script??

On Wednesday, May 7, 2014 5:31:20 PM UTC+5:30, Adrien Grand wrote:

Hi,

There is no way to do it on the Elasticsearch side for the moment. It can
only be done on client side.

On Wed, May 7, 2014 at 1:37 PM, Alex Mathew
alexmathe...@gmail.comjavascript:
wrote:

How to write an ES query to find the difference between max and min value
of a field?

I am a newbee in elastic search, In my case I feed lot of events along
with session_id and time in to elastic search. My event structure is

Event_name string
Client_id string
App_id string
Session_id string
User_idstring
Ip_address string
Latitude int64
Longitude int64
Event_time time.Time

I want to find the life time of a session_id based the feeded events. For
that I can retrive the maximum Event_time and minimum Event_time for a
particular session_id by the following ES query.

{
size: 0,
query: {
match: {
Session_id: dummySessionId
}
},
aggs: {
max_time: {
max: {
field: Time
}
},
min_time:{
min: {
field: Time
}
}
}
}

But what I exact want is (max_time - min_time) How to write the ES query for
the same

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com javascript:.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1e937884-4052-4a5a-91db-bc1449c43efe%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/1e937884-4052-4a5a-91db-bc1449c43efe%40googlegroups.com?utm_medium=emailutm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ab72a9e2-60d4-4865-9c71-351b79322f29%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: How to find the difference between aggregate min from aggregate max(max - min) in ES?

2014-05-07 Thread Alex Mathew

Thank you Adrien Grand for reply.
Is it possible to use aggregate functions inside script??

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c7a1d6a8-1bb7-472c-9be1-7da4d9327e3e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: MoreLikeThis can't identify that 2 documents with exactly same attachments are duplicates

2014-05-07 Thread Alex Ksikes

Hi Zoran,

In a nutshell 'more like this' creates a large boolean disjunctive query of 
'max_query_terms' number of interesting terms from a text specified in 
'like_text'. The interesting terms are picked up with respect to the their 
tf-idf scores in the whole corpus. These later parameters could be tuned 
with 'min_term_freq', 'min_doc_freq', and 'min_doc_freq' parameters. The 
number of boolean clauses that must match is controlled by 
'percent_terms_to_match'. In the case of specifying only one field in 
'fields', the analyzer used to pick up the terms in 'like_text' is the one 
associated with the field, unless specified specified by 'analyzer'. So as 
an example, the default is to create a boolean query of 25 interesting 
terms where only 30% of the should clauses must match.

On Wednesday, May 7, 2014 5:14:11 AM UTC+2, Zoran Jeremic wrote:

 Hi Alex,


 If you are looking for exact duplicates then hashing the file content, and 
 doing a search for that hash would do the job.
 This trick won't work for me as these are not exact duplicates. For 
 example, I have 10 students working on the same 100 pages long word 
 document. Each of these students could change only one sentence and upload 
 a document. The hash will be different, but it's 99,99 % same documents. 
 I have the other service that uses mlt_like_text to recommend some 
 relevant documents, and my problem is if this document has best score, then 
 all duplicates will be among top hits and instead recommending users with 
 several most relevant documents I will recommend 10 instances of same 
 document. 


Could you please define relevant in your setting? In a corpus of very 
similar documents, is your goal to find the ones which are oddly different? 
Have you looked into ES significant terms?
 

 If you are looking for near duplicates, then I would recommend extracting 
 whatever text you have in your html, pdf, doc, indexing that and running 
 more like this with like_text set to that content.
 I tried that as well, and results are very disappointing, though I'm not 
 sure if that would be good idea having in mind that long textual documents 
 could be used. For testing purpose, I made a simple test with 10 web pages. 
 Maybe I'm making some mistake there. What I did is to index 10 web pages 
 and store it in document as attachment. Content is stored as byte[]. Then 
 I'm using the same 10 pages, extract content using Jsoup, and try to find 
 similar web pages. Here is the code that I used to find similar web pages 
 to the provided one:
 System.out.println(Duplicates for link:+link);
  System.out.println(
 );
  String indexName=ESIndexNames.INDEX_DOCUMENTS;
  String indexType=ESIndexTypes.DOCUMENT;
  String mapping = copyToStringFromClasspath(
 /org/prosolo/services/indexing/document-mapping.json);
  client.admin().indices().putMapping(putMappingRequest(
 indexName).type(indexType).source(mapping)).actionGet();
  URL url = new URL(link);
 org.jsoup.nodes.Document doc=Jsoup.connect(link).get();
   String html=doc.html(); //doc.text();
  QueryBuilder qb = null;
  // create the query
  qb = QueryBuilders.moreLikeThisQuery(file)
  .likeText(html).minTermFreq(0).minDocFreq(0);
  SearchResponse sr = client.prepareSearch(ESIndexNames.
 INDEX_DOCUMENTS)
  .setQuery(qb).addFields(url, title, contentType
 )
  .setFrom(0).setSize(5).execute().actionGet();
  if (sr != null) {
  SearchHits searchHits = sr.getHits();
  IteratorSearchHit hitsIter = searchHits.iterator();
  while (hitsIter.hasNext()) {
  SearchHit searchHit = hitsIter.next();
  System.out.println(Duplicate: + searchHit.getId()
  +  title:+searchHit.getFields().get(url).
 getValue()+ score: + searchHit.getScore());
   }
  }

 And results of the execution of this for each of 10 urls is:
  
 Duplicates for link:http://en.wikipedia.org/wiki/Mathematical_logic
 
 Duplicate:Crwk_36bTUCEso1ambs0bA URL:http://
 en.wikipedia.org/wiki/Mathematical_logic score:0.3335998
 Duplicate:--3l-WRuQL2osXg71ixw7A URL:http://
 en.wikipedia.org/wiki/Chemistry score:0.16319205
 Duplicate:8dDa6HsBS12HrI0XgFVLvA URL:http://
 en.wikipedia.org/wiki/Formal_science score:0.13035104
 Duplicate:1APeDW0KQnWRv_8mihrz4A 
 URL:http://en.wikipedia.org/wiki/Starscore:0.12292466
 Duplicate:2NElV2ULQxqcbFhd2pVy0w URL:http://
 en.wikipedia.org/wiki/Crystallography score:0.117023855

 Duplicates for link:http://en.wikipedia.org/wiki/Mathematical_statistics
 
 Duplicate:Crwk_36bTUCEso1ambs0bA URL:http://
 en.wikipedia.org/wiki

Re: MoreLikeThis can't identify that 2 documents with exactly same attachments are duplicates

2014-05-06 Thread Alex Ksikes

Hi Zoran,

If you are looking for exact duplicates then hashing the file content, and
doing a search for that hash would do the job. If you are looking for near
duplicates, then I would recommend extracting whatever text you have in
your html, pdf, doc, indexing that and running more like this with
like_text set to that content. Additionally you can perform a mlt search on
more fields including the meta-data fields extracted with the attachment
plugin. Hope this helps.

Alex

On Monday, May 5, 2014 8:08:30 PM UTC+2, Zoran Jeremic wrote:

Hi Alex,

Thank you for your explanation. It makes sense now. However, I'm not sure
I understood your proposal.

So I would adjust the mlt_fields accordingly, and possibly extract the
relevant portions of texts manually
What do you mean by adjusting mlt_fields? The only shared field that is
guaranteed to be same is file. Different users could add different titles
to documents, but attach same or almost the same documents. If I compare
documents based on the other fields, it doesn't mean that it will match,
even though attached files are exactly the same.
I'm also not sure what did you mean by extract the relevant portions of
text manually. How would I do that and what to do with it?

Thanks,
Zoran

On Monday, 5 May 2014 01:23:49 UTC-7, Alex Ksikes wrote:

Hi Zoran,

Using the attachment type, you can text search over the attached document
meta-data, but not its actual content, as it is base 64 encoded. So I would
adjust the mlt_fields accordingly, and possibly extract the relevant
portions of texts manually. Also set percent_terms_to_match = 0, to ensure
that all boolean clauses match. Let me know how this works out for you.

Cheers,

Alex

On Monday, May 5, 2014 5:50:07 AM UTC+2, Zoran Jeremic wrote:

Hi guys,

I have a document that stores a content of html file, pdf, doc or other
textual document in one of it's fields as byte array using attachment
plugin. Mapping is as follows:

{ document:{
properties:{
title:{type:string,store:true },
description:{type:string,store:yes},
contentType:{type:string,store:yes},
url:{store:yes, type:string},
visibility: { store:yes, type:string},
ownerId: {type: long, store:yes },
relatedToType: { type: string, store:yes },
relatedToId: {type: long, store:yes },
file:{
path: full,type:attachment,
fields:{
author: { type: string },
title: { store: true,type: string },
keywords: { type: string },
file: { store: true, term_vector:
with_positions_offsets,type: string },
name: { type: string },
content_length: { type: integer },
date: { format: dateOptionalTime, type:
date },
content_type: { type: string }
}
}}
And the code I'm using to store the document is:

VisibilityType.PUBLIC

These files seems to be stored fine and I can search content. However, I
need to identify if there are duplicates of web pages or files stored in
ES, so I don't return the same documents to the user as search or
recommendation result. My expectation was that I could use MoreLikeThis
after the document was indexed to identify if there are duplicates of that
document and accordingly to mark it as duplicate. However, results look
weird for me, or I don't understand very well how MoreLikeThis works.

For example, I indexed web page http://en.wikipedia.org/wiki/Linguistics3
times, and all 3 documents in ES have exactly the same binary content
under file. Then for the following query:

http://localhost:9200/documents/document/WpkcK-ZjSMi_l6iRq0Vuhg/_mlt?mlt_fields=filemin_doc_freq=1
where ID is id of one of these documents I got these results:
http://en.wikipedia.org/wiki/Linguistics with score 0.6633003
http://en.wikipedia.org/wiki/Linguistics with score 0.6197818
http://en.wikipedia.org/wiki/Computational_linguistics with score
0.48509508
...

For some other examples, scores for the same documents are much lower,
and sometimes (though not that often) I don't get duplicates on the first
positions. I would expect here to have score 1.0 or higher for documents
that are exactly the same, but it's not the case, and I can't figure out
how could I identify if there are duplicates in the Elasticsearch index.

I would appreciate if somebody could explain if this is expected
behaviour or I didn't use it properly.

Thanks,
Zoran

Re: Need help on similarity ranking approach

2014-05-06 Thread Alex Ksikes

Hello,

What you want to know is the score of the document that has matched itself
using more like this. The API excludes the queried document. However, it is
equivalent to running a boolean query of more like this field for each of
the queried document field. This will give you as top result, the document
that has matched itself, so that you can compute the percentage of
similarity of the remaining matched documents.

Alex

On Friday, May 2, 2014 3:22:34 PM UTC+2, Rgs wrote:

Thanks Binh Ly and Ivan Brusic for your replies.

I need to find the similarity in percentage of a document against other
documents and this will be considered for grouping the documents.

is it possible to get the similarity percentage using more like this
query?
or is any other way to calculate the percentage of similarity from the
query
result?

Eg: document1 is 90% similar to document2.
document1 is 45% similar to document3
etc..

Thanks

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Need-help-on-similarity-ranking-approach-tp4054847p4055227.html

Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/05db016b-1c2e-497c-9275-37dcccedfae3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: MoreLikeThis ignores queries?

2014-05-06 Thread Alex Ksikes

Hello Alexey,

You should use the query DSL and not the more like this API. You can create 
a boolean query where one clause is your more like this query and the other 
one is your ignore category query (better use a filter here if you can). 

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html

However, more like this of the DSL only takes a like_text parameter, you 
cannot pass the id of the document. This will change in a subsequent 
version of ES. For now, to simulate this functionality, you can use 
multiple mlt queries with a like_text set to the value of each field of the 
queried document, inside a boolean query. Let me know if this helps.

Alex

On Wednesday, March 19, 2014 5:01:06 AM UTC+1, Alexey Bagryancev wrote:

 Anyone can help me? It really does not work...

 среда, 19 марта 2014 г., 2:05:49 UTC+7 пользователь Alexey Bagryancev 
 написал:

 Hi,

 I am trying to filter moreLikeThis results by adding additional query - 
 but it seems to ignore it at all.

 I tried to run my ignoreQuery separately and it works fine, but how to 
 make it work with moreLikeThis? Please help me.

 $ignoreQuery = $this-IgnoreCategoryQuery('movies')



 $this-resultsSet = $this-index-moreLikeThis(
new \Elastica\Document($id), 
array_merge($this-mlt_fields, array('search_size' = $this-
 size, 'search_from' = $this-from)), 
$ignoreQuery);



 My IgnoreCategory function:

 public function IgnoreCategoryQuery($category = 'main') 
 { 
  $categoriesTermQuery = new \Elastica\Query\Term();
  $categoriesTermQuery-setTerm('categories', $category);
  
  $categoriesBoolQuery = new \Elastica\Query\Bool(); 
  $categoriesBoolQuery-addMustNot($categoriesTermQuery);
  
  return $categoriesBoolQuery;
 }




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e605d6e2-b42b-4661-b819-90735a9581ec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elastic Search MLT API, how to use fields with weights.

2014-05-06 Thread Alex Ksikes

I'd like to add to this that mlt API is the same as a boolean query DSL 
made of multiple more like this field clauses, where each field is set to 
the content of the field of the queried document.

On Thursday, February 20, 2014 4:20:36 PM UTC+1, Binh Ly wrote:

 I do not believe you can boost individual fields/terms separately in a MLT 
 query. Your best bet is to probably run a bool query of multiple MLT 
 queries each with a different field and boost, but you'll need to first 
 extract the MLT text before you can do this.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2fcd0453-58dc-4a66-b7d9-2e785a2a7fa6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Interesting Terms for MoreLikeThis Query in ElasticSearch

2014-05-06 Thread Alex Ksikes

You could always use explain to find out the best matching terms of any 
query. In order to get all the interesting terms, you could run a query 
where the top result document has matched itself.

Also the new significant terms might be of interest to you:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html

On Thursday, January 30, 2014 9:59:02 PM UTC+1, api...@clearedgeit.com 
wrote:

 I have been trying to figure out how to get interesting terms using the 
 MLT query.  Does ElasticSearch have this functionality similar to solr or 
 if not, is there a work around?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/201edd47-d5d1-4fcf-a520-184737b6b7ec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: More like this scoring algorithm unclear

2014-05-06 Thread Alex Ksikes

Hi Maarten,

Your 'like_text' is analyzed, the same way your 'product_id' field is
analyzed, unless specified by 'analyzer'. I would recommend setting
'percent_terms_to_match' to 0. However, if you are only searching over
product ids then a simple boolean query would do. If not, then I would
create a boolean query where each clause is a 'more like this field' for
each field of the queried document. This is actually what the mlt API does.

Cheers,

Alex

On Wednesday, January 8, 2014 7:20:05 PM UTC+1, Maarten Roosendaal wrote:

scoring algorithm is still vague but i got the query to act like the API,
although the results are different so i'm still doing it wrong, here's an
example:
{
explain: true,
query: {
more_like_this: {
fields: [
PRODUCT_ID
],
like_text: 104004855475 1001004002067765 100200494210
1002004004499883,
min_term_freq: 1,
min_doc_freq: 1,
max_query_terms: 1,
percent_terms_to_match: 0.5
}
},
from: 0,
size: 50,
sort: [],
facets: {}
}

the like_text contains product_id's from a wishlist for which i want to
find similair lists

Op woensdag 8 januari 2014 16:50:53 UTC+1 schreef Maarten Roosendaal:

Hi,

Thanks, i'm not quite sure how to do that. I'm using:
http://localhost:9200/lists/list/[id of
list]/_mlt?mlt_field=product_idmin_term_freq=1min_doc_freq=1

the body does not seem to be respected (i'm using the elasticsearch head
plugin) if i ad:
{
explain: true
}

i've been trying to rewrite the mlt api as an mlt query but no luck so
far. Any suggestions?

Thanks,
Maarten

Op woensdag 8 januari 2014 16:14:25 UTC+1 schreef Justin Treher:

Hey Maarten,

I would use the explain:true option to see just why your documents are
being scored higher than others. MoreLikeThis using the same fulltext
scoring as far as I know, so term position would affect score.

http://lucene.apache.org/core/3_0_3/api/contrib-queries/org/apache/lucene/search/similar/MoreLikeThis.html

Justin

On Wednesday, January 8, 2014 3:04:47 AM UTC-5, Maarten Roosendaal wrote:

Hi,

I have a question about why the 'more like this' algorithm scores
documents higher than others, while they are (at first glance) the same.

What i've done is index wishlist-documents which contain 1 property:
product_id, this property contains an array of product_id's (e.g. [1234,
, , ]. What i'm trying to do is find similair wishlist for a
given wishlist with id x. The MLT API seems to work, it returns other
documents which contain at least 1 of the product_id's from the original
list.

But what is see is that, for example. i get 10 hits, the first 6 hits
contain the same (and only 1) product_id, this product_id is present in
the
original wishlist. What i would expect is that the score of the first 6 is
the same. However what i see is that only the first 2 have the same, the
next 2 a lower score and the next 2 even lower. Why is this?

Also, i'm trying to write the MLT API as an MLT query, but somehow it
doesn't work. I would expect that i need to take the entire content of the
original product_id property and feed is as input for the 'like_text'. The
documentation is not very clear and doesn't provide examples so i'm a
little lost.

Hope someone can give some pointers.

Thanks,
Maarten

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/91734252-74d0-4001-becc-a184af0f2997%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Snapshot Restore Frequency

2014-05-06 Thread Alex Philipp

According to the docs, snapshot operations are online and only store diffs. 
 Is there any particular reason to not run them at a fairly high frequency? 
 E.g. every 15 minutes?

Alex

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/60c3365b-7dec-487b-be45-c174ef992329%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Shared facet filtration

2014-04-09 Thread Alex G

Fantastic, that's exactly what I was looking for, thankyou!

On Wednesday, April 9, 2014 3:12:42 AM UTC+10, Ivan Brusic wrote:

 You should be able to use filtered queries instead, where the filter is 
 your facet filter: 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html

 The filtered query will filter documents before the query. Facets work on 
 the documents returned by the query, so if the documents are pre-filtered, 
 the facets will not even work on them.

 -- 
 Ivan




 On Mon, Apr 7, 2014 at 6:56 PM, Alex G alex@crowdstrike.comjavascript:
  wrote:

 Hello,

 I’m implementing a faceted interface that requires that all the facets be 
 filtered by a shared filter - below is roughly how the queries currently 
 look, is there a more efficient/performant way to make this kind of query? 
 Less fussed about actual query verbosity but if there some way of sharing 
 or referencing the repeated facet_filter other than search templates that’d 
 be fantastic.

 Thanks,

 Alex

 {
 facets: {
 facetOne: {
 facet_filter: {
 bool: {
 must: [
 {
 term: {
 foo.bar: test
 }
 },
 {
 term: {
 baz:test*
 }
 }
 ]
 }
 },
 terms: {
 field: facetOne.field,
 order: [count],
 size: 50
 }
 },

 facetTwo: {
 facet_filter: {
 bool: {
 must: [
 {
 term: {
 foo.bar: test
 }
 },
 {
 term: {
 baz:test*
 }
 }
 ]
 }
 },
 terms: {
 field: facetTwo.field,
 order: [count],
 size: 50
 }
 },

 facetThree: {
 facet_filter: {
 bool: {
 must: [
 {
 term: {
 foo.bar: test
 }
 },
 {
 term: {
 baz:test*
 }
 }
 ]
 }
 },
 terms: {
 field: facetThree.field,
 order: [count],
 size: 50
 }
 }
 },
 size: 0
 }




 --

 *CONFIDENTIALITY NOTICE:* The information contained in this message may be 
 privileged and/or confidential. It is the property of CrowdStrike.  If you 
 are not the intended recipient, or responsible for delivering this message 
 to the intended recipient, any review, forwarding, dissemination, 
 distribution or copying of this communication or any attachment(s) is 
 strictly prohibited. If you have received this message in error, please 
 notify the sender immediately, and delete it and all attachments from your 
 computer and network.

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/0993b970-b60c-4e38-b42c-953394abdac1%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/0993b970-b60c-4e38-b42c-953394abdac1%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.



-- 


--

*CONFIDENTIALITY NOTICE:* The information contained in this message may be 
privileged and/or confidential. It is the property of CrowdStrike.  If you are 
not the intended recipient, or responsible for delivering this message to the 
intended recipient, any review, forwarding, dissemination, distribution or 
copying of this communication or any attachment(s) is strictly prohibited. If 
you have received this message in error, please notify the sender immediately, 
and delete it and all attachments from your computer and network.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com

Re: synonyms in a query

2014-04-08 Thread Alex K

Hi Luiz,

thank you again for your reply.

I don't fully understand the part you mentioned:

After index one document with the title equals do core:

 curl -XPOST 'localhost:9200/myindex/test/1' -d '{
 title: core
 }'


Sorry, I am pretty new to ES and haven't understand pretty much.

Now what happens there?
And what if I don't have hardcoded synonyms, but a file which someone can 
fill out.
I need something like 

synonyms_path : analysis/synonym.txt

 in my filter, but then what about the setp you mentioned that I did not 
understand?

Sorry for all the trouble 



Am Montag, 7. April 2014 09:29:17 UTC+2 schrieb Alex K:

 Hello there,

 i have a query, example is this:
 {
 query: {
 bool: {
 should: [
 {
 multi_match: {
 query: foo,
 fields: [
 TITLE,
 SHORTDESC
 ],
 type: phrase_prefix
 }
 },
 {
 multi_match: {
 query: foo,
 cutoff_frequency: null,
 fields: [
 TITLE,
 SHORTDESC
 ]
 }
 }
 ]
 }
 },
 filter: {
 term: {
 ACTIVE: 1
 }
 },
 sort: {
 TITLE: {
 order: asc
 }
 },
 size: 7
 }

 Now I have the question if I can use synonyms here?

 I already saw that you can use a synonym-token inside an analyzer.
 But I have a query here, not an analyzer.
 Do I have to put an analyzer inside the query?

 I don't know much about ES yet, so this may be a total stupid question.
 Thank you in advance :-)


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/637c5a0a-89cc-47d3-9f59-785ddf6ccfc3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: synonyms in a query

2014-04-08 Thread Alex K

Hi Luiz,

thank you again for your reply.

A colleague of mine told me that I might miss a plugin to use my 
settings-file.
I will check this out and later write down here what I found out.

Sorry for all the trouble 

Am Montag, 7. April 2014 09:29:17 UTC+2 schrieb Alex K:

 Hello there,

 i have a query, example is this:
 {
 query: {
 bool: {
 should: [
 {
 multi_match: {
 query: foo,
 fields: [
 TITLE,
 SHORTDESC
 ],
 type: phrase_prefix
 }
 },
 {
 multi_match: {
 query: foo,
 cutoff_frequency: null,
 fields: [
 TITLE,
 SHORTDESC
 ]
 }
 }
 ]
 }
 },
 filter: {
 term: {
 ACTIVE: 1
 }
 },
 sort: {
 TITLE: {
 order: asc
 }
 },
 size: 7
 }

 Now I have the question if I can use synonyms here?

 I already saw that you can use a synonym-token inside an analyzer.
 But I have a query here, not an analyzer.
 Do I have to put an analyzer inside the query?

 I don't know much about ES yet, so this may be a total stupid question.
 Thank you in advance :-)


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ba838067-1277-4db9-a8f9-e306d47d6591%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: synonyms in a query

2014-04-08 Thread Alex K

Me again,

seems it was a local problem for me.
The way Luiz mentioned is the exact correct way.
Thank you very much, Luiz
You helped me really out of this!

Am Montag, 7. April 2014 09:29:17 UTC+2 schrieb Alex K:

 Hello there,

 i have a query, example is this:
 {
 query: {
 bool: {
 should: [
 {
 multi_match: {
 query: foo,
 fields: [
 TITLE,
 SHORTDESC
 ],
 type: phrase_prefix
 }
 },
 {
 multi_match: {
 query: foo,
 cutoff_frequency: null,
 fields: [
 TITLE,
 SHORTDESC
 ]
 }
 }
 ]
 }
 },
 filter: {
 term: {
 ACTIVE: 1
 }
 },
 sort: {
 TITLE: {
 order: asc
 }
 },
 size: 7
 }

 Now I have the question if I can use synonyms here?

 I already saw that you can use a synonym-token inside an analyzer.
 But I have a query here, not an analyzer.
 Do I have to put an analyzer inside the query?

 I don't know much about ES yet, so this may be a total stupid question.
 Thank you in advance :-)


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7a1878b4-6c79-42a3-b0d7-5562d0cbdece%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: running a specific integration test

2014-04-08 Thread Alex Marandon



On Friday, 9 December 2011 10:21:40 UTC+1, Karussell wrote:

 It is possible:


 http://maven.apache.org/plugins/maven-surefire-plugin/examples/single-test.htmlhttp://www.google.com/url?q=http%3A%2F%2Fmaven.apache.org%2Fplugins%2Fmaven-surefire-plugin%2Fexamples%2Fsingle-test.htmlsa=Dsntz=1usg=AFQjCNGZmEOlv7GgGLzLlscGY82yGfiGYw

 http://stackoverflow.com/questions/1873995/run-a-single-test-method-with-maven


Is this advice still valid?

I've tried different variations of the mvn test command with no luck so 
far. Example :

$ ES_TEST_LOCAL=true mvn test 
-Dtest=SimpleValidateQueryTests#simpleValidateQuery 
[INFO] Scanning for projects...
[...]
Executing 501 suites with 3 JVMs.

[...]
Suite: org.elasticsearch.search.aggregations.bucket.GeoDistanceTests
[...]

Can you give me a cli example of executing a specific test? Or do I have to 
use an IDE?

Thanks,
Alex

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3465d32d-ca52-4482-83b6-45d55751a12b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

synonyms in a query

2014-04-07 Thread Alex K

Hello there,

i have a query, example is this:
{
query: {
bool: {
should: [
{
multi_match: {
query: foo,
fields: [
TITLE,
SHORTDESC
],
type: phrase_prefix
}
},
{
multi_match: {
query: foo,
cutoff_frequency: null,
fields: [
TITLE,
SHORTDESC
]
}
}
]
}
},
filter: {
term: {
ACTIVE: 1
}
},
sort: {
TITLE: {
order: asc
}
},
size: 7
}

Now I have the question if I can use synonyms here?

I already saw that you can use a synonym-token inside an analyzer.
But I have a query here, not an analyzer.
Do I have to put an analyzer inside the query?

I don't know much about ES yet, so this may be a total stupid question.
Thank you in advance :-)

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5fe2d157-b437-4bd8-8a18-8aa4f41f63fe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: synonyms in a query

2014-04-07 Thread Alex K

Hello Luiz, thank you for your reply!

As we use rivers, I was told to declare the analyzer there.
It looks like this for me:
{
   index : {
  analysis : {
 filter : {
synonym_filter : {
type : synonym,
synonyms : [
foo, foo bar = core
]
}
 },
 analyzer : {
synonym : {
tokenizer : whitespace,
filter : [
synonym_filter
],
type : custom,
}
 }
  }
   }
}
which acctually says, for testing-purpose, 'if someone searches for 'foo' 
or 'foo bar', search for 'core' '

Now my query uses the analyzer:
{
query: {
bool: {
should: [
{
multi_match: {
query: foo,
fields: [
TITLE,
SHORTDESC
],
type: phrase_prefix,
analyzer: synonym
}
},
{
multi_match: {
query: foo,
cutoff_frequency: null,
fields: [
TITLE,
SHORTDESC
]
}
}
]
}
},
filter: {
term: {
ACTIVE: 1
}
},
sort: {
TITLE: {
order: asc
}
},
size: 7
}

But I get an error there:
[...]nested: QueryParsingException[[test484] [multi_match] analyzer 
[synonym] not found];[...]

What am I doing wrong here?


Am Montag, 7. April 2014 09:29:17 UTC+2 schrieb Alex K:

 Hello there,

 i have a query, example is this:
 {
 query: {
 bool: {
 should: [
 {
 multi_match: {
 query: foo,
 fields: [
 TITLE,
 SHORTDESC
 ],
 type: phrase_prefix
 }
 },
 {
 multi_match: {
 query: foo,
 cutoff_frequency: null,
 fields: [
 TITLE,
 SHORTDESC
 ]
 }
 }
 ]
 }
 },
 filter: {
 term: {
 ACTIVE: 1
 }
 },
 sort: {
 TITLE: {
 order: asc
 }
 },
 size: 7
 }

 Now I have the question if I can use synonyms here?

 I already saw that you can use a synonym-token inside an analyzer.
 But I have a query here, not an analyzer.
 Do I have to put an analyzer inside the query?

 I don't know much about ES yet, so this may be a total stupid question.
 Thank you in advance :-)


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3ec9e97d-f210-4a88-a269-f6306bf0266c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Shared facet filtration

2014-04-07 Thread Alex G

 

Hello,

I’m implementing a faceted interface that requires that all the facets be 
filtered by a shared filter - below is roughly how the queries currently 
look, is there a more efficient/performant way to make this kind of query? 
Less fussed about actual query verbosity but if there some way of sharing 
or referencing the repeated facet_filter other than search templates that’d 
be fantastic.

Thanks,

Alex

{
facets: {
facetOne: {
facet_filter: {
bool: {
must: [
{
term: {
foo.bar: test
}
},
{
term: {
baz:test*
}
}
]
}
},
terms: {
field: facetOne.field,
order: [count],
size: 50
}
},

facetTwo: {
facet_filter: {
bool: {
must: [
{
term: {
foo.bar: test
}
},
{
term: {
baz:test*
}
}
]
}
},
terms: {
field: facetTwo.field,
order: [count],
size: 50
}
},

facetThree: {
facet_filter: {
bool: {
must: [
{
term: {
foo.bar: test
}
},
{
term: {
baz:test*
}
}
]
}
},
terms: {
field: facetThree.field,
order: [count],
size: 50
}
}
},
size: 0
}




-- 


--

*CONFIDENTIALITY NOTICE:* The information contained in this message may be 
privileged and/or confidential. It is the property of CrowdStrike.  If you are 
not the intended recipient, or responsible for delivering this message to the 
intended recipient, any review, forwarding, dissemination, distribution or 
copying of this communication or any attachment(s) is strictly prohibited. If 
you have received this message in error, please notify the sender immediately, 
and delete it and all attachments from your computer and network.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0993b970-b60c-4e38-b42c-953394abdac1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Make a autosuggest-search searching in realtime doesn't work properly

2014-04-02 Thread Alex K

Hi there,

I have the following Request I send to ES:

{
query: {
filtered: {
query: {
bool: {
should: [
{
multi_match: {
query: socks purple,
fields: [
TITLE
],
type: phrase_prefix
}
},
{
multi_match: {
query: socks purple,
fields: [
TITLE
],
}
}
]
}
},
filter: {
and: [
{
terms: {
ACTIVE: [
1
]
}
}
]
}
}
},
size: 7
}

Now, the first multi_match gives me good results, when I input the words I 
search in the correct manner (e.g. Purple Socks).
But when I enter it in a 'wrong' way (e.g. Socks Purple) it doesn't find 
anything.
A colleague of mine said I could try using a second multi_match.
I have not much knowledge of ES, almost all of the above was already there, 
I just extended the code with the second multimatch.
But now there is the problem, that if I input socks it gives me all 
matches for socks.now when I continue to enter purple, it gives me 
not just purple socks, but everything matching purple (although I would 
expect only purple socks)
Anyone knows what the problem here is?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/074ec987-379b-4591-a5dc-0d2b482d4ec8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

multi_match and cutoff_frequency

2014-03-31 Thread Alex K

Hello there,

I am a total ES-noob, so please forgive me if my question is weird or 
something ;-)

Currently I have a task to implement the cutoff_frequency for our 
elasticsearch-queries.
The current query looks like this:

{
query:{
bool:{
should:[
{multi_match:{
query:the,
cutoff_frequency:0.001,
fields:[TITLE,SHORTDESC],
type:phrase_prefix}
}
]
}
},
filter:{
term:{
ACTIVE:1
}
},
sort:{
TITLE:{
order:asc
}
},
size:7}

This works perfectly fine, like before, BUT it seems that the 
cutoff_frequency there doesn't matter.
Is it the wrong place to put there?
Or doesn't it work with multi_match?

I have to admit that I haven't fully understand the things that 
cutoff_frequency does.
But I have lots of entries in the index for this query which has The in 
the title.
Wouldn't cutoff_frequency:0.001 mean that the word the is ignored if it 
is in 1/1000 of all the words in the titles?
(and yes, in case I did understand it the other way around, I also tried 
1.0, which would mean 1000/1000 = every word, yes? didn't make a diference 
for my query)

Sorry for my bad english, I am german.
I hope I don't confuse anyone too much...

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/94f1c37e-b01f-465a-bf55-55cf848613b3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Indexing performance with doc values (particularly with larger number of fields)

2014-03-23 Thread Alex at Ikanow

This might be more of a Lucene question, but a quick google didn't throw up 
anything.

Has anyone done/seen any benchmarking on indexing performance (overhead) 
due to using doc values?

I often index quite large JSON objects, with many fields (eg 50), I'm 
trying to get a feel for whether I can just let all of them be doc values 
on the off chance I'll want to aggregate over them, or whether I need to 
pick beforehand which fields will support aggregation.

(A related question: presumably allowing a mix of doc values fields and 
legacy fields is a bad idea, because if you use doc values fields you 
want a low max heap so that the file cache has lots of memory available, 
whereas if you use the field cache you need a large heap - is that about 
right, or am i missing something?)

Thanks for any insight!

Alex
Ikanow

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0361eda4-ab39-4536-b91a-ccb710921edd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Install issues with Kibana3 vs elasticsearch 0.19.11

2014-03-20 Thread Alex at Ikanow

Never mind, I'm an idiot, it clearly mentions it needs 0.90.x in the README 
:(

On Wednesday, March 19, 2014 12:49:46 PM UTC-4, Alex at Ikanow wrote:


 I downloaded the latest Kibana3, popped it on a tomcat instance sharing 
 space with my elasticsearch (0.19.11) instance and tried to connect (both: 
 using an ssh tunnel to connect localhost:9200 back to the server, and 
 opening port 9200 in the firewall)

 In both cases, the browser makes a call to _nodes (eg returns 
 {ok:true,cluster_name:infinite-dev,nodes:{Yup-Cmn0QwCrkYI6l7SdRw:{name:Firefrost,transport_address:inet[/
 10.113.42.186:9300],hostname:ip-10-113-42-186,http_address:inet[/
 10.113.42.186:9200]}}})

 and then returns the following error:

 TypeError: Cannot call method 'split' of undefined at 
 http://SERVER/kibana-3.0.0/app/app.js:22:11260http://dev.ikanow.com/kibana-3.0.0/app/app.jsat
  he (
 http:// 
 http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:7:20041)
  
 at Function.Yb (http:// http://dev.ikanow.com/kibana-3.0.0/app/app.js
 SERVER 
 http://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:7:7025)
  
 at http:// 
 http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:22:11204
  
 at i (http:// 
 http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:458)
  
 at i (http:// 
 http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:458)
  
 at http:// 
 http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:1014
  
 at Object.f.$eval (http:// http://dev.ikanow.com/kibana-3.0.0/app/app.js
 SERVER 
 http://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:6963)
  
 at Object.f.$digest (http://http://dev.ikanow.com/kibana-3.0.0/app/app.js
 SERVER 
 http://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:5755)
  
 at Object.f.$apply (http://http://dev.ikanow.com/kibana-3.0.0/app/app.js
 SERVER http://dev.ikanow.com/kibana-3.0.0/app/app.js
 /kibana-3.0.0/app/app.js:9:7111) 

 I don't see any other calls back to elasticseach

 I couldn't find a statement anywhere of which versions Kibana3 is 
 compatible with - does it just need a later version (anyone know the 
 earliest with which it is compatible, out of curiosity; though I'm planning 
 to move to 1.0 anyway soon), or am I doing something wrong

 Thanks for any insight/help anyone can provide!

 Alex






-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c6a2f998-89a9-4794-bdf8-15d1dcd26aae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Install issues with Kibana3 vs elasticsearch 0.19.11

2014-03-19 Thread Alex at Ikanow


I downloaded the latest Kibana3, popped it on a tomcat instance sharing 
space with my elasticsearch (0.19.11) instance and tried to connect (both: 
using an ssh tunnel to connect localhost:9200 back to the server, and 
opening port 9200 in the firewall)

In both cases, the browser makes a call to _nodes (eg returns 
{ok:true,cluster_name:infinite-dev,nodes:{Yup-Cmn0QwCrkYI6l7SdRw:{name:Firefrost,transport_address:inet[/10.113.42.186:9300],hostname:ip-10-113-42-186,http_address:inet[/10.113.42.186:9200]}}})

and then returns the following error:

TypeError: Cannot call method 'split' of undefined at 
http://SERVER/kibana-3.0.0/app/app.js:22:11260http://dev.ikanow.com/kibana-3.0.0/app/app.jsat
 he (
http:// 
http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:7:20041)
 
at Function.Yb (http:// http://dev.ikanow.com/kibana-3.0.0/app/app.js
SERVER 
http://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:7:7025) 
at http:// 
http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:22:11204
 
at i (http:// 
http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:458)
 
at i (http:// 
http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:458)
 
at http:// 
http://dev.ikanow.com/kibana-3.0.0/app/app.jsSERVERhttp://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:1014
 
at Object.f.$eval (http:// http://dev.ikanow.com/kibana-3.0.0/app/app.js
SERVER 
http://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:6963) 
at Object.f.$digest (http:// http://dev.ikanow.com/kibana-3.0.0/app/app.js
SERVER 
http://dev.ikanow.com/kibana-3.0.0/app/app.js/kibana-3.0.0/app/app.js:9:5755) 
at Object.f.$apply (http:// http://dev.ikanow.com/kibana-3.0.0/app/app.js
SERVER http://dev.ikanow.com/kibana-3.0.0/app/app.js
/kibana-3.0.0/app/app.js:9:7111) 

I don't see any other calls back to elasticseach

I couldn't find a statement anywhere of which versions Kibana3 is 
compatible with - does it just need a later version (anyone know the 
earliest with which it is compatible, out of curiosity; though I'm planning 
to move to 1.0 anyway soon), or am I doing something wrong

Thanks for any insight/help anyone can provide!

Alex




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ed77f1e7-547d-4c87-b23f-ee97dece9533%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: EsRejectedExecutionException when searching date based indices.

2014-02-26 Thread Alex Clark

That is correct, I was mixing the terms nodes and shards (sorry about
that). I'm running the test on a single node (machine). I've chosen 20
shards so we could eventually go to a 20 server cluster without
re-indexing. It's unlikely we'll ever need to go that high but we never
know and given we receive 750 million messages a day, the thought of
reindexing after collecting a years worth of data makes me nervous. If I
can over shard and avoid a massive reindex then I'll be a happy guy.

I thought about reducing the 20 shards but even if I go to say 5 shards on
5 machines (1 shard per machine?) then I'll still run into the issue if a
user searches several years back. Any other thoughts on a possible
solution? Would increasing the queue size be a good option. Is there a
down side (performance hit, running out of resources, etc)?

Thanks again!

On Tuesday, February 25, 2014 11:32:26 PM UTC-8, David Pilato wrote:

You are mixing nodes and shards, right?
How many elasticsearch nodes do you have to manage your 7300 shards?
Why did you set 20 shards per index?

You can increase the queue size in elasticsearch.yml but I'm not sure it's
the right thing to do here.

My 2 cents

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 26 févr. 2014 à 01:36, Alex Clark al...@bitstew.com javascript: a
écrit :

Hello all, I’m getting failed nodes when running searches and I’m hoping
someone can point me in the right direction. I have indices created per
day to store messages. The pattern is pretty straight forward: the index
for January 1 is messages_20140101, for January 2 is messages_20140102
and so on. Each index is created against a template that specifies 20
shards. A full year will give 365 indices * 20 shards = 7300 nodes. I have
recently upgraded to ES 1.0.

When I search for all messages in a year (either using an alias or
specifying “messages_2013*”), I get many failed nodes. The reason given
is: “EsRejectedExecutionException[rejected execution (queue capacity
1000) on
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4@651b8924javascript:]”).

The more often I search, the fewer failed nodes I get (probably caching in
ES) but I can’t get down to 0 failed nodes. I’m using ES for analytics so
the document counts coming back have to be accurate. The aggregate counts
will change depending on the number of node failures. We use the Java API
to create a local node to index and search the documents. However, we also
see the issue if we use the URL search API on port 9200.

If I restrict the search for 30 days then I do not see any failures (it’s
under 1000 nodes so as expected). However, it is a pretty common use case
for our customers to search messages spanning an entire year. Any
suggestions on how I can prevent these failures?

Thank you for your help!

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com javascript:.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9bf6d3bb-34e5-44c4-8d76-24f868d283a0%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/954f7266-6587-4509-8159-aae5897dc2b6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

EsRejectedExecutionException when searching date based indices.

2014-02-25 Thread Alex Clark

When I search for all messages in a year (either using an alias or
specifying “messages_2013*”), I get many failed nodes. The reason given
is: “EsRejectedExecutionException[rejected execution (queue capacity 1000)
on
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4@651b8924]”).

Thank you for your help!

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9bf6d3bb-34e5-44c4-8d76-24f868d283a0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Elasticsearch Maven plugin on GitHub

2014-01-17 Thread Alex Cojocaru

You are very welcome, David.

I believe the project is pretty much complete, for it also contains tests
which exercise the mojos.
As mentioned already, it depends on an ES version which is already old. I
will try to keep it up to date, but contributions of any sort are more than
welcome.

alex

On Fri, Jan 17, 2014 at 2:54 AM, David Pilato da...@pilato.fr wrote:

Hey Alex

That's great! I started a project like this some months ago but did not
find enough time to finish it.
Thanks for sharing it!

--
*David Pilato* | *Technical Advocate* | *Elasticsearch.com*
@dadoonet https://twitter.com/dadoonet |
@elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 17 janvier 2014 at 01:44:26, AlexC
(acojoc...@pingidentity.com//acojoc...@pingidentity.com)
a écrit:

If anyone is interested in using a Maven plugin to run Elasticsearch for
integration testing, I just published one on GitHub:
https://github.com/alexcojocaru/elasticsearch-maven-plugin.

It is an alternative to starting a node through the code.

The readme should provide enough information, but let me know if something
is missing or not clear enough.
It uses ES v0.90.7, but it can be easily updated to the latest ES version
by changing the dependency version in the pom.xml file.

alex

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/91db7e6a-9bab-4cde-b52f-7f4d660c1248%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups elasticsearch group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/0I2TGylTRHc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.52d8e1b1.6763845e.dc5%40MacBook-Air-de-David.local
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHUBgW-BJbavcMR0Tqn0djA%2BygtXCNsbZ5c8U%3DphGEHiqdt84g%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

78 matches

Mail list logo