Re: implementing a plugin to process the whole input document

2014-05-23 Thread joergpra...@gmail.com
In answer to (1), in each custom mapper, you have access to ParseContext in the method public void parse(ParseContext context) throws IOException In the ParseContext, you can access _source with the source() method to do whatever you want, e.g. copy it, parse it, index it again etc. (2) is a sli

Re: implementing a plugin to process the whole input document

2014-05-23 Thread joergpra...@gmail.com
Do you plan to implement SPARQL endpoint on Elasticsearch? That would be one wonderful asset missing in my portfolio for supporting library catalog indexing and search, all I do with RDF and Elasticsearch is based on JSON-LD. Jörg On Fri, May 23, 2014 at 8:17 PM, Jakub Kotowski wrote: > Great,

Re: ElasticSearch HTTP Server on Android ?

2014-05-25 Thread joergpra...@gmail.com
On Android, vanilla Elasticsearch does not compile/run out of the box, but, if you can live with a modified version (e.g. less JVM monitoring) there are chances to get an Android version adapted, because Google Android SDK supports Java 7 features. Note, Dalvik does not support invokedynamic so scr

Re: Question about time based indexes/rolling indexes and eviction policies?

2014-05-26 Thread joergpra...@gmail.com
1. I will add a timeseries mode to my JDBC plugin soon. Right now you can create timestamps with bash (or your favorite shell) and append it as a suffix to the index name into the river/feeder creation call, but this can be automated. No ETA yet. 2. This is also a nifty feature, I will experiment

Re: prefix query with multiple "prefixes"/words

2014-05-26 Thread joergpra...@gmail.com
See https://gist.github.com/jprante/bdf9a9755a64bc23afbe Jörg On Mon, May 26, 2014 at 1:21 PM, Felix Schwarz wrote: > > I'm looking for a query so that "foo baz" matches the following documents: > "foobar baz" > "foo baz" > "baz foo" > "foo bazilion" > > Basically I'd like to break down the se

Re: Update Mapping for JDBC river freezes till next request is received

2014-05-27 Thread joergpra...@gmail.com
You should upgrade ES, there were bugs fixed regarding cluster update service and rivers. Jörg On Tue, May 27, 2014 at 6:44 PM, André Morais wrote: > Hello, > > I am using the JDBC river plugin (latest version with the name " > elasticsearch-river-jdbc-2.2.1.jar" on ES 0.90.5) and recently fou

Re: implementing a plugin to process the whole input document

2014-05-27 Thread joergpra...@gmail.com
Yes, it is (not only) relevant to library catalog indexing, because Bibframe, a new project by Library of Congress, is built on RDF, and next-generation library systems will embrace W3C semantic web technologies. The RDF data I generate is indexed in JSON-LD format into Elasticsearch but for SPARQ

Re: Sequence Numbers for Replica Recovery

2014-05-27 Thread joergpra...@gmail.com
I'm not sure if this is related but there is work on designing sequence numbers that are decentralized time based UUIDs. If they were assigned to Lucene segments, shards could declare what segments they already have, when a recovery process runs. Feature is planned for 1.3 https://github.com/elast

Re: looking for heavy write optimization

2014-05-27 Thread joergpra...@gmail.com
For maximum write performance, you should - use fastest disk subsystem (SSD) - use RAID 0 with expensive controller to max out IO bandwidth - do not run more than one ES instance per server - do not use virtual servers, use physical servers - for ES data folder, disable acess time flag (noatime),

Re: How to retrieve just certain amount of docs from a larger query?

2014-05-28 Thread joergpra...@gmail.com
Look into the scan/scroll query http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html It works like a cursor that iterates through all docs of a query result Jörg On Wed, May 28, 2014 at 1:42 PM, Tom wrote: > Hi, > > i need to fire a query against la

Re: Elasticsearch and Smile encoded JSON

2014-05-29 Thread joergpra...@gmail.com
1. No (the cluster state of ES - not part of Lucene - is saved to disk in SMILE format) 2. No. 3. Yes, you can use SMILE on XContentBuilder classes. The result can transported to the cluster, the decoding of SMILE is done transparently. Because the transport is LZF compressed by default, you sh

Re: Elasticsearch and Smile encoded JSON

2014-05-30 Thread joergpra...@gmail.com
SMILE and/or CBOR, the communication/storage won’t be compressed > using LZF? > > - Drew > > > On May 29, 2014, at 2:52 PM, joergpra...@gmail.com wrote: > > 1. No (the cluster state of ES - not part of Lucene - is saved to disk in > SMILE format) > > 2. No. > > 3.

Re: IDF per customer, many customers per index - best practices

2014-05-30 Thread joergpra...@gmail.com
IDF is calculated per shard, and only in DFS search types, it is calculated over all nodes in an initial scatter phase. http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_search_options.html#_literal_search_type_literal If you are concerned about IDF in a single multi-user index p

Re: Failing unit tests on a fresh fork

2014-05-30 Thread joergpra...@gmail.com
When forking the master, things are expected to be quite volatile, and you should allow the ES core team a few hours or even days to let the dust settle down. Don't worry too much if things are temporarily broken on master, for stable builds, there are tagged releases... Jörg On Fri, May 30, 201

Re: Improving a slow running Match_All Query

2014-05-30 Thread joergpra...@gmail.com
Is "match_all" always running at that time or is it getting faster after a first run? Did you run an optimize with maximum number of segments? What is your segment count? Jörg On Fri, May 30, 2014 at 9:20 PM, wrote: > *Bump* > > > On Wednesday, May 28, 2014 4:10:26 PM UTC-7, sai...@roblox.com

Re: Elasticsearch 1.20 and 1.1.2

2014-05-31 Thread joergpra...@gmail.com
Just look into org.elasticsearch.rest.BytesRestResponse, it supersedes XContentRestResponse Jörg On Sat, May 31, 2014 at 12:28 AM, Ben McCann wrote: > Jörg thanks for the heads up about XContentRestResponse going away. I've > run into that as an issue with a river I help maintain. Do you know

Re: ES 1.1.1 - Plugins _site not found

2014-05-31 Thread joergpra...@gmail.com
Each time you start a node, may it be a (transport) client node or a server node, all plugins are checked/loaded at initialization. Each plugin, also jvm plugins on the classpath, is by default examined if a directory named "_site" can be accessed. The purpose is to classify a plugin as site plugi

Re: Elasticsearch 1.20 and 1.1.2

2014-05-31 Thread joergpra...@gmail.com
ons for > replacing XContentThrowableRestResponse and RestXContentBuilder? > > Thanks, > Ben > > > > On Sat, May 31, 2014 at 2:35 AM, joergpra...@gmail.com < > joergpra...@gmail.com> wrote: > >> Just look into org.elasticsearch.rest.BytesRestResponse, it

Re: RFC 6902 requires variant type mapping

2014-06-02 Thread joergpra...@gmail.com
You'd have to use a plugin for such kind of operations, because vanilla ES does not support RFC 6902 I'm also interested in supporting HTTP PATCH by Elasticsearch, because this is a must have for modifying resources due to the rules of Linked Data Platform (LDP) http://www.w3.org/TR/2014/WD-ldp-20

Re: Configuring cross-cloud cluster via REST API

2014-06-02 Thread joergpra...@gmail.com
You have to restart the whole cluster. Switching discovery while running a cluster is not possible. Jörg On Mon, Jun 2, 2014 at 12:49 PM, Martin Harris < martin.har...@cloudsoftcorp.com> wrote: > Hi Folks, > > I'm trying to setup a cross-cloud elastic-search cluster. As it's > cross-cloud, the

[ANN] Elasticsearch Simple Action Plugin

2014-06-03 Thread joergpra...@gmail.com
Hi, many of us want to start writing extensions for Elasticsearch. Except submitting pull requests to the core code, one great advantage of Elasticsearch is the plugin mechanism. Here, custom code can be hooked into Elasticsearch, without having to ask for inclusion into the core code. Neverthele

Re: Migration from Solr to ElasticSearch

2014-06-03 Thread joergpra...@gmail.com
If you have indexed the data in Solr, you should consider a tool that can traverse the Lucene index and reconstruct the documents. This is not a straightforward process, as you know already, because analyzed fields look different than the original input. The reconstruction may not recover the orig

Re: Migration from Solr to ElasticSearch

2014-06-03 Thread joergpra...@gmail.com
If you can iterate over the Solr index doc ids and fetch the source docs from a secondary storage, you should consider doing this first. This is the most straightforward method for reindexing. Otherwise, if you can not access the filesystem storage for the docs (for whatever reason), the idea woul

Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-03 Thread joergpra...@gmail.com
Usually, plugins that extend internal ES functionality should be installed on all nodes. This is easy to remember and preferable from an administrative view. All the nodes in the ES cluster must have access to plugin code under all circumstances, especially when executing actions, mappers, routers,

Re: What's using memory in ElasticSearch? (Details to follow...)

2014-06-03 Thread joergpra...@gmail.com
What ES version is this? Your segment count is very high (>1000) which is not efficient. Maybe index.codec.bloom.load: false can help reducing heap mem usage. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-codec.html Jörg -- You received this message becau

Re: All primary shards are in same node. Why? Version 1.1.1

2014-06-03 Thread joergpra...@gmail.com
Primary shards are addressed first when writing, but it is a myth they do all the writing. Secondary shards do the writing too, but only some milli seconds later. There is nothing to worry about. Jörg On Tue, Jun 3, 2014 at 9:49 PM, Santiago Ferrer Deheza < sa.ferrer.deh...@gmail.com> wrote: >

Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-03 Thread joergpra...@gmail.com
Not sure if I understand your concern completely - as long as you're doing things right in your code, it should be possible to allocate resources only when required - this holds also for plugins. Jörg On Tue, Jun 3, 2014 at 11:48 PM, virgil wrote: > Thank you Jörg. I see the point. But if the

Re: Best cluster environment for search

2014-06-03 Thread joergpra...@gmail.com
Can you show your test code? You seem to look at the wrong settings - by adjusting node number, shard number, replica number alone, you can not find out the maximum node performance. E.g. concurrency settings, index optimizations, query optimizations, thread pooling, and most of all, fast disk sub

Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-04 Thread joergpra...@gmail.com
You need resources on all nodes that hold shards, you can not do it with just one instance, because ES index is distributed. Rescoring would be very expensive if you did it on an extra central instance with an extra scatter/gather phase. It is also very expensive in scripting. A better method is a

Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-04 Thread joergpra...@gmail.com
Sorry, the plugin is outdated, a better start is by looking at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-similarity.html Jörg On Wed, Jun 4, 2014 at 10:07 AM, joergpra...@gmail.com < joergpra...@gmail.com> wrote: > You need resources on all n

Re: iptablex trojan experiences?

2014-06-04 Thread joergpra...@gmail.com
One very essential feature, from the very beginning, is that Elasticsearch instances, when started, automatically form a cluster over the network. This is only possible in an open network environment and by having multicast enabled. Are you aware, that by talking about "safe" configuration option

Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-04 Thread joergpra...@gmail.com
re is little documentation about the internals and there are > no code level comments. I always meant to experiment with the different > action hierarchies via simple plugins and document my findings. Perhaps one > day... > > Cheers, > > Ivan > > > On Wed, Jun 4, 2014

Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-04 Thread joergpra...@gmail.com
As said, it is true that scoring scripts (like the function score scripts o the AbstractSearchScript) need to reside on data nodes. Accessing fields is a low level operation in a script so it is not possible to install such a boost plugin that uses scripting on a data-less node. You would have to i

Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-04 Thread joergpra...@gmail.com
veloper & Consultant > Author of RavenDB in Action <http://manning.com/synhershko/> > > > On Tue, Jun 3, 2014 at 6:15 PM, joergpra...@gmail.com < > joergpra...@gmail.com> wrote: > >> Hi, >> >> many of us want to start writing extensions for Elasticse

Re: Best cluster environment for search

2014-06-04 Thread joergpra...@gmail.com
Why do you use terms on _id field and not the the ids filter? ids filter is more efficient since it reuses the _uid field which is cached by default. Do the terms in the query vary from query to query? If so, caching might kill your heap. Another possible issue is that your query is not distribut

Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-05 Thread joergpra...@gmail.com
One more hint, you see org.elasticsearch.common.lucene.search.function.FieldValueFunction This implements the ScoreFunction and fetches boost values from a configured field in the doc, for use by the Java API for FunctionScoreQuery. If you can write a custom ScoreFunction, you could implement an

Re: Inter-document Queries

2014-06-05 Thread joergpra...@gmail.com
A suggestion for the path model: - index also the path depth, and name the fields with the depth level - execute a nested aggregation query over the path depth levels Example doc with path info: { "path0" : "promo/A", "path1" : "sale/B" ... } In this doc you know the user went from "pr

Re: Java Client - Error Handling

2014-06-05 Thread joergpra...@gmail.com
Do you use TransportClient or NodeClient? On NodeClient, you are tied to the cluster, as the node is being a part of it, on TransportClient, you can count the connected nodes. The discovery mechanism behind the scenes sends "ping" actions each few seconds for you. If an action fails, you will see

Re: Best cluster environment for search

2014-06-05 Thread joergpra...@gmail.com
Ah, that is a simple resolution, thanks for highlighting it. Jörg On Thu, Jun 5, 2014 at 2:38 PM, Marcelo Paes Rech wrote: > Hi Jörg. Thanks for your reply again. > > As I said, I already had used ids filter, but I got the same behaviour. > > I realized what was wrong. Maybe it could be a bug

Re: Java Client - Error Handling

2014-06-05 Thread joergpra...@gmail.com
Check the Elasticsearch test code. There, you can see how Java API works. For example GetIndexTemplatesResponse response = client().admin().indices().prepareGetTemplates().get(); You can get an empty response if template does not exist, or the execution throws an exception, when something went w

Re: Shard count and plugin questions

2014-06-05 Thread joergpra...@gmail.com
The knapsack plugin does not come with a downtime. You can increase shards on the fly by copying an index over to another index (even on another cluster). The index should be write disabled during copy though. Increasing replica level is a very simple command, no index copy required. It seems you

Re: Shard count and plugin questions

2014-06-05 Thread joergpra...@gmail.com
table. > This is by design, so we don't get document version collisions between > data centers. What are some current mechanisms in use in production > environments to replicate indexes across regions? I just can't seem to > find any. Rivers was my initial thinking so regions ca

Re: A plugin to change the result set before sending it back to the http client

2014-06-05 Thread joergpra...@gmail.com
Just a quick question, do you just want to extract a field from the json source? There are field filters and parameters for shaping such a JSON result, maybe they can already help? Or can you give an example of the problem? Jörg On Thu, Jun 5, 2014 at 7:45 PM, Mario Mueller wrote: > Hey fol

Re: A plugin to change the result set before sending it back to the http client

2014-06-05 Thread joergpra...@gmail.com
Ah, now I get it. Source without metadata, listed in the order of hits. This should be easy to do in a plugin. REST filter is the way to go for PHP. Just a minute... Jörg On Thu, Jun 5, 2014 at 9:49 PM, Ivan Brusic wrote: > There is no way to eliminate returning the search metadata. It has be

Re: A plugin to change the result set before sending it back to the http client

2014-06-05 Thread joergpra...@gmail.com
stener that takes a SearchResponse and creates a > simplified version with no metadata. > > Should be an interesting quick plugin, but it looks like Jorg is going to > beat me to it (I'm still at work for several more hours). > > -- > Ivan > > > On Thu, Jun

Re: Shard count and plugin questions

2014-06-05 Thread joergpra...@gmail.com
an create more replicas. Our systems > is about 50/50 now, some users are even read/write, others are very read > heavy. I'll probably come up with 2 indexing strategies we can apply to an > application's index based on the heuristics from the operations they're > perform

Re: A plugin to change the result set before sending it back to the http client

2014-06-05 Thread joergpra...@gmail.com
, but > I noticed that you provided your own parseSearchRequest, but still > call RestSearchAction.parseSearchRequest from inside handleRequest. Did I > misinterpret the code or is that a mistake? > > -- > Ivan > > > On Thu, Jun 5, 2014 at 2:37 PM, joergpra...@gmail.com < >

Re: Could a custom Aggregator be used for general purpose Map/Reduce or bulk update?

2014-06-05 Thread joergpra...@gmail.com
I try to answer some of the queries though I must admit, I am not too much familiar with the aggregation source code yet (still exploring). Aggregations work like a search, they are "embedded" into the search actions, and work over the result set of a search. They run in each shard, just like the

Re: If I set index.number_of_replica:1, then the minimum number of nodes should be 3 to assure that the status of the cluster is gree?

2014-06-06 Thread joergpra...@gmail.com
1. No. Did you change the configuration? You have two data nodes connected? 2. You do not need to be concerned where primary shards are allocated, secondary shards play the same role (except primaries receive writes first a few milliseconds earlier than secondaries). Elasticsearch randomly allocat

Re: A plugin to change the result set before sending it back to the http client

2014-06-06 Thread joergpra...@gmail.com
I drink Kölsch only :) ävver et hätt noh immer joot jejange Greetings from Cologne! Jörg On Fri, Jun 6, 2014 at 7:14 AM, Mario Mueller wrote: > You guys are totally awesome! Thanks a lot! If you ever visit Duesseldorf > drop me a line, I owe you a beer. > > @Brian: > Interesting approach, but

Re: Correct way to use TransportClient connection object

2014-06-06 Thread joergpra...@gmail.com
Closing the transport client may not be enough. Try this: - wait for all outstanding actions (all actions send responses asynchronously) - then shut down client.threadpool() (perhaps with shutdownNow() or shutdown()), this effectively disables new actions form being started - then close the trans

Re: Analyzing queries in the client side of Elasticsearch but not on the server

2014-06-06 Thread joergpra...@gmail.com
Please ask your question here. Thanks. Jörg On Fri, Jun 6, 2014 at 9:28 AM, ohw wrote: > Hi folks > > I just asked a question in StackOverflow, please have a look if you have > encountered similar problem or have some input to it. > > Thanks in advance! > > -- > You received this message becau

Re: Analyzing queries in the client side of Elasticsearch but not on the server

2014-06-06 Thread joergpra...@gmail.com
The Query DSL is not equivalent to Lucene Query but close to, with enhancements. If you want to make use of Lucene Query, and you already decided to write a plugin for scoring, so why don't you just add your query parsers to the plugin? Jörg On Fri, Jun 6, 2014 at 9:39 AM, ohw wrote: > Sure,

Re: Analyzing queries in the client side of Elasticsearch but not on the server

2014-06-06 Thread joergpra...@gmail.com
he query parsers into > elasticsearch, would you please elaborate more on this? > > > On Fri, Jun 6, 2014 at 4:53 PM, joergpra...@gmail.com < > joergpra...@gmail.com> wrote: > >> The Query DSL is not equivalent to Lucene Query but close to, with >> enhancements. &

Re: If I set index.number_of_replica:1, then the minimum number of nodes should be 3 to assure that the status of the cluster is gree?

2014-06-06 Thread joergpra...@gmail.com
I index >> data, the state of the cluster is green. I have no idea why this >> happened..Is there something I ignore? >> >> I want to know how ES allocates nodes. Is there some reference? I googled >> but couldn't find it. >> >> Thank you :D >>

Re: Get by _id doesn't work but search does.

2014-06-06 Thread joergpra...@gmail.com
Look here for the tool and how to use it http://www.elasticsearch.org/blog/tool-help-routing-issues-elasticsearch-1-2-0/ Jörg On Fri, Jun 6, 2014 at 11:24 AM, Luke Wilson-Mawer < lukewilsonma...@gmail.com> wrote: > Great, thanks Adrien. I will eagerly await the tool. > > Kind regards, > > Luke

Re: What's using memory in ElasticSearch? (Details to follow...)

2014-06-06 Thread joergpra...@gmail.com
No, the settings will not merge existing segments unless you call _optimize action via API. And take some patience. Thousands of segments take time - also, they need quite few memory resources to merge... I suggest backup your data first, to stay safe if the merging fails / aborts... Jörg On T

Re: Max doc size for indexing over HTTP

2014-06-06 Thread joergpra...@gmail.com
1gb is a very large document and it is unusual to index such sizes. There is a limit check against the heap. In order to be able to process such length, you need a large heap alone to store the document source. Depending on analyzer, heap demand increases even more. You can index documents of arb

Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-06 Thread joergpra...@gmail.com
I mean, you can add a MyOwnFunctionBuilder/MyOwnFunctionParser to Elasticsearch via plugin. See package org.elasticsearch.index.query.functionscore for the standard implementations. The functionscore code is masterpiece quality - no need to modify existing code! It is pluggable. A close example t

Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-06 Thread joergpra...@gmail.com
For an example function score plugin implementation, see https://github.com/elasticsearch/elasticsearch/blob/master/src/test/java/org/elasticsearch/search/functionscore/FunctionScorePluginTests.java Jörg On Fri, Jun 6, 2014 at 7:10 PM, joergpra...@gmail.com wrote: > I mean, you can ad

Re: [ANN] Elasticsearch Simple Action Plugin

2014-06-07 Thread joergpra...@gmail.com
I have implemented a function score based conditional boost plugin for demonstration. Very useful for faking relevance scoring, in dependency of document field values which were originally not meant to contribute for boosting. A list of boost values can be specified in dependency of indexed value

Re: What's the difference between bind_host and publish_host in ElasticSearch?

2014-06-07 Thread joergpra...@gmail.com
"bind_host" is the host that an Elasticsearch node uses in the socket bind call when starting the network. Due to socket programming model, you can "bind" to an address. By referencing an "address", the socket allows access to one or all underlying network devices. There are several addresses with

Re: What's using memory in ElasticSearch? (Details to follow...)

2014-06-07 Thread joergpra...@gmail.com
Maybe the segment count is just counting new segments as they are created... can you look into the data folders to examine if the segment file count is still high? And can you verify if the settings are really active... not sure what's going on without seeing details. The _optimize call takes a p

Re: compresstion in ES 1.2.1

2014-06-08 Thread joergpra...@gmail.com
Compression is always enabled by default. Jörg On Sun, Jun 8, 2014 at 6:01 PM, sri <1.fr@gmail.com> wrote: > Hello everyone, > > I have read posts and blogs on how elasticsearch compression can be > enabled in the previous versions(0.17 - 0.19). > > I am currently using ES 1.2.1, i wasn't a

Re: compresstion in ES 1.2.1

2014-06-08 Thread joergpra...@gmail.com
The Elasticsearch file size does not only contain compressed fields, but much more. For example, term vectors, norms, etc. You would have to disable field attributes you do not want. Also note, Elasticsearch has replica enabled by default, and segment count is not optimized automatically. Jörg O

Re: compresstion in ES 1.2.1

2014-06-08 Thread joergpra...@gmail.com
Lucene uses LZ4 compression http://blog.jpountz.net/post/35667727458/stored-fields-compression-in-lucene-4-1 so you should not run ES on a ZFS file system with compression enabled. Jörg On Sun, Jun 8, 2014 at 8:47 PM, Patrick Proniewski wrote: > Hello, > > I don't know how it's compressed b

Re: compresstion in ES 1.2.1

2014-06-08 Thread joergpra...@gmail.com
Try this index template for new index creations curl -XPUT 'localhost:9200/_template/template1' -d ' { "template" : "*", "mappings" : { "_default_" : { "_source" : { "enabled" : false }, "_all" : { "enabled" : false} } } } ' See also http://www

Re: JDBC river: trouble getting analyzer in type mapping to be applied

2014-06-09 Thread joergpra...@gmail.com
There is a bug in the JDBC river introduced recently that prevents it from using type_mapping parameter if there is no index_settings parameter defined. It will be fixed asap A work around might be adding an empty settings parameter like "index_settings" : {} Jörg On Mon, Jun 9, 2014 at 1:00

Re: Cannot Increase Write TPS in Elasticsearch by adding more nodes

2014-06-09 Thread joergpra...@gmail.com
There are many reasons that may cause this, just to name a few - benchmarking tool setup ( do they show correct numbers?) - network bandwidth limits - cluster setup (e.g. complex mapping, high latency between nodes) - pattern of the data input - method of data input (bulk vs. index, HTTP vs. Java

Re: Cannot Increase Write TPS in Elasticsearch by adding more nodes

2014-06-09 Thread joergpra...@gmail.com
How do you try to figure out you're hitting limits? I have not enough information to help. Marvel, Elastic HQ, etc. are all very useful tools but should be combined with OS-related monitoring to get an overall picture. Jörg On Mon, Jun 9, 2014 at 9:31 PM, pranav amin wrote: > Thanks Jorg for

Re: Exposing elastic search query APIs at a public endpoint

2014-06-10 Thread joergpra...@gmail.com
It depend on your requirements and your product strategy - both is possible with pros and cons: - are your users proficient in a report language? Do they already write report specs in a "standard" report language? Do you want to support this report language standard? Do you like to share report st

Re: elasticsearch Java API for function_score query

2014-06-10 Thread joergpra...@gmail.com
Try this import org.elasticsearch.action.search.SearchRequest; import org.elasticsearch.index.query.functionscore.FunctionScoreQueryBuilder; import java.util.Arrays; import static org.elasticsearch.client.Requests.searchRequest; import static org.elasticsearch.index.query.FilterBuilders.termsFil

Re: Is ES es.index.store.type=memory equivalent to Lucene's RAMDirectory?

2014-06-10 Thread joergpra...@gmail.com
Yes, it is equivalent. MMapDirectory is already using as much memory as possible, for reading data. RamDirectory store is when you want to push all data onto the heap, typically for volatile unit tests. For large index, it puts only burden on the heap and your performance will suffer from GC. Jö

Re: Cannot Increase Write TPS in Elasticsearch by adding more nodes

2014-06-10 Thread joergpra...@gmail.com
On bare metal I can process sustained 10-12 MB/sec on a single node. Maybe you can measure throughput in bytes per second, this is easier to compare. Jörg On Tue, Jun 10, 2014 at 6:19 PM, pranav amin wrote: > Thanks Mark. > > We are using Java version - 1.7.0_25 > > What is your document size

Re: Problem when indexing data

2014-06-10 Thread joergpra...@gmail.com
Just install Cygwin https://www.cygwin.com/ and leave Windows crappy console behind. Jörg On Tue, Jun 10, 2014 at 9:31 PM, Aaliyah wrote: > > > I've already read a lot about installing and setting of elasticsearch. > Most of them are for non-windosw OS. However, it seems the principle is > sor

Re: Creating a browse interface from ES

2014-06-11 Thread joergpra...@gmail.com
Welcome to the show :) I also build library catalog on Elasticsearch professionally. Some time ago I wrote a Perl Dancer starter app just to show how very basic features like a hit list and facets are look like. https://github.com/jprante/Elasticsearch-Dancer-App The browsing UI you mean is a t

Re: Urgent

2014-06-11 Thread joergpra...@gmail.com
Have you tried the "schedule" setting in JDBC river plugin? https://github.com/jprante/elasticsearch-river-jdbc#time-scheduled-execution-of-jdbc-river You can also try the feeder mode of the JDBC plugin, combined with cronjob from your crontab. Best, Jörg 2014-06-11 11:27 GMT+02:00 Sekrafi Is

Re: Performance as a sql result cache

2014-06-11 Thread joergpra...@gmail.com
You should run your search query more than just once. The first time executed, ES will load the Lucene index fields, and ramp up internal resources, which adds some overhead. Subsequent queries will be faster (around 1ms on my MacBook Pro with SSD but SSD is not important, it is the filesystem cach

Re: Slow search perfomance when using mmap versus memory.

2014-06-11 Thread joergpra...@gmail.com
Can you share your setup configuration, and an example document and a query? So it is possible to recreate your situation? Also interesting would be OS version, ES version, Java JVM version. Thanks, Jörg On Wed, Jun 11, 2014 at 6:44 PM, MikeP wrote: > Our servers have 130 GB of RAM and we ar

Re: Slow search perfomance when using mmap versus memory.

2014-06-11 Thread joergpra...@gmail.com
started). Index store "memory" is not faster. Jörg On Wed, Jun 11, 2014 at 11:09 PM, joergpra...@gmail.com < joergpra...@gmail.com> wrote: > Can you share your setup configuration, and an example document and a > query? So it is possible to recreate your situation? > >

Re: Query Result Caching in Elasticsearch similar to SOLR

2014-06-11 Thread joergpra...@gmail.com
In Elasticsearch you use filters in queries where the results are cached. More info: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-cache.html Jörg On Wed, Jun 11, 2014 at 10:00 PM, wrote: > Is there a way to mimic the Query Result Caching >

Re: Slow search perfomance when using mmap versus memory.

2014-06-12 Thread joergpra...@gmail.com
You should use a boolean query and wrap it into a constant core query. Constant score query is important, otherwise each clause will lead to score calculation which has a significant impact on the overall search response time. There is also a notable difference of performance on AWS between "memor

Re: Elastic Search and consistency

2014-06-12 Thread joergpra...@gmail.com
I think the documentation is quite clear, but I try to explain in my own words. 1.1 Not sure what you mean "after the quorum check". Write consistency is a model where ES makes sure there are enough recipients (nodes) before writes are executed. consistency=quorum fails if you have too few nodes t

Re: Securing Data in Elasticsearch

2014-06-12 Thread joergpra...@gmail.com
There are a lot of methods to tamper with ES files, and physically, everything is possible to modify in files as long as your operating system permits more than something like "append-only" mode for ES files (not that I know this would work) So it depends on your requirements about the security le

Re: Sorting on timestamps from multiple fields

2014-06-12 Thread joergpra...@gmail.com
If you have two (or more) date fields to sort on, look at "copy_to" mapping feature to copy them over to a third field e.g. "sort_date". So you have a single field you can happily to sort on, without having to change fields in the source. Same method works for tag/category fields in different inde

Re: ES 1.2.1 sort by _timestamp

2014-06-12 Thread joergpra...@gmail.com
Do you set timestamp value from you client or do you let ES fill them for you? Do you run more than one node? Are the clocks on your nodes running synchronously? Jörg On Thu, Jun 12, 2014 at 2:13 PM, Stefan Eberl wrote: > Hey all, > > I have a question regarding sorting by _timestamp. > > The

Re: Securing Data in Elasticsearch

2014-06-12 Thread joergpra...@gmail.com
If you want ES-level security, you should first reduce attack vectors, by closing down all the open ports and resources that are not necessary. One step would be to disable HTTP REST API completely (port 9200) and run Logstash Elasticsearch output only http://logstash.net/docs/1.4.1/outputs/elasti

Re: implementing a plugin to process the whole input document

2014-06-12 Thread joergpra...@gmail.com
Short answer: modifying the source after having executed a standard index or bulk action is not possible. Long answer: it depends, if you look at https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/index/TransportIndexAction.java#L188 you can see how

Re: Cassandra with JDBC river plugin

2014-06-13 Thread joergpra...@gmail.com
The Cassandra Java Driver is not a JDBC driver. Jörg On Fri, Jun 13, 2014 at 11:11 AM, Abhishek Mukherjee <4271...@gmail.com> wrote: > Checking the Elasticsearch log files I found this. > > No suitable driver found for jdbc:cassandra:// > 192.168.1.103:9160/transactionlogdb > at java.sql.Driver

Re: Runtime JRE?

2014-06-13 Thread joergpra...@gmail.com
Yes, you can use Java Server JRE. It is a build without Java desktop graphics library (aka headless JVM). Jörg On Fri, Jun 13, 2014 at 1:53 PM, wrote: > I know the guide says the following: > > While a JRE can be used for the Elasticsearch service, due to its use of a > client VM (as oppose to

Re: Securing Data in Elasticsearch

2014-06-13 Thread joergpra...@gmail.com
You should start HTTP only on localhost then and run Kibana on a selected number of nodes only. There are some authentication solutions for Kibana. I am not able to find security features like audit trails or preventing writes in Kibana/ES so you have to take care. Assessing Kibana for attacks ov

[ANN] Elasticsearch syslog plugin

2014-06-14 Thread joergpra...@gmail.com
Hi, here is a small plugin for Elasticsearch for receiving syslog messages via UDP or TCP. It is very similar to the bulk UDP module, but can parse syslog RFC messages. https://github.com/jprante/elasticsearch-syslog As always, feedback is most welcome. Best, Jörg -- You received this messag

Re: Elastic Search and consistency

2014-06-15 Thread joergpra...@gmail.com
index.gateway.local.sync: 0 is related to durability, it means, the underlying data is really going to disk by using the guarantee of FileChannel.force(false). This destroys performance compared to the default value of ES, because there are a lot more I/O operations on OS layer when fsync() is perf

Re: Securing Data in Elasticsearch

2014-06-15 Thread joergpra...@gmail.com
>From what I know about Kibana, it just uses the HTTP API _search endpoint, but I have not examined it more thoroughly. It is quite simple to set up an nginx/apache reverse proxy to filter requests. You should add http: host: 127.0.0.1 to your config/elasticsearch.yml to ensure that HTTP RES

Re: Securing Data in Elasticsearch

2014-06-15 Thread joergpra...@gmail.com
No, with the setting, you can run Logstash and Kibana on different hosts. Only on ES node side, you start an additional nginx/apache, to wrap the HTTP 9200 port service with a HTTP port 80 reverse proxy service. On Kibana, you change all port 9200 configs to port 80 configs (also the remote host

Re: Creating a browse interface from ES

2014-06-16 Thread joergpra...@gmail.com
What about this: - build author name index - page size is static (e.g. 20) - absolute position: you must index each author name with absolute position info (sort author names before indexing, use a counter and increment it while indexing) - sort asc/desc works on author's name keyword analyzed

Re: IllegalArgumentException[No type mapped for [43]], version 1.2.1

2014-06-16 Thread joergpra...@gmail.com
I guess you hit the following condition: - you insert data with bulk indexing - your index has dynamic mapping and already has huge field mappings - bulk requests span over many nodes / shards / replicas and introduce tons of new fields into the dynamic mapping - you do not wait for bulk respon

Re: Creating a browse interface from ES

2014-06-17 Thread joergpra...@gmail.com
hey do not return exact counts, only an estimated count. For "register search" you need absolutely exact counts. Jörg On Tue, Jun 17, 2014 at 7:28 AM, Robin Sheat wrote: > joergpra...@gmail.com schreef op ma 16-06-2014 om 13:12 [+0200]: > > > > This is how I implement "

Re: Elasticsearch support for Java 1.8?

2014-06-17 Thread joergpra...@gmail.com
Scripting issues were due to MVEL, but with MVEL 2.2.0.Final, this has been fixed in ES. So yes, you can run ES on Java 8 JVM. Jörg On Tue, Jun 17, 2014 at 3:58 PM, Georgi Ivanov wrote: > As far as I know , ES will work just fine with java 1.8, > except script support. > > I read some article

Re: Scroll Questions

2014-06-17 Thread joergpra...@gmail.com
1. yes 2. facet/aggregations are not very useful while scrolling (I doubt they even work at all) because scrolling works on shard level and aggregations work on indices level 3. a scroll request takes resources. The purpose of ClearScrollRequest is to release those resources explicitly. This is i

  1   2   3   4   5   6   7   8   9   10   >