Index fields configuration - suggestions

2018-03-05 Thread Kumaran Ramasubramanian
Hi all, Regarding configurations about every fields( stored? analyzed? sort needed? numeric ? ), elastic search designed cluster state to hold these configurations index wise.. solr have those configurations in xml format. If we have data center in multiple locations, is there any better way of

Is docvalue sorted by value?

2018-03-05 Thread Tony Ma
Per my understanding, doc values (binary doc values / numeric doc values) are stored with sequence of document id. Sorted numeric doc values just means if a document has multiple values, the values will be sorted for same document, but for different documents, the value is still ordered by

Re: Is docvalue sorted by value?

2018-03-05 Thread Dominik Safaric
> So, can doc values be persisted with order of values, not document id? This > should be fast in sort scenario that the values are pre-ordered instead of > scan/sort at runtime. No, unfortunately doc values cannot be persisted in order. Lucene stores this values internally as a

Re: Is docvalue sorted by value?

2018-03-05 Thread Erick Erickson
I think there are two issues here that are being conflated 1> _within_ a document, i.e. for a multi-valued field the values are stored as Dominik says as a SORTED_SET. Not only will they be returned (if you return from docValues rather than stored) in lexical order, but identical values will be

Re: Recommendation for doing a search plus collecting extra information?

2018-03-05 Thread Trejkaz
I did some experiments. As it turns out, changing SortedNumericSortField to SortField had no effect on the timings at all. However, changing the SortField.Type from LONG to INT makes queries come back 3 times faster. (20ms vs. 6.5ms comparing the fastest runs for each.) Why would using int be 3

Re: [EXTERNAL] - Re: Is docvalue sorted by value?

2018-03-05 Thread Tony Ma
Hi Erick, I raise this question is about the sorting scenario as you mentioned in #2. If the hit docs are about 100, and my query just want top 2. If the values are not sorted, it has to iterate all 100 docs and find top2 in a priority queue. If the values are already sorted, it just need to