RE: Stored fields and OS file caching

2014-04-05 Thread Toke Eskildsen
Vitaly Funstein [vfunst...@gmail.com] wrote: > It's a bit of a guess on my part, but I did get better write and search > performance with size <= 2K, as opposed to the default 16K. For search that sounds plausible as that is very random access heavy and the disk cache will contain a larger amount

Re: Stored fields and OS file caching

2014-04-05 Thread Jack Krupansky
. -- Jack Krupansky -Original Message- From: Adrien Grand Sent: Friday, April 4, 2014 4:50 PM To: java-user@lucene.apache.org Subject: Re: Stored fields and OS file caching Hi Vitaly, Doc values are indeed well-suited for grouping and sorting. However stored fields remain better at returning

Re: Stored fields and OS file caching

2014-04-04 Thread Vitaly Funstein
Thanks for the explanation, Adrien. I do have a couple of follow-up questions. Isn't this block size used for file caching OS-dependent? And if 4K happens to be the most commonly used size, wouldn't it make more sense for the default stored fields format to have a chunk size equal to or smaller tha

Re: Stored fields and OS file caching

2014-04-04 Thread Adrien Grand
Hi Vitaly, Doc values are indeed well-suited for grouping and sorting. However stored fields remain better at returning field values to users since they guarantee a worst-case of one disk seek per document. The filesystem cache typically caches data by blocks of 4KB. This plays more nicely with d

Re: Stored fields and OS file caching

2014-04-04 Thread Vitaly Funstein
I use stored fields to load values for the following use cases: - to return per-document values as is, requested by the user - similar to listing DB columns you are interested in, in a "select ..." clause. - to perform aggregate function calculations while forming the result set (if requested). - f

RE: Stored fields and OS file caching

2014-04-04 Thread Uwe Schindler
Hi, What are you doing with the stored fields? They are not deprecated and also not really slow, unless you scan over millions of documents in random access order. To display serach results, DocValues are of no use. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetap