Hi, Mikhail Your words is very encouraging. I was thinking i might need to do another Lucene custom query to apply my business-specific "index-only sort" and "early-termination"... SortField API says can use any numeric/String field, which is very perfect. (In this way, Lucene should be able to a high-perf Top-N query if SortField can support dynamically-generated ranking scores... not only the native indexed numeric/String fields)
Mikhail Khludnev <m...@apache.org> 于2019年12月31日周二 下午4:41写道: > Hello, 小鱼儿. > > On Tue, Dec 31, 2019 at 6:32 AM 小鱼儿 <ctengc...@gmail.com> wrote: > > > Assume i first use keyword search to get a DocIDSet from inverted index, > > then i want to sort these docIds by some numeric field, like a > > `updateTime`, does Lucene do this without need of loading the Document > > objects but only with an sorted index on `updateTime`? > > 1. Lucene doesn't load Document objects from stored fields files while > sorting for sure. > 2. Lucene uses dedicated columnar data structure (DocVaues made index time, > or in the worst case lazily loaded FieldCache) to obtain field values while > collecting search results from inverted index. > 3. One deviation from this generic algorithm is sorted index and early > termination, that's probably what you meant in "Index-Only Search > Optimization". > > > > Which i call it > > "Index-Only Sort Optimization" (MUST be some equal concepts in RDBMS?) > > > > And since Lucene has a `SortField` API, what does it do the sort? I > thought > > > It brings up TopFieldCollector instead of the default TopScoreDocCollector. > > > SortField is just a post-processing... > > > Not really. Scoring/sorting should be done along side with searching to > reduce memory footprint by storing only top candidate results in a binary > heap. > IIRC it's described in this classic paper > http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf > > > -- > Sincerely yours > Mikhail Khludnev >