Hi, I can confirm the below mentioned problems trying to migrate to 2.9.
Our Lucene-based (2.4) app uses custom multi-level sorting on a lot of different fields and pretty large indexes (> 100m docs). Most of the fields that we sort on are strings, some with up to 400 characters in length. A lot of the strings share a common prefix, so comparing these character-by-character is costly. As the comparison proved to be a hotspot (and ordinary string field caches a memory-hog) we've been using a similar approach like the one mentioned by John Wang for years now, starting with Lucene 1.x: Using a map from doc id to ordinals (int->int) where the ordinal represents the order imposed by the (costly) comparison. This has been rather simple to implement and gives our app sizeable performance gains. Furthermore, it saves a lot of memory compared to standard field caches. With the new API, though, the simple mapping from doc id to pre-generated sort ordinals won't work anymore. This is the only reason that has prevented us from upgrading to 2.9. Currently it seems to us that adapting to the new sorting API would be either much more complicated to implement, or - when reverting to standard field cache-based sorting - slower and use far more memory. On the other hand, we might not have a full understanding of 2.9's inner workings yet, being used to (hopefully not stuck with) our a.m. custom solution. Other than that, 2.9 seems to be a very nice release. Thanks for the great work! -Tom On Tue, Oct 20, 2009 at 8:55 AM, John Wang <john.w...@gmail.com> wrote: [...] Let me provide an example: We have a multi valued field on integers, we define a sort on this set of strings by defining a comparator on each value to be similar to a lex order, instead of compare on characters, we do on strings, we also want to keep the multi value representation as we do filtering and facet counting on it. The in memory representation is similar to the UnInvertedField in Solr. Implementing a sort with the old API was rather simple, as we only needed mapping from a docid to a set of ordinals. With the new api, we needed to do a "conversion", which would mean mapping a set of String/ordinals back to a doc. Which is to me, is not trivial, let alone performance implications. That actually gave us to motivation to see if the old api can handle the segment level changes that was made in 2.9 (which in my opinion is the best thing in lucene since payloads :) ) So after some investigation, with code and big O analysis, and discussions with Mike and Yonik, on our end, we feel given the performance numbers, it is unnecessary to go with the more complicated API. Thanks -John