: with (in my overridden process() method): : String[] selectFields = {"id", "fileName"}; // the subset of fields : I am interested in : TopDocs results = searcher.search(cmd.getQuery(), 100000); // : custom spanquery, and many/all hits : /* save hit info (doc & score) */ : /* maybe process SpanQuery.getSpans() here, but perhaps try "doc : oriented results" processing approach(?) for tokenization : caching/optimization? */
For an approach like this (where you get the top N matches, then process those N to get the spans) you can actually use the existing QueryCOmponent as is, and just add your own SearchComponent that runs after it and inspects to DocList in the QueryResult to get th Spans and record whatever data you want. doing that would have the added benefit of leveraging hte existing filter/query caches when doing the main search (you would still need to use the caching APIs if you wnated to cache your post processing work) The alternate approach using a HitCollector (or the code you've got now asking for TopDocs) bypasses all of Solr's caching -- it will works fine, it's just a question of what you want. -Hoss