Hi, The purpose of the project is an actual RT Search, not NRT, but with a specific condition: when an updated document meets a fixed criteria, it should be filtered out from future results (no reuse of the document). This criteria is present in the search query but of course doesn't work for uncommitted documents.
What I wrote is a combination of the following: - an UpdateRequestProcessor in the update chain storing the document unique key in a local cache when the condition is met - a postCommit listener clearing the cache - a PostFilter collecting documents that aren't found in the cache, activated in the search query as a fq parameter Functionally it does the job, however for large indexes the filter takes a hit. The index that poses problem has 18 mil documents in 13Gb, and queries return an average of 25,000 docs in results. The VM has 8 cores and 20Gb RAM, and uses nimble storage (combination of ssd & hd). Without the code Solr works like a charm. My guess so far is that the filter has to fetch the unique key for all documents in results, which consumes a lot of resources. What would be your advice? - Could I use the internal document id instead of a field value? This id would have to be available both in the UpdateRequestProcessor and PostFilter: is it the case and how can I access it? I suppose the SolrInputDocument in the request processor doesn't have it yet anyway. - If I reduce the autoSoftCommit maxDocs value (how far?), would it be wise (and feasible) to convert the PostFilter into a plain filter query such as "*:* NOT (id:1 OR id:2)" or something similar? How could I implement this and how to estimate the filter cost in order for Solr to execute it at the right position? - Maybe I took the wrong path altogether? Thanks in advance John