Hello all,
Here is the situation I am facing.

I am migrating from SOLR 4 to SOLR 7. SOLR 4 is running on Tomcat 8, SOLR 7 
runs with built in Jetty 9.
The largest core contains about 1,800,000 documents (about 3 GB).

The migration went through smoothly. But something's bothering me.

I have a PostFilter to collect only some documents according to a pre-selected 
list.

Here is the code for the org.apache.solr.search.DelegatingCollector:

        @Override
        protected void doSetNextReader(LeafReaderContext context) throws 
IOException {
                this.reader = context.reader();
                super.doSetNextReader(context);
        }

        @Override
        public void collect(int docNumber) throws IOException {
                if (null != this.reader && 
isValid(this.reader.document(docNumber).get("customid")))
                {
                        super.collect(docNumber);
                }
        }

        private boolean isValid(String customId) {
                boolean valid = false;
                if (null != customMap) // HashMap<String, String>, contains the 
custom IDs to keep. Contains an average of 2k items
                {
                        valid = customMap.get(customId) != null;
                }

                return valid;
        }

And here is an example of query sent to SOLR:

        
/select?fq=%7B!MyPostFilter%20sessionid%3DWST0DEV-QS-5BEEB1CC28B45580F92CCCEA32727083&q=system%20upgrade

So, the problem is:
        - It runs pretty fast on SOLR 4, with average QTime equals to 30.
        - But now on SOLR 7, it is awfully slow with average QTime around 25000!

And I am wondering what can be the source of such bad performances...

With a very simplified (or should I say transparent) collect function (see 
below), there is no degradation. This test just to exclude server/platform from 
the equation.

        @Override
        public void collect(int docNumber) throws IOException {
                super.collect(docNumber);
        }

My guess is that since LUCENE 7, there have been drastic changes in the way the 
API access documents, but I am not sure to have understood everything.
I got it from this post: 
https://stackoverflow.com/questions/48474506/how-to-get-docvalue-by-document-id-in-lucene-7

I suppose this has something to do with the issues I am facing.
But I have no idea how to upgrade/change my PostFilter and/or 
DelegatingCollector to go back to good performances.

If any LUCENE/SOLR experts could provide some hints or leads, it would be very 
appreciated.
Thanks in advance.


PS:
In the core schema:

        <field name="customid" type="string" indexed="true" stored="true" 
required="true" multiValued="false" />

This field is string-type as it can be something like "100034_001".

In the solrconfig.xml:

        <queryParser name="MyPostFilter" class="solrpostfilter.MyQueryPaser"/>

I can share the full schema and solrconfig.xml files if needed but so far, 
there is no other particular configuration in there.
This message contains information that may be privileged or confidential and is 
the property of the Capgemini Group. It is intended only for the person to whom 
it is addressed. If you are not the intended recipient, you are not authorized 
to read, print, retain, copy, disseminate, distribute, or use this message or 
any part thereof. If you receive this message in error, please notify the 
sender immediately and delete all copies of this message.

Reply via email to