I work with Aditya, so this information is in continuation where Aditya left
off.

Here are some of the observations based on running a query on a particular
unique id .  The nature of the document (corresponding to the uniqueid) is
such that it is fairly large if we were to run a query without an fl list
for this document, the total size would be in the neighborhood of 6MB. 
However, we are using fl list to get a subset of this document.   We use a
script that uses curl to call the server, run from a different box, for the
same uniqueId's but with different fl list.   After the first few runs of
the search (something like, q=id:foo) we change the fl list to return some
other fields which produce a different set of fields perhaps larger than the
first query for the same id but different fl list.

1.  The curl client blocks when the fl list changes.  The CPU from VisualVM
shows 50% CPU utilization.
2. This spin continues till the result is returned back to the curl client.  
3. We see the same thing from a browser as well and this reproduces the
problem and helps identify that the spin occurs after the server has
completed searching for document (since we see an entry in the solr log file
and that contains the QTime for this query), and is now trying to return. 
The browser waits till all the data is received and only after this is done,
renders the page.  So what is taking so long for the server to respond to
the client?
4.  Monitor the sampler from VisualVM and you can see the getFields() on the
top of the list. Since I see it on the top of the list I believe that it may
be spinning here.
5.  Restart the Server running SOLR.
6.  Start with running the same query from the browser and it returns in a
couple of seconds.
7.  Running the same curl script and we see that sail through the query as
well, with the server responding back almost immediately.
6.  Monitoring sampler this time around and you _don't_ see CPU spinning on
getFields().
7.  I change the solrconfig.xml file in the definitions for firstSearcher
and add the uniqueId in the q parameter and restart.
8.  This time running the curl script runs well.
9.  If the server is restarted again, we run the curl script with the
blocking (spinning) query right on top, the script sails through again.

Just from this observation, it seems like the code for SOLR 4.1 takes a
wrong turn somewhere for large responses if it comes across the same query
with a different fl list again.    If the spinning query is pre-cached via
the solrconfig.xml firstsearcher change or via the browser or run ahead of
other queries for the same id, it seems to work fine after the first run of
the command.  However, running it after running the same search with
different fl does have an effect.   This did not happen with SOLR 3.5 and
seems like a regression.   The above is repeatable for us.

Question:  Why is this happening on SOLR 4.1?   Seems like the workaround
for now may be to cache the queries with large document sizes in
solrconfig.xml . 

Would appreciate hearing from others facing this issue thus validating what
we see as well. Thanks.

Best regards,
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr3-5-Vs-Solr4-1-Help-please-tp4043543p4044742.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to