I work with Aditya, so this information is in continuation where Aditya left off.
Here are some of the observations based on running a query on a particular unique id . The nature of the document (corresponding to the uniqueid) is such that it is fairly large if we were to run a query without an fl list for this document, the total size would be in the neighborhood of 6MB. However, we are using fl list to get a subset of this document. We use a script that uses curl to call the server, run from a different box, for the same uniqueId's but with different fl list. After the first few runs of the search (something like, q=id:foo) we change the fl list to return some other fields which produce a different set of fields perhaps larger than the first query for the same id but different fl list. 1. The curl client blocks when the fl list changes. The CPU from VisualVM shows 50% CPU utilization. 2. This spin continues till the result is returned back to the curl client. 3. We see the same thing from a browser as well and this reproduces the problem and helps identify that the spin occurs after the server has completed searching for document (since we see an entry in the solr log file and that contains the QTime for this query), and is now trying to return. The browser waits till all the data is received and only after this is done, renders the page. So what is taking so long for the server to respond to the client? 4. Monitor the sampler from VisualVM and you can see the getFields() on the top of the list. Since I see it on the top of the list I believe that it may be spinning here. 5. Restart the Server running SOLR. 6. Start with running the same query from the browser and it returns in a couple of seconds. 7. Running the same curl script and we see that sail through the query as well, with the server responding back almost immediately. 6. Monitoring sampler this time around and you _don't_ see CPU spinning on getFields(). 7. I change the solrconfig.xml file in the definitions for firstSearcher and add the uniqueId in the q parameter and restart. 8. This time running the curl script runs well. 9. If the server is restarted again, we run the curl script with the blocking (spinning) query right on top, the script sails through again. Just from this observation, it seems like the code for SOLR 4.1 takes a wrong turn somewhere for large responses if it comes across the same query with a different fl list again. If the spinning query is pre-cached via the solrconfig.xml firstsearcher change or via the browser or run ahead of other queries for the same id, it seems to work fine after the first run of the command. However, running it after running the same search with different fl does have an effect. This did not happen with SOLR 3.5 and seems like a regression. The above is repeatable for us. Question: Why is this happening on SOLR 4.1? Seems like the workaround for now may be to cache the queries with large document sizes in solrconfig.xml . Would appreciate hearing from others facing this issue thus validating what we see as well. Thanks. Best regards, -- Sandeep -- View this message in context: http://lucene.472066.n3.nabble.com/Solr3-5-Vs-Solr4-1-Help-please-tp4043543p4044742.html Sent from the Solr - User mailing list archive at Nabble.com.