having an issue where solr becomes unresponsive for unknown reasons.
Client requests timeout for minutes at a time (sometimes only some
requests time out while others work fine). The logs don't reveal any
clues, other than just a big gap
example:
INFO - 2015-07-17 14:39:57.195;
org.apache.solr.update.processor.LogUpdateProcessor; [collection1]
webapp=/solr path=/update params={omitHeader=false&wt=json}
{add=[16ce1c27-558c-4307-bdb6-e88ea93c9b2a
(1506981128084389888)],commit=} 0 32
INFO - 2015-07-17 14:40:02.716; org.apache.solr.core.SolrCore;
[collection1] webapp=/solr path=/select
params={omitHeader=true&sort=objectid+asc&fl=objectid&start=0&q=*:*&wt=json&fq=grades:("39")&fq=keywords:("1193"+OR+"1198"+OR+"11532206"+OR+"11532216"+OR+"17787406"+OR+"147664140"+OR+"147664142"+OR+"147664180"+OR+"325388273"+OR+"342808011")&fq=orgobjectid:("514119")&fq=subjects:("Language+Arts")&rows=1000}
hits=4 status=0 QTime=54
INFO - 2015-07-17 14:40:07.756; org.apache.solr.core.SolrCore;
[collection1] webapp=/solr path=/select
params={omitHeader=true&sort=objectid+asc&fl=objectid&start=0&q=*:*&wt=json&fq=grades:("43")&fq=keywords:("1134"+OR+"1158"+OR+"1160"+OR+"1175"+OR+"1193"+OR+"1208"+OR+"1209"+OR+"1213"+OR+"1215"+OR+"7838251"+OR+"7838265"+OR+"8877368"+OR+"11532189"+OR+"14736433"+OR+"14736436"+OR+"15392964"+OR+"15392969"+OR+"17787380"+OR+"17787385"+OR+"17787388"+OR+"17787389"+OR+"17787396"+OR+"17787397"+OR+"17787400"+OR+"17787405"+OR+"17787406"+OR+"26538072"+OR+"27982226"+OR+"28551934"+OR+"28551953"+OR+"466877542"+OR+"466877543"+OR+"476555246")&fq=orgobjectid:("392236052")&fq=subjects:("Language+Arts")&rows=1000}
hits=17 status=0 QTime=381
INFO - 2015-07-17 14:47:27.223; org.apache.solr.core.SolrCore;
[collection1] webapp=/solr path=/select
params={omitHeader=true&fl=*,score&start=0&q=*:*&wt=json&fq=orgobjectid:(672130365)&rows=1000}
hits=0 status=0 QTime=0
you'll notice that is a 7 minute gap between the 3rd and 4th lines
there. The only exceptions that show up are a few EofExceptions/Broken
Pipes, but my assumption is that they are end users prematurely stopping
their requests. Updates happen periodically throughout the day with
soft commits. hard commits are configured to run every 15sec, and we
only optimize once at night. disk IO and memory usage is normal during
these hiccups. the only thing abnormal is a load avg of about 3 (where
1.5 is the normal load)
Any ideas as to what's going on?
--
*jeremy ashcraft*
//