Thank you Shawn.  It looks like it is being applied.  This could be some sort of chain reaction where:

Drive or server fails.  HDFS starts to replicate blocks which causes network congestion.  Solr7 can't talk, so initiates a replication process which causes more network congestion....which causes more replicas to replicate, and which eventually causes HBase (we run HBase+Solr on the same machines) to also not be able to talk.  That is my running hypothesis anyway!

We've made a change to limit how much bandwidth HDFS can use. One issue that we have seen is that the replicas fail to replicate, and retry, over and over.  I believe they are getting a timeout error; is that parameter adjustable?

-------------


{
  "responseHeader":{
    "status":0,
    "QTime":134,
    "params":{
      "echoParams":"all",
      "indent":"true",
      "wt":"json",
      "command":"details",
      "maxWriteMBPerSec":"75"}},
  "details":{
    "indexSize":"156.72 GB",
"indexPath":"hdfs://nameservice1:8020/solr7.1.0/UNCLASS/core_node106/data/index/",
    "commits":[[
        "indexVersion",1528860019189,
        "generation",8188,
        "filelist",["_10k8.cfe",
          "_10k8.cfs",
          "_10k8.si",
          "_10k8_1.liv",
          "_1l1j.cfe",
          "_1l1j.cfs",
          "_1l1j.si",
          "_1l1j_2.liv",
          "_289p.cfe",
          "_289p.cfs",
          "_289p.si",
          "_30fj.cfe",
          "_30fj.cfs",
          "_30fj.si",
          "_30fj_8o.liv",
          "_3ugu.cfe",
          "_3ugu.cfs",
          "_3ugu.si",
          "_3uno.cfe",
          "_3uno.cfs",
          "_3uno.si",
          "_3x64.cfe",
          "_3x64.cfs",
          "_3x64.si",
          "_3zt7.cfe",
          "_3zt7.cfs",
          "_3zt7.si",
          "_43mm.cfe",
          "_43mm.cfs",
          "_43mm.si",
          "_43mm_o.liv",
          "_487a.cfe",
          "_487a.cfs",
          "_487a.si",
          "_4cxd.cfe",
          "_4cxd.cfs",
          "_4cxd.si",
          "_4eux.cfe",
          "_4eux.cfs",
          "_4eux.si",
          "_4jez.cfe",
          "_4jez.cfs",
          "_4jez.si",
          "_4jez_f.liv",
          "_4jgn.cfe",
          "_4jgn.cfs",
          "_4jgn.si",
          "_4jgn_d.liv",
          "_4jlm.cfe",
          "_4jlm.cfs",
          "_4jlm.si",
          "_4jlm_9.liv",
          "_4jm6.cfe",
          "_4jm6.cfs",
          "_4jm6.si",
          "_4jm6_b.liv",
          "_4jmr.cfe",
          "_4jmr.cfs",
          "_4jmr.si",
          "_4jmr_2.liv",
          "_4jna.cfe",
          "_4jna.cfs",
          "_4jna.si",
          "_4jna_4.liv",
          "_4joy.cfe",
          "_4joy.cfs",
          "_4joy.si",
          "_4joy_5.liv",
          "_4jpi.cfe",
          "_4jpi.cfs",
          "_4jpi.si",
          "_4jpi_4.liv",
          "_4jq2.cfe",
          "_4jq2.cfs",
          "_4jq2.si",
          "_4jq2_4.liv",
          "_4jqm.cfe",
          "_4jqm.cfs",
          "_4jqm.si",
          "_4jqm_1.liv",
          "_4jqn.cfe",
          "_4jqn.cfs",
          "_4jqn.si",
          "_4jqn_2.liv",
          "_4jqo.cfe",
          "_4jqo.cfs",
          "_4jqo.si",
          "_4jqp.cfe",
          "_4jqp.cfs",
          "_4jqp.si",
          "_4jqq.cfe",
          "_4jqq.cfs",
          "_4jqq.si",
          "_4jqq_1.liv",
          "_4jqr.cfe",
          "_4jqr.cfs",
          "_4jqr.si",
          "_4jqs.cfe",
          "_4jqs.cfs",
          "_4jqs.si",
          "_4jqt.cfe",
          "_4jqt.cfs",
          "_4jqt.si",
          "_4jqu.cfe",
          "_4jqu.cfs",
          "_4jqu.si",
          "_4jqv.cfe",
          "_4jqv.cfs",
          "_4jqv.si",
          "_4jqv_1.liv",
          "_4jqw.cfe",
          "_4jqw.cfs",
          "_4jqw.si",
          "_4jqw_1.liv",
          "_4jqx.cfe",
          "_4jqx.cfs",
          "_4jqx.si",
          "_4jqx_1.liv",
          "_4jqy.cfe",
          "_4jqy.cfs",
          "_4jqy.si",
          "_4jqy_1.liv",
          "_4jqz.cfe",
          "_4jqz.cfs",
          "_4jqz.si",
          "_4jqz_1.liv",
          "_4jr0.cfe",
          "_4jr0.cfs",
          "_4jr0.si",
          "_4jr0_1.liv",
          "_4jr1.cfe",
          "_4jr1.cfs",
          "_4jr1.si",
          "_4jr2.cfe",
          "_4jr2.cfs",
          "_4jr2.si",
          "_4jr3.cfe",
          "_4jr3.cfs",
          "_4jr3.si",
          "_4jr3_1.liv",
          "_4jr4.cfe",
          "_4jr4.cfs",
          "_4jr4.si",
          "_4jr4_1.liv",
          "_4jr5.cfe",
          "_4jr5.cfs",
          "_4jr5.si",
          "_4jr6.cfe",
          "_4jr6.cfs",
          "_4jr6.si",
          "_4jr6_1.liv",
          "_4jr7.cfe",
          "_4jr7.cfs",
          "_4jr7.si",
          "_4jr8.cfe",
          "_4jr8.cfs",
          "_4jr8.si",
          "_4jr9.cfe",
          "_4jr9.cfs",
          "_4jr9.si",
          "_4jr9_1.liv",
          "_4jra.cfe",
          "_4jra.cfs",
          "_4jra.si",
          "_4jra_1.liv",
          "_4jrb.cfe",
          "_4jrb.cfs",
          "_4jrb.si",
          "_4jrb_1.liv",
          "_4jrc.cfe",
          "_4jrc.cfs",
          "_4jrc.si",
          "_4jrc_1.liv",
          "_4jrd.cfe",
          "_4jrd.cfs",
          "_4jrd.si",
          "_4jre.cfe",
          "_4jre.cfs",
          "_4jre.si",
          "_4jrf.cfe",
          "_4jrf.cfs",
          "_4jrf.si",
          "_4jrg.cfe",
          "_4jrg.cfs",
          "_4jrg.si",
          "_4jrh.cfe",
          "_4jrh.cfs",
          "_4jrh.si",
          "_4jri.cfe",
          "_4jri.cfs",
          "_4jri.si",
          "_4jri_1.liv",
          "_4jrj.cfe",
          "_4jrj.cfs",
          "_4jrj.si",
          "_4jrk.cfe",
          "_4jrk.cfs",
          "_4jrk.si",
          "_4jrl.cfe",
          "_4jrl.cfs",
          "_4jrl.si",
          "_itc.cfe",
          "_itc.cfs",
          "_itc.si",
          "_itc_2s.liv",
          "segments_6bg"]],
      [
        "indexVersion",1528861822922,
        "generation",8189,
        "filelist",["_10k8.cfe",
          "_10k8.cfs",
          "_10k8.si",
          "_10k8_1.liv",
          "_1l1j.cfe",
          "_1l1j.cfs",
          "_1l1j.si",
          "_1l1j_2.liv",
          "_289p.cfe",
          "_289p.cfs",
          "_289p.si",
          "_30fj.cfe",
          "_30fj.cfs",
          "_30fj.si",
          "_30fj_8o.liv",
          "_3ugu.cfe",
          "_3ugu.cfs",
          "_3ugu.si",
          "_3uno.cfe",
          "_3uno.cfs",
          "_3uno.si",
          "_3x64.cfe",
          "_3x64.cfs",
          "_3x64.si",
          "_3zt7.cfe",
          "_3zt7.cfs",
          "_3zt7.si",
          "_43mm.cfe",
          "_43mm.cfs",
          "_43mm.si",
          "_43mm_o.liv",
          "_487a.cfe",
          "_487a.cfs",
          "_487a.si",
          "_4cxd.cfe",
          "_4cxd.cfs",
          "_4cxd.si",
          "_4eux.cfe",
          "_4eux.cfs",
          "_4eux.si",
          "_4jez.cfe",
          "_4jez.cfs",
          "_4jez.si",
          "_4jez_f.liv",
          "_4jgn.cfe",
          "_4jgn.cfs",
          "_4jgn.si",
          "_4jgn_d.liv",
          "_4jlm.cfe",
          "_4jlm.cfs",
          "_4jlm.si",
          "_4jlm_9.liv",
          "_4jm6.cfe",
          "_4jm6.cfs",
          "_4jm6.si",
          "_4jm6_c.liv",
          "_4jmr.cfe",
          "_4jmr.cfs",
          "_4jmr.si",
          "_4jmr_3.liv",
          "_4jna.cfe",
          "_4jna.cfs",
          "_4jna.si",
          "_4jna_5.liv",
          "_4joy.cfe",
          "_4joy.cfs",
          "_4joy.si",
          "_4joy_6.liv",
          "_4jpi.cfe",
          "_4jpi.cfs",
          "_4jpi.si",
          "_4jpi_4.liv",
          "_4jq2.cfe",
          "_4jq2.cfs",
          "_4jq2.si",
          "_4jq2_4.liv",
          "_4jqm.cfe",
          "_4jqm.cfs",
          "_4jqm.si",
          "_4jqm_1.liv",
          "_4jqn.cfe",
          "_4jqn.cfs",
          "_4jqn.si",
          "_4jqn_2.liv",
          "_4jqr.cfe",
          "_4jqr.cfs",
          "_4jqr.si",
          "_4jqu.cfe",
          "_4jqu.cfs",
          "_4jqu.si",
          "_4jqv.cfe",
          "_4jqv.cfs",
          "_4jqv.si",
          "_4jqv_1.liv",
          "_4jqw.cfe",
          "_4jqw.cfs",
          "_4jqw.si",
          "_4jqw_1.liv",
          "_4jqy.cfe",
          "_4jqy.cfs",
          "_4jqy.si",
          "_4jqy_1.liv",
          "_4jqz.cfe",
          "_4jqz.cfs",
          "_4jqz.si",
          "_4jqz_1.liv",
          "_4jr0.cfe",
          "_4jr0.cfs",
          "_4jr0.si",
          "_4jr0_1.liv",
          "_4jr3.cfe",
          "_4jr3.cfs",
          "_4jr3.si",
          "_4jr3_1.liv",
          "_4jr6.cfe",
          "_4jr6.cfs",
          "_4jr6.si",
          "_4jr6_2.liv",
          "_4jr8.cfe",
          "_4jr8.cfs",
          "_4jr8.si",
          "_4jr9.cfe",
          "_4jr9.cfs",
          "_4jr9.si",
          "_4jr9_1.liv",
          "_4jra.cfe",
          "_4jra.cfs",
          "_4jra.si",
          "_4jra_1.liv",
          "_4jrb.cfe",
          "_4jrb.cfs",
          "_4jrb.si",
          "_4jrb_1.liv",
          "_4jrd.cfe",
          "_4jrd.cfs",
          "_4jrd.si",
          "_4jre.cfe",
          "_4jre.cfs",
          "_4jre.si",
          "_4jrh.cfe",
          "_4jrh.cfs",
          "_4jrh.si",
          "_4jro.cfe",
          "_4jro.cfs",
          "_4jro.si",
          "_4jrp.cfe",
          "_4jrp.cfs",
          "_4jrp.si",
          "_4jrq.cfe",
          "_4jrq.cfs",
          "_4jrq.si",
          "_4jrr.cfe",
          "_4jrr.cfs",
          "_4jrr.si",
          "_4jrr_1.liv",
          "_4jrs.cfe",
          "_4jrs.cfs",
          "_4jrs.si",
          "_4jrt.cfe",
          "_4jrt.cfs",
          "_4jrt.si",
          "_4jru.cfe",
          "_4jru.cfs",
          "_4jru.si",
          "_4jrv.cfe",
          "_4jrv.cfs",
          "_4jrv.si",
          "_4jrw.cfe",
          "_4jrw.cfs",
          "_4jrw.si",
          "_4jrx.cfe",
          "_4jrx.cfs",
          "_4jrx.si",
          "_4jry.cfe",
          "_4jry.cfs",
          "_4jry.si",
          "_4jrz.cfe",
          "_4jrz.cfs",
          "_4jrz.si",
          "_4js0.cfe",
          "_4js0.cfs",
          "_4js0.si",
          "_4js1.cfe",
          "_4js1.cfs",
          "_4js1.si",
          "_4js2.cfe",
          "_4js2.cfs",
          "_4js2.si",
          "_4js3.cfe",
          "_4js3.cfs",
          "_4js3.si",
          "_itc.cfe",
          "_itc.cfs",
          "_itc.si",
          "_itc_2s.liv",
          "segments_6bh"]]],
    "isMaster":"true",
    "isSlave":"false",
    "indexVersion":1528861822922,
    "generation":8189,
    "master":{
      "replicateAfter":["commit"],
      "replicationEnabled":"true",
      "replicableVersion":1528861822922,
      "replicableGeneration":8189}}}

-----------------

-Joe


On 6/12/2018 11:48 AM, Shawn Heisey wrote:
On 6/11/2018 9:46 AM, Joe Obernberger wrote:
We are seeing an issue on our Solr Cloud 7.3.1 cluster where
replication starts and pegs network interfaces so aggressively that
other tasks cannot talk.  We will see it peg a bonded 2GB interfaces.
In some cases the replication fails over and over until it finally
succeeds and the replica comes back up.  Usually the error is a timeout.

Has anyone seen this?  We've tried adjust the /replication
requestHandler and setting:

<requestHandler name="/replication" class="solr.ReplicationHandler">
          <lst name="defaults">
           <str name="maxWriteMBPerSec">75</str>
          </lst>
</requestHandler>
Here's something I'd like you to try.  Open a browser and visit the URL
for the handler with some specific parameters, so we can see if that
config is actually being applied.  Substitute the correct host, port,
and collection name:

http://host:port/solr/collection/replication?command=details&echoParams=all&wt=json&indent=true

And provide the full raw JSON response.

On a solr 7.3.0 example, I added your replication handler definition,
and this is the result of visiting a similar URL:

{
   "responseHeader":{
     "status":0,
     "QTime":5,
     "params":{
       "echoParams":"all",
       "indent":"true",
       "wt":"json",
       "command":"details",
       "maxWriteMBPerSec":"75"}},
   "details":{
     "indexSize":"6.27 KB",
     
"indexPath":"C:\\Users\\sheisey\\Downloads\\solr-7.3.0\\server\\solr\\foo\\data\\index/",
     "commits":[[
         "indexVersion",1528213960436,
         "generation",4,
         "filelist",["_0.fdt",
           "_0.fdx",
           "_0.fnm",
           "_0.si",
           "_0_Lucene50_0.doc",
           "_0_Lucene50_0.tim",
           "_0_Lucene50_0.tip",
           "_0_Lucene70_0.dvd",
           "_0_Lucene70_0.dvm",
           "_1.fdt",
           "_1.fdx",
           "_1.fnm",
           "_1.nvd",
           "_1.nvm",
           "_1.si",
           "_1_Lucene50_0.doc",
           "_1_Lucene50_0.pos",
           "_1_Lucene50_0.tim",
           "_1_Lucene50_0.tip",
           "_1_Lucene70_0.dvd",
           "_1_Lucene70_0.dvm",
           "_2.fdt",
           "_2.fdx",
           "_2.fnm",
           "_2.nvd",
           "_2.nvm",
           "_2.si",
           "_2_Lucene50_0.doc",
           "_2_Lucene50_0.pos",
           "_2_Lucene50_0.tim",
           "_2_Lucene50_0.tip",
           "_2_Lucene70_0.dvd",
           "_2_Lucene70_0.dvm",
           "segments_4"]]],
     "isMaster":"true",
     "isSlave":"false",
     "indexVersion":1528213960436,
     "generation":4,
     "master":{
       "replicateAfter":["commit"],
       "replicationEnabled":"true"}}}

The maxWriteMBPerSec parameter can be seen in the response header, so on
this system, it looks like it's working.

Thanks,
Shawn


---
This email has been checked for viruses by AVG.
https://www.avg.com


Reply via email to