Thanks for the super quick reply. The logs are pretty big, but one thing comes up over and over again:
Leader side: ERROR - 2013-06-21 01:44:24.014; org.apache.solr.common.SolrException; shard update error StdNode: http://xxx:xxx:xx:xx:8983/solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok status:500, message:Internal Server Error ERROR - 2013-06-21 01:44:24.015; org.apache.solr.common.SolrException; shard update error StdNode: http://xxx:xxx:xx:xx:8983/solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok status:500, message:Internal Server Error ERROR - 2013-06-21 01:44:24.015; org.apache.solr.common.SolrException; shard update error StdNode: http://xxx:xxx:xx:xx:8983/solr/collection1/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://xxx:xxx:xx:xx:8983/solr/collection1 returned non ok status:500, message:Internal Server Error Non-Leader side: 757682 [RecoveryThread] ERROR org.apache.solr.update.PeerSync – PeerSync: core=collection1 url=http://xxx:xxx:xx:xx:8983/solr Error applying updates from [Ljava.lang.String;@1be0799a ,update=[1, 1438251416655233024, SolrInputDocument[type=topic, fullId=9ce54310-d89a-11e2-b89d-22000af02b44, account=account1, site=mySite, topic=topic5, id=account1mySitetopic5, totalCount=195, approvedCount=195, declinedCount=0, flaggedCount=0, createdOn=2013-06-19T04:42:14.329Z, updatedOn=2013-06-19T04:42:14.386Z, _version_=1438251416655233024]] java.lang.UnsupportedOperationException at org.apache.lucene.queries.function.FunctionValues.longVal(FunctionValues.java:46) at org.apache.solr.update.VersionInfo.getVersionFromIndex(VersionInfo.java:201) at org.apache.solr.update.UpdateLog.lookupVersion(UpdateLog.java:718) at org.apache.solr.update.VersionInfo.lookupVersion(VersionInfo.java:184) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:635) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:398) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.update.PeerSync.handleUpdates(PeerSync.java:487) at org.apache.solr.update.PeerSync.handleResponse(PeerSync.java:335) at org.apache.solr.update.PeerSync.sync(PeerSync.java:265) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:366) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223) Unfortunately I don't see what kind of UnsupportedOperation this could be referring to. Many thanks, Sven On Fri, Jun 21, 2013 at 11:44 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > This doesn't seem right. A leader will ask a replica to recover only > when an update request could not be forwarded to it. Can you check > your leader logs to see why updates are not being sent through to > replicas? > > On Fri, Jun 21, 2013 at 7:03 AM, Sven Stark <sven.st...@m-square.com.au> > wrote: > > Hello, > > > > first: I am pretty much a Solr newcomer, so don't necessarily assume > basic > > solr knowledge. > > > > My problem is that in my setup SolrCloud seems to create way too much > > network traffic for replication. I hope I'm just missing some proper > config > > options. Here's the setup first: > > > > * I am running a five node SolrCloud cluster on top of an external 5 node > > zookeeper cluster, according to logs and clusterstate.json all nodes find > > each other and are happy > > * Solr version is now 4.3.1, but the problem also existed on 4.1.0 ( I > > thought upgrade might solve the issue because of > > https://issues.apache.org/jira/browse/SOLR-4471) > > * there is only one shard > > * solr.xml and solrconfig.xml are out of the box, except for the enabled > > soft commit > > > > <autoSoftCommit> > > <maxTime>1000</maxTime> > > </autoSoftCommit> > > > > * our index is minimal at the moment (dev and testing stage) 20-30Mb, > about > > 30k small docs > > > > The issue is when I run smallish load tests against our app which posts > ca > > 1-2 docs/sec to solr, the SolrCloud leader creates outgoing network > traffic > > of 20-30Mbyte/sec and the non-leader receive 4-8MByte/sec each. > > > > The non-leaders logs are full of entries like > > > > INFO - 2013-06-21 01:08:58.624; > > org.apache.solr.handler.admin.CoreAdminHandler; It has been requested > that > > we recover > > INFO - 2013-06-21 01:08:58.640; > > org.apache.solr.handler.admin.CoreAdminHandler; It has been requested > that > > we recover > > INFO - 2013-06-21 01:08:58.643; > > org.apache.solr.handler.admin.CoreAdminHandler; It has been requested > that > > we recover > > INFO - 2013-06-21 01:08:58.651; > > org.apache.solr.handler.admin.CoreAdminHandler; It has been requested > that > > we recover > > INFO - 2013-06-21 01:08:58.892; > > org.apache.solr.handler.admin.CoreAdminHandler; It has been requested > that > > we recover > > INFO - 2013-06-21 01:08:58.893; > > org.apache.solr.handler.admin.CoreAdminHandler; It has been requested > that > > we recover > > > > So my assumption is I am making config errors and the cloud leader tries > to > > push the index to all non-leaders over and over again. But I couldn't > > really find much doco on how to properly configure SolrCloud replication > > online. > > > > Any hints and help much appreciated. I can provide more info or data, > just > > let me know what you need. > > > > Thanks in advance, > > Sven > > > > -- > Regards, > Shalin Shekhar Mangar. >