Re: Replication Question
On 8/2/2017 8:56 AM, Michael B. Klein wrote: > SCALE DOWN > 1) Call admin/collections?action=BACKUP for each collection to a > shared NFS volume > 2) Shut down all the nodes > > SCALE UP > 1) Spin up 2 Zookeeper nodes and wait for them to stabilize > 2) Spin up 3 Solr nodes and wait for them to show up under Zookeeper's > live_nodes > 3) Call admin/collections?action=RESTORE to put all the collections back > > This has been working very well, for the most part, with the following > complications/observations: > > 1) If I don't optimize each collection right before BACKUP, the backup > fails (see the attached solr_backup_error.json). Sounds like you're being hit by this at backup time: https://issues.apache.org/jira/browse/SOLR-9120 There's a patch in the issue which I have not verified and tested. The workaround of optimizing the collection is not one I would have thought of. > 2) If I don't specify a replicationFactor during RESTORE, the admin > interface's Cloud diagram only shows one active node per collection. > Is this expected? Am I required to specify the replicationFactor > unless I'm using a shared HDFS volume for solr data? The documentation for RESTORE (looking at the 6.6 docs) says that the restored collection will have the same number of shards and replicas as the original collection. Your experience says that either the documentation is wrong or the version of Solr you're running doesn't behave that way, and might have a bug. > 3) If I don't specify maxShardsPerNode=1 during RESTORE, I get a > warning message in the response, even though the restore seems to succeed. I would like to see that warning, including whatever stacktrace is present. It might be expected, but I'd like to look into it. > 4) Aside from the replicationFactor parameter on the CREATE/RESTORE, I > do not currently have any replication stuff configured (as it seems I > should not). Correct, you don't need any replication configured. It's not for cloud mode. > 5) At the time my "1-in-3 requests are failing" issue occurred, the > Cloud diagram looked like the attached solr_admin_cloud_diagram.png. > It seemed to think all replicas were live and synced and happy, and > because I was accessing solr through a round-robin load balancer, I > was never able to tell which node was out of sync. > > If it happens again, I'll make node-by-node requests and try to figure > out what's different about the failing one. But the fact that this > happened (and the way it happened) is making me wonder if/how I can > automate this automated staging environment scaling reliably and with > confidence that it will Just Work™. That image didn't make it to the mailing list. Your JSON showing errors did, though. Your description of the diagram is good -- sounds like it was all green and looked exactly how you expected it to look. What you've described sounds like there may be a problem in the RESTORE action on the collections API, or possibly a problem with your shared storage where you put the backups, so the restored data on one replica isn't faithful to the backup. I don't know very much about that code, and what you've described makes me think that this is going to be a hard one to track down. Thanks, Shawn
Re: Replication Question
And the one that isn't getting the updates is the one marked in the cloud diagram as the leader. /me bangs head on desk On Wed, Aug 2, 2017 at 10:31 AM, Michael B. Kleinwrote: > Another observation: After bringing the cluster back up just now, the > "1-in-3 nodes don't get the updates" issue persists, even with the cloud > diagram showing 3 nodes, all green. > > On Wed, Aug 2, 2017 at 9:56 AM, Michael B. Klein > wrote: > >> Thanks for your responses, Shawn and Erick. >> >> Some clarification questions, but first a description of my >> (non-standard) use case: >> >> My Zookeeper/SolrCloud cluster is running on Amazon AWS. Things are >> working well so far on the production cluster (knock wood); its the staging >> cluster that's giving me fits. Here's why: In order to save money, I have >> the AWS auto-scaler scale the cluster down to zero nodes when it's not in >> use. Here's the (automated) procedure: >> >> SCALE DOWN >> 1) Call admin/collections?action=BACKUP for each collection to a shared >> NFS volume >> 2) Shut down all the nodes >> >> SCALE UP >> 1) Spin up 2 Zookeeper nodes and wait for them to stabilize >> 2) Spin up 3 Solr nodes and wait for them to show up under Zookeeper's >> live_nodes >> 3) Call admin/collections?action=RESTORE to put all the collections back >> >> This has been working very well, for the most part, with the following >> complications/observations: >> >> 1) If I don't optimize each collection right before BACKUP, the backup >> fails (see the attached solr_backup_error.json). >> 2) If I don't specify a replicationFactor during RESTORE, the admin >> interface's Cloud diagram only shows one active node per collection. Is >> this expected? Am I required to specify the replicationFactor unless I'm >> using a shared HDFS volume for solr data? >> 3) If I don't specify maxShardsPerNode=1 during RESTORE, I get a warning >> message in the response, even though the restore seems to succeed. >> 4) Aside from the replicationFactor parameter on the CREATE/RESTORE, I do >> not currently have any replication stuff configured (as it seems I should >> not). >> 5) At the time my "1-in-3 requests are failing" issue occurred, the Cloud >> diagram looked like the attached solr_admin_cloud_diagram.png. It seemed to >> think all replicas were live and synced and happy, and because I was >> accessing solr through a round-robin load balancer, I was never able to >> tell which node was out of sync. >> >> If it happens again, I'll make node-by-node requests and try to figure >> out what's different about the failing one. But the fact that this happened >> (and the way it happened) is making me wonder if/how I can automate this >> automated staging environment scaling reliably and with confidence that it >> will Just Work™. >> >> Comments and suggestions would be GREATLY appreciated. >> >> Michael >> >> >> >> On Tue, Aug 1, 2017 at 8:14 PM, Erick Erickson >> wrote: >> >>> And please do not use optimize unless your index is >>> totally static. I only recommend it when the pattern is >>> to update the index periodically, like every day or >>> something and not update any docs in between times. >>> >>> Implied in Shawn's e-mail was that you should undo >>> anything you've done in terms of configuring replication, >>> just go with the defaults. >>> >>> Finally, my bet is that your problematic Solr node is misconfigured. >>> >>> Best, >>> Erick >>> >>> On Tue, Aug 1, 2017 at 2:36 PM, Shawn Heisey >>> wrote: >>> > On 8/1/2017 12:09 PM, Michael B. Klein wrote: >>> >> I have a 3-node solrcloud cluster orchestrated by zookeeper. Most >>> stuff >>> >> seems to be working OK, except that one of the nodes never seems to >>> get its >>> >> replica updated. >>> >> >>> >> Queries take place through a non-caching, round-robin load balancer. >>> The >>> >> collection looks fine, with one shard and a replicationFactor of 3. >>> >> Everything in the cloud diagram is green. >>> >> >>> >> But if I (for example) select?q=id:hd76s004z, the results come up >>> empty 1 >>> >> out of every 3 times. >>> >> >>> >> Even several minutes after a commit and optimize, one replica still >>> isn’t >>> >> returning the right info. >>> >> >>> >> Do I need to configure my `solrconfig.xml` with `replicateAfter` >>> options on >>> >> the `/replication` requestHandler, or is that a non-solrcloud, >>> >> standalone-replication thing? >>> > >>> > This is one of the more confusing aspects of SolrCloud. >>> > >>> > When everything is working perfectly in a SolrCloud install, the >>> feature >>> > in Solr called "replication" is *never* used. SolrCloud does require >>> > the replication feature, though ... which is what makes this whole >>> thing >>> > very confusing. >>> > >>> > Replication is used to replicate an entire Lucene index (consisting of >>> a >>> > bunch of files on the disk) from a core on a master server to a core on >>> > a slave server.
Re: Replication Question
Another observation: After bringing the cluster back up just now, the "1-in-3 nodes don't get the updates" issue persists, even with the cloud diagram showing 3 nodes, all green. On Wed, Aug 2, 2017 at 9:56 AM, Michael B. Kleinwrote: > Thanks for your responses, Shawn and Erick. > > Some clarification questions, but first a description of my (non-standard) > use case: > > My Zookeeper/SolrCloud cluster is running on Amazon AWS. Things are > working well so far on the production cluster (knock wood); its the staging > cluster that's giving me fits. Here's why: In order to save money, I have > the AWS auto-scaler scale the cluster down to zero nodes when it's not in > use. Here's the (automated) procedure: > > SCALE DOWN > 1) Call admin/collections?action=BACKUP for each collection to a shared > NFS volume > 2) Shut down all the nodes > > SCALE UP > 1) Spin up 2 Zookeeper nodes and wait for them to stabilize > 2) Spin up 3 Solr nodes and wait for them to show up under Zookeeper's > live_nodes > 3) Call admin/collections?action=RESTORE to put all the collections back > > This has been working very well, for the most part, with the following > complications/observations: > > 1) If I don't optimize each collection right before BACKUP, the backup > fails (see the attached solr_backup_error.json). > 2) If I don't specify a replicationFactor during RESTORE, the admin > interface's Cloud diagram only shows one active node per collection. Is > this expected? Am I required to specify the replicationFactor unless I'm > using a shared HDFS volume for solr data? > 3) If I don't specify maxShardsPerNode=1 during RESTORE, I get a warning > message in the response, even though the restore seems to succeed. > 4) Aside from the replicationFactor parameter on the CREATE/RESTORE, I do > not currently have any replication stuff configured (as it seems I should > not). > 5) At the time my "1-in-3 requests are failing" issue occurred, the Cloud > diagram looked like the attached solr_admin_cloud_diagram.png. It seemed to > think all replicas were live and synced and happy, and because I was > accessing solr through a round-robin load balancer, I was never able to > tell which node was out of sync. > > If it happens again, I'll make node-by-node requests and try to figure out > what's different about the failing one. But the fact that this happened > (and the way it happened) is making me wonder if/how I can automate this > automated staging environment scaling reliably and with confidence that it > will Just Work™. > > Comments and suggestions would be GREATLY appreciated. > > Michael > > > > On Tue, Aug 1, 2017 at 8:14 PM, Erick Erickson > wrote: > >> And please do not use optimize unless your index is >> totally static. I only recommend it when the pattern is >> to update the index periodically, like every day or >> something and not update any docs in between times. >> >> Implied in Shawn's e-mail was that you should undo >> anything you've done in terms of configuring replication, >> just go with the defaults. >> >> Finally, my bet is that your problematic Solr node is misconfigured. >> >> Best, >> Erick >> >> On Tue, Aug 1, 2017 at 2:36 PM, Shawn Heisey wrote: >> > On 8/1/2017 12:09 PM, Michael B. Klein wrote: >> >> I have a 3-node solrcloud cluster orchestrated by zookeeper. Most stuff >> >> seems to be working OK, except that one of the nodes never seems to >> get its >> >> replica updated. >> >> >> >> Queries take place through a non-caching, round-robin load balancer. >> The >> >> collection looks fine, with one shard and a replicationFactor of 3. >> >> Everything in the cloud diagram is green. >> >> >> >> But if I (for example) select?q=id:hd76s004z, the results come up >> empty 1 >> >> out of every 3 times. >> >> >> >> Even several minutes after a commit and optimize, one replica still >> isn’t >> >> returning the right info. >> >> >> >> Do I need to configure my `solrconfig.xml` with `replicateAfter` >> options on >> >> the `/replication` requestHandler, or is that a non-solrcloud, >> >> standalone-replication thing? >> > >> > This is one of the more confusing aspects of SolrCloud. >> > >> > When everything is working perfectly in a SolrCloud install, the feature >> > in Solr called "replication" is *never* used. SolrCloud does require >> > the replication feature, though ... which is what makes this whole thing >> > very confusing. >> > >> > Replication is used to replicate an entire Lucene index (consisting of a >> > bunch of files on the disk) from a core on a master server to a core on >> > a slave server. This is how replication was done before SolrCloud was >> > created. >> > >> > The way that SolrCloud keeps replicas in sync is *entirely* different. >> > SolrCloud has no masters and no slaves. When you index or delete a >> > document in a SolrCloud collection, the request is forwarded to the >> > leader of the correct shard for that
Re: Replication Question
Thanks for your responses, Shawn and Erick. Some clarification questions, but first a description of my (non-standard) use case: My Zookeeper/SolrCloud cluster is running on Amazon AWS. Things are working well so far on the production cluster (knock wood); its the staging cluster that's giving me fits. Here's why: In order to save money, I have the AWS auto-scaler scale the cluster down to zero nodes when it's not in use. Here's the (automated) procedure: SCALE DOWN 1) Call admin/collections?action=BACKUP for each collection to a shared NFS volume 2) Shut down all the nodes SCALE UP 1) Spin up 2 Zookeeper nodes and wait for them to stabilize 2) Spin up 3 Solr nodes and wait for them to show up under Zookeeper's live_nodes 3) Call admin/collections?action=RESTORE to put all the collections back This has been working very well, for the most part, with the following complications/observations: 1) If I don't optimize each collection right before BACKUP, the backup fails (see the attached solr_backup_error.json). 2) If I don't specify a replicationFactor during RESTORE, the admin interface's Cloud diagram only shows one active node per collection. Is this expected? Am I required to specify the replicationFactor unless I'm using a shared HDFS volume for solr data? 3) If I don't specify maxShardsPerNode=1 during RESTORE, I get a warning message in the response, even though the restore seems to succeed. 4) Aside from the replicationFactor parameter on the CREATE/RESTORE, I do not currently have any replication stuff configured (as it seems I should not). 5) At the time my "1-in-3 requests are failing" issue occurred, the Cloud diagram looked like the attached solr_admin_cloud_diagram.png. It seemed to think all replicas were live and synced and happy, and because I was accessing solr through a round-robin load balancer, I was never able to tell which node was out of sync. If it happens again, I'll make node-by-node requests and try to figure out what's different about the failing one. But the fact that this happened (and the way it happened) is making me wonder if/how I can automate this automated staging environment scaling reliably and with confidence that it will Just Work™. Comments and suggestions would be GREATLY appreciated. Michael On Tue, Aug 1, 2017 at 8:14 PM, Erick Ericksonwrote: > And please do not use optimize unless your index is > totally static. I only recommend it when the pattern is > to update the index periodically, like every day or > something and not update any docs in between times. > > Implied in Shawn's e-mail was that you should undo > anything you've done in terms of configuring replication, > just go with the defaults. > > Finally, my bet is that your problematic Solr node is misconfigured. > > Best, > Erick > > On Tue, Aug 1, 2017 at 2:36 PM, Shawn Heisey wrote: > > On 8/1/2017 12:09 PM, Michael B. Klein wrote: > >> I have a 3-node solrcloud cluster orchestrated by zookeeper. Most stuff > >> seems to be working OK, except that one of the nodes never seems to get > its > >> replica updated. > >> > >> Queries take place through a non-caching, round-robin load balancer. The > >> collection looks fine, with one shard and a replicationFactor of 3. > >> Everything in the cloud diagram is green. > >> > >> But if I (for example) select?q=id:hd76s004z, the results come up empty > 1 > >> out of every 3 times. > >> > >> Even several minutes after a commit and optimize, one replica still > isn’t > >> returning the right info. > >> > >> Do I need to configure my `solrconfig.xml` with `replicateAfter` > options on > >> the `/replication` requestHandler, or is that a non-solrcloud, > >> standalone-replication thing? > > > > This is one of the more confusing aspects of SolrCloud. > > > > When everything is working perfectly in a SolrCloud install, the feature > > in Solr called "replication" is *never* used. SolrCloud does require > > the replication feature, though ... which is what makes this whole thing > > very confusing. > > > > Replication is used to replicate an entire Lucene index (consisting of a > > bunch of files on the disk) from a core on a master server to a core on > > a slave server. This is how replication was done before SolrCloud was > > created. > > > > The way that SolrCloud keeps replicas in sync is *entirely* different. > > SolrCloud has no masters and no slaves. When you index or delete a > > document in a SolrCloud collection, the request is forwarded to the > > leader of the correct shard for that document. The leader then sends a > > copy of that request to all the other replicas, and each replica > > (including the leader) independently handles the updates that are in the > > request. Since all replicas index the same content, they stay in sync. > > > > What SolrCloud does with the replication feature is index recovery. In > > some situations recovery can be done from the leader's transaction log, > >
Re: Replication Question
And please do not use optimize unless your index is totally static. I only recommend it when the pattern is to update the index periodically, like every day or something and not update any docs in between times. Implied in Shawn's e-mail was that you should undo anything you've done in terms of configuring replication, just go with the defaults. Finally, my bet is that your problematic Solr node is misconfigured. Best, Erick On Tue, Aug 1, 2017 at 2:36 PM, Shawn Heiseywrote: > On 8/1/2017 12:09 PM, Michael B. Klein wrote: >> I have a 3-node solrcloud cluster orchestrated by zookeeper. Most stuff >> seems to be working OK, except that one of the nodes never seems to get its >> replica updated. >> >> Queries take place through a non-caching, round-robin load balancer. The >> collection looks fine, with one shard and a replicationFactor of 3. >> Everything in the cloud diagram is green. >> >> But if I (for example) select?q=id:hd76s004z, the results come up empty 1 >> out of every 3 times. >> >> Even several minutes after a commit and optimize, one replica still isn’t >> returning the right info. >> >> Do I need to configure my `solrconfig.xml` with `replicateAfter` options on >> the `/replication` requestHandler, or is that a non-solrcloud, >> standalone-replication thing? > > This is one of the more confusing aspects of SolrCloud. > > When everything is working perfectly in a SolrCloud install, the feature > in Solr called "replication" is *never* used. SolrCloud does require > the replication feature, though ... which is what makes this whole thing > very confusing. > > Replication is used to replicate an entire Lucene index (consisting of a > bunch of files on the disk) from a core on a master server to a core on > a slave server. This is how replication was done before SolrCloud was > created. > > The way that SolrCloud keeps replicas in sync is *entirely* different. > SolrCloud has no masters and no slaves. When you index or delete a > document in a SolrCloud collection, the request is forwarded to the > leader of the correct shard for that document. The leader then sends a > copy of that request to all the other replicas, and each replica > (including the leader) independently handles the updates that are in the > request. Since all replicas index the same content, they stay in sync. > > What SolrCloud does with the replication feature is index recovery. In > some situations recovery can be done from the leader's transaction log, > but when a replica has gotten so far out of sync that the only option > available is to completely replace the index on the bad replica, > SolrCloud will fire up the replication feature and create an exact copy > of the index from the replica that is currently elected as leader. > SolrCloud temporarily designates the leader core as master and the bad > replica as slave, then initiates a one-time replication. This is all > completely automated and requires no configuration or input from the > administrator. > > The configuration elements you have asked about are for the old > master-slave replication setup and do not apply to SolrCloud at all. > > What I would recommend that you do to solve your immediate issue: Shut > down the Solr instance that is having the problem, rename the "data" > directory in the core that isn't working right to something else, and > start Solr back up. As long as you still have at least one good replica > in the cloud, SolrCloud will see that the index data is gone and copy > the index from the leader. You could delete the data directory instead > of renaming it, but that would leave you with no "undo" option. > > Thanks, > Shawn >
Re: Replication Question
On 8/1/2017 12:09 PM, Michael B. Klein wrote: > I have a 3-node solrcloud cluster orchestrated by zookeeper. Most stuff > seems to be working OK, except that one of the nodes never seems to get its > replica updated. > > Queries take place through a non-caching, round-robin load balancer. The > collection looks fine, with one shard and a replicationFactor of 3. > Everything in the cloud diagram is green. > > But if I (for example) select?q=id:hd76s004z, the results come up empty 1 > out of every 3 times. > > Even several minutes after a commit and optimize, one replica still isn’t > returning the right info. > > Do I need to configure my `solrconfig.xml` with `replicateAfter` options on > the `/replication` requestHandler, or is that a non-solrcloud, > standalone-replication thing? This is one of the more confusing aspects of SolrCloud. When everything is working perfectly in a SolrCloud install, the feature in Solr called "replication" is *never* used. SolrCloud does require the replication feature, though ... which is what makes this whole thing very confusing. Replication is used to replicate an entire Lucene index (consisting of a bunch of files on the disk) from a core on a master server to a core on a slave server. This is how replication was done before SolrCloud was created. The way that SolrCloud keeps replicas in sync is *entirely* different. SolrCloud has no masters and no slaves. When you index or delete a document in a SolrCloud collection, the request is forwarded to the leader of the correct shard for that document. The leader then sends a copy of that request to all the other replicas, and each replica (including the leader) independently handles the updates that are in the request. Since all replicas index the same content, they stay in sync. What SolrCloud does with the replication feature is index recovery. In some situations recovery can be done from the leader's transaction log, but when a replica has gotten so far out of sync that the only option available is to completely replace the index on the bad replica, SolrCloud will fire up the replication feature and create an exact copy of the index from the replica that is currently elected as leader. SolrCloud temporarily designates the leader core as master and the bad replica as slave, then initiates a one-time replication. This is all completely automated and requires no configuration or input from the administrator. The configuration elements you have asked about are for the old master-slave replication setup and do not apply to SolrCloud at all. What I would recommend that you do to solve your immediate issue: Shut down the Solr instance that is having the problem, rename the "data" directory in the core that isn't working right to something else, and start Solr back up. As long as you still have at least one good replica in the cloud, SolrCloud will see that the index data is gone and copy the index from the leader. You could delete the data directory instead of renaming it, but that would leave you with no "undo" option. Thanks, Shawn
Replication Question
I have a 3-node solrcloud cluster orchestrated by zookeeper. Most stuff seems to be working OK, except that one of the nodes never seems to get its replica updated. Queries take place through a non-caching, round-robin load balancer. The collection looks fine, with one shard and a replicationFactor of 3. Everything in the cloud diagram is green. But if I (for example) select?q=id:hd76s004z, the results come up empty 1 out of every 3 times. Even several minutes after a commit and optimize, one replica still isn’t returning the right info. Do I need to configure my `solrconfig.xml` with `replicateAfter` options on the `/replication` requestHandler, or is that a non-solrcloud, standalone-replication thing? Michael
SOLR replication question?
I am currently using SOLR 4.4. but not planning to use solrcloud in very near future. I have 3 master / 3 slave setup. Each master is linked to its corresponding slave.. I have disabled auto polling.. We do both push (using MQ) and pull indexing using SOLRJ indexing program. I have enabled soft commit in slave (to view the changes immediately pushed by queue). I am thinking of doing the batch indexing in master (optimize and hard commit) and push indexing in both master / slave. I am trying to do more testing with my configuration but thought of getting to know some answers before diving very deep... Since the queue pushes the docs in master / slave there is a possibility of slave having more record compared to master (when master is busy doing batch indexing).. What would happen if the slave has additional segments compared to Master. will that be deleted when the replication happens. If a message is pushed from a queue to both master and slave during replication, will there be a latency in seeing that document even if we use softcommit in slave? We want to make sure that we are not missing any documents from queue (since its updated via UI and we don't really store that data anywhere except in index). -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-replication-question-tp4081161.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR replication question?
I am currently using SOLR 4.4. but not planning to use solrcloud in very near future. I have 3 master / 3 slave setup. Each master is linked to its corresponding slave.. I have disabled auto polling.. We do both push (using MQ) and pull indexing using SOLRJ indexing program. I have enabled soft commit in slave (to view the changes immediately pushed by queue). I am thinking of doing the batch indexing in master (optimize and hard commit) and push indexing in both master / slave. I am trying to do more testing with my configuration but thought of getting to know some answers before diving very deep... Since the queue pushes the docs in master / slave there is a possibility of slave having more record compared to master (when master is busy doing batch indexing).. What would happen if the slave has additional segments compared to Master. will that be deleted when the replication happens. If a message is pushed from a queue to both master and slave during replication, will there be a latency in seeing that document even if we use softcommit in slave? We want to make sure that we are not missing any documents from queue (since its updated via UI and we don't really store that data anywhere except in index) If you are doing replication, then all updates must go to the master server. You cannot update the slave directly. The replication happens, the slave will be identical to the master... Any documents aent to only the slave will be lost. Replication will happen according to the interval you have configured, or since you say you have disabled polling, according to whatever schedule you manually trigger a replication. SolrCloud would probably be a better fit for you. With a properly configured SolrCloud you just index to any host in the cloud and documents end up exactly where they need to go, and all replicas get updated. Thanks, Shawn
Re: Solr HTTP Replication Question
Okay one last note... just for closure... looks like it was addressed in solr 4.1+ (I was looking at 4.0). On Thu, Jan 24, 2013 at 11:14 PM, Amit Nithian anith...@gmail.com wrote: Okay so after some debugging I found the problem. While the replication piece will download the index from the master server and move the files to the index directory but during the commit phase, these older generation files are deleted and the index is essentially left in tact. I noticed that a full copy is needed if the index is stale (meaning that files in common between the master and slave have different sizes) but also I think a full copy should be needed if the slaves generation is higher than the master as well. In short, to me it's not sufficient enough to simply say a full copy is needed if the slave's index version is = master's index version. I'll create a patch and file a bug along with a more thorough writeup of how I got in this state. Thanks! Amit On Thu, Jan 24, 2013 at 2:33 PM, Amit Nithian anith...@gmail.com wrote: Does Solr's replication look at the generation difference between master and slave when determining whether or not to replicate? To be more clear: What happens if a slave's generation is higher than the master yet the slave's index version is less than the master's index version? I looked at the source and didn't seem to see any reason why the generation matters other than fetching the file list from the master for a given generation. It's too wordy to explain how this happened so I'll go into details on that if anyone cares. Thanks! Amit
Re: Solr HTTP Replication Question
Okay so after some debugging I found the problem. While the replication piece will download the index from the master server and move the files to the index directory but during the commit phase, these older generation files are deleted and the index is essentially left in tact. I noticed that a full copy is needed if the index is stale (meaning that files in common between the master and slave have different sizes) but also I think a full copy should be needed if the slaves generation is higher than the master as well. In short, to me it's not sufficient enough to simply say a full copy is needed if the slave's index version is = master's index version. I'll create a patch and file a bug along with a more thorough writeup of how I got in this state. Thanks! Amit On Thu, Jan 24, 2013 at 2:33 PM, Amit Nithian anith...@gmail.com wrote: Does Solr's replication look at the generation difference between master and slave when determining whether or not to replicate? To be more clear: What happens if a slave's generation is higher than the master yet the slave's index version is less than the master's index version? I looked at the source and didn't seem to see any reason why the generation matters other than fetching the file list from the master for a given generation. It's too wordy to explain how this happened so I'll go into details on that if anyone cares. Thanks! Amit
Re: SolrCloud replication question
Hi, Interesting article in your link. What servlet container do you use and how is it configured wrt. threads etc? You should be able to utilize all CPUs with a single Solr index, given that you are not I/O bound.. Also, what is your mergeFactor? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 9. juli 2012, at 22:11, avenka wrote: Hmm, never mind my question about replicating using symlinks. Given that replication on a single machine improves throughput, I should be able to get a similar improvement by simply sharding on a single machine. As also observed at http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/ I am now benchmarking my workload to compare replication vs. sharding performance on a single machine. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761p3994017.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud replication question
The symlink thing sounds... complicated, but as you say you're going another route The indexing speed you're seeing is surprisingly slow, I'd get to the root of the timeouts before giving up. SolrCloud simply _can't_ be that slow by design, something about your setup is causing that I suspect. The timeouts you're seeing are certainly a clue here. Incoming updates have a couple of things happen 1 the incoming request is pulled apart. Any docs for this shard are indexed and forwarded to any replicas. 2 any docs that are for a different shard are packed up and forwarded to the leader for that shard. Which in turn distributes them to any replicas. So I _suspect_ that indexing will be a bit slower, there's some additional communication going on. But not _that_ much slower. Any clue what your slow server is doing that would cause it to timeout? Best Erick On Mon, Jul 9, 2012 at 4:11 PM, avenka ave...@gmail.com wrote: Hmm, never mind my question about replicating using symlinks. Given that replication on a single machine improves throughput, I should be able to get a similar improvement by simply sharding on a single machine. As also observed at http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/ I am now benchmarking my workload to compare replication vs. sharding performance on a single machine. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761p3994017.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud replication question
No, you're misunderstanding the setup. Each replica has a complete index. Updates get automatically forwarded to _both_ nodes for a particular shard. So, when a doc comes in to be indexed, it gets sent to the leader for, say, shard1. From there: 1 it gets indexed on the leader 2 it gets forwarded to the replica(s) where it gets indexed locally. Each replica has a complete index (for that shard). There is no master/slave setup any more. And you do _not_ have to configure replication. Best Erick On Sun, Jul 8, 2012 at 1:03 PM, avenka ave...@gmail.com wrote: I am trying to wrap my head around replication in SolrCloud. I tried the setup at http://wiki.apache.org/solr/SolrCloud/. I mainly need replication for high query throughput. The setup at the URL above appears to maintain just one copy of the index at the primary node (instead of a replicated index as in a master/slave configuration). Will I still get roughly an n-fold increase in query throughput with n replicas? And if so, why would one do master/slave replication with multiple copies of the index at all? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud replication question
Erick, thanks. I now do see segment files in an index.timestamp directory at the replicas. Not sure why they were not getting populated earlier. I have a couple more questions, the second is more elaborate - let me know if I should move it to a separate thread. (1) The speed of adding documents in SolrCloud is excruciatingly slow. It takes about 30-50 seconds to add a batch of 100 documents (and about twice that to add 200, etc.) to the primary but just ~10 seconds to add 5K documents in batches of 200 on a standalone solr 4 server. The log files indicate that the primary is timing out with messages like below and Cloud-Graph in the UI shows the other two replicas in orange after starting green. org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://localhost:7574/solr Any idea why? (3) I am seriously considering using symbolic links for a replicated solr setup with completely independent instances on a *single machine*. Tell me if I am thinking about this incorrectly. Here is my reasoning: (a) Master/slave replication in 3.6 simply seems old school as it doesn't have the nice consistency properties of SolrCloud. Polling say every 20 seconds means I don't know exactly how up-to-speed each replica is, which will complicate my request re-distribution. (b) SolrCloud seems like a great alternative to master/slave replication. But it seems slow (see 1) and having played with it, I don't feel comfortable with the maturity of ZK integration (or my comprehension of it) in solr 4 alpha. (c) Symbolic links seem like the fastest and most space-efficient solution *provided* there is only a single writer, which is just fine for me. I plan to run completely separate solr instances with one designated as the primary and do the following operations in sequence: Add a batch to the primary and commit -- From each replica's index directory, remove all symlinks and re-create symlinks to segment files in the primary (but not the write.lock file) -- Call update?commit=true to force replicas to re-load their in-memory index -- Do whatever read-only processing is required on the batch using the primary and all replicas by manually (randomly) distributing read requests -- Repeat sequence. Is there any downside to 3(c) (other than maintaining a trivial script to manage symlinks and call commit)? I tested it on small index sizes and it seems to work fine. The throughput improves with more replicas (for 2-4 replicas) as a single replica is not enough to saturate the machine (due to high query latency). Am I overlooking something in this setup? Overall, I need high throughput and minimal latency from the time a document is added to the time it is available at a replica. SolrCloud's automated request redirection, consistency, and fault-tolerance is awesome for a physically distributed setup, but I don't see how it beats 3(c) in a single-writer, single-machine, replicated setup. AV On Jul 9, 2012, at 9:43 AM, Erick Erickson [via Lucene] wrote: No, you're misunderstanding the setup. Each replica has a complete index. Updates get automatically forwarded to _both_ nodes for a particular shard. So, when a doc comes in to be indexed, it gets sent to the leader for, say, shard1. From there: 1 it gets indexed on the leader 2 it gets forwarded to the replica(s) where it gets indexed locally. Each replica has a complete index (for that shard). There is no master/slave setup any more. And you do _not_ have to configure replication. Best Erick On Sun, Jul 8, 2012 at 1:03 PM, avenka [hidden email] wrote: I am trying to wrap my head around replication in SolrCloud. I tried the setup at http://wiki.apache.org/solr/SolrCloud/. I mainly need replication for high query throughput. The setup at the URL above appears to maintain just one copy of the index at the primary node (instead of a replicated index as in a master/slave configuration). Will I still get roughly an n-fold increase in query throughput with n replicas? And if so, why would one do master/slave replication with multiple copies of the index at all? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761p3993889.html To unsubscribe from SolrCloud replication question, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761p3993960.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud replication question
Hmm, never mind my question about replicating using symlinks. Given that replication on a single machine improves throughput, I should be able to get a similar improvement by simply sharding on a single machine. As also observed at http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/ I am now benchmarking my workload to compare replication vs. sharding performance on a single machine. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761p3994017.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud replication question
I am trying to wrap my head around replication in SolrCloud. I tried the setup at http://wiki.apache.org/solr/SolrCloud/. I mainly need replication for high query throughput. The setup at the URL above appears to maintain just one copy of the index at the primary node (instead of a replicated index as in a master/slave configuration). Will I still get roughly an n-fold increase in query throughput with n replicas? And if so, why would one do master/slave replication with multiple copies of the index at all? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Simple Slave Replication Question
Hello, Had to leave the office so didn't get a chance to reply. Nothing in the logs. Just ran one through from the ingest tool. Same results full copy of the index. Is it something to do with: server.commit(); server.optimize(); I call this at the end of the ingestion. Would optimize then work across the whole index? Thanks Ben -Original Message- From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] Sent: 23 March 2012 15:10 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question Also, what happens if, instead of adding the 40K docs you add just one and commit? 2012/3/23 Tomás Fernández Löbbe tomasflo...@gmail.com Have you changed the mergeFactor or are you using 10 as in the example solrconfig? What do you see in the slave's log during replication? Do you see any line like Skipping download for...? On Fri, Mar 23, 2012 at 11:57 AM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: I just have a index directory. I push the documents through with a change to a field. Im using SOLRJ to do this. Im using the guide from the wiki to setup the replication. When the feed of updates to the master finishes I call a commit again using SOLRJ. I then have a poll period of 5 minutes from the slave. When it kicks in I see a new version of the index and then it copys the full 5gb index. Thanks Ben -Original Message- From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] Sent: 23 March 2012 14:29 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question Hi Ben, only new segments are replicated from master to slave. In a situation where all the segments are new, this will cause the index to be fully replicated, but this rarely happen with incremental updates. It can also happen if the slave Solr assumes it has an invalid index. Are you committing or optimizing on the slaves? After replication, the index directory on the slaves is called index or index.timestamp? Tomás On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: So do you just simpy address this with big nic and network pipes. -Original Message- From: Martin Koch [mailto:m...@issuu.com] Sent: 23 March 2012 14:07 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question I guess this would depend on network bandwidth, but we move around 150G/hour when hooking up a new slave to the master. /Martin On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: Hello, Im looking at the replication from a master to a number of slaves. I have configured it and it appears to be working. When updating 40K records on the master is it standard to always copy over the full index, currently 5gb in size. If this is standard what do people do who have massive 200gb indexs, does it not take a while to bring the slaves inline with the master? Thanks Ben This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses. This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses. This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses. This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House
Re: Simple Slave Replication Question
It's the optimize step. Optimize essentially forces all the segments to be copied into a single new segment, which means that your entire index will be replicated to the slaves. In recent Solrs, there's usually no need to optimize, so unless and until you can demonstrate a noticeable change, I'd just leave the optimize step off. In fact, trunk renames it to forceMerge or something just because it's so common for people to think of course I want to optimize my index! and get the unintended consequences you're seeing even thought the optimize doesn't actually do that much good in most cases. Some people just do the optimize once a day (or week or whatever) during off-peak hours as a compromise. Best Erick On Mon, Mar 26, 2012 at 5:02 AM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: Hello, Had to leave the office so didn't get a chance to reply. Nothing in the logs. Just ran one through from the ingest tool. Same results full copy of the index. Is it something to do with: server.commit(); server.optimize(); I call this at the end of the ingestion. Would optimize then work across the whole index? Thanks Ben -Original Message- From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] Sent: 23 March 2012 15:10 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question Also, what happens if, instead of adding the 40K docs you add just one and commit? 2012/3/23 Tomás Fernández Löbbe tomasflo...@gmail.com Have you changed the mergeFactor or are you using 10 as in the example solrconfig? What do you see in the slave's log during replication? Do you see any line like Skipping download for...? On Fri, Mar 23, 2012 at 11:57 AM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: I just have a index directory. I push the documents through with a change to a field. Im using SOLRJ to do this. Im using the guide from the wiki to setup the replication. When the feed of updates to the master finishes I call a commit again using SOLRJ. I then have a poll period of 5 minutes from the slave. When it kicks in I see a new version of the index and then it copys the full 5gb index. Thanks Ben -Original Message- From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] Sent: 23 March 2012 14:29 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question Hi Ben, only new segments are replicated from master to slave. In a situation where all the segments are new, this will cause the index to be fully replicated, but this rarely happen with incremental updates. It can also happen if the slave Solr assumes it has an invalid index. Are you committing or optimizing on the slaves? After replication, the index directory on the slaves is called index or index.timestamp? Tomás On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: So do you just simpy address this with big nic and network pipes. -Original Message- From: Martin Koch [mailto:m...@issuu.com] Sent: 23 March 2012 14:07 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question I guess this would depend on network bandwidth, but we move around 150G/hour when hooking up a new slave to the master. /Martin On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: Hello, Im looking at the replication from a master to a number of slaves. I have configured it and it appears to be working. When updating 40K records on the master is it standard to always copy over the full index, currently 5gb in size. If this is standard what do people do who have massive 200gb indexs, does it not take a while to bring the slaves inline with the master? Thanks Ben This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses. This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify
RE: Simple Slave Replication Question
That's great information. Thanks for all the help and guidance, its been invaluable. Thanks Ben -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 26 March 2012 12:21 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question It's the optimize step. Optimize essentially forces all the segments to be copied into a single new segment, which means that your entire index will be replicated to the slaves. In recent Solrs, there's usually no need to optimize, so unless and until you can demonstrate a noticeable change, I'd just leave the optimize step off. In fact, trunk renames it to forceMerge or something just because it's so common for people to think of course I want to optimize my index! and get the unintended consequences you're seeing even thought the optimize doesn't actually do that much good in most cases. Some people just do the optimize once a day (or week or whatever) during off-peak hours as a compromise. Best Erick On Mon, Mar 26, 2012 at 5:02 AM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: Hello, Had to leave the office so didn't get a chance to reply. Nothing in the logs. Just ran one through from the ingest tool. Same results full copy of the index. Is it something to do with: server.commit(); server.optimize(); I call this at the end of the ingestion. Would optimize then work across the whole index? Thanks Ben -Original Message- From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] Sent: 23 March 2012 15:10 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question Also, what happens if, instead of adding the 40K docs you add just one and commit? 2012/3/23 Tomás Fernández Löbbe tomasflo...@gmail.com Have you changed the mergeFactor or are you using 10 as in the example solrconfig? What do you see in the slave's log during replication? Do you see any line like Skipping download for...? On Fri, Mar 23, 2012 at 11:57 AM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: I just have a index directory. I push the documents through with a change to a field. Im using SOLRJ to do this. Im using the guide from the wiki to setup the replication. When the feed of updates to the master finishes I call a commit again using SOLRJ. I then have a poll period of 5 minutes from the slave. When it kicks in I see a new version of the index and then it copys the full 5gb index. Thanks Ben -Original Message- From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] Sent: 23 March 2012 14:29 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question Hi Ben, only new segments are replicated from master to slave. In a situation where all the segments are new, this will cause the index to be fully replicated, but this rarely happen with incremental updates. It can also happen if the slave Solr assumes it has an invalid index. Are you committing or optimizing on the slaves? After replication, the index directory on the slaves is called index or index.timestamp? Tomás On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: So do you just simpy address this with big nic and network pipes. -Original Message- From: Martin Koch [mailto:m...@issuu.com] Sent: 23 March 2012 14:07 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question I guess this would depend on network bandwidth, but we move around 150G/hour when hooking up a new slave to the master. /Martin On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: Hello, Im looking at the replication from a master to a number of slaves. I have configured it and it appears to be working. When updating 40K records on the master is it standard to always copy over the full index, currently 5gb in size. If this is standard what do people do who have massive 200gb indexs, does it not take a while to bring the slaves inline with the master? Thanks Ben This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses. This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6
Simple Slave Replication Question
Hello, Im looking at the replication from a master to a number of slaves. I have configured it and it appears to be working. When updating 40K records on the master is it standard to always copy over the full index, currently 5gb in size. If this is standard what do people do who have massive 200gb indexs, does it not take a while to bring the slaves inline with the master? Thanks Ben This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.
Re: Simple Slave Replication Question
I guess this would depend on network bandwidth, but we move around 150G/hour when hooking up a new slave to the master. /Martin On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: Hello, Im looking at the replication from a master to a number of slaves. I have configured it and it appears to be working. When updating 40K records on the master is it standard to always copy over the full index, currently 5gb in size. If this is standard what do people do who have massive 200gb indexs, does it not take a while to bring the slaves inline with the master? Thanks Ben This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.
RE: Simple Slave Replication Question
So do you just simpy address this with big nic and network pipes. -Original Message- From: Martin Koch [mailto:m...@issuu.com] Sent: 23 March 2012 14:07 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question I guess this would depend on network bandwidth, but we move around 150G/hour when hooking up a new slave to the master. /Martin On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: Hello, Im looking at the replication from a master to a number of slaves. I have configured it and it appears to be working. When updating 40K records on the master is it standard to always copy over the full index, currently 5gb in size. If this is standard what do people do who have massive 200gb indexs, does it not take a while to bring the slaves inline with the master? Thanks Ben This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses. This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.
Re: Simple Slave Replication Question
Hi Ben, only new segments are replicated from master to slave. In a situation where all the segments are new, this will cause the index to be fully replicated, but this rarely happen with incremental updates. It can also happen if the slave Solr assumes it has an invalid index. Are you committing or optimizing on the slaves? After replication, the index directory on the slaves is called index or index.timestamp? Tomás On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: So do you just simpy address this with big nic and network pipes. -Original Message- From: Martin Koch [mailto:m...@issuu.com] Sent: 23 March 2012 14:07 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question I guess this would depend on network bandwidth, but we move around 150G/hour when hooking up a new slave to the master. /Martin On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: Hello, Im looking at the replication from a master to a number of slaves. I have configured it and it appears to be working. When updating 40K records on the master is it standard to always copy over the full index, currently 5gb in size. If this is standard what do people do who have massive 200gb indexs, does it not take a while to bring the slaves inline with the master? Thanks Ben This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses. This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.
RE: Simple Slave Replication Question
I just have a index directory. I push the documents through with a change to a field. Im using SOLRJ to do this. Im using the guide from the wiki to setup the replication. When the feed of updates to the master finishes I call a commit again using SOLRJ. I then have a poll period of 5 minutes from the slave. When it kicks in I see a new version of the index and then it copys the full 5gb index. Thanks Ben -Original Message- From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] Sent: 23 March 2012 14:29 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question Hi Ben, only new segments are replicated from master to slave. In a situation where all the segments are new, this will cause the index to be fully replicated, but this rarely happen with incremental updates. It can also happen if the slave Solr assumes it has an invalid index. Are you committing or optimizing on the slaves? After replication, the index directory on the slaves is called index or index.timestamp? Tomás On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: So do you just simpy address this with big nic and network pipes. -Original Message- From: Martin Koch [mailto:m...@issuu.com] Sent: 23 March 2012 14:07 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question I guess this would depend on network bandwidth, but we move around 150G/hour when hooking up a new slave to the master. /Martin On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: Hello, Im looking at the replication from a master to a number of slaves. I have configured it and it appears to be working. When updating 40K records on the master is it standard to always copy over the full index, currently 5gb in size. If this is standard what do people do who have massive 200gb indexs, does it not take a while to bring the slaves inline with the master? Thanks Ben This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses. This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses. This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.
Re: Simple Slave Replication Question
Have you changed the mergeFactor or are you using 10 as in the example solrconfig? What do you see in the slave's log during replication? Do you see any line like Skipping download for...? On Fri, Mar 23, 2012 at 11:57 AM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: I just have a index directory. I push the documents through with a change to a field. Im using SOLRJ to do this. Im using the guide from the wiki to setup the replication. When the feed of updates to the master finishes I call a commit again using SOLRJ. I then have a poll period of 5 minutes from the slave. When it kicks in I see a new version of the index and then it copys the full 5gb index. Thanks Ben -Original Message- From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] Sent: 23 March 2012 14:29 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question Hi Ben, only new segments are replicated from master to slave. In a situation where all the segments are new, this will cause the index to be fully replicated, but this rarely happen with incremental updates. It can also happen if the slave Solr assumes it has an invalid index. Are you committing or optimizing on the slaves? After replication, the index directory on the slaves is called index or index.timestamp? Tomás On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: So do you just simpy address this with big nic and network pipes. -Original Message- From: Martin Koch [mailto:m...@issuu.com] Sent: 23 March 2012 14:07 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question I guess this would depend on network bandwidth, but we move around 150G/hour when hooking up a new slave to the master. /Martin On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: Hello, Im looking at the replication from a master to a number of slaves. I have configured it and it appears to be working. When updating 40K records on the master is it standard to always copy over the full index, currently 5gb in size. If this is standard what do people do who have massive 200gb indexs, does it not take a while to bring the slaves inline with the master? Thanks Ben This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses. This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses. This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.
Re: Simple Slave Replication Question
Also, what happens if, instead of adding the 40K docs you add just one and commit? 2012/3/23 Tomás Fernández Löbbe tomasflo...@gmail.com Have you changed the mergeFactor or are you using 10 as in the example solrconfig? What do you see in the slave's log during replication? Do you see any line like Skipping download for...? On Fri, Mar 23, 2012 at 11:57 AM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: I just have a index directory. I push the documents through with a change to a field. Im using SOLRJ to do this. Im using the guide from the wiki to setup the replication. When the feed of updates to the master finishes I call a commit again using SOLRJ. I then have a poll period of 5 minutes from the slave. When it kicks in I see a new version of the index and then it copys the full 5gb index. Thanks Ben -Original Message- From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] Sent: 23 March 2012 14:29 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question Hi Ben, only new segments are replicated from master to slave. In a situation where all the segments are new, this will cause the index to be fully replicated, but this rarely happen with incremental updates. It can also happen if the slave Solr assumes it has an invalid index. Are you committing or optimizing on the slaves? After replication, the index directory on the slaves is called index or index.timestamp? Tomás On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: So do you just simpy address this with big nic and network pipes. -Original Message- From: Martin Koch [mailto:m...@issuu.com] Sent: 23 March 2012 14:07 To: solr-user@lucene.apache.org Subject: Re: Simple Slave Replication Question I guess this would depend on network bandwidth, but we move around 150G/hour when hooking up a new slave to the master. /Martin On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy ben.mccar...@tradermedia.co.uk wrote: Hello, Im looking at the replication from a master to a number of slaves. I have configured it and it appears to be working. When updating 40K records on the master is it standard to always copy over the full index, currently 5gb in size. If this is standard what do people do who have massive 200gb indexs, does it not take a while to bring the slaves inline with the master? Thanks Ben This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses. This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses. This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.
Re: SolrCloud Replication Question
On Feb 14, 2012, at 10:57 PM, Jamie Johnson wrote: Not sure if this is expected or not. Nope - should be already resolved or will be today though. - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
Ok, great. Just wanted to make sure someone was aware. Thanks for looking into this. On Thu, Feb 16, 2012 at 8:26 AM, Mark Miller markrmil...@gmail.com wrote: On Feb 14, 2012, at 10:57 PM, Jamie Johnson wrote: Not sure if this is expected or not. Nope - should be already resolved or will be today though. - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
Has there been any success in replicating this? I'm wondering if it could be something with my setup that is causing the issue... On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote: Yes, I have the following layout on the FS ./bootstrap.sh ./example (standard example directory from distro containing jetty jars, solr confs, solr war, etc) ./slice1 - start.sh -solr.xml - slice1_shard1 - data - slice2_shard2 -data ./slice2 - start.sh - solr.xml -slice2_shard1 -data -slice1_shard2 -data if it matters I'm running everything from localhost, zk and the solr shards On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote: Do you have unique dataDir for each instance? 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti:
Re: SolrCloud Replication Question
Sorry, have not gotten it yet, but will be back trying later today - monday, tuesday tend to be slow for me (meetings and crap). - Mark On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote: Has there been any success in replicating this? I'm wondering if it could be something with my setup that is causing the issue... On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote: Yes, I have the following layout on the FS ./bootstrap.sh ./example (standard example directory from distro containing jetty jars, solr confs, solr war, etc) ./slice1 - start.sh -solr.xml - slice1_shard1 - data - slice2_shard2 -data ./slice2 - start.sh - solr.xml -slice2_shard1 -data -slice1_shard2 -data if it matters I'm running everything from localhost, zk and the solr shards On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote: Do you have unique dataDir for each instance? 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti: - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
Thanks Mark, not a huge rush, just me trying to get to use the latest stuff on our project. On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller markrmil...@gmail.com wrote: Sorry, have not gotten it yet, but will be back trying later today - monday, tuesday tend to be slow for me (meetings and crap). - Mark On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote: Has there been any success in replicating this? I'm wondering if it could be something with my setup that is causing the issue... On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote: Yes, I have the following layout on the FS ./bootstrap.sh ./example (standard example directory from distro containing jetty jars, solr confs, solr war, etc) ./slice1 - start.sh -solr.xml - slice1_shard1 - data - slice2_shard2 -data ./slice2 - start.sh - solr.xml -slice2_shard1 -data -slice1_shard2 -data if it matters I'm running everything from localhost, zk and the solr shards On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote: Do you have unique dataDir for each instance? 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti: - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
Okay Jamie, I think I have a handle on this. It looks like an issue with what config files are being used by cores created with the admin core handler - I think it's just picking up default config and not the correct config for the collection. This means they end up using config that has no UpdateLog defined - and so recovery fails. I've added more logging around this so that it's easy to determine that. I'm investigating more and working on a test + fix. I'll file a JIRA issue soon as well. - Mark On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote: Thanks Mark, not a huge rush, just me trying to get to use the latest stuff on our project. On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller markrmil...@gmail.com wrote: Sorry, have not gotten it yet, but will be back trying later today - monday, tuesday tend to be slow for me (meetings and crap). - Mark On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote: Has there been any success in replicating this? I'm wondering if it could be something with my setup that is causing the issue... On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote: Yes, I have the following layout on the FS ./bootstrap.sh ./example (standard example directory from distro containing jetty jars, solr confs, solr war, etc) ./slice1 - start.sh -solr.xml - slice1_shard1 - data - slice2_shard2 -data ./slice2 - start.sh - solr.xml -slice2_shard1 -data -slice1_shard2 -data if it matters I'm running everything from localhost, zk and the solr shards On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote: Do you have unique dataDir for each instance? 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti: - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
Sounds good, if I pull the latest from trunk and rerun will that be useful or were you able to duplicate my issue now? On Tue, Feb 14, 2012 at 3:00 PM, Mark Miller markrmil...@gmail.com wrote: Okay Jamie, I think I have a handle on this. It looks like an issue with what config files are being used by cores created with the admin core handler - I think it's just picking up default config and not the correct config for the collection. This means they end up using config that has no UpdateLog defined - and so recovery fails. I've added more logging around this so that it's easy to determine that. I'm investigating more and working on a test + fix. I'll file a JIRA issue soon as well. - Mark On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote: Thanks Mark, not a huge rush, just me trying to get to use the latest stuff on our project. On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller markrmil...@gmail.com wrote: Sorry, have not gotten it yet, but will be back trying later today - monday, tuesday tend to be slow for me (meetings and crap). - Mark On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote: Has there been any success in replicating this? I'm wondering if it could be something with my setup that is causing the issue... On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote: Yes, I have the following layout on the FS ./bootstrap.sh ./example (standard example directory from distro containing jetty jars, solr confs, solr war, etc) ./slice1 - start.sh -solr.xml - slice1_shard1 - data - slice2_shard2 -data ./slice2 - start.sh - solr.xml -slice2_shard1 -data -slice1_shard2 -data if it matters I'm running everything from localhost, zk and the solr shards On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote: Do you have unique dataDir for each instance? 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti: - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
Doh - looks like I was just seeing a test issue. Do you mind updating and trying the latest rev? At the least there should be some better logging around the recovery. I'll keep working on tests in the meantime. - Mark On Feb 14, 2012, at 3:15 PM, Jamie Johnson wrote: Sounds good, if I pull the latest from trunk and rerun will that be useful or were you able to duplicate my issue now? On Tue, Feb 14, 2012 at 3:00 PM, Mark Miller markrmil...@gmail.com wrote: Okay Jamie, I think I have a handle on this. It looks like an issue with what config files are being used by cores created with the admin core handler - I think it's just picking up default config and not the correct config for the collection. This means they end up using config that has no UpdateLog defined - and so recovery fails. I've added more logging around this so that it's easy to determine that. I'm investigating more and working on a test + fix. I'll file a JIRA issue soon as well. - Mark On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote: Thanks Mark, not a huge rush, just me trying to get to use the latest stuff on our project. On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller markrmil...@gmail.com wrote: Sorry, have not gotten it yet, but will be back trying later today - monday, tuesday tend to be slow for me (meetings and crap). - Mark On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote: Has there been any success in replicating this? I'm wondering if it could be something with my setup that is causing the issue... On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote: Yes, I have the following layout on the FS ./bootstrap.sh ./example (standard example directory from distro containing jetty jars, solr confs, solr war, etc) ./slice1 - start.sh -solr.xml - slice1_shard1 - data - slice2_shard2 -data ./slice2 - start.sh - solr.xml -slice2_shard1 -data -slice1_shard2 -data if it matters I'm running everything from localhost, zk and the solr shards On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote: Do you have unique dataDir for each instance? 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti: - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
Doing so now, will let you know if I continue to see the same issues On Tue, Feb 14, 2012 at 4:59 PM, Mark Miller markrmil...@gmail.com wrote: Doh - looks like I was just seeing a test issue. Do you mind updating and trying the latest rev? At the least there should be some better logging around the recovery. I'll keep working on tests in the meantime. - Mark On Feb 14, 2012, at 3:15 PM, Jamie Johnson wrote: Sounds good, if I pull the latest from trunk and rerun will that be useful or were you able to duplicate my issue now? On Tue, Feb 14, 2012 at 3:00 PM, Mark Miller markrmil...@gmail.com wrote: Okay Jamie, I think I have a handle on this. It looks like an issue with what config files are being used by cores created with the admin core handler - I think it's just picking up default config and not the correct config for the collection. This means they end up using config that has no UpdateLog defined - and so recovery fails. I've added more logging around this so that it's easy to determine that. I'm investigating more and working on a test + fix. I'll file a JIRA issue soon as well. - Mark On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote: Thanks Mark, not a huge rush, just me trying to get to use the latest stuff on our project. On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller markrmil...@gmail.com wrote: Sorry, have not gotten it yet, but will be back trying later today - monday, tuesday tend to be slow for me (meetings and crap). - Mark On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote: Has there been any success in replicating this? I'm wondering if it could be something with my setup that is causing the issue... On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote: Yes, I have the following layout on the FS ./bootstrap.sh ./example (standard example directory from distro containing jetty jars, solr confs, solr war, etc) ./slice1 - start.sh -solr.xml - slice1_shard1 - data - slice2_shard2 -data ./slice2 - start.sh - solr.xml -slice2_shard1 -data -slice1_shard2 -data if it matters I'm running everything from localhost, zk and the solr shards On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote: Do you have unique dataDir for each instance? 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti: - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
All of the nodes now show as being Active. When starting the replicas I did receive the following message though. Not sure if this is expected or not. INFO: Attempting to replicate from http://JamiesMac.local:8501/solr/slice2_shard2/ Feb 14, 2012 10:53:34 PM org.apache.solr.common.SolrException log SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: null java.lang.NullPointerException at org.apache.solr.handler.admin.CoreAdminHandler.handlePrepRecoveryAction(CoreAdminHandler.java:646) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:358) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:172) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) null java.lang.NullPointerExceptionat org.apache.solr.handler.admin.CoreAdminHandler.handlePrepRecoveryAction(CoreAdminHandler.java:646) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:358) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:172) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) request: http://JamiesMac.local:8501/solr/admin/cores?action=PREPRECOVERYcore=slice2_shard2nodeName=JamiesMac.local:8502_solrcoreNodeName=JamiesMac.local:8502_solr_slice2_shard1wt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:433) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:120) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:208) Feb 14, 2012 10:53:34 PM org.apache.solr.update.UpdateLog dropBufferedUpdates
Re: SolrCloud Replication Question
I don't see any errors in the log. here are the following scripts I'm running, and to create the cores I run the following commands curl 'http://localhost:8501/solr/admin/cores?action=CREATEname=slice1_shard1collection=collection1shard=slice1collection.configName=config1' curl 'http://localhost:8501/solr/admin/cores?action=CREATEname=slice2_shard2collection=collection1shard=slice2collection.configName=config1' curl 'http://localhost:8502/solr/admin/cores?action=CREATEname=slice2_shard1collection=collection1shard=slice2collection.configName=config1' curl 'http://localhost:8502/solr/admin/cores?action=CREATEname=slice1_shard2collection=collection1shard=slice1collection.configName=config1' after doing this the nodes are immediately marked as down in clusterstate.json. Restating the solr instances I see that which ever I start first shows up as active, and the other is down. There are no errors in the logs either. On Sat, Feb 11, 2012 at 9:48 PM, Mark Miller markrmil...@gmail.com wrote: Yeah, that is what I would expect - for a node to be marked as down, it either didn't finish starting, or it gave up recovering...either case should be logged. You might try searching for the recover keyword and see if there are any interesting bits around that. Meanwhile, I have dug up a couple issues around recovery and committed fixes to trunk - still playing around... On Feb 11, 2012, at 8:44 PM, Jamie Johnson wrote: I didn't see anything in the logs, would it be an error? On Sat, Feb 11, 2012 at 3:58 PM, Mark Miller markrmil...@gmail.com wrote: On Feb 11, 2012, at 3:08 PM, Jamie Johnson wrote: I wiped the zk and started over (when I switch networks I get different host names and honestly haven't dug into why). That being said the latest state shows all in sync, why would the cores show up as down? If recovery fails X times (say because the leader can't be reached from the replica), a node is marked as down. It can't be active, and technically it has stopped trying to recover (it tries X times and eventually give up until you restart it). Side note, I recently ran into this issue: SOLR-3122 - fix coming soon. Not sure if you have looked at your logs or not, but perhaps it's involved. - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com bootstrap.sh Description: Bourne shell script start.sh Description: Bourne shell script start.sh Description: Bourne shell script ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores adminPath=/admin/cores zkClientTimeout=1 hostPort=8501 hostContext=solr /cores /solr
Re: SolrCloud Replication Question
Do you have unique dataDir for each instance? 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti:
Re: SolrCloud Replication Question
Yes, I have the following layout on the FS ./bootstrap.sh ./example (standard example directory from distro containing jetty jars, solr confs, solr war, etc) ./slice1 - start.sh -solr.xml - slice1_shard1 - data - slice2_shard2 -data ./slice2 - start.sh - solr.xml -slice2_shard1 -data -slice1_shard2 -data if it matters I'm running everything from localhost, zk and the solr shards On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote: Do you have unique dataDir for each instance? 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti:
Re: SolrCloud Replication Question
On Feb 10, 2012, at 9:40 PM, Jamie Johnson wrote: how'd you resolve this issue? I was basing my guess on seeing JamiesMac.local and jamiesmac in your first cluster state dump - your latest doesn't seem to mismatch like that though. - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
I wiped the zk and started over (when I switch networks I get different host names and honestly haven't dug into why). That being said the latest state shows all in sync, why would the cores show up as down? On Sat, Feb 11, 2012 at 11:08 AM, Mark Miller markrmil...@gmail.com wrote: On Feb 10, 2012, at 9:40 PM, Jamie Johnson wrote: how'd you resolve this issue? I was basing my guess on seeing JamiesMac.local and jamiesmac in your first cluster state dump - your latest doesn't seem to mismatch like that though. - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
On Feb 11, 2012, at 3:08 PM, Jamie Johnson wrote: I wiped the zk and started over (when I switch networks I get different host names and honestly haven't dug into why). That being said the latest state shows all in sync, why would the cores show up as down? If recovery fails X times (say because the leader can't be reached from the replica), a node is marked as down. It can't be active, and technically it has stopped trying to recover (it tries X times and eventually give up until you restart it). Side note, I recently ran into this issue: SOLR-3122 - fix coming soon. Not sure if you have looked at your logs or not, but perhaps it's involved. - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
I didn't see anything in the logs, would it be an error? On Sat, Feb 11, 2012 at 3:58 PM, Mark Miller markrmil...@gmail.com wrote: On Feb 11, 2012, at 3:08 PM, Jamie Johnson wrote: I wiped the zk and started over (when I switch networks I get different host names and honestly haven't dug into why). That being said the latest state shows all in sync, why would the cores show up as down? If recovery fails X times (say because the leader can't be reached from the replica), a node is marked as down. It can't be active, and technically it has stopped trying to recover (it tries X times and eventually give up until you restart it). Side note, I recently ran into this issue: SOLR-3122 - fix coming soon. Not sure if you have looked at your logs or not, but perhaps it's involved. - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
Yeah, that is what I would expect - for a node to be marked as down, it either didn't finish starting, or it gave up recovering...either case should be logged. You might try searching for the recover keyword and see if there are any interesting bits around that. Meanwhile, I have dug up a couple issues around recovery and committed fixes to trunk - still playing around... On Feb 11, 2012, at 8:44 PM, Jamie Johnson wrote: I didn't see anything in the logs, would it be an error? On Sat, Feb 11, 2012 at 3:58 PM, Mark Miller markrmil...@gmail.com wrote: On Feb 11, 2012, at 3:08 PM, Jamie Johnson wrote: I wiped the zk and started over (when I switch networks I get different host names and honestly haven't dug into why). That being said the latest state shows all in sync, why would the cores show up as down? If recovery fails X times (say because the leader can't be reached from the replica), a node is marked as down. It can't be active, and technically it has stopped trying to recover (it tries X times and eventually give up until you restart it). Side note, I recently ran into this issue: SOLR-3122 - fix coming soon. Not sure if you have looked at your logs or not, but perhaps it's involved. - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com
SolrCloud Replication Question
I know that the latest Solr Cloud doesn't use standard replication but I have a question about how it appears to be working. I currently have the following cluster state {collection1:{ slice1:{ JamiesMac.local:8501_solr_slice1_shard1:{ shard_id:slice1, state:active, core:slice1_shard1, collection:collection1, node_name:JamiesMac.local:8501_solr, base_url:http://JamiesMac.local:8501/solr}, JamiesMac.local:8502_solr_slice1_shard2:{ shard_id:slice1, state:active, core:slice1_shard2, collection:collection1, node_name:JamiesMac.local:8502_solr, base_url:http://JamiesMac.local:8502/solr}, jamiesmac:8501_solr_slice1_shard1:{ shard_id:slice1, state:down, core:slice1_shard1, collection:collection1, node_name:jamiesmac:8501_solr, base_url:http://jamiesmac:8501/solr}, jamiesmac:8502_solr_slice1_shard2:{ shard_id:slice1, leader:true, state:active, core:slice1_shard2, collection:collection1, node_name:jamiesmac:8502_solr, base_url:http://jamiesmac:8502/solr}}, slice2:{ JamiesMac.local:8501_solr_slice2_shard2:{ shard_id:slice2, state:active, core:slice2_shard2, collection:collection1, node_name:JamiesMac.local:8501_solr, base_url:http://JamiesMac.local:8501/solr}, JamiesMac.local:8502_solr_slice2_shard1:{ shard_id:slice2, state:active, core:slice2_shard1, collection:collection1, node_name:JamiesMac.local:8502_solr, base_url:http://JamiesMac.local:8502/solr}, jamiesmac:8501_solr_slice2_shard2:{ shard_id:slice2, state:down, core:slice2_shard2, collection:collection1, node_name:jamiesmac:8501_solr, base_url:http://jamiesmac:8501/solr}, jamiesmac:8502_solr_slice2_shard1:{ shard_id:slice2, leader:true, state:active, core:slice2_shard1, collection:collection1, node_name:jamiesmac:8502_solr, base_url:http://jamiesmac:8502/solr I then added some docs to the following shards using SolrJ http://localhost:8502/solr/slice2_shard1 http://localhost:8502/solr/slice1_shard2 I then bring back up the other cores and I don't see replication happening. Looking at the stats for each core I see that on the 8501 instance (the instance that was off) the number of docs is 0, so I've clearly set something up incorrectly. Any help on this would be greatly appreciated.
Re: SolrCloud Replication Question
Can you explain a little more how you doing this? How are you bringing the cores down and then back up? Shutting down a full solr instance, unloading the core? On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote: I know that the latest Solr Cloud doesn't use standard replication but I have a question about how it appears to be working. I currently have the following cluster state {collection1:{ slice1:{ JamiesMac.local:8501_solr_slice1_shard1:{ shard_id:slice1, state:active, core:slice1_shard1, collection:collection1, node_name:JamiesMac.local:8501_solr, base_url:http://JamiesMac.local:8501/solr}, JamiesMac.local:8502_solr_slice1_shard2:{ shard_id:slice1, state:active, core:slice1_shard2, collection:collection1, node_name:JamiesMac.local:8502_solr, base_url:http://JamiesMac.local:8502/solr}, jamiesmac:8501_solr_slice1_shard1:{ shard_id:slice1, state:down, core:slice1_shard1, collection:collection1, node_name:jamiesmac:8501_solr, base_url:http://jamiesmac:8501/solr}, jamiesmac:8502_solr_slice1_shard2:{ shard_id:slice1, leader:true, state:active, core:slice1_shard2, collection:collection1, node_name:jamiesmac:8502_solr, base_url:http://jamiesmac:8502/solr}}, slice2:{ JamiesMac.local:8501_solr_slice2_shard2:{ shard_id:slice2, state:active, core:slice2_shard2, collection:collection1, node_name:JamiesMac.local:8501_solr, base_url:http://JamiesMac.local:8501/solr}, JamiesMac.local:8502_solr_slice2_shard1:{ shard_id:slice2, state:active, core:slice2_shard1, collection:collection1, node_name:JamiesMac.local:8502_solr, base_url:http://JamiesMac.local:8502/solr}, jamiesmac:8501_solr_slice2_shard2:{ shard_id:slice2, state:down, core:slice2_shard2, collection:collection1, node_name:jamiesmac:8501_solr, base_url:http://jamiesmac:8501/solr}, jamiesmac:8502_solr_slice2_shard1:{ shard_id:slice2, leader:true, state:active, core:slice2_shard1, collection:collection1, node_name:jamiesmac:8502_solr, base_url:http://jamiesmac:8502/solr I then added some docs to the following shards using SolrJ http://localhost:8502/solr/slice2_shard1 http://localhost:8502/solr/slice1_shard2 I then bring back up the other cores and I don't see replication happening. Looking at the stats for each core I see that on the 8501 instance (the instance that was off) the number of docs is 0, so I've clearly set something up incorrectly. Any help on this would be greatly appreciated. - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
Sorry, I shut down the full solr instance. On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller markrmil...@gmail.com wrote: Can you explain a little more how you doing this? How are you bringing the cores down and then back up? Shutting down a full solr instance, unloading the core? On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote: I know that the latest Solr Cloud doesn't use standard replication but I have a question about how it appears to be working. I currently have the following cluster state {collection1:{ slice1:{ JamiesMac.local:8501_solr_slice1_shard1:{ shard_id:slice1, state:active, core:slice1_shard1, collection:collection1, node_name:JamiesMac.local:8501_solr, base_url:http://JamiesMac.local:8501/solr}, JamiesMac.local:8502_solr_slice1_shard2:{ shard_id:slice1, state:active, core:slice1_shard2, collection:collection1, node_name:JamiesMac.local:8502_solr, base_url:http://JamiesMac.local:8502/solr}, jamiesmac:8501_solr_slice1_shard1:{ shard_id:slice1, state:down, core:slice1_shard1, collection:collection1, node_name:jamiesmac:8501_solr, base_url:http://jamiesmac:8501/solr}, jamiesmac:8502_solr_slice1_shard2:{ shard_id:slice1, leader:true, state:active, core:slice1_shard2, collection:collection1, node_name:jamiesmac:8502_solr, base_url:http://jamiesmac:8502/solr}}, slice2:{ JamiesMac.local:8501_solr_slice2_shard2:{ shard_id:slice2, state:active, core:slice2_shard2, collection:collection1, node_name:JamiesMac.local:8501_solr, base_url:http://JamiesMac.local:8501/solr}, JamiesMac.local:8502_solr_slice2_shard1:{ shard_id:slice2, state:active, core:slice2_shard1, collection:collection1, node_name:JamiesMac.local:8502_solr, base_url:http://JamiesMac.local:8502/solr}, jamiesmac:8501_solr_slice2_shard2:{ shard_id:slice2, state:down, core:slice2_shard2, collection:collection1, node_name:jamiesmac:8501_solr, base_url:http://jamiesmac:8501/solr}, jamiesmac:8502_solr_slice2_shard1:{ shard_id:slice2, leader:true, state:active, core:slice2_shard1, collection:collection1, node_name:jamiesmac:8502_solr, base_url:http://jamiesmac:8502/solr I then added some docs to the following shards using SolrJ http://localhost:8502/solr/slice2_shard1 http://localhost:8502/solr/slice1_shard2 I then bring back up the other cores and I don't see replication happening. Looking at the stats for each core I see that on the 8501 instance (the instance that was off) the number of docs is 0, so I've clearly set something up incorrectly. Any help on this would be greatly appreciated. - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
Sorry for pinging this again, is more information needed on this? I can provide more details but am not sure what to provide. On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson jej2...@gmail.com wrote: Sorry, I shut down the full solr instance. On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller markrmil...@gmail.com wrote: Can you explain a little more how you doing this? How are you bringing the cores down and then back up? Shutting down a full solr instance, unloading the core? On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote: I know that the latest Solr Cloud doesn't use standard replication but I have a question about how it appears to be working. I currently have the following cluster state {collection1:{ slice1:{ JamiesMac.local:8501_solr_slice1_shard1:{ shard_id:slice1, state:active, core:slice1_shard1, collection:collection1, node_name:JamiesMac.local:8501_solr, base_url:http://JamiesMac.local:8501/solr}, JamiesMac.local:8502_solr_slice1_shard2:{ shard_id:slice1, state:active, core:slice1_shard2, collection:collection1, node_name:JamiesMac.local:8502_solr, base_url:http://JamiesMac.local:8502/solr}, jamiesmac:8501_solr_slice1_shard1:{ shard_id:slice1, state:down, core:slice1_shard1, collection:collection1, node_name:jamiesmac:8501_solr, base_url:http://jamiesmac:8501/solr}, jamiesmac:8502_solr_slice1_shard2:{ shard_id:slice1, leader:true, state:active, core:slice1_shard2, collection:collection1, node_name:jamiesmac:8502_solr, base_url:http://jamiesmac:8502/solr}}, slice2:{ JamiesMac.local:8501_solr_slice2_shard2:{ shard_id:slice2, state:active, core:slice2_shard2, collection:collection1, node_name:JamiesMac.local:8501_solr, base_url:http://JamiesMac.local:8501/solr}, JamiesMac.local:8502_solr_slice2_shard1:{ shard_id:slice2, state:active, core:slice2_shard1, collection:collection1, node_name:JamiesMac.local:8502_solr, base_url:http://JamiesMac.local:8502/solr}, jamiesmac:8501_solr_slice2_shard2:{ shard_id:slice2, state:down, core:slice2_shard2, collection:collection1, node_name:jamiesmac:8501_solr, base_url:http://jamiesmac:8501/solr}, jamiesmac:8502_solr_slice2_shard1:{ shard_id:slice2, leader:true, state:active, core:slice2_shard1, collection:collection1, node_name:jamiesmac:8502_solr, base_url:http://jamiesmac:8502/solr I then added some docs to the following shards using SolrJ http://localhost:8502/solr/slice2_shard1 http://localhost:8502/solr/slice1_shard2 I then bring back up the other cores and I don't see replication happening. Looking at the stats for each core I see that on the 8501 instance (the instance that was off) the number of docs is 0, so I've clearly set something up incorrectly. Any help on this would be greatly appreciated. - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
I'm trying, but so far I don't see anything. I'll have to try and mimic your setup closer it seems. I tried starting up 6 solr instances on different ports as 2 shards, each with a replication factor of 3. Then I indexed 20k documents to the cluster and verified doc counts. Then I shutdown all the replicas so that only one instance served each shard. Then I indexed 20k documents to the cluster. Then I started the downed nodes and verified that they where in a recovery state. After enough time went by I checked and verified document counts on each instance - they where as expected. I guess next I can try a similar experiment using multiple cores, but if you notice anything that stands out that is largely different in what you are doing, let me know. The cores that are behind, does it say they are down, recovering, or active in zookeeper? On Feb 10, 2012, at 4:48 PM, Jamie Johnson wrote: Sorry for pinging this again, is more information needed on this? I can provide more details but am not sure what to provide. On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson jej2...@gmail.com wrote: Sorry, I shut down the full solr instance. On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller markrmil...@gmail.com wrote: Can you explain a little more how you doing this? How are you bringing the cores down and then back up? Shutting down a full solr instance, unloading the core? On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote: I know that the latest Solr Cloud doesn't use standard replication but I have a question about how it appears to be working. I currently have the following cluster state {collection1:{ slice1:{ JamiesMac.local:8501_solr_slice1_shard1:{ shard_id:slice1, state:active, core:slice1_shard1, collection:collection1, node_name:JamiesMac.local:8501_solr, base_url:http://JamiesMac.local:8501/solr}, JamiesMac.local:8502_solr_slice1_shard2:{ shard_id:slice1, state:active, core:slice1_shard2, collection:collection1, node_name:JamiesMac.local:8502_solr, base_url:http://JamiesMac.local:8502/solr}, jamiesmac:8501_solr_slice1_shard1:{ shard_id:slice1, state:down, core:slice1_shard1, collection:collection1, node_name:jamiesmac:8501_solr, base_url:http://jamiesmac:8501/solr}, jamiesmac:8502_solr_slice1_shard2:{ shard_id:slice1, leader:true, state:active, core:slice1_shard2, collection:collection1, node_name:jamiesmac:8502_solr, base_url:http://jamiesmac:8502/solr}}, slice2:{ JamiesMac.local:8501_solr_slice2_shard2:{ shard_id:slice2, state:active, core:slice2_shard2, collection:collection1, node_name:JamiesMac.local:8501_solr, base_url:http://JamiesMac.local:8501/solr}, JamiesMac.local:8502_solr_slice2_shard1:{ shard_id:slice2, state:active, core:slice2_shard1, collection:collection1, node_name:JamiesMac.local:8502_solr, base_url:http://JamiesMac.local:8502/solr}, jamiesmac:8501_solr_slice2_shard2:{ shard_id:slice2, state:down, core:slice2_shard2, collection:collection1, node_name:jamiesmac:8501_solr, base_url:http://jamiesmac:8501/solr}, jamiesmac:8502_solr_slice2_shard1:{ shard_id:slice2, leader:true, state:active, core:slice2_shard1, collection:collection1, node_name:jamiesmac:8502_solr, base_url:http://jamiesmac:8502/solr I then added some docs to the following shards using SolrJ http://localhost:8502/solr/slice2_shard1 http://localhost:8502/solr/slice1_shard2 I then bring back up the other cores and I don't see replication happening. Looking at the stats for each core I see that on the 8501 instance (the instance that was off) the number of docs is 0, so I've clearly set something up incorrectly. Any help on this would be greatly appreciated. - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
Also, it will help if you can mention the exact version of solrcloud you are talking about in each issue - I know you have one from the old branch, and I assume a version off trunk you are playing with - so a heads up on which and if trunk, what rev or day will help in the case that I'm trying to dupe issues that have been addressed. - Mark On Feb 10, 2012, at 6:09 PM, Mark Miller wrote: I'm trying, but so far I don't see anything. I'll have to try and mimic your setup closer it seems. I tried starting up 6 solr instances on different ports as 2 shards, each with a replication factor of 3. Then I indexed 20k documents to the cluster and verified doc counts. Then I shutdown all the replicas so that only one instance served each shard. Then I indexed 20k documents to the cluster. Then I started the downed nodes and verified that they where in a recovery state. After enough time went by I checked and verified document counts on each instance - they where as expected. I guess next I can try a similar experiment using multiple cores, but if you notice anything that stands out that is largely different in what you are doing, let me know. The cores that are behind, does it say they are down, recovering, or active in zookeeper? On Feb 10, 2012, at 4:48 PM, Jamie Johnson wrote: Sorry for pinging this again, is more information needed on this? I can provide more details but am not sure what to provide. On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson jej2...@gmail.com wrote: Sorry, I shut down the full solr instance. On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller markrmil...@gmail.com wrote: Can you explain a little more how you doing this? How are you bringing the cores down and then back up? Shutting down a full solr instance, unloading the core? On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote: I know that the latest Solr Cloud doesn't use standard replication but I have a question about how it appears to be working. I currently have the following cluster state {collection1:{ slice1:{ JamiesMac.local:8501_solr_slice1_shard1:{ shard_id:slice1, state:active, core:slice1_shard1, collection:collection1, node_name:JamiesMac.local:8501_solr, base_url:http://JamiesMac.local:8501/solr}, JamiesMac.local:8502_solr_slice1_shard2:{ shard_id:slice1, state:active, core:slice1_shard2, collection:collection1, node_name:JamiesMac.local:8502_solr, base_url:http://JamiesMac.local:8502/solr}, jamiesmac:8501_solr_slice1_shard1:{ shard_id:slice1, state:down, core:slice1_shard1, collection:collection1, node_name:jamiesmac:8501_solr, base_url:http://jamiesmac:8501/solr}, jamiesmac:8502_solr_slice1_shard2:{ shard_id:slice1, leader:true, state:active, core:slice1_shard2, collection:collection1, node_name:jamiesmac:8502_solr, base_url:http://jamiesmac:8502/solr}}, slice2:{ JamiesMac.local:8501_solr_slice2_shard2:{ shard_id:slice2, state:active, core:slice2_shard2, collection:collection1, node_name:JamiesMac.local:8501_solr, base_url:http://JamiesMac.local:8501/solr}, JamiesMac.local:8502_solr_slice2_shard1:{ shard_id:slice2, state:active, core:slice2_shard1, collection:collection1, node_name:JamiesMac.local:8502_solr, base_url:http://JamiesMac.local:8502/solr}, jamiesmac:8501_solr_slice2_shard2:{ shard_id:slice2, state:down, core:slice2_shard2, collection:collection1, node_name:jamiesmac:8501_solr, base_url:http://jamiesmac:8501/solr}, jamiesmac:8502_solr_slice2_shard1:{ shard_id:slice2, leader:true, state:active, core:slice2_shard1, collection:collection1, node_name:jamiesmac:8502_solr, base_url:http://jamiesmac:8502/solr I then added some docs to the following shards using SolrJ http://localhost:8502/solr/slice2_shard1 http://localhost:8502/solr/slice1_shard2 I then bring back up the other cores and I don't see replication happening. Looking at the stats for each core I see that on the 8501 instance (the instance that was off) the number of docs is 0, so I've clearly set something up incorrectly. Any help on this would be greatly appreciated. - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
nothing seems that different. In regards to the states of each I'll try to verify tonight. This was using a version I pulled from SVN trunk yesterday morning On Fri, Feb 10, 2012 at 6:22 PM, Mark Miller markrmil...@gmail.com wrote: Also, it will help if you can mention the exact version of solrcloud you are talking about in each issue - I know you have one from the old branch, and I assume a version off trunk you are playing with - so a heads up on which and if trunk, what rev or day will help in the case that I'm trying to dupe issues that have been addressed. - Mark On Feb 10, 2012, at 6:09 PM, Mark Miller wrote: I'm trying, but so far I don't see anything. I'll have to try and mimic your setup closer it seems. I tried starting up 6 solr instances on different ports as 2 shards, each with a replication factor of 3. Then I indexed 20k documents to the cluster and verified doc counts. Then I shutdown all the replicas so that only one instance served each shard. Then I indexed 20k documents to the cluster. Then I started the downed nodes and verified that they where in a recovery state. After enough time went by I checked and verified document counts on each instance - they where as expected. I guess next I can try a similar experiment using multiple cores, but if you notice anything that stands out that is largely different in what you are doing, let me know. The cores that are behind, does it say they are down, recovering, or active in zookeeper? On Feb 10, 2012, at 4:48 PM, Jamie Johnson wrote: Sorry for pinging this again, is more information needed on this? I can provide more details but am not sure what to provide. On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson jej2...@gmail.com wrote: Sorry, I shut down the full solr instance. On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller markrmil...@gmail.com wrote: Can you explain a little more how you doing this? How are you bringing the cores down and then back up? Shutting down a full solr instance, unloading the core? On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote: I know that the latest Solr Cloud doesn't use standard replication but I have a question about how it appears to be working. I currently have the following cluster state {collection1:{ slice1:{ JamiesMac.local:8501_solr_slice1_shard1:{ shard_id:slice1, state:active, core:slice1_shard1, collection:collection1, node_name:JamiesMac.local:8501_solr, base_url:http://JamiesMac.local:8501/solr}, JamiesMac.local:8502_solr_slice1_shard2:{ shard_id:slice1, state:active, core:slice1_shard2, collection:collection1, node_name:JamiesMac.local:8502_solr, base_url:http://JamiesMac.local:8502/solr}, jamiesmac:8501_solr_slice1_shard1:{ shard_id:slice1, state:down, core:slice1_shard1, collection:collection1, node_name:jamiesmac:8501_solr, base_url:http://jamiesmac:8501/solr}, jamiesmac:8502_solr_slice1_shard2:{ shard_id:slice1, leader:true, state:active, core:slice1_shard2, collection:collection1, node_name:jamiesmac:8502_solr, base_url:http://jamiesmac:8502/solr}}, slice2:{ JamiesMac.local:8501_solr_slice2_shard2:{ shard_id:slice2, state:active, core:slice2_shard2, collection:collection1, node_name:JamiesMac.local:8501_solr, base_url:http://JamiesMac.local:8501/solr}, JamiesMac.local:8502_solr_slice2_shard1:{ shard_id:slice2, state:active, core:slice2_shard1, collection:collection1, node_name:JamiesMac.local:8502_solr, base_url:http://JamiesMac.local:8502/solr}, jamiesmac:8501_solr_slice2_shard2:{ shard_id:slice2, state:down, core:slice2_shard2, collection:collection1, node_name:jamiesmac:8501_solr, base_url:http://jamiesmac:8501/solr}, jamiesmac:8502_solr_slice2_shard1:{ shard_id:slice2, leader:true, state:active, core:slice2_shard1, collection:collection1, node_name:jamiesmac:8502_solr, base_url:http://jamiesmac:8502/solr I then added some docs to the following shards using SolrJ http://localhost:8502/solr/slice2_shard1 http://localhost:8502/solr/slice1_shard2 I then bring back up the other cores and I don't see replication happening. Looking at the stats for each core I see that on the 8501 instance (the instance that was off) the number of docs is 0, so I've clearly set something up incorrectly. Any help on this would be greatly appreciated. - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
Thanks. If the given ZK snapshot was the end state, then two nodes are marked as down. Generally that happens because replication failed - if you have not, I'd check the logs for those two nodes. - Mark On Fri, Feb 10, 2012 at 7:35 PM, Jamie Johnson jej2...@gmail.com wrote: nothing seems that different. In regards to the states of each I'll try to verify tonight. This was using a version I pulled from SVN trunk yesterday morning On Fri, Feb 10, 2012 at 6:22 PM, Mark Miller markrmil...@gmail.com wrote: Also, it will help if you can mention the exact version of solrcloud you are talking about in each issue - I know you have one from the old branch, and I assume a version off trunk you are playing with - so a heads up on which and if trunk, what rev or day will help in the case that I'm trying to dupe issues that have been addressed. - Mark On Feb 10, 2012, at 6:09 PM, Mark Miller wrote: I'm trying, but so far I don't see anything. I'll have to try and mimic your setup closer it seems. I tried starting up 6 solr instances on different ports as 2 shards, each with a replication factor of 3. Then I indexed 20k documents to the cluster and verified doc counts. Then I shutdown all the replicas so that only one instance served each shard. Then I indexed 20k documents to the cluster. Then I started the downed nodes and verified that they where in a recovery state. After enough time went by I checked and verified document counts on each instance - they where as expected. I guess next I can try a similar experiment using multiple cores, but if you notice anything that stands out that is largely different in what you are doing, let me know. The cores that are behind, does it say they are down, recovering, or active in zookeeper? On Feb 10, 2012, at 4:48 PM, Jamie Johnson wrote: Sorry for pinging this again, is more information needed on this? I can provide more details but am not sure what to provide. On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson jej2...@gmail.com wrote: Sorry, I shut down the full solr instance. On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller markrmil...@gmail.com wrote: Can you explain a little more how you doing this? How are you bringing the cores down and then back up? Shutting down a full solr instance, unloading the core? On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote: I know that the latest Solr Cloud doesn't use standard replication but I have a question about how it appears to be working. I currently have the following cluster state {collection1:{ slice1:{ JamiesMac.local:8501_solr_slice1_shard1:{ shard_id:slice1, state:active, core:slice1_shard1, collection:collection1, node_name:JamiesMac.local:8501_solr, base_url:http://JamiesMac.local:8501/solr}, JamiesMac.local:8502_solr_slice1_shard2:{ shard_id:slice1, state:active, core:slice1_shard2, collection:collection1, node_name:JamiesMac.local:8502_solr, base_url:http://JamiesMac.local:8502/solr}, jamiesmac:8501_solr_slice1_shard1:{ shard_id:slice1, state:down, core:slice1_shard1, collection:collection1, node_name:jamiesmac:8501_solr, base_url:http://jamiesmac:8501/solr}, jamiesmac:8502_solr_slice1_shard2:{ shard_id:slice1, leader:true, state:active, core:slice1_shard2, collection:collection1, node_name:jamiesmac:8502_solr, base_url:http://jamiesmac:8502/solr}}, slice2:{ JamiesMac.local:8501_solr_slice2_shard2:{ shard_id:slice2, state:active, core:slice2_shard2, collection:collection1, node_name:JamiesMac.local:8501_solr, base_url:http://JamiesMac.local:8501/solr}, JamiesMac.local:8502_solr_slice2_shard1:{ shard_id:slice2, state:active, core:slice2_shard1, collection:collection1, node_name:JamiesMac.local:8502_solr, base_url:http://JamiesMac.local:8502/solr}, jamiesmac:8501_solr_slice2_shard2:{ shard_id:slice2, state:down, core:slice2_shard2, collection:collection1, node_name:jamiesmac:8501_solr, base_url:http://jamiesmac:8501/solr}, jamiesmac:8502_solr_slice2_shard1:{ shard_id:slice2, leader:true, state:active, core:slice2_shard1, collection:collection1, node_name:jamiesmac:8502_solr, base_url:http://jamiesmac:8502/solr I then added some docs to the following shards using SolrJ http://localhost:8502/solr/slice2_shard1 http://localhost:8502/solr/slice1_shard2 I then bring back up the other cores and I don't see replication happening. Looking at the stats for each core I see that on the 8501 instance (the instance that was off) the number of docs is 0, so I've
Re: SolrCloud Replication Question
On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote: jamiesmac Another note: Have no idea if this is involved, but when I do tests with my linux box and mac I run into the following: My linux box auto finds the address of halfmetal and my macbook mbpro.local. If I accept those defaults, my mac connect reach my linux box. It can only reach the linux box through halfmetal.local, and so I have to override the host on the linux box to advertise as halfmetal.local and then they can talk. In the bad case, if my leaders where on the linux box, they would be able to forward to the mac no problem, but then if shards on the mac needed to recover, they would fail to reach the linux box through the halfmetal address. - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
hmmperhaps I'm seeing the issue you're speaking of. I have everything running right now and my state is as follows: {collection1:{ slice1:{ JamiesMac.local:8501_solr_slice1_shard1:{ shard_id:slice1, leader:true, state:active, core:slice1_shard1, collection:collection1, node_name:JamiesMac.local:8501_solr, base_url:http://JamiesMac.local:8501/solr}, JamiesMac.local:8502_solr_slice1_shard2:{ shard_id:slice1, state:down, core:slice1_shard2, collection:collection1, node_name:JamiesMac.local:8502_solr, base_url:http://JamiesMac.local:8502/solr}}, slice2:{ JamiesMac.local:8502_solr_slice2_shard1:{ shard_id:slice2, leader:true, state:active, core:slice2_shard1, collection:collection1, node_name:JamiesMac.local:8502_solr, base_url:http://JamiesMac.local:8502/solr}, JamiesMac.local:8501_solr_slice2_shard2:{ shard_id:slice2, state:down, core:slice2_shard2, collection:dataspace, node_name:JamiesMac.local:8501_solr, base_url:http://JamiesMac.local:8501/solr how'd you resolve this issue? On Fri, Feb 10, 2012 at 8:49 PM, Mark Miller markrmil...@gmail.com wrote: On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote: jamiesmac Another note: Have no idea if this is involved, but when I do tests with my linux box and mac I run into the following: My linux box auto finds the address of halfmetal and my macbook mbpro.local. If I accept those defaults, my mac connect reach my linux box. It can only reach the linux box through halfmetal.local, and so I have to override the host on the linux box to advertise as halfmetal.local and then they can talk. In the bad case, if my leaders where on the linux box, they would be able to forward to the mac no problem, but then if shards on the mac needed to recover, they would fail to reach the linux box through the halfmetal address. - Mark Miller lucidimagination.com
Replication question
I have Replication set up with str name=pollInterval00:00:60/str I assumed that meant it would poll the master for updates once a minute. But my logs make it look like it is trying to sync up almost constantly. Below is an example of my log from just 1 minute in time. Am I reading this wrong? This is from one of the slaves, I have 2 of them so my Master's log file is double this. Is this normal? May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. -- View this message in context: http://lucene.472066.n3.nabble.com/Replication-question-tp2909157p2909157.html Sent from the Solr - User mailing list archive at Nabble.com.
solr1.4 replication question
Hi, I am fairly new to solr, and have setup two servers, one with master, other as a slave. I have a load balancer in front with 2 different VIP, one to do gets/reads distributed evenly on the master and slave, and another VIP to do posts/updates just to the master. If the master fails I have the second VIP to automatically update the slave. But if that happens is there a way to automatically switch whcih is master and which is slave instead of going into solrconfig.xml and then restarting the instances? Any recommendations for the best way to set it up? Thanks