subject:"Replication question"

Re: Replication Question

2017-08-04 Thread Shawn Heisey

On 8/2/2017 8:56 AM, Michael B. Klein wrote:
> SCALE DOWN
> 1) Call admin/collections?action=BACKUP for each collection to a
> shared NFS volume
> 2) Shut down all the nodes
>
> SCALE UP
> 1) Spin up 2 Zookeeper nodes and wait for them to stabilize
> 2) Spin up 3 Solr nodes and wait for them to show up under Zookeeper's
> live_nodes
> 3) Call admin/collections?action=RESTORE to put all the collections back
>
> This has been working very well, for the most part, with the following
> complications/observations:
>
> 1) If I don't optimize each collection right before BACKUP, the backup
> fails (see the attached solr_backup_error.json).

Sounds like you're being hit by this at backup time:

https://issues.apache.org/jira/browse/SOLR-9120

There's a patch in the issue which I have not verified and tested.  The
workaround of optimizing the collection is not one I would have thought of.

> 2) If I don't specify a replicationFactor during RESTORE, the admin
> interface's Cloud diagram only shows one active node per collection.
> Is this expected? Am I required to specify the replicationFactor
> unless I'm using a shared HDFS volume for solr data?

The documentation for RESTORE (looking at the 6.6 docs) says that the
restored collection will have the same number of shards and replicas as
the original collection.  Your experience says that either the
documentation is wrong or the version of Solr you're running doesn't
behave that way, and might have a bug.

> 3) If I don't specify maxShardsPerNode=1 during RESTORE, I get a
> warning message in the response, even though the restore seems to succeed.

I would like to see that warning, including whatever stacktrace is
present.  It might be expected, but I'd like to look into it.

> 4) Aside from the replicationFactor parameter on the CREATE/RESTORE, I
> do not currently have any replication stuff configured (as it seems I
> should not).

Correct, you don't need any replication configured.  It's not for cloud
mode.

> 5) At the time my "1-in-3 requests are failing" issue occurred, the
> Cloud diagram looked like the attached solr_admin_cloud_diagram.png.
> It seemed to think all replicas were live and synced and happy, and
> because I was accessing solr through a round-robin load balancer, I
> was never able to tell which node was out of sync.
>
> If it happens again, I'll make node-by-node requests and try to figure
> out what's different about the failing one. But the fact that this
> happened (and the way it happened) is making me wonder if/how I can
> automate this automated staging environment scaling reliably and with
> confidence that it will Just Work™.

That image didn't make it to the mailing list.  Your JSON showing errors
did, though.  Your description of the diagram is good -- sounds like it
was all green and looked exactly how you expected it to look.

What you've described sounds like there may be a problem in the RESTORE
action on the collections API, or possibly a problem with your shared
storage where you put the backups, so the restored data on one replica
isn't faithful to the backup.  I don't know very much about that code,
and what you've described makes me think that this is going to be a hard
one to track down.

Thanks,
Shawn

Re: Replication Question

2017-08-02 Thread Michael B. Klein

And the one that isn't getting the updates is the one marked in the cloud
diagram as the leader.

/me bangs head on desk

On Wed, Aug 2, 2017 at 10:31 AM, Michael B. Klein  wrote:

> Another observation: After bringing the cluster back up just now, the
> "1-in-3 nodes don't get the updates" issue persists, even with the cloud
> diagram showing 3 nodes, all green.
>
> On Wed, Aug 2, 2017 at 9:56 AM, Michael B. Klein 
> wrote:
>
>> Thanks for your responses, Shawn and Erick.
>>
>> Some clarification questions, but first a description of my
>> (non-standard) use case:
>>
>> My Zookeeper/SolrCloud cluster is running on Amazon AWS. Things are
>> working well so far on the production cluster (knock wood); its the staging
>> cluster that's giving me fits. Here's why: In order to save money, I have
>> the AWS auto-scaler scale the cluster down to zero nodes when it's not in
>> use. Here's the (automated) procedure:
>>
>> SCALE DOWN
>> 1) Call admin/collections?action=BACKUP for each collection to a shared
>> NFS volume
>> 2) Shut down all the nodes
>>
>> SCALE UP
>> 1) Spin up 2 Zookeeper nodes and wait for them to stabilize
>> 2) Spin up 3 Solr nodes and wait for them to show up under Zookeeper's
>> live_nodes
>> 3) Call admin/collections?action=RESTORE to put all the collections back
>>
>> This has been working very well, for the most part, with the following
>> complications/observations:
>>
>> 1) If I don't optimize each collection right before BACKUP, the backup
>> fails (see the attached solr_backup_error.json).
>> 2) If I don't specify a replicationFactor during RESTORE, the admin
>> interface's Cloud diagram only shows one active node per collection. Is
>> this expected? Am I required to specify the replicationFactor unless I'm
>> using a shared HDFS volume for solr data?
>> 3) If I don't specify maxShardsPerNode=1 during RESTORE, I get a warning
>> message in the response, even though the restore seems to succeed.
>> 4) Aside from the replicationFactor parameter on the CREATE/RESTORE, I do
>> not currently have any replication stuff configured (as it seems I should
>> not).
>> 5) At the time my "1-in-3 requests are failing" issue occurred, the Cloud
>> diagram looked like the attached solr_admin_cloud_diagram.png. It seemed to
>> think all replicas were live and synced and happy, and because I was
>> accessing solr through a round-robin load balancer, I was never able to
>> tell which node was out of sync.
>>
>> If it happens again, I'll make node-by-node requests and try to figure
>> out what's different about the failing one. But the fact that this happened
>> (and the way it happened) is making me wonder if/how I can automate this
>> automated staging environment scaling reliably and with confidence that it
>> will Just Work™.
>>
>> Comments and suggestions would be GREATLY appreciated.
>>
>> Michael
>>
>>
>>
>> On Tue, Aug 1, 2017 at 8:14 PM, Erick Erickson 
>> wrote:
>>
>>> And please do not use optimize unless your index is
>>> totally static. I only recommend it when the pattern is
>>> to update the index periodically, like every day or
>>> something and not update any docs in between times.
>>>
>>> Implied in Shawn's e-mail was that you should undo
>>> anything you've done in terms of configuring replication,
>>> just go with the defaults.
>>>
>>> Finally, my bet is that your problematic Solr node is misconfigured.
>>>
>>> Best,
>>> Erick
>>>
>>> On Tue, Aug 1, 2017 at 2:36 PM, Shawn Heisey 
>>> wrote:
>>> > On 8/1/2017 12:09 PM, Michael B. Klein wrote:
>>> >> I have a 3-node solrcloud cluster orchestrated by zookeeper. Most
>>> stuff
>>> >> seems to be working OK, except that one of the nodes never seems to
>>> get its
>>> >> replica updated.
>>> >>
>>> >> Queries take place through a non-caching, round-robin load balancer.
>>> The
>>> >> collection looks fine, with one shard and a replicationFactor of 3.
>>> >> Everything in the cloud diagram is green.
>>> >>
>>> >> But if I (for example) select?q=id:hd76s004z, the results come up
>>> empty 1
>>> >> out of every 3 times.
>>> >>
>>> >> Even several minutes after a commit and optimize, one replica still
>>> isn’t
>>> >> returning the right info.
>>> >>
>>> >> Do I need to configure my `solrconfig.xml` with `replicateAfter`
>>> options on
>>> >> the `/replication` requestHandler, or is that a non-solrcloud,
>>> >> standalone-replication thing?
>>> >
>>> > This is one of the more confusing aspects of SolrCloud.
>>> >
>>> > When everything is working perfectly in a SolrCloud install, the
>>> feature
>>> > in Solr called "replication" is *never* used.  SolrCloud does require
>>> > the replication feature, though ... which is what makes this whole
>>> thing
>>> > very confusing.
>>> >
>>> > Replication is used to replicate an entire Lucene index (consisting of
>>> a
>>> > bunch of files on the disk) from a core on a master server to a core on
>>> > a slave server.

Re: Replication Question

2017-08-02 Thread Michael B. Klein

Another observation: After bringing the cluster back up just now, the
"1-in-3 nodes don't get the updates" issue persists, even with the cloud
diagram showing 3 nodes, all green.

On Wed, Aug 2, 2017 at 9:56 AM, Michael B. Klein  wrote:

> Thanks for your responses, Shawn and Erick.
>
> Some clarification questions, but first a description of my (non-standard)
> use case:
>
> My Zookeeper/SolrCloud cluster is running on Amazon AWS. Things are
> working well so far on the production cluster (knock wood); its the staging
> cluster that's giving me fits. Here's why: In order to save money, I have
> the AWS auto-scaler scale the cluster down to zero nodes when it's not in
> use. Here's the (automated) procedure:
>
> SCALE DOWN
> 1) Call admin/collections?action=BACKUP for each collection to a shared
> NFS volume
> 2) Shut down all the nodes
>
> SCALE UP
> 1) Spin up 2 Zookeeper nodes and wait for them to stabilize
> 2) Spin up 3 Solr nodes and wait for them to show up under Zookeeper's
> live_nodes
> 3) Call admin/collections?action=RESTORE to put all the collections back
>
> This has been working very well, for the most part, with the following
> complications/observations:
>
> 1) If I don't optimize each collection right before BACKUP, the backup
> fails (see the attached solr_backup_error.json).
> 2) If I don't specify a replicationFactor during RESTORE, the admin
> interface's Cloud diagram only shows one active node per collection. Is
> this expected? Am I required to specify the replicationFactor unless I'm
> using a shared HDFS volume for solr data?
> 3) If I don't specify maxShardsPerNode=1 during RESTORE, I get a warning
> message in the response, even though the restore seems to succeed.
> 4) Aside from the replicationFactor parameter on the CREATE/RESTORE, I do
> not currently have any replication stuff configured (as it seems I should
> not).
> 5) At the time my "1-in-3 requests are failing" issue occurred, the Cloud
> diagram looked like the attached solr_admin_cloud_diagram.png. It seemed to
> think all replicas were live and synced and happy, and because I was
> accessing solr through a round-robin load balancer, I was never able to
> tell which node was out of sync.
>
> If it happens again, I'll make node-by-node requests and try to figure out
> what's different about the failing one. But the fact that this happened
> (and the way it happened) is making me wonder if/how I can automate this
> automated staging environment scaling reliably and with confidence that it
> will Just Work™.
>
> Comments and suggestions would be GREATLY appreciated.
>
> Michael
>
>
>
> On Tue, Aug 1, 2017 at 8:14 PM, Erick Erickson 
> wrote:
>
>> And please do not use optimize unless your index is
>> totally static. I only recommend it when the pattern is
>> to update the index periodically, like every day or
>> something and not update any docs in between times.
>>
>> Implied in Shawn's e-mail was that you should undo
>> anything you've done in terms of configuring replication,
>> just go with the defaults.
>>
>> Finally, my bet is that your problematic Solr node is misconfigured.
>>
>> Best,
>> Erick
>>
>> On Tue, Aug 1, 2017 at 2:36 PM, Shawn Heisey  wrote:
>> > On 8/1/2017 12:09 PM, Michael B. Klein wrote:
>> >> I have a 3-node solrcloud cluster orchestrated by zookeeper. Most stuff
>> >> seems to be working OK, except that one of the nodes never seems to
>> get its
>> >> replica updated.
>> >>
>> >> Queries take place through a non-caching, round-robin load balancer.
>> The
>> >> collection looks fine, with one shard and a replicationFactor of 3.
>> >> Everything in the cloud diagram is green.
>> >>
>> >> But if I (for example) select?q=id:hd76s004z, the results come up
>> empty 1
>> >> out of every 3 times.
>> >>
>> >> Even several minutes after a commit and optimize, one replica still
>> isn’t
>> >> returning the right info.
>> >>
>> >> Do I need to configure my `solrconfig.xml` with `replicateAfter`
>> options on
>> >> the `/replication` requestHandler, or is that a non-solrcloud,
>> >> standalone-replication thing?
>> >
>> > This is one of the more confusing aspects of SolrCloud.
>> >
>> > When everything is working perfectly in a SolrCloud install, the feature
>> > in Solr called "replication" is *never* used.  SolrCloud does require
>> > the replication feature, though ... which is what makes this whole thing
>> > very confusing.
>> >
>> > Replication is used to replicate an entire Lucene index (consisting of a
>> > bunch of files on the disk) from a core on a master server to a core on
>> > a slave server.  This is how replication was done before SolrCloud was
>> > created.
>> >
>> > The way that SolrCloud keeps replicas in sync is *entirely* different.
>> > SolrCloud has no masters and no slaves.  When you index or delete a
>> > document in a SolrCloud collection, the request is forwarded to the
>> > leader of the correct shard for that

Re: Replication Question

2017-08-02 Thread Michael B. Klein

Thanks for your responses, Shawn and Erick.

Some clarification questions, but first a description of my (non-standard)
use case:

My Zookeeper/SolrCloud cluster is running on Amazon AWS. Things are working
well so far on the production cluster (knock wood); its the staging cluster
that's giving me fits. Here's why: In order to save money, I have the AWS
auto-scaler scale the cluster down to zero nodes when it's not in use.
Here's the (automated) procedure:

SCALE DOWN
1) Call admin/collections?action=BACKUP for each collection to a shared NFS
volume
2) Shut down all the nodes

SCALE UP
1) Spin up 2 Zookeeper nodes and wait for them to stabilize
2) Spin up 3 Solr nodes and wait for them to show up under Zookeeper's
live_nodes
3) Call admin/collections?action=RESTORE to put all the collections back

This has been working very well, for the most part, with the following
complications/observations:

1) If I don't optimize each collection right before BACKUP, the backup
fails (see the attached solr_backup_error.json).
2) If I don't specify a replicationFactor during RESTORE, the admin
interface's Cloud diagram only shows one active node per collection. Is
this expected? Am I required to specify the replicationFactor unless I'm
using a shared HDFS volume for solr data?
3) If I don't specify maxShardsPerNode=1 during RESTORE, I get a warning
message in the response, even though the restore seems to succeed.
4) Aside from the replicationFactor parameter on the CREATE/RESTORE, I do
not currently have any replication stuff configured (as it seems I should
not).
5) At the time my "1-in-3 requests are failing" issue occurred, the Cloud
diagram looked like the attached solr_admin_cloud_diagram.png. It seemed to
think all replicas were live and synced and happy, and because I was
accessing solr through a round-robin load balancer, I was never able to
tell which node was out of sync.

If it happens again, I'll make node-by-node requests and try to figure out
what's different about the failing one. But the fact that this happened
(and the way it happened) is making me wonder if/how I can automate this
automated staging environment scaling reliably and with confidence that it
will Just Work™.

Comments and suggestions would be GREATLY appreciated.

Michael

On Tue, Aug 1, 2017 at 8:14 PM, Erick Erickson 
wrote:

> And please do not use optimize unless your index is
> totally static. I only recommend it when the pattern is
> to update the index periodically, like every day or
> something and not update any docs in between times.
>
> Implied in Shawn's e-mail was that you should undo
> anything you've done in terms of configuring replication,
> just go with the defaults.
>
> Finally, my bet is that your problematic Solr node is misconfigured.
>
> Best,
> Erick
>
> On Tue, Aug 1, 2017 at 2:36 PM, Shawn Heisey  wrote:
> > On 8/1/2017 12:09 PM, Michael B. Klein wrote:
> >> I have a 3-node solrcloud cluster orchestrated by zookeeper. Most stuff
> >> seems to be working OK, except that one of the nodes never seems to get
> its
> >> replica updated.
> >>
> >> Queries take place through a non-caching, round-robin load balancer. The
> >> collection looks fine, with one shard and a replicationFactor of 3.
> >> Everything in the cloud diagram is green.
> >>
> >> But if I (for example) select?q=id:hd76s004z, the results come up empty
> 1
> >> out of every 3 times.
> >>
> >> Even several minutes after a commit and optimize, one replica still
> isn’t
> >> returning the right info.
> >>
> >> Do I need to configure my `solrconfig.xml` with `replicateAfter`
> options on
> >> the `/replication` requestHandler, or is that a non-solrcloud,
> >> standalone-replication thing?
> >
> > This is one of the more confusing aspects of SolrCloud.
> >
> > When everything is working perfectly in a SolrCloud install, the feature
> > in Solr called "replication" is *never* used.  SolrCloud does require
> > the replication feature, though ... which is what makes this whole thing
> > very confusing.
> >
> > Replication is used to replicate an entire Lucene index (consisting of a
> > bunch of files on the disk) from a core on a master server to a core on
> > a slave server.  This is how replication was done before SolrCloud was
> > created.
> >
> > The way that SolrCloud keeps replicas in sync is *entirely* different.
> > SolrCloud has no masters and no slaves.  When you index or delete a
> > document in a SolrCloud collection, the request is forwarded to the
> > leader of the correct shard for that document.  The leader then sends a
> > copy of that request to all the other replicas, and each replica
> > (including the leader) independently handles the updates that are in the
> > request.  Since all replicas index the same content, they stay in sync.
> >
> > What SolrCloud does with the replication feature is index recovery.  In
> > some situations recovery can be done from the leader's transaction log,
> >

Re: Replication Question

2017-08-01 Thread Erick Erickson

And please do not use optimize unless your index is
totally static. I only recommend it when the pattern is
to update the index periodically, like every day or
something and not update any docs in between times.

Implied in Shawn's e-mail was that you should undo
anything you've done in terms of configuring replication,
just go with the defaults.

Finally, my bet is that your problematic Solr node is misconfigured.

Best,
Erick

On Tue, Aug 1, 2017 at 2:36 PM, Shawn Heisey  wrote:
> On 8/1/2017 12:09 PM, Michael B. Klein wrote:
>> I have a 3-node solrcloud cluster orchestrated by zookeeper. Most stuff
>> seems to be working OK, except that one of the nodes never seems to get its
>> replica updated.
>>
>> Queries take place through a non-caching, round-robin load balancer. The
>> collection looks fine, with one shard and a replicationFactor of 3.
>> Everything in the cloud diagram is green.
>>
>> But if I (for example) select?q=id:hd76s004z, the results come up empty 1
>> out of every 3 times.
>>
>> Even several minutes after a commit and optimize, one replica still isn’t
>> returning the right info.
>>
>> Do I need to configure my `solrconfig.xml` with `replicateAfter` options on
>> the `/replication` requestHandler, or is that a non-solrcloud,
>> standalone-replication thing?
>
> This is one of the more confusing aspects of SolrCloud.
>
> When everything is working perfectly in a SolrCloud install, the feature
> in Solr called "replication" is *never* used.  SolrCloud does require
> the replication feature, though ... which is what makes this whole thing
> very confusing.
>
> Replication is used to replicate an entire Lucene index (consisting of a
> bunch of files on the disk) from a core on a master server to a core on
> a slave server.  This is how replication was done before SolrCloud was
> created.
>
> The way that SolrCloud keeps replicas in sync is *entirely* different.
> SolrCloud has no masters and no slaves.  When you index or delete a
> document in a SolrCloud collection, the request is forwarded to the
> leader of the correct shard for that document.  The leader then sends a
> copy of that request to all the other replicas, and each replica
> (including the leader) independently handles the updates that are in the
> request.  Since all replicas index the same content, they stay in sync.
>
> What SolrCloud does with the replication feature is index recovery.  In
> some situations recovery can be done from the leader's transaction log,
> but when a replica has gotten so far out of sync that the only option
> available is to completely replace the index on the bad replica,
> SolrCloud will fire up the replication feature and create an exact copy
> of the index from the replica that is currently elected as leader.
> SolrCloud temporarily designates the leader core as master and the bad
> replica as slave, then initiates a one-time replication.  This is all
> completely automated and requires no configuration or input from the
> administrator.
>
> The configuration elements you have asked about are for the old
> master-slave replication setup and do not apply to SolrCloud at all.
>
> What I would recommend that you do to solve your immediate issue:  Shut
> down the Solr instance that is having the problem, rename the "data"
> directory in the core that isn't working right to something else, and
> start Solr back up.  As long as you still have at least one good replica
> in the cloud, SolrCloud will see that the index data is gone and copy
> the index from the leader.  You could delete the data directory instead
> of renaming it, but that would leave you with no "undo" option.
>
> Thanks,
> Shawn
>

Re: Replication Question

2017-08-01 Thread Shawn Heisey

On 8/1/2017 12:09 PM, Michael B. Klein wrote:
> I have a 3-node solrcloud cluster orchestrated by zookeeper. Most stuff
> seems to be working OK, except that one of the nodes never seems to get its
> replica updated.
>
> Queries take place through a non-caching, round-robin load balancer. The
> collection looks fine, with one shard and a replicationFactor of 3.
> Everything in the cloud diagram is green.
>
> But if I (for example) select?q=id:hd76s004z, the results come up empty 1
> out of every 3 times.
>
> Even several minutes after a commit and optimize, one replica still isn’t
> returning the right info.
>
> Do I need to configure my `solrconfig.xml` with `replicateAfter` options on
> the `/replication` requestHandler, or is that a non-solrcloud,
> standalone-replication thing?

This is one of the more confusing aspects of SolrCloud.

When everything is working perfectly in a SolrCloud install, the feature
in Solr called "replication" is *never* used.  SolrCloud does require
the replication feature, though ... which is what makes this whole thing
very confusing.

Replication is used to replicate an entire Lucene index (consisting of a
bunch of files on the disk) from a core on a master server to a core on
a slave server.  This is how replication was done before SolrCloud was
created.

The way that SolrCloud keeps replicas in sync is *entirely* different. 
SolrCloud has no masters and no slaves.  When you index or delete a
document in a SolrCloud collection, the request is forwarded to the
leader of the correct shard for that document.  The leader then sends a
copy of that request to all the other replicas, and each replica
(including the leader) independently handles the updates that are in the
request.  Since all replicas index the same content, they stay in sync.

What SolrCloud does with the replication feature is index recovery.  In
some situations recovery can be done from the leader's transaction log,
but when a replica has gotten so far out of sync that the only option
available is to completely replace the index on the bad replica,
SolrCloud will fire up the replication feature and create an exact copy
of the index from the replica that is currently elected as leader. 
SolrCloud temporarily designates the leader core as master and the bad
replica as slave, then initiates a one-time replication.  This is all
completely automated and requires no configuration or input from the
administrator.

The configuration elements you have asked about are for the old
master-slave replication setup and do not apply to SolrCloud at all.

What I would recommend that you do to solve your immediate issue:  Shut
down the Solr instance that is having the problem, rename the "data"
directory in the core that isn't working right to something else, and
start Solr back up.  As long as you still have at least one good replica
in the cloud, SolrCloud will see that the index data is gone and copy
the index from the leader.  You could delete the data directory instead
of renaming it, but that would leave you with no "undo" option.

Thanks,
Shawn

Replication Question

2017-08-01 Thread Michael B. Klein

I have a 3-node solrcloud cluster orchestrated by zookeeper. Most stuff
seems to be working OK, except that one of the nodes never seems to get its
replica updated.

Queries take place through a non-caching, round-robin load balancer. The
collection looks fine, with one shard and a replicationFactor of 3.
Everything in the cloud diagram is green.

But if I (for example) select?q=id:hd76s004z, the results come up empty 1
out of every 3 times.

Even several minutes after a commit and optimize, one replica still isn’t
returning the right info.

Do I need to configure my `solrconfig.xml` with `replicateAfter` options on
the `/replication` requestHandler, or is that a non-solrcloud,
standalone-replication thing?

Michael

SOLR replication question?

2013-07-29 Thread SolrLover

I am currently using SOLR 4.4. but not planning to use solrcloud in very near
future.

I have 3 master / 3 slave setup. Each master is linked to its corresponding
slave.. I have disabled auto polling.. 

We do both push (using MQ) and pull indexing using SOLRJ indexing program.

I have enabled soft commit in slave (to view the changes immediately pushed
by queue).

I am thinking of doing the batch indexing in master (optimize and hard
commit) and push indexing in both master / slave. 

I am trying to do more testing with my configuration but thought of getting
to know some answers before diving very deep...

Since the queue pushes the docs in master / slave there is a possibility of
slave having more record compared to master (when master is busy doing batch
indexing).. What would happen if the slave has additional segments compared
to Master. will that be deleted when the replication happens.

If a message is pushed from a queue to both master and slave during
replication, will there be a latency in seeing that document even if we use
softcommit in slave?

We want to make sure that we are not missing any documents from queue (since
its updated via UI and we don't really store that data anywhere except in
index).







--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-replication-question-tp4081161.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLR replication question?

2013-07-29 Thread Shawn Heisey

 I am currently using SOLR 4.4. but not planning to use solrcloud in very
near
 future.
 I have 3 master / 3 slave setup. Each master is linked to its
 corresponding
 slave.. I have disabled auto polling..
 We do both push (using MQ) and pull indexing using SOLRJ indexing
program.
 I have enabled soft commit in slave (to view the changes immediately pushed
 by queue).
 I am thinking of doing the batch indexing in master (optimize and hard
commit) and push indexing in both master / slave.
 I am trying to do more testing with my configuration but thought of getting
 to know some answers before diving very deep...
 Since the queue pushes the docs in master / slave there is a possibility of
 slave having more record compared to master (when master is busy doing
batch
 indexing).. What would happen if the slave has additional segments compared
 to Master. will that be deleted when the replication happens.
 If a message is pushed from a queue to both master and slave during
replication, will there be a latency in seeing that document even if we
use
 softcommit in slave?
 We want to make sure that we are not missing any documents from queue
(since
 its updated via UI and we don't really store that data anywhere except
in
 index)

If you are doing replication, then all updates must go to the master
server. You cannot update the slave directly. The replication happens, the
slave will be identical to the master... Any documents aent to only the
slave will be lost.

Replication will happen according to the interval you have configured, or
since you say you have disabled polling, according to whatever schedule
you manually trigger a replication.

SolrCloud would probably be a better fit for you. With a properly
configured SolrCloud you just index to any host in the cloud and documents
end up exactly where they need to go, and all replicas get updated.

Thanks,
Shawn

Re: Solr HTTP Replication Question

2013-01-25 Thread Amit Nithian

Okay one last note... just for closure... looks like it was addressed in
solr 4.1+ (I was looking at 4.0).


On Thu, Jan 24, 2013 at 11:14 PM, Amit Nithian anith...@gmail.com wrote:

 Okay so after some debugging I found the problem. While the replication
 piece will download the index from the master server and move the files to
 the index directory but during the commit phase, these older generation
 files are deleted and the index is essentially left in tact.

 I noticed that a full copy is needed if the index is stale (meaning that
 files in common between the master and slave have different sizes) but also
 I think a full copy should be needed if the slaves generation is higher
 than the master as well. In short, to me it's not sufficient enough to
 simply say a full copy is needed if the slave's index version is =
 master's index version. I'll create a patch and file a bug along with a
 more thorough writeup of how I got in this state.

 Thanks!
 Amit



 On Thu, Jan 24, 2013 at 2:33 PM, Amit Nithian anith...@gmail.com wrote:

 Does Solr's replication look at the generation difference between master
 and slave when determining whether or not to replicate?

 To be more clear:
 What happens if a slave's generation is higher than the master yet the
 slave's index version is less than the master's index version?

 I looked at the source and didn't seem to see any reason why the
 generation matters other than fetching the file list from the master for a
 given generation. It's too wordy to explain how this happened so I'll go
 into details on that if anyone cares.

 Thanks!
 Amit

Re: Solr HTTP Replication Question

2013-01-24 Thread Amit Nithian

Okay so after some debugging I found the problem. While the replication
piece will download the index from the master server and move the files to
the index directory but during the commit phase, these older generation
files are deleted and the index is essentially left in tact.

I noticed that a full copy is needed if the index is stale (meaning that
files in common between the master and slave have different sizes) but also
I think a full copy should be needed if the slaves generation is higher
than the master as well. In short, to me it's not sufficient enough to
simply say a full copy is needed if the slave's index version is =
master's index version. I'll create a patch and file a bug along with a
more thorough writeup of how I got in this state.

Thanks!
Amit



On Thu, Jan 24, 2013 at 2:33 PM, Amit Nithian anith...@gmail.com wrote:

 Does Solr's replication look at the generation difference between master
 and slave when determining whether or not to replicate?

 To be more clear:
 What happens if a slave's generation is higher than the master yet the
 slave's index version is less than the master's index version?

 I looked at the source and didn't seem to see any reason why the
 generation matters other than fetching the file list from the master for a
 given generation. It's too wordy to explain how this happened so I'll go
 into details on that if anyone cares.

 Thanks!
 Amit

Re: SolrCloud replication question

2012-07-30 Thread Jan Høydahl

Hi,

Interesting article in your link. What servlet container do you use and how is 
it configured wrt. threads etc? You should be able to utilize all CPUs with a 
single Solr index, given that you are not I/O bound.. Also, what is your 
mergeFactor?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 9. juli 2012, at 22:11, avenka wrote:

 Hmm, never mind my question about replicating using symlinks. Given that
 replication on a single machine improves throughput, I should be able to get
 a similar improvement by simply sharding on a single machine. As also
 observed at
 
 http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/
 
 I am now benchmarking my workload to compare replication vs. sharding
 performance on a single machine.
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761p3994017.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud replication question

2012-07-10 Thread Erick Erickson

The symlink thing sounds... complicated, but as you say you're going
another route

The indexing speed you're seeing is surprisingly slow, I'd get to the root
of the timeouts before giving up. SolrCloud simply _can't_ be that slow
by design, something about your setup is causing that I suspect. The
timeouts you're seeing are certainly a clue here. Incoming updates have
a couple of things happen

1 the incoming request is pulled apart. Any docs for this shard are
 indexed and forwarded to any replicas.
2 any docs that are for a different shard are packed up and forwarded
 to the leader for that shard. Which in turn distributes them to any
 replicas.

So I _suspect_ that indexing will be a bit slower, there's some additional
communication going on. But not _that_ much slower. Any clue what your
slow server is doing that would cause it to timeout?

Best
Erick

On Mon, Jul 9, 2012 at 4:11 PM, avenka ave...@gmail.com wrote:
 Hmm, never mind my question about replicating using symlinks. Given that
 replication on a single machine improves throughput, I should be able to get
 a similar improvement by simply sharding on a single machine. As also
 observed at

 http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/

 I am now benchmarking my workload to compare replication vs. sharding
 performance on a single machine.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761p3994017.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud replication question

2012-07-09 Thread Erick Erickson

No, you're misunderstanding the setup. Each replica has a complete
index. Updates get automatically forwarded to _both_ nodes for a
particular shard. So, when a doc comes in to be indexed, it gets
sent to the leader for, say, shard1. From there:
1 it gets indexed on the leader
2 it gets forwarded to the replica(s) where it gets indexed locally.

Each replica has a complete index (for that shard).

There is no master/slave setup any more. And you do
_not_ have to configure replication.

Best
Erick

On Sun, Jul 8, 2012 at 1:03 PM, avenka ave...@gmail.com wrote:
 I am trying to wrap my head around replication in SolrCloud. I tried the
 setup at http://wiki.apache.org/solr/SolrCloud/. I mainly need replication
 for high query throughput. The setup at the URL above appears to maintain
 just one copy of the index at the primary node (instead of a replicated
 index as in a master/slave configuration). Will I still get roughly an
 n-fold increase in query throughput with n replicas? And if so, why would
 one do master/slave replication with multiple copies of the index at all?

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud replication question

2012-07-09 Thread avenka

Erick, thanks. I now do see segment files in an index.timestamp directory at 
the replicas. Not sure why they were not getting populated earlier. 

I have a couple more questions, the second is more elaborate - let me know if I 
should move it to a separate thread.

(1) The speed of adding documents in SolrCloud is excruciatingly slow. It takes 
about 30-50 seconds to add a batch of 100 documents (and about twice that to 
add 200, etc.) to the primary but just ~10 seconds to add 5K documents in 
batches of 200 on a standalone solr 4 server. The log files indicate that the 
primary is timing out with messages like below and Cloud-Graph in the UI shows 
the other two replicas in orange after starting green.
 org.apache.solr.client.solrj.SolrServerException: Timeout occured while 
waiting response from server at: http://localhost:7574/solr

Any idea why?

(3) I am seriously considering using symbolic links for a replicated solr setup 
with completely independent instances on a *single machine*. Tell me if I am 
thinking about this incorrectly. Here is my reasoning: 

(a) Master/slave replication in 3.6 simply seems old school as it doesn't have 
the nice consistency properties of SolrCloud. Polling say every 20 seconds 
means I don't know exactly how up-to-speed each replica is, which will 
complicate my request re-distribution.

(b) SolrCloud seems like a great alternative to master/slave replication. But 
it seems slow (see 1) and having played with it, I don't feel comfortable with 
the maturity of ZK integration (or my comprehension of it) in solr 4 alpha. 

(c) Symbolic links seem like the fastest and most space-efficient solution 
*provided* there is only a single writer, which is just fine for me. I plan to 
run completely separate solr instances with one designated as the primary and 
do the following operations in sequence: Add a batch to the primary and commit 
-- From each replica's index directory, remove all symlinks and re-create 
symlinks to segment files in the primary (but not the write.lock file) -- Call 
update?commit=true to force replicas to re-load their in-memory index -- Do 
whatever read-only processing is required on the batch using the primary and 
all replicas by manually (randomly) distributing read requests -- Repeat 
sequence.

Is there any downside to 3(c) (other than maintaining a trivial script to 
manage symlinks and call commit)? I tested it on small index sizes and it seems 
to work fine. The throughput improves with more replicas (for 2-4 replicas) as 
a single replica is not enough to saturate the machine (due to high query 
latency). Am I overlooking something in this setup?

Overall, I need high throughput and minimal latency from the time a document is 
added to the time it is available at a replica. SolrCloud's automated request 
redirection, consistency, and fault-tolerance is awesome for a physically 
distributed setup, but I don't see how it beats 3(c) in a single-writer, 
single-machine, replicated setup.

AV

On Jul 9, 2012, at 9:43 AM, Erick Erickson [via Lucene] wrote:

 No, you're misunderstanding the setup. Each replica has a complete 
 index. Updates get automatically forwarded to _both_ nodes for a 
 particular shard. So, when a doc comes in to be indexed, it gets 
 sent to the leader for, say, shard1. From there: 
 1 it gets indexed on the leader 
 2 it gets forwarded to the replica(s) where it gets indexed locally. 
 
 Each replica has a complete index (for that shard). 
 
 There is no master/slave setup any more. And you do 
 _not_ have to configure replication. 
 
 Best 
 Erick 
 
 On Sun, Jul 8, 2012 at 1:03 PM, avenka [hidden email] wrote:
 
  I am trying to wrap my head around replication in SolrCloud. I tried the 
  setup at http://wiki.apache.org/solr/SolrCloud/. I mainly need replication 
  for high query throughput. The setup at the URL above appears to maintain 
  just one copy of the index at the primary node (instead of a replicated 
  index as in a master/slave configuration). Will I still get roughly an 
  n-fold increase in query throughput with n replicas? And if so, why would 
  one do master/slave replication with multiple copies of the index at all? 
  
  -- 
  View this message in context: 
  http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761.html
  Sent from the Solr - User mailing list archive at Nabble.com. 
 
 
 If you reply to this email, your message will be added to the discussion 
 below:
 http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761p3993889.html
 To unsubscribe from SolrCloud replication question, click here.
 NAML



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761p3993960.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud replication question

2012-07-09 Thread avenka

Hmm, never mind my question about replicating using symlinks. Given that
replication on a single machine improves throughput, I should be able to get
a similar improvement by simply sharding on a single machine. As also
observed at
 
http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/

I am now benchmarking my workload to compare replication vs. sharding
performance on a single machine.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761p3994017.html
Sent from the Solr - User mailing list archive at Nabble.com.

SolrCloud replication question

2012-07-08 Thread avenka

I am trying to wrap my head around replication in SolrCloud. I tried the
setup at http://wiki.apache.org/solr/SolrCloud/. I mainly need replication
for high query throughput. The setup at the URL above appears to maintain
just one copy of the index at the primary node (instead of a replicated
index as in a master/slave configuration). Will I still get roughly an
n-fold increase in query throughput with n replicas? And if so, why would
one do master/slave replication with multiple copies of the index at all?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Simple Slave Replication Question

2012-03-26 Thread Ben McCarthy

Hello,

Had to leave the office so didn't get a chance to reply.  Nothing in the logs.  
Just ran one through from the ingest tool.

Same results full copy of the index.

Is it something to do with:

server.commit();
server.optimize();

I call this at the end of the ingestion.

Would optimize then work across the whole index?

Thanks
Ben

-Original Message-
From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
Sent: 23 March 2012 15:10
To: solr-user@lucene.apache.org
Subject: Re: Simple Slave Replication Question

Also, what happens if, instead of adding the 40K docs you add just one and 
commit?

2012/3/23 Tomás Fernández Löbbe tomasflo...@gmail.com

 Have you changed the mergeFactor or are you using 10 as in the example
 solrconfig?

 What do you see in the slave's log during replication? Do you see any
 line like Skipping download for...?


 On Fri, Mar 23, 2012 at 11:57 AM, Ben McCarthy 
 ben.mccar...@tradermedia.co.uk wrote:

 I just have a index directory.

 I push the documents through with a change to a field.  Im using
 SOLRJ to do this.  Im using the guide from the wiki to setup the
 replication.  When the feed of updates to the master finishes I call
 a commit again using SOLRJ.  I then have a poll period of 5 minutes
 from the slave.  When it kicks in I see a new version of the index
 and then it copys the full 5gb index.

 Thanks
 Ben

 -Original Message-
 From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
 Sent: 23 March 2012 14:29
 To: solr-user@lucene.apache.org
 Subject: Re: Simple Slave Replication Question

 Hi Ben, only new segments are replicated from master to slave. In a
 situation where all the segments are new, this will cause the index
 to be fully replicated, but this rarely happen with incremental
 updates. It can also happen if the slave Solr assumes it has an invalid 
 index.
 Are you committing or optimizing on the slaves? After replication,
 the index directory on the slaves is called index or index.timestamp?

 Tomás

 On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy 
 ben.mccar...@tradermedia.co.uk wrote:

  So do you just simpy address this with big nic and network pipes.
 
  -Original Message-
  From: Martin Koch [mailto:m...@issuu.com]
  Sent: 23 March 2012 14:07
  To: solr-user@lucene.apache.org
  Subject: Re: Simple Slave Replication Question
 
  I guess this would depend on network bandwidth, but we move around
  150G/hour when hooking up a new slave to the master.
 
  /Martin
 
  On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy 
  ben.mccar...@tradermedia.co.uk wrote:
 
   Hello,
  
   Im looking at the replication from a master to a number of slaves.
   I have configured it and it appears to be working.  When updating
   40K records on the master is it standard to always copy over the
   full index, currently 5gb in size.  If this is standard what do
   people do who have massive 200gb indexs, does it not take a while
   to bring the
  slaves inline with the master?
  
   Thanks
   Ben
  
   
  
  
   This e-mail is sent on behalf of Trader Media Group Limited,
   Registered
   Office: Auto Trader House, Cutbush Park Industrial Estate,
   Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in 
   England No.
  4768833).
   This email and any files transmitted with it are confidential and
   may be legally privileged, and intended solely for the use of the
   individual or entity to whom they are addressed. If you have
   received this email in error please notify the sender. This email
   message has been swept for the presence of computer viruses.
  
  
 
  
 
 
  This e-mail is sent on behalf of Trader Media Group Limited,
  Registered
  Office: Auto Trader House, Cutbush Park Industrial Estate,
  Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England 
  No.
 4768833).
  This email and any files transmitted with it are confidential and
  may be legally privileged, and intended solely for the use of the
  individual or entity to whom they are addressed. If you have
  received this email in error please notify the sender. This email
  message has been swept for the presence of computer viruses.
 
 

 


 This e-mail is sent on behalf of Trader Media Group Limited,
 Registered
 Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill,
 Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833).
 This email and any files transmitted with it are confidential and may
 be legally privileged, and intended solely for the use of the
 individual or entity to whom they are addressed. If you have received
 this email in error please notify the sender. This email message has
 been swept for the presence of computer viruses.







This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: 
Auto Trader House

Re: Simple Slave Replication Question

2012-03-26 Thread Erick Erickson

It's the optimize step. Optimize essentially forces all the segments to
be copied into a single new segment, which means that your entire index
will be replicated to the slaves.

In recent Solrs, there's usually no need to optimize, so unless and until you
can demonstrate a noticeable change, I'd just leave the optimize step off. In
fact, trunk renames it to forceMerge or something just because it's so common
for people to think of course I want to optimize my index! and get the
unintended consequences you're seeing even thought the optimize doesn't
actually do that much good in most cases.

Some people just do the optimize once a day (or week or whatever) during
off-peak hours as a compromise.

Best
Erick


On Mon, Mar 26, 2012 at 5:02 AM, Ben McCarthy
ben.mccar...@tradermedia.co.uk wrote:
 Hello,

 Had to leave the office so didn't get a chance to reply.  Nothing in the 
 logs.  Just ran one through from the ingest tool.

 Same results full copy of the index.

 Is it something to do with:

 server.commit();
 server.optimize();

 I call this at the end of the ingestion.

 Would optimize then work across the whole index?

 Thanks
 Ben

 -Original Message-
 From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
 Sent: 23 March 2012 15:10
 To: solr-user@lucene.apache.org
 Subject: Re: Simple Slave Replication Question

 Also, what happens if, instead of adding the 40K docs you add just one and 
 commit?

 2012/3/23 Tomás Fernández Löbbe tomasflo...@gmail.com

 Have you changed the mergeFactor or are you using 10 as in the example
 solrconfig?

 What do you see in the slave's log during replication? Do you see any
 line like Skipping download for...?


 On Fri, Mar 23, 2012 at 11:57 AM, Ben McCarthy 
 ben.mccar...@tradermedia.co.uk wrote:

 I just have a index directory.

 I push the documents through with a change to a field.  Im using
 SOLRJ to do this.  Im using the guide from the wiki to setup the
 replication.  When the feed of updates to the master finishes I call
 a commit again using SOLRJ.  I then have a poll period of 5 minutes
 from the slave.  When it kicks in I see a new version of the index
 and then it copys the full 5gb index.

 Thanks
 Ben

 -Original Message-
 From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
 Sent: 23 March 2012 14:29
 To: solr-user@lucene.apache.org
 Subject: Re: Simple Slave Replication Question

 Hi Ben, only new segments are replicated from master to slave. In a
 situation where all the segments are new, this will cause the index
 to be fully replicated, but this rarely happen with incremental
 updates. It can also happen if the slave Solr assumes it has an invalid 
 index.
 Are you committing or optimizing on the slaves? After replication,
 the index directory on the slaves is called index or index.timestamp?

 Tomás

 On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy 
 ben.mccar...@tradermedia.co.uk wrote:

  So do you just simpy address this with big nic and network pipes.
 
  -Original Message-
  From: Martin Koch [mailto:m...@issuu.com]
  Sent: 23 March 2012 14:07
  To: solr-user@lucene.apache.org
  Subject: Re: Simple Slave Replication Question
 
  I guess this would depend on network bandwidth, but we move around
  150G/hour when hooking up a new slave to the master.
 
  /Martin
 
  On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy 
  ben.mccar...@tradermedia.co.uk wrote:
 
   Hello,
  
   Im looking at the replication from a master to a number of slaves.
   I have configured it and it appears to be working.  When updating
   40K records on the master is it standard to always copy over the
   full index, currently 5gb in size.  If this is standard what do
   people do who have massive 200gb indexs, does it not take a while
   to bring the
  slaves inline with the master?
  
   Thanks
   Ben
  
   
  
  
   This e-mail is sent on behalf of Trader Media Group Limited,
   Registered
   Office: Auto Trader House, Cutbush Park Industrial Estate,
   Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in 
   England No.
  4768833).
   This email and any files transmitted with it are confidential and
   may be legally privileged, and intended solely for the use of the
   individual or entity to whom they are addressed. If you have
   received this email in error please notify the sender. This email
   message has been swept for the presence of computer viruses.
  
  
 
  
 
 
  This e-mail is sent on behalf of Trader Media Group Limited,
  Registered
  Office: Auto Trader House, Cutbush Park Industrial Estate,
  Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England 
  No.
 4768833).
  This email and any files transmitted with it are confidential and
  may be legally privileged, and intended solely for the use of the
  individual or entity to whom they are addressed. If you have
  received this email in error please notify

RE: Simple Slave Replication Question

2012-03-26 Thread Ben McCarthy

That's great information.

Thanks for all the help and guidance, its been invaluable.

Thanks
Ben

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 26 March 2012 12:21
To: solr-user@lucene.apache.org
Subject: Re: Simple Slave Replication Question

It's the optimize step. Optimize essentially forces all the segments to be 
copied into a single new segment, which means that your entire index will be 
replicated to the slaves.

In recent Solrs, there's usually no need to optimize, so unless and until you 
can demonstrate a noticeable change, I'd just leave the optimize step off. In 
fact, trunk renames it to forceMerge or something just because it's so common 
for people to think of course I want to optimize my index! and get the 
unintended consequences you're seeing even thought the optimize doesn't 
actually do that much good in most cases.

Some people just do the optimize once a day (or week or whatever) during 
off-peak hours as a compromise.

Best
Erick


On Mon, Mar 26, 2012 at 5:02 AM, Ben McCarthy ben.mccar...@tradermedia.co.uk 
wrote:
 Hello,

 Had to leave the office so didn't get a chance to reply.  Nothing in the 
 logs.  Just ran one through from the ingest tool.

 Same results full copy of the index.

 Is it something to do with:

 server.commit();
 server.optimize();

 I call this at the end of the ingestion.

 Would optimize then work across the whole index?

 Thanks
 Ben

 -Original Message-
 From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
 Sent: 23 March 2012 15:10
 To: solr-user@lucene.apache.org
 Subject: Re: Simple Slave Replication Question

 Also, what happens if, instead of adding the 40K docs you add just one and 
 commit?

 2012/3/23 Tomás Fernández Löbbe tomasflo...@gmail.com

 Have you changed the mergeFactor or are you using 10 as in the
 example solrconfig?

 What do you see in the slave's log during replication? Do you see any
 line like Skipping download for...?


 On Fri, Mar 23, 2012 at 11:57 AM, Ben McCarthy 
 ben.mccar...@tradermedia.co.uk wrote:

 I just have a index directory.

 I push the documents through with a change to a field.  Im using
 SOLRJ to do this.  Im using the guide from the wiki to setup the
 replication.  When the feed of updates to the master finishes I call
 a commit again using SOLRJ.  I then have a poll period of 5 minutes
 from the slave.  When it kicks in I see a new version of the index
 and then it copys the full 5gb index.

 Thanks
 Ben

 -Original Message-
 From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
 Sent: 23 March 2012 14:29
 To: solr-user@lucene.apache.org
 Subject: Re: Simple Slave Replication Question

 Hi Ben, only new segments are replicated from master to slave. In a
 situation where all the segments are new, this will cause the index
 to be fully replicated, but this rarely happen with incremental
 updates. It can also happen if the slave Solr assumes it has an invalid 
 index.
 Are you committing or optimizing on the slaves? After replication,
 the index directory on the slaves is called index or index.timestamp?

 Tomás

 On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy 
 ben.mccar...@tradermedia.co.uk wrote:

  So do you just simpy address this with big nic and network pipes.
 
  -Original Message-
  From: Martin Koch [mailto:m...@issuu.com]
  Sent: 23 March 2012 14:07
  To: solr-user@lucene.apache.org
  Subject: Re: Simple Slave Replication Question
 
  I guess this would depend on network bandwidth, but we move around
  150G/hour when hooking up a new slave to the master.
 
  /Martin
 
  On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy 
  ben.mccar...@tradermedia.co.uk wrote:
 
   Hello,
  
   Im looking at the replication from a master to a number of slaves.
   I have configured it and it appears to be working.  When
   updating 40K records on the master is it standard to always copy
   over the full index, currently 5gb in size.  If this is standard
   what do people do who have massive 200gb indexs, does it not
   take a while to bring the
  slaves inline with the master?
  
   Thanks
   Ben
  
   
  
  
   This e-mail is sent on behalf of Trader Media Group Limited,
   Registered
   Office: Auto Trader House, Cutbush Park Industrial Estate,
   Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in 
   England No.
  4768833).
   This email and any files transmitted with it are confidential
   and may be legally privileged, and intended solely for the use
   of the individual or entity to whom they are addressed. If you
   have received this email in error please notify the sender. This
   email message has been swept for the presence of computer viruses.
  
  
 
  
 
 
  This e-mail is sent on behalf of Trader Media Group Limited,
  Registered
  Office: Auto Trader House, Cutbush Park Industrial Estate,
  Danehill, Lower Earley, Reading, Berkshire, RG6

Simple Slave Replication Question

2012-03-23 Thread Ben McCarthy

Hello,

Im looking at the replication from a master to a number of slaves.  I have 
configured it and it appears to be working.  When updating 40K records on the 
master is it standard to always copy over the full index, currently 5gb in 
size.  If this is standard what do people do who have massive 200gb indexs, 
does it not take a while to bring the slaves inline with the master?

Thanks
Ben




This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: 
Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, 
Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and 
any files transmitted with it are confidential and may be legally privileged, 
and intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses.

Re: Simple Slave Replication Question

2012-03-23 Thread Martin Koch

I guess this would depend on network bandwidth, but we move around
150G/hour when hooking up a new slave to the master.

/Martin

On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy 
ben.mccar...@tradermedia.co.uk wrote:

 Hello,

 Im looking at the replication from a master to a number of slaves.  I have
 configured it and it appears to be working.  When updating 40K records on
 the master is it standard to always copy over the full index, currently 5gb
 in size.  If this is standard what do people do who have massive 200gb
 indexs, does it not take a while to bring the slaves inline with the master?

 Thanks
 Ben

 


 This e-mail is sent on behalf of Trader Media Group Limited, Registered
 Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower
 Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833).
 This email and any files transmitted with it are confidential and may be
 legally privileged, and intended solely for the use of the individual or
 entity to whom they are addressed. If you have received this email in error
 please notify the sender. This email message has been swept for the
 presence of computer viruses.

RE: Simple Slave Replication Question

2012-03-23 Thread Ben McCarthy

So do you just simpy address this with big nic and network pipes.

-Original Message-
From: Martin Koch [mailto:m...@issuu.com]
Sent: 23 March 2012 14:07
To: solr-user@lucene.apache.org
Subject: Re: Simple Slave Replication Question

I guess this would depend on network bandwidth, but we move around 150G/hour 
when hooking up a new slave to the master.

/Martin

On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy  
ben.mccar...@tradermedia.co.uk wrote:

 Hello,

 Im looking at the replication from a master to a number of slaves.  I
 have configured it and it appears to be working.  When updating 40K
 records on the master is it standard to always copy over the full
 index, currently 5gb in size.  If this is standard what do people do
 who have massive 200gb indexs, does it not take a while to bring the slaves 
 inline with the master?

 Thanks
 Ben

 This e-mail is sent on behalf of Trader Media Group Limited,
 Registered
 Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill,
 Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833).
 This email and any files transmitted with it are confidential and may
 be legally privileged, and intended solely for the use of the
 individual or entity to whom they are addressed. If you have received
 this email in error please notify the sender. This email message has
 been swept for the presence of computer viruses.

This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: 
Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, 
Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and 
any files transmitted with it are confidential and may be legally privileged, 
and intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses.

Re: Simple Slave Replication Question

2012-03-23 Thread Tomás Fernández Löbbe

Hi Ben, only new segments are replicated from master to slave. In a
situation where all the segments are new, this will cause the index to be
fully replicated, but this rarely happen with incremental updates. It can
also happen if the slave Solr assumes it has an invalid index.
Are you committing or optimizing on the slaves? After replication, the
index directory on the slaves is called index or index.timestamp?

Tomás

On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy 
ben.mccar...@tradermedia.co.uk wrote:

 So do you just simpy address this with big nic and network pipes.

 -Original Message-
 From: Martin Koch [mailto:m...@issuu.com]
 Sent: 23 March 2012 14:07
 To: solr-user@lucene.apache.org
 Subject: Re: Simple Slave Replication Question

 I guess this would depend on network bandwidth, but we move around
 150G/hour when hooking up a new slave to the master.

 /Martin

 On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy 
 ben.mccar...@tradermedia.co.uk wrote:

  Hello,
 
  Im looking at the replication from a master to a number of slaves.  I
  have configured it and it appears to be working.  When updating 40K
  records on the master is it standard to always copy over the full
  index, currently 5gb in size.  If this is standard what do people do
  who have massive 200gb indexs, does it not take a while to bring the
 slaves inline with the master?
 
  Thanks
  Ben
 
  
 
 
  This e-mail is sent on behalf of Trader Media Group Limited,
  Registered
  Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill,
  Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No.
 4768833).
  This email and any files transmitted with it are confidential and may
  be legally privileged, and intended solely for the use of the
  individual or entity to whom they are addressed. If you have received
  this email in error please notify the sender. This email message has
  been swept for the presence of computer viruses.
 
 

 


 This e-mail is sent on behalf of Trader Media Group Limited, Registered
 Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower
 Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833).
 This email and any files transmitted with it are confidential and may be
 legally privileged, and intended solely for the use of the individual or
 entity to whom they are addressed. If you have received this email in error
 please notify the sender. This email message has been swept for the
 presence of computer viruses.

RE: Simple Slave Replication Question

2012-03-23 Thread Ben McCarthy

I just have a index directory.

I push the documents through with a change to a field.  Im using SOLRJ to do 
this.  Im using the guide from the wiki to setup the replication.  When the 
feed of updates to the master finishes I call a commit again using SOLRJ.  I 
then have a poll period of 5 minutes from the slave.  When it kicks in I see a 
new version of the index and then it copys the full 5gb index.

Thanks
Ben

-Original Message-
From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
Sent: 23 March 2012 14:29
To: solr-user@lucene.apache.org
Subject: Re: Simple Slave Replication Question

Hi Ben, only new segments are replicated from master to slave. In a situation 
where all the segments are new, this will cause the index to be fully 
replicated, but this rarely happen with incremental updates. It can also happen 
if the slave Solr assumes it has an invalid index.
Are you committing or optimizing on the slaves? After replication, the index 
directory on the slaves is called index or index.timestamp?

Tomás

On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy  
ben.mccar...@tradermedia.co.uk wrote:

 So do you just simpy address this with big nic and network pipes.

 -Original Message-
 From: Martin Koch [mailto:m...@issuu.com]
 Sent: 23 March 2012 14:07
 To: solr-user@lucene.apache.org
 Subject: Re: Simple Slave Replication Question

 I guess this would depend on network bandwidth, but we move around
 150G/hour when hooking up a new slave to the master.

 /Martin

 On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy 
 ben.mccar...@tradermedia.co.uk wrote:

  Hello,
 
  Im looking at the replication from a master to a number of slaves.
  I have configured it and it appears to be working.  When updating
  40K records on the master is it standard to always copy over the
  full index, currently 5gb in size.  If this is standard what do
  people do who have massive 200gb indexs, does it not take a while to
  bring the
 slaves inline with the master?
 
  Thanks
  Ben
 
  
 
 
  This e-mail is sent on behalf of Trader Media Group Limited,
  Registered
  Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill,
  Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No.
 4768833).
  This email and any files transmitted with it are confidential and
  may be legally privileged, and intended solely for the use of the
  individual or entity to whom they are addressed. If you have
  received this email in error please notify the sender. This email
  message has been swept for the presence of computer viruses.
 
 

 


 This e-mail is sent on behalf of Trader Media Group Limited,
 Registered
 Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill,
 Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833).
 This email and any files transmitted with it are confidential and may
 be legally privileged, and intended solely for the use of the
 individual or entity to whom they are addressed. If you have received
 this email in error please notify the sender. This email message has
 been swept for the presence of computer viruses.






This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: 
Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, 
Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and 
any files transmitted with it are confidential and may be legally privileged, 
and intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses.

Re: Simple Slave Replication Question

2012-03-23 Thread Tomás Fernández Löbbe

Have you changed the mergeFactor or are you using 10 as in the example
solrconfig?

What do you see in the slave's log during replication? Do you see any line
like Skipping download for...?

On Fri, Mar 23, 2012 at 11:57 AM, Ben McCarthy 
ben.mccar...@tradermedia.co.uk wrote:

 I just have a index directory.

 I push the documents through with a change to a field.  Im using SOLRJ to
 do this.  Im using the guide from the wiki to setup the replication.  When
 the feed of updates to the master finishes I call a commit again using
 SOLRJ.  I then have a poll period of 5 minutes from the slave.  When it
 kicks in I see a new version of the index and then it copys the full 5gb
 index.

 Thanks
 Ben

 -Original Message-
 From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
 Sent: 23 March 2012 14:29
 To: solr-user@lucene.apache.org
 Subject: Re: Simple Slave Replication Question

 Hi Ben, only new segments are replicated from master to slave. In a
 situation where all the segments are new, this will cause the index to be
 fully replicated, but this rarely happen with incremental updates. It can
 also happen if the slave Solr assumes it has an invalid index.
 Are you committing or optimizing on the slaves? After replication, the
 index directory on the slaves is called index or index.timestamp?

 Tomás

 On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy 
 ben.mccar...@tradermedia.co.uk wrote:

  So do you just simpy address this with big nic and network pipes.
 
  -Original Message-
  From: Martin Koch [mailto:m...@issuu.com]
  Sent: 23 March 2012 14:07
  To: solr-user@lucene.apache.org
  Subject: Re: Simple Slave Replication Question
 
  I guess this would depend on network bandwidth, but we move around
  150G/hour when hooking up a new slave to the master.
 
  /Martin
 
  On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy 
  ben.mccar...@tradermedia.co.uk wrote:
 
   Hello,
  
   Im looking at the replication from a master to a number of slaves.
   I have configured it and it appears to be working.  When updating
   40K records on the master is it standard to always copy over the
   full index, currently 5gb in size.  If this is standard what do
   people do who have massive 200gb indexs, does it not take a while to
   bring the
  slaves inline with the master?
  
   Thanks
   Ben
  
   
  
  
   This e-mail is sent on behalf of Trader Media Group Limited,
   Registered
   Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill,
   Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No.
  4768833).
   This email and any files transmitted with it are confidential and
   may be legally privileged, and intended solely for the use of the
   individual or entity to whom they are addressed. If you have
   received this email in error please notify the sender. This email
   message has been swept for the presence of computer viruses.
  
  
 
  
 
 
  This e-mail is sent on behalf of Trader Media Group Limited,
  Registered
  Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill,
  Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No.
 4768833).
  This email and any files transmitted with it are confidential and may
  be legally privileged, and intended solely for the use of the
  individual or entity to whom they are addressed. If you have received
  this email in error please notify the sender. This email message has
  been swept for the presence of computer viruses.
 
 

 


 This e-mail is sent on behalf of Trader Media Group Limited, Registered
 Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower
 Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833).
 This email and any files transmitted with it are confidential and may be
 legally privileged, and intended solely for the use of the individual or
 entity to whom they are addressed. If you have received this email in error
 please notify the sender. This email message has been swept for the
 presence of computer viruses.

Re: Simple Slave Replication Question

2012-03-23 Thread Tomás Fernández Löbbe

Also, what happens if, instead of adding the 40K docs you add just one and
commit?

2012/3/23 Tomás Fernández Löbbe tomasflo...@gmail.com

 Have you changed the mergeFactor or are you using 10 as in the example
 solrconfig?

 What do you see in the slave's log during replication? Do you see any line
 like Skipping download for...?


 On Fri, Mar 23, 2012 at 11:57 AM, Ben McCarthy 
 ben.mccar...@tradermedia.co.uk wrote:

 I just have a index directory.

 I push the documents through with a change to a field.  Im using SOLRJ to
 do this.  Im using the guide from the wiki to setup the replication.  When
 the feed of updates to the master finishes I call a commit again using
 SOLRJ.  I then have a poll period of 5 minutes from the slave.  When it
 kicks in I see a new version of the index and then it copys the full 5gb
 index.

 Thanks
 Ben

 -Original Message-
 From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
 Sent: 23 March 2012 14:29
 To: solr-user@lucene.apache.org
 Subject: Re: Simple Slave Replication Question

 Hi Ben, only new segments are replicated from master to slave. In a
 situation where all the segments are new, this will cause the index to be
 fully replicated, but this rarely happen with incremental updates. It can
 also happen if the slave Solr assumes it has an invalid index.
 Are you committing or optimizing on the slaves? After replication, the
 index directory on the slaves is called index or index.timestamp?

 Tomás

 On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy 
 ben.mccar...@tradermedia.co.uk wrote:

  So do you just simpy address this with big nic and network pipes.
 
  -Original Message-
  From: Martin Koch [mailto:m...@issuu.com]
  Sent: 23 March 2012 14:07
  To: solr-user@lucene.apache.org
  Subject: Re: Simple Slave Replication Question
 
  I guess this would depend on network bandwidth, but we move around
  150G/hour when hooking up a new slave to the master.
 
  /Martin
 
  On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy 
  ben.mccar...@tradermedia.co.uk wrote:
 
   Hello,
  
   Im looking at the replication from a master to a number of slaves.
   I have configured it and it appears to be working.  When updating
   40K records on the master is it standard to always copy over the
   full index, currently 5gb in size.  If this is standard what do
   people do who have massive 200gb indexs, does it not take a while to
   bring the
  slaves inline with the master?
  
   Thanks
   Ben
  
   
  
  
   This e-mail is sent on behalf of Trader Media Group Limited,
   Registered
   Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill,
   Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No.
  4768833).
   This email and any files transmitted with it are confidential and
   may be legally privileged, and intended solely for the use of the
   individual or entity to whom they are addressed. If you have
   received this email in error please notify the sender. This email
   message has been swept for the presence of computer viruses.
  
  
 
  
 
 
  This e-mail is sent on behalf of Trader Media Group Limited,
  Registered
  Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill,
  Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No.
 4768833).
  This email and any files transmitted with it are confidential and may
  be legally privileged, and intended solely for the use of the
  individual or entity to whom they are addressed. If you have received
  this email in error please notify the sender. This email message has
  been swept for the presence of computer viruses.
 
 

 


 This e-mail is sent on behalf of Trader Media Group Limited, Registered
 Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower
 Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833).
 This email and any files transmitted with it are confidential and may be
 legally privileged, and intended solely for the use of the individual or
 entity to whom they are addressed. If you have received this email in error
 please notify the sender. This email message has been swept for the
 presence of computer viruses.

Re: SolrCloud Replication Question

2012-02-16 Thread Mark Miller


On Feb 14, 2012, at 10:57 PM, Jamie Johnson wrote:

  Not sure if this is
 expected or not.

Nope - should be already resolved or will be today though.

- Mark Miller
lucidimagination.com

Re: SolrCloud Replication Question

2012-02-16 Thread Jamie Johnson

Ok, great.  Just wanted to make sure someone was aware.  Thanks for
looking into this.

On Thu, Feb 16, 2012 at 8:26 AM, Mark Miller markrmil...@gmail.com wrote:

 On Feb 14, 2012, at 10:57 PM, Jamie Johnson wrote:

  Not sure if this is
 expected or not.

 Nope - should be already resolved or will be today though.

 - Mark Miller
 lucidimagination.com

Re: SolrCloud Replication Question

2012-02-14 Thread Jamie Johnson

Has there been any success in replicating this?  I'm wondering if it
could be something with my setup that is causing the issue...


On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote:
 Yes, I have the following layout on the FS

 ./bootstrap.sh
 ./example (standard example directory from distro containing jetty
 jars, solr confs, solr war, etc)
 ./slice1
  - start.sh
  -solr.xml
  - slice1_shard1
   - data
  - slice2_shard2
   -data
 ./slice2
  - start.sh
  - solr.xml
  -slice2_shard1
    -data
  -slice1_shard2
    -data

 if it matters I'm running everything from localhost, zk and the solr shards

 On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote:
 Do you have unique dataDir for each instance?
 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti:

Re: SolrCloud Replication Question

2012-02-14 Thread Mark Miller

Sorry, have not gotten it yet, but will be back trying later today - monday, 
tuesday tend to be slow for me (meetings and crap).

- Mark

On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:

 Has there been any success in replicating this?  I'm wondering if it
 could be something with my setup that is causing the issue...
 
 
 On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote:
 Yes, I have the following layout on the FS
 
 ./bootstrap.sh
 ./example (standard example directory from distro containing jetty
 jars, solr confs, solr war, etc)
 ./slice1
  - start.sh
  -solr.xml
  - slice1_shard1
   - data
  - slice2_shard2
   -data
 ./slice2
  - start.sh
  - solr.xml
  -slice2_shard1
-data
  -slice1_shard2
-data
 
 if it matters I'm running everything from localhost, zk and the solr shards
 
 On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote:
 Do you have unique dataDir for each instance?
 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti:

- Mark Miller
lucidimagination.com

Re: SolrCloud Replication Question

2012-02-14 Thread Jamie Johnson

Thanks Mark, not a huge rush, just me trying to get to use the latest
stuff on our project.

On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller markrmil...@gmail.com wrote:
 Sorry, have not gotten it yet, but will be back trying later today - monday, 
 tuesday tend to be slow for me (meetings and crap).

 - Mark

 On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:

 Has there been any success in replicating this?  I'm wondering if it
 could be something with my setup that is causing the issue...


 On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote:
 Yes, I have the following layout on the FS

 ./bootstrap.sh
 ./example (standard example directory from distro containing jetty
 jars, solr confs, solr war, etc)
 ./slice1
  - start.sh
  -solr.xml
  - slice1_shard1
   - data
  - slice2_shard2
   -data
 ./slice2
  - start.sh
  - solr.xml
  -slice2_shard1
    -data
  -slice1_shard2
    -data

 if it matters I'm running everything from localhost, zk and the solr shards

 On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote:
 Do you have unique dataDir for each instance?
 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti:

 - Mark Miller
 lucidimagination.com

Re: SolrCloud Replication Question

2012-02-14 Thread Mark Miller

Okay Jamie, I think I have a handle on this. It looks like an issue with what 
config files are being used by cores created with the admin core handler - I 
think it's just picking up default config and not the correct config for the 
collection. This means they end up using config that has no UpdateLog defined - 
and so recovery fails.

I've added more logging around this so that it's easy to determine that.

I'm investigating more and working on a test + fix. I'll file a JIRA issue soon 
as well.

- Mark

On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote:

 Thanks Mark, not a huge rush, just me trying to get to use the latest
 stuff on our project.
 
 On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller markrmil...@gmail.com wrote:
 Sorry, have not gotten it yet, but will be back trying later today - monday, 
 tuesday tend to be slow for me (meetings and crap).
 
 - Mark
 
 On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:
 
 Has there been any success in replicating this?  I'm wondering if it
 could be something with my setup that is causing the issue...
 
 
 On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote:
 Yes, I have the following layout on the FS
 
 ./bootstrap.sh
 ./example (standard example directory from distro containing jetty
 jars, solr confs, solr war, etc)
 ./slice1
  - start.sh
  -solr.xml
  - slice1_shard1
   - data
  - slice2_shard2
   -data
 ./slice2
  - start.sh
  - solr.xml
  -slice2_shard1
-data
  -slice1_shard2
-data
 
 if it matters I'm running everything from localhost, zk and the solr shards
 
 On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote:
 Do you have unique dataDir for each instance?
 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti:
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 

- Mark Miller
lucidimagination.com

Re: SolrCloud Replication Question

2012-02-14 Thread Jamie Johnson

Sounds good, if I pull the latest from trunk and rerun will that be
useful or were you able to duplicate my issue now?

On Tue, Feb 14, 2012 at 3:00 PM, Mark Miller markrmil...@gmail.com wrote:
 Okay Jamie, I think I have a handle on this. It looks like an issue with what 
 config files are being used by cores created with the admin core handler - I 
 think it's just picking up default config and not the correct config for the 
 collection. This means they end up using config that has no UpdateLog defined 
 - and so recovery fails.

 I've added more logging around this so that it's easy to determine that.

 I'm investigating more and working on a test + fix. I'll file a JIRA issue 
 soon as well.

 - Mark

 On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote:

 Thanks Mark, not a huge rush, just me trying to get to use the latest
 stuff on our project.

 On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller markrmil...@gmail.com wrote:
 Sorry, have not gotten it yet, but will be back trying later today - 
 monday, tuesday tend to be slow for me (meetings and crap).

 - Mark

 On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:

 Has there been any success in replicating this?  I'm wondering if it
 could be something with my setup that is causing the issue...


 On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote:
 Yes, I have the following layout on the FS

 ./bootstrap.sh
 ./example (standard example directory from distro containing jetty
 jars, solr confs, solr war, etc)
 ./slice1
  - start.sh
  -solr.xml
  - slice1_shard1
   - data
  - slice2_shard2
   -data
 ./slice2
  - start.sh
  - solr.xml
  -slice2_shard1
    -data
  -slice1_shard2
    -data

 if it matters I'm running everything from localhost, zk and the solr 
 shards

 On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote:
 Do you have unique dataDir for each instance?
 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti:

 - Mark Miller
 lucidimagination.com












 - Mark Miller
 lucidimagination.com

Re: SolrCloud Replication Question

2012-02-14 Thread Mark Miller

Doh - looks like I was just seeing a test issue. Do you mind updating and 
trying the latest rev? At the least there should be some better logging around 
the recovery.

I'll keep working on tests in the meantime.

- Mark

On Feb 14, 2012, at 3:15 PM, Jamie Johnson wrote:

 Sounds good, if I pull the latest from trunk and rerun will that be
 useful or were you able to duplicate my issue now?
 
 On Tue, Feb 14, 2012 at 3:00 PM, Mark Miller markrmil...@gmail.com wrote:
 Okay Jamie, I think I have a handle on this. It looks like an issue with 
 what config files are being used by cores created with the admin core 
 handler - I think it's just picking up default config and not the correct 
 config for the collection. This means they end up using config that has no 
 UpdateLog defined - and so recovery fails.
 
 I've added more logging around this so that it's easy to determine that.
 
 I'm investigating more and working on a test + fix. I'll file a JIRA issue 
 soon as well.
 
 - Mark
 
 On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote:
 
 Thanks Mark, not a huge rush, just me trying to get to use the latest
 stuff on our project.
 
 On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller markrmil...@gmail.com wrote:
 Sorry, have not gotten it yet, but will be back trying later today - 
 monday, tuesday tend to be slow for me (meetings and crap).
 
 - Mark
 
 On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:
 
 Has there been any success in replicating this?  I'm wondering if it
 could be something with my setup that is causing the issue...
 
 
 On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote:
 Yes, I have the following layout on the FS
 
 ./bootstrap.sh
 ./example (standard example directory from distro containing jetty
 jars, solr confs, solr war, etc)
 ./slice1
  - start.sh
  -solr.xml
  - slice1_shard1
   - data
  - slice2_shard2
   -data
 ./slice2
  - start.sh
  - solr.xml
  -slice2_shard1
-data
  -slice1_shard2
-data
 
 if it matters I'm running everything from localhost, zk and the solr 
 shards
 
 On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote:
 Do you have unique dataDir for each instance?
 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti:
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 

- Mark Miller
lucidimagination.com

Re: SolrCloud Replication Question

2012-02-14 Thread Jamie Johnson

Doing so now, will let you know if I continue to see the same issues

On Tue, Feb 14, 2012 at 4:59 PM, Mark Miller markrmil...@gmail.com wrote:
 Doh - looks like I was just seeing a test issue. Do you mind updating and 
 trying the latest rev? At the least there should be some better logging 
 around the recovery.

 I'll keep working on tests in the meantime.

 - Mark

 On Feb 14, 2012, at 3:15 PM, Jamie Johnson wrote:

 Sounds good, if I pull the latest from trunk and rerun will that be
 useful or were you able to duplicate my issue now?

 On Tue, Feb 14, 2012 at 3:00 PM, Mark Miller markrmil...@gmail.com wrote:
 Okay Jamie, I think I have a handle on this. It looks like an issue with 
 what config files are being used by cores created with the admin core 
 handler - I think it's just picking up default config and not the correct 
 config for the collection. This means they end up using config that has no 
 UpdateLog defined - and so recovery fails.

 I've added more logging around this so that it's easy to determine that.

 I'm investigating more and working on a test + fix. I'll file a JIRA issue 
 soon as well.

 - Mark

 On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote:

 Thanks Mark, not a huge rush, just me trying to get to use the latest
 stuff on our project.

 On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller markrmil...@gmail.com 
 wrote:
 Sorry, have not gotten it yet, but will be back trying later today - 
 monday, tuesday tend to be slow for me (meetings and crap).

 - Mark

 On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:

 Has there been any success in replicating this?  I'm wondering if it
 could be something with my setup that is causing the issue...


 On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote:
 Yes, I have the following layout on the FS

 ./bootstrap.sh
 ./example (standard example directory from distro containing jetty
 jars, solr confs, solr war, etc)
 ./slice1
  - start.sh
  -solr.xml
  - slice1_shard1
   - data
  - slice2_shard2
   -data
 ./slice2
  - start.sh
  - solr.xml
  -slice2_shard1
    -data
  -slice1_shard2
    -data

 if it matters I'm running everything from localhost, zk and the solr 
 shards

 On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote:
 Do you have unique dataDir for each instance?
 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti:

 - Mark Miller
 lucidimagination.com












 - Mark Miller
 lucidimagination.com












 - Mark Miller
 lucidimagination.com

Re: SolrCloud Replication Question

2012-02-14 Thread Jamie Johnson

All of the nodes now show as being Active.  When starting the replicas
I did receive the following message though.  Not sure if this is
expected or not.

INFO: Attempting to replicate from
http://JamiesMac.local:8501/solr/slice2_shard2/
Feb 14, 2012 10:53:34 PM org.apache.solr.common.SolrException log
SEVERE: Error while trying to
recover:org.apache.solr.common.SolrException: null
java.lang.NullPointerException  at
org.apache.solr.handler.admin.CoreAdminHandler.handlePrepRecoveryAction(CoreAdminHandler.java:646)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:358)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:172)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326) at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)  at
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)at
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

null  java.lang.NullPointerExceptionat
org.apache.solr.handler.admin.CoreAdminHandler.handlePrepRecoveryAction(CoreAdminHandler.java:646)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:358)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:172)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326) at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)at
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

request: 
http://JamiesMac.local:8501/solr/admin/cores?action=PREPRECOVERYcore=slice2_shard2nodeName=JamiesMac.local:8502_solrcoreNodeName=JamiesMac.local:8502_solr_slice2_shard1wt=javabinversion=2
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:433)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251)
at 
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:120)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:208)

Feb 14, 2012 10:53:34 PM org.apache.solr.update.UpdateLog dropBufferedUpdates

Re: SolrCloud Replication Question

2012-02-13 Thread Jamie Johnson

I don't see any errors in the log.  here are the following scripts I'm
running, and to create the cores I run the following commands

curl 
'http://localhost:8501/solr/admin/cores?action=CREATEname=slice1_shard1collection=collection1shard=slice1collection.configName=config1'
curl 
'http://localhost:8501/solr/admin/cores?action=CREATEname=slice2_shard2collection=collection1shard=slice2collection.configName=config1'
curl 
'http://localhost:8502/solr/admin/cores?action=CREATEname=slice2_shard1collection=collection1shard=slice2collection.configName=config1'
curl 
'http://localhost:8502/solr/admin/cores?action=CREATEname=slice1_shard2collection=collection1shard=slice1collection.configName=config1'

after doing this the nodes are immediately marked as down in
clusterstate.json.  Restating the solr instances I see that which ever
I start first shows up as active, and the other is down.  There are no
errors in the logs either.



On Sat, Feb 11, 2012 at 9:48 PM, Mark Miller markrmil...@gmail.com wrote:
 Yeah, that is what I would expect - for a node to be marked as down, it 
 either didn't finish starting, or it gave up recovering...either case should 
 be logged. You might try searching for the recover keyword and see if there 
 are any interesting bits around that.

 Meanwhile, I have dug up a couple issues around recovery and committed fixes 
 to trunk - still playing around...

 On Feb 11, 2012, at 8:44 PM, Jamie Johnson wrote:

 I didn't see anything in the logs, would it be an error?

 On Sat, Feb 11, 2012 at 3:58 PM, Mark Miller markrmil...@gmail.com wrote:

 On Feb 11, 2012, at 3:08 PM, Jamie Johnson wrote:

 I wiped the zk and started over (when I switch networks I get
 different host names and honestly haven't dug into why).  That being
 said the latest state shows all in sync, why would the cores show up
 as down?


 If recovery fails X times (say because the leader can't be reached from the 
 replica), a node is marked as down. It can't be active, and technically it 
 has stopped trying to recover (it tries X times and eventually give up 
 until you restart it).

 Side note, I recently ran into this issue: SOLR-3122 - fix coming soon. Not 
 sure if you have looked at your logs or not, but perhaps it's involved.

 - Mark Miller
 lucidimagination.com












 - Mark Miller
 lucidimagination.com













bootstrap.sh
Description: Bourne shell script


start.sh
Description: Bourne shell script


start.sh
Description: Bourne shell script
?xml version=1.0 encoding=UTF-8 ?
solr persistent=true
  cores adminPath=/admin/cores zkClientTimeout=1 hostPort=8501 hostContext=solr
  /cores
/solr

Re: SolrCloud Replication Question

2012-02-13 Thread Sami Siren

Do you have unique dataDir for each instance?
13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti:

Re: SolrCloud Replication Question

2012-02-13 Thread Jamie Johnson

Yes, I have the following layout on the FS

./bootstrap.sh
./example (standard example directory from distro containing jetty
jars, solr confs, solr war, etc)
./slice1
 - start.sh
 -solr.xml
 - slice1_shard1
   - data
 - slice2_shard2
   -data
./slice2
  - start.sh
  - solr.xml
  -slice2_shard1
-data
  -slice1_shard2
-data

if it matters I'm running everything from localhost, zk and the solr shards

On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote:
 Do you have unique dataDir for each instance?
 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti:

Re: SolrCloud Replication Question

2012-02-11 Thread Mark Miller


On Feb 10, 2012, at 9:40 PM, Jamie Johnson wrote:

 
 
 how'd you resolve this issue?
 


I was basing my guess on seeing JamiesMac.local and jamiesmac in your first 
cluster state dump - your latest doesn't seem to mismatch like that though.

- Mark Miller
lucidimagination.com

Re: SolrCloud Replication Question

2012-02-11 Thread Jamie Johnson

I wiped the zk and started over (when I switch networks I get
different host names and honestly haven't dug into why).  That being
said the latest state shows all in sync, why would the cores show up
as down?

On Sat, Feb 11, 2012 at 11:08 AM, Mark Miller markrmil...@gmail.com wrote:

 On Feb 10, 2012, at 9:40 PM, Jamie Johnson wrote:



 how'd you resolve this issue?



 I was basing my guess on seeing JamiesMac.local and jamiesmac in your 
 first cluster state dump - your latest doesn't seem to mismatch like that 
 though.

 - Mark Miller
 lucidimagination.com

Re: SolrCloud Replication Question

2012-02-11 Thread Mark Miller


On Feb 11, 2012, at 3:08 PM, Jamie Johnson wrote:

 I wiped the zk and started over (when I switch networks I get
 different host names and honestly haven't dug into why).  That being
 said the latest state shows all in sync, why would the cores show up
 as down?


If recovery fails X times (say because the leader can't be reached from the 
replica), a node is marked as down. It can't be active, and technically it has 
stopped trying to recover (it tries X times and eventually give up until you 
restart it).

Side note, I recently ran into this issue: SOLR-3122 - fix coming soon. Not 
sure if you have looked at your logs or not, but perhaps it's involved.

- Mark Miller
lucidimagination.com

Re: SolrCloud Replication Question

2012-02-11 Thread Jamie Johnson

I didn't see anything in the logs, would it be an error?

On Sat, Feb 11, 2012 at 3:58 PM, Mark Miller markrmil...@gmail.com wrote:

 On Feb 11, 2012, at 3:08 PM, Jamie Johnson wrote:

 I wiped the zk and started over (when I switch networks I get
 different host names and honestly haven't dug into why).  That being
 said the latest state shows all in sync, why would the cores show up
 as down?


 If recovery fails X times (say because the leader can't be reached from the 
 replica), a node is marked as down. It can't be active, and technically it 
 has stopped trying to recover (it tries X times and eventually give up until 
 you restart it).

 Side note, I recently ran into this issue: SOLR-3122 - fix coming soon. Not 
 sure if you have looked at your logs or not, but perhaps it's involved.

 - Mark Miller
 lucidimagination.com

Re: SolrCloud Replication Question

2012-02-11 Thread Mark Miller

Yeah, that is what I would expect - for a node to be marked as down, it either 
didn't finish starting, or it gave up recovering...either case should be 
logged. You might try searching for the recover keyword and see if there are 
any interesting bits around that.

Meanwhile, I have dug up a couple issues around recovery and committed fixes to 
trunk - still playing around...

On Feb 11, 2012, at 8:44 PM, Jamie Johnson wrote:

 I didn't see anything in the logs, would it be an error?
 
 On Sat, Feb 11, 2012 at 3:58 PM, Mark Miller markrmil...@gmail.com wrote:
 
 On Feb 11, 2012, at 3:08 PM, Jamie Johnson wrote:
 
 I wiped the zk and started over (when I switch networks I get
 different host names and honestly haven't dug into why).  That being
 said the latest state shows all in sync, why would the cores show up
 as down?
 
 
 If recovery fails X times (say because the leader can't be reached from the 
 replica), a node is marked as down. It can't be active, and technically it 
 has stopped trying to recover (it tries X times and eventually give up until 
 you restart it).
 
 Side note, I recently ran into this issue: SOLR-3122 - fix coming soon. Not 
 sure if you have looked at your logs or not, but perhaps it's involved.
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 

- Mark Miller
lucidimagination.com

SolrCloud Replication Question

2012-02-10 Thread Jamie Johnson

I know that the latest Solr Cloud doesn't use standard replication but
I have a question about how it appears to be working.  I currently
have the following cluster state

{collection1:{
slice1:{
  JamiesMac.local:8501_solr_slice1_shard1:{
shard_id:slice1,
state:active,
core:slice1_shard1,
collection:collection1,
node_name:JamiesMac.local:8501_solr,
base_url:http://JamiesMac.local:8501/solr},
  JamiesMac.local:8502_solr_slice1_shard2:{
shard_id:slice1,
state:active,
core:slice1_shard2,
collection:collection1,
node_name:JamiesMac.local:8502_solr,
base_url:http://JamiesMac.local:8502/solr},
  jamiesmac:8501_solr_slice1_shard1:{
shard_id:slice1,
state:down,
core:slice1_shard1,
collection:collection1,
node_name:jamiesmac:8501_solr,
base_url:http://jamiesmac:8501/solr},
  jamiesmac:8502_solr_slice1_shard2:{
shard_id:slice1,
leader:true,
state:active,
core:slice1_shard2,
collection:collection1,
node_name:jamiesmac:8502_solr,
base_url:http://jamiesmac:8502/solr}},
slice2:{
  JamiesMac.local:8501_solr_slice2_shard2:{
shard_id:slice2,
state:active,
core:slice2_shard2,
collection:collection1,
node_name:JamiesMac.local:8501_solr,
base_url:http://JamiesMac.local:8501/solr},
  JamiesMac.local:8502_solr_slice2_shard1:{
shard_id:slice2,
state:active,
core:slice2_shard1,
collection:collection1,
node_name:JamiesMac.local:8502_solr,
base_url:http://JamiesMac.local:8502/solr},
  jamiesmac:8501_solr_slice2_shard2:{
shard_id:slice2,
state:down,
core:slice2_shard2,
collection:collection1,
node_name:jamiesmac:8501_solr,
base_url:http://jamiesmac:8501/solr},
  jamiesmac:8502_solr_slice2_shard1:{
shard_id:slice2,
leader:true,
state:active,
core:slice2_shard1,
collection:collection1,
node_name:jamiesmac:8502_solr,
base_url:http://jamiesmac:8502/solr

I then added some docs to the following shards using SolrJ
http://localhost:8502/solr/slice2_shard1
http://localhost:8502/solr/slice1_shard2

I then bring back up the other cores and I don't see replication
happening.  Looking at the stats for each core I see that on the 8501
instance (the instance that was off) the number of docs is 0, so I've
clearly set something up incorrectly.  Any help on this would be
greatly appreciated.

Re: SolrCloud Replication Question

2012-02-10 Thread Mark Miller

Can you explain a little more how you doing this? How are you bringing the 
cores down and then back up? Shutting down a full solr instance, unloading the 
core?

On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:

 I know that the latest Solr Cloud doesn't use standard replication but
 I have a question about how it appears to be working.  I currently
 have the following cluster state
 
 {collection1:{
slice1:{
  JamiesMac.local:8501_solr_slice1_shard1:{
shard_id:slice1,
state:active,
core:slice1_shard1,
collection:collection1,
node_name:JamiesMac.local:8501_solr,
base_url:http://JamiesMac.local:8501/solr},
  JamiesMac.local:8502_solr_slice1_shard2:{
shard_id:slice1,
state:active,
core:slice1_shard2,
collection:collection1,
node_name:JamiesMac.local:8502_solr,
base_url:http://JamiesMac.local:8502/solr},
  jamiesmac:8501_solr_slice1_shard1:{
shard_id:slice1,
state:down,
core:slice1_shard1,
collection:collection1,
node_name:jamiesmac:8501_solr,
base_url:http://jamiesmac:8501/solr},
  jamiesmac:8502_solr_slice1_shard2:{
shard_id:slice1,
leader:true,
state:active,
core:slice1_shard2,
collection:collection1,
node_name:jamiesmac:8502_solr,
base_url:http://jamiesmac:8502/solr}},
slice2:{
  JamiesMac.local:8501_solr_slice2_shard2:{
shard_id:slice2,
state:active,
core:slice2_shard2,
collection:collection1,
node_name:JamiesMac.local:8501_solr,
base_url:http://JamiesMac.local:8501/solr},
  JamiesMac.local:8502_solr_slice2_shard1:{
shard_id:slice2,
state:active,
core:slice2_shard1,
collection:collection1,
node_name:JamiesMac.local:8502_solr,
base_url:http://JamiesMac.local:8502/solr},
  jamiesmac:8501_solr_slice2_shard2:{
shard_id:slice2,
state:down,
core:slice2_shard2,
collection:collection1,
node_name:jamiesmac:8501_solr,
base_url:http://jamiesmac:8501/solr},
  jamiesmac:8502_solr_slice2_shard1:{
shard_id:slice2,
leader:true,
state:active,
core:slice2_shard1,
collection:collection1,
node_name:jamiesmac:8502_solr,
base_url:http://jamiesmac:8502/solr
 
 I then added some docs to the following shards using SolrJ
 http://localhost:8502/solr/slice2_shard1
 http://localhost:8502/solr/slice1_shard2
 
 I then bring back up the other cores and I don't see replication
 happening.  Looking at the stats for each core I see that on the 8501
 instance (the instance that was off) the number of docs is 0, so I've
 clearly set something up incorrectly.  Any help on this would be
 greatly appreciated.

- Mark Miller
lucidimagination.com

Re: SolrCloud Replication Question

2012-02-10 Thread Jamie Johnson

Sorry, I shut down the full solr instance.

On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller markrmil...@gmail.com wrote:
 Can you explain a little more how you doing this? How are you bringing the 
 cores down and then back up? Shutting down a full solr instance, unloading 
 the core?

 On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:

 I know that the latest Solr Cloud doesn't use standard replication but
 I have a question about how it appears to be working.  I currently
 have the following cluster state

 {collection1:{
    slice1:{
      JamiesMac.local:8501_solr_slice1_shard1:{
        shard_id:slice1,
        state:active,
        core:slice1_shard1,
        collection:collection1,
        node_name:JamiesMac.local:8501_solr,
        base_url:http://JamiesMac.local:8501/solr},
      JamiesMac.local:8502_solr_slice1_shard2:{
        shard_id:slice1,
        state:active,
        core:slice1_shard2,
        collection:collection1,
        node_name:JamiesMac.local:8502_solr,
        base_url:http://JamiesMac.local:8502/solr},
      jamiesmac:8501_solr_slice1_shard1:{
        shard_id:slice1,
        state:down,
        core:slice1_shard1,
        collection:collection1,
        node_name:jamiesmac:8501_solr,
        base_url:http://jamiesmac:8501/solr},
      jamiesmac:8502_solr_slice1_shard2:{
        shard_id:slice1,
        leader:true,
        state:active,
        core:slice1_shard2,
        collection:collection1,
        node_name:jamiesmac:8502_solr,
        base_url:http://jamiesmac:8502/solr}},
    slice2:{
      JamiesMac.local:8501_solr_slice2_shard2:{
        shard_id:slice2,
        state:active,
        core:slice2_shard2,
        collection:collection1,
        node_name:JamiesMac.local:8501_solr,
        base_url:http://JamiesMac.local:8501/solr},
      JamiesMac.local:8502_solr_slice2_shard1:{
        shard_id:slice2,
        state:active,
        core:slice2_shard1,
        collection:collection1,
        node_name:JamiesMac.local:8502_solr,
        base_url:http://JamiesMac.local:8502/solr},
      jamiesmac:8501_solr_slice2_shard2:{
        shard_id:slice2,
        state:down,
        core:slice2_shard2,
        collection:collection1,
        node_name:jamiesmac:8501_solr,
        base_url:http://jamiesmac:8501/solr},
      jamiesmac:8502_solr_slice2_shard1:{
        shard_id:slice2,
        leader:true,
        state:active,
        core:slice2_shard1,
        collection:collection1,
        node_name:jamiesmac:8502_solr,
        base_url:http://jamiesmac:8502/solr

 I then added some docs to the following shards using SolrJ
 http://localhost:8502/solr/slice2_shard1
 http://localhost:8502/solr/slice1_shard2

 I then bring back up the other cores and I don't see replication
 happening.  Looking at the stats for each core I see that on the 8501
 instance (the instance that was off) the number of docs is 0, so I've
 clearly set something up incorrectly.  Any help on this would be
 greatly appreciated.

 - Mark Miller
 lucidimagination.com

Re: SolrCloud Replication Question

2012-02-10 Thread Jamie Johnson

Sorry for pinging this again, is more information needed on this?  I
can provide more details but am not sure what to provide.

On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson jej2...@gmail.com wrote:
 Sorry, I shut down the full solr instance.

 On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller markrmil...@gmail.com wrote:
 Can you explain a little more how you doing this? How are you bringing the 
 cores down and then back up? Shutting down a full solr instance, unloading 
 the core?

 On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:

 I know that the latest Solr Cloud doesn't use standard replication but
 I have a question about how it appears to be working.  I currently
 have the following cluster state

 {collection1:{
    slice1:{
      JamiesMac.local:8501_solr_slice1_shard1:{
        shard_id:slice1,
        state:active,
        core:slice1_shard1,
        collection:collection1,
        node_name:JamiesMac.local:8501_solr,
        base_url:http://JamiesMac.local:8501/solr},
      JamiesMac.local:8502_solr_slice1_shard2:{
        shard_id:slice1,
        state:active,
        core:slice1_shard2,
        collection:collection1,
        node_name:JamiesMac.local:8502_solr,
        base_url:http://JamiesMac.local:8502/solr},
      jamiesmac:8501_solr_slice1_shard1:{
        shard_id:slice1,
        state:down,
        core:slice1_shard1,
        collection:collection1,
        node_name:jamiesmac:8501_solr,
        base_url:http://jamiesmac:8501/solr},
      jamiesmac:8502_solr_slice1_shard2:{
        shard_id:slice1,
        leader:true,
        state:active,
        core:slice1_shard2,
        collection:collection1,
        node_name:jamiesmac:8502_solr,
        base_url:http://jamiesmac:8502/solr}},
    slice2:{
      JamiesMac.local:8501_solr_slice2_shard2:{
        shard_id:slice2,
        state:active,
        core:slice2_shard2,
        collection:collection1,
        node_name:JamiesMac.local:8501_solr,
        base_url:http://JamiesMac.local:8501/solr},
      JamiesMac.local:8502_solr_slice2_shard1:{
        shard_id:slice2,
        state:active,
        core:slice2_shard1,
        collection:collection1,
        node_name:JamiesMac.local:8502_solr,
        base_url:http://JamiesMac.local:8502/solr},
      jamiesmac:8501_solr_slice2_shard2:{
        shard_id:slice2,
        state:down,
        core:slice2_shard2,
        collection:collection1,
        node_name:jamiesmac:8501_solr,
        base_url:http://jamiesmac:8501/solr},
      jamiesmac:8502_solr_slice2_shard1:{
        shard_id:slice2,
        leader:true,
        state:active,
        core:slice2_shard1,
        collection:collection1,
        node_name:jamiesmac:8502_solr,
        base_url:http://jamiesmac:8502/solr

 I then added some docs to the following shards using SolrJ
 http://localhost:8502/solr/slice2_shard1
 http://localhost:8502/solr/slice1_shard2

 I then bring back up the other cores and I don't see replication
 happening.  Looking at the stats for each core I see that on the 8501
 instance (the instance that was off) the number of docs is 0, so I've
 clearly set something up incorrectly.  Any help on this would be
 greatly appreciated.

 - Mark Miller
 lucidimagination.com

Re: SolrCloud Replication Question

2012-02-10 Thread Mark Miller

I'm trying, but so far I don't see anything. I'll have to try and mimic your 
setup closer it seems.

I tried starting up 6 solr instances on different ports as 2 shards, each with 
a replication factor of 3.

Then I indexed 20k documents to the cluster and verified doc counts.

Then I shutdown all the replicas so that only one instance served each shard.

Then I indexed 20k documents to the cluster.

Then I started the downed nodes and verified that they where in a recovery 
state.

After enough time went by I checked and verified document counts on each 
instance - they where as expected.

I guess next I can try a similar experiment using multiple cores, but if you 
notice anything that stands out that is largely different in what you are 
doing, let me know.

The cores that are behind, does it say they are down, recovering, or active in 
zookeeper?

On Feb 10, 2012, at 4:48 PM, Jamie Johnson wrote:

 Sorry for pinging this again, is more information needed on this?  I
 can provide more details but am not sure what to provide.
 
 On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson jej2...@gmail.com wrote:
 Sorry, I shut down the full solr instance.
 
 On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller markrmil...@gmail.com wrote:
 Can you explain a little more how you doing this? How are you bringing the 
 cores down and then back up? Shutting down a full solr instance, unloading 
 the core?
 
 On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:
 
 I know that the latest Solr Cloud doesn't use standard replication but
 I have a question about how it appears to be working.  I currently
 have the following cluster state
 
 {collection1:{
slice1:{
  JamiesMac.local:8501_solr_slice1_shard1:{
shard_id:slice1,
state:active,
core:slice1_shard1,
collection:collection1,
node_name:JamiesMac.local:8501_solr,
base_url:http://JamiesMac.local:8501/solr},
  JamiesMac.local:8502_solr_slice1_shard2:{
shard_id:slice1,
state:active,
core:slice1_shard2,
collection:collection1,
node_name:JamiesMac.local:8502_solr,
base_url:http://JamiesMac.local:8502/solr},
  jamiesmac:8501_solr_slice1_shard1:{
shard_id:slice1,
state:down,
core:slice1_shard1,
collection:collection1,
node_name:jamiesmac:8501_solr,
base_url:http://jamiesmac:8501/solr},
  jamiesmac:8502_solr_slice1_shard2:{
shard_id:slice1,
leader:true,
state:active,
core:slice1_shard2,
collection:collection1,
node_name:jamiesmac:8502_solr,
base_url:http://jamiesmac:8502/solr}},
slice2:{
  JamiesMac.local:8501_solr_slice2_shard2:{
shard_id:slice2,
state:active,
core:slice2_shard2,
collection:collection1,
node_name:JamiesMac.local:8501_solr,
base_url:http://JamiesMac.local:8501/solr},
  JamiesMac.local:8502_solr_slice2_shard1:{
shard_id:slice2,
state:active,
core:slice2_shard1,
collection:collection1,
node_name:JamiesMac.local:8502_solr,
base_url:http://JamiesMac.local:8502/solr},
  jamiesmac:8501_solr_slice2_shard2:{
shard_id:slice2,
state:down,
core:slice2_shard2,
collection:collection1,
node_name:jamiesmac:8501_solr,
base_url:http://jamiesmac:8501/solr},
  jamiesmac:8502_solr_slice2_shard1:{
shard_id:slice2,
leader:true,
state:active,
core:slice2_shard1,
collection:collection1,
node_name:jamiesmac:8502_solr,
base_url:http://jamiesmac:8502/solr
 
 I then added some docs to the following shards using SolrJ
 http://localhost:8502/solr/slice2_shard1
 http://localhost:8502/solr/slice1_shard2
 
 I then bring back up the other cores and I don't see replication
 happening.  Looking at the stats for each core I see that on the 8501
 instance (the instance that was off) the number of docs is 0, so I've
 clearly set something up incorrectly.  Any help on this would be
 greatly appreciated.
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 

- Mark Miller
lucidimagination.com

Re: SolrCloud Replication Question

2012-02-10 Thread Mark Miller

Also, it will help if you can mention the exact version of solrcloud you are 
talking about in each issue - I know you have one from the old branch, and I 
assume a version off trunk you are playing with - so a heads up on which and if 
trunk, what rev or day will help in the case that I'm trying to dupe issues 
that have been addressed.

- Mark

On Feb 10, 2012, at 6:09 PM, Mark Miller wrote:

 I'm trying, but so far I don't see anything. I'll have to try and mimic your 
 setup closer it seems.
 
 I tried starting up 6 solr instances on different ports as 2 shards, each 
 with a replication factor of 3.
 
 Then I indexed 20k documents to the cluster and verified doc counts.
 
 Then I shutdown all the replicas so that only one instance served each shard.
 
 Then I indexed 20k documents to the cluster.
 
 Then I started the downed nodes and verified that they where in a recovery 
 state.
 
 After enough time went by I checked and verified document counts on each 
 instance - they where as expected.
 
 I guess next I can try a similar experiment using multiple cores, but if you 
 notice anything that stands out that is largely different in what you are 
 doing, let me know.
 
 The cores that are behind, does it say they are down, recovering, or active 
 in zookeeper?
 
 On Feb 10, 2012, at 4:48 PM, Jamie Johnson wrote:
 
 Sorry for pinging this again, is more information needed on this?  I
 can provide more details but am not sure what to provide.
 
 On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson jej2...@gmail.com wrote:
 Sorry, I shut down the full solr instance.
 
 On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller markrmil...@gmail.com wrote:
 Can you explain a little more how you doing this? How are you bringing the 
 cores down and then back up? Shutting down a full solr instance, unloading 
 the core?
 
 On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:
 
 I know that the latest Solr Cloud doesn't use standard replication but
 I have a question about how it appears to be working.  I currently
 have the following cluster state
 
 {collection1:{
   slice1:{
 JamiesMac.local:8501_solr_slice1_shard1:{
   shard_id:slice1,
   state:active,
   core:slice1_shard1,
   collection:collection1,
   node_name:JamiesMac.local:8501_solr,
   base_url:http://JamiesMac.local:8501/solr},
 JamiesMac.local:8502_solr_slice1_shard2:{
   shard_id:slice1,
   state:active,
   core:slice1_shard2,
   collection:collection1,
   node_name:JamiesMac.local:8502_solr,
   base_url:http://JamiesMac.local:8502/solr},
 jamiesmac:8501_solr_slice1_shard1:{
   shard_id:slice1,
   state:down,
   core:slice1_shard1,
   collection:collection1,
   node_name:jamiesmac:8501_solr,
   base_url:http://jamiesmac:8501/solr},
 jamiesmac:8502_solr_slice1_shard2:{
   shard_id:slice1,
   leader:true,
   state:active,
   core:slice1_shard2,
   collection:collection1,
   node_name:jamiesmac:8502_solr,
   base_url:http://jamiesmac:8502/solr}},
   slice2:{
 JamiesMac.local:8501_solr_slice2_shard2:{
   shard_id:slice2,
   state:active,
   core:slice2_shard2,
   collection:collection1,
   node_name:JamiesMac.local:8501_solr,
   base_url:http://JamiesMac.local:8501/solr},
 JamiesMac.local:8502_solr_slice2_shard1:{
   shard_id:slice2,
   state:active,
   core:slice2_shard1,
   collection:collection1,
   node_name:JamiesMac.local:8502_solr,
   base_url:http://JamiesMac.local:8502/solr},
 jamiesmac:8501_solr_slice2_shard2:{
   shard_id:slice2,
   state:down,
   core:slice2_shard2,
   collection:collection1,
   node_name:jamiesmac:8501_solr,
   base_url:http://jamiesmac:8501/solr},
 jamiesmac:8502_solr_slice2_shard1:{
   shard_id:slice2,
   leader:true,
   state:active,
   core:slice2_shard1,
   collection:collection1,
   node_name:jamiesmac:8502_solr,
   base_url:http://jamiesmac:8502/solr
 
 I then added some docs to the following shards using SolrJ
 http://localhost:8502/solr/slice2_shard1
 http://localhost:8502/solr/slice1_shard2
 
 I then bring back up the other cores and I don't see replication
 happening.  Looking at the stats for each core I see that on the 8501
 instance (the instance that was off) the number of docs is 0, so I've
 clearly set something up incorrectly.  Any help on this would be
 greatly appreciated.
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 

- Mark Miller
lucidimagination.com

Re: SolrCloud Replication Question

2012-02-10 Thread Jamie Johnson

nothing seems that different.  In regards to the states of each I'll
try to verify tonight.

This was using a version I pulled from SVN trunk yesterday morning

On Fri, Feb 10, 2012 at 6:22 PM, Mark Miller markrmil...@gmail.com wrote:
 Also, it will help if you can mention the exact version of solrcloud you are 
 talking about in each issue - I know you have one from the old branch, and I 
 assume a version off trunk you are playing with - so a heads up on which and 
 if trunk, what rev or day will help in the case that I'm trying to dupe 
 issues that have been addressed.

 - Mark

 On Feb 10, 2012, at 6:09 PM, Mark Miller wrote:

 I'm trying, but so far I don't see anything. I'll have to try and mimic your 
 setup closer it seems.

 I tried starting up 6 solr instances on different ports as 2 shards, each 
 with a replication factor of 3.

 Then I indexed 20k documents to the cluster and verified doc counts.

 Then I shutdown all the replicas so that only one instance served each shard.

 Then I indexed 20k documents to the cluster.

 Then I started the downed nodes and verified that they where in a recovery 
 state.

 After enough time went by I checked and verified document counts on each 
 instance - they where as expected.

 I guess next I can try a similar experiment using multiple cores, but if you 
 notice anything that stands out that is largely different in what you are 
 doing, let me know.

 The cores that are behind, does it say they are down, recovering, or active 
 in zookeeper?

 On Feb 10, 2012, at 4:48 PM, Jamie Johnson wrote:

 Sorry for pinging this again, is more information needed on this?  I
 can provide more details but am not sure what to provide.

 On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson jej2...@gmail.com wrote:
 Sorry, I shut down the full solr instance.

 On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller markrmil...@gmail.com wrote:
 Can you explain a little more how you doing this? How are you bringing 
 the cores down and then back up? Shutting down a full solr instance, 
 unloading the core?

 On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:

 I know that the latest Solr Cloud doesn't use standard replication but
 I have a question about how it appears to be working.  I currently
 have the following cluster state

 {collection1:{
   slice1:{
     JamiesMac.local:8501_solr_slice1_shard1:{
       shard_id:slice1,
       state:active,
       core:slice1_shard1,
       collection:collection1,
       node_name:JamiesMac.local:8501_solr,
       base_url:http://JamiesMac.local:8501/solr},
     JamiesMac.local:8502_solr_slice1_shard2:{
       shard_id:slice1,
       state:active,
       core:slice1_shard2,
       collection:collection1,
       node_name:JamiesMac.local:8502_solr,
       base_url:http://JamiesMac.local:8502/solr},
     jamiesmac:8501_solr_slice1_shard1:{
       shard_id:slice1,
       state:down,
       core:slice1_shard1,
       collection:collection1,
       node_name:jamiesmac:8501_solr,
       base_url:http://jamiesmac:8501/solr},
     jamiesmac:8502_solr_slice1_shard2:{
       shard_id:slice1,
       leader:true,
       state:active,
       core:slice1_shard2,
       collection:collection1,
       node_name:jamiesmac:8502_solr,
       base_url:http://jamiesmac:8502/solr}},
   slice2:{
     JamiesMac.local:8501_solr_slice2_shard2:{
       shard_id:slice2,
       state:active,
       core:slice2_shard2,
       collection:collection1,
       node_name:JamiesMac.local:8501_solr,
       base_url:http://JamiesMac.local:8501/solr},
     JamiesMac.local:8502_solr_slice2_shard1:{
       shard_id:slice2,
       state:active,
       core:slice2_shard1,
       collection:collection1,
       node_name:JamiesMac.local:8502_solr,
       base_url:http://JamiesMac.local:8502/solr},
     jamiesmac:8501_solr_slice2_shard2:{
       shard_id:slice2,
       state:down,
       core:slice2_shard2,
       collection:collection1,
       node_name:jamiesmac:8501_solr,
       base_url:http://jamiesmac:8501/solr},
     jamiesmac:8502_solr_slice2_shard1:{
       shard_id:slice2,
       leader:true,
       state:active,
       core:slice2_shard1,
       collection:collection1,
       node_name:jamiesmac:8502_solr,
       base_url:http://jamiesmac:8502/solr

 I then added some docs to the following shards using SolrJ
 http://localhost:8502/solr/slice2_shard1
 http://localhost:8502/solr/slice1_shard2

 I then bring back up the other cores and I don't see replication
 happening.  Looking at the stats for each core I see that on the 8501
 instance (the instance that was off) the number of docs is 0, so I've
 clearly set something up incorrectly.  Any help on this would be
 greatly appreciated.

 - Mark Miller
 lucidimagination.com












 - Mark Miller
 lucidimagination.com












 - Mark Miller
 lucidimagination.com

Re: SolrCloud Replication Question

2012-02-10 Thread Mark Miller

Thanks.

If the given ZK snapshot was the end state, then two nodes are marked as
down. Generally that happens because replication failed - if you have not,
I'd check the logs for those two nodes.

- Mark

On Fri, Feb 10, 2012 at 7:35 PM, Jamie Johnson jej2...@gmail.com wrote:

 nothing seems that different.  In regards to the states of each I'll
 try to verify tonight.

 This was using a version I pulled from SVN trunk yesterday morning

 On Fri, Feb 10, 2012 at 6:22 PM, Mark Miller markrmil...@gmail.com
 wrote:
  Also, it will help if you can mention the exact version of solrcloud you
 are talking about in each issue - I know you have one from the old branch,
 and I assume a version off trunk you are playing with - so a heads up on
 which and if trunk, what rev or day will help in the case that I'm trying
 to dupe issues that have been addressed.
 
  - Mark
 
  On Feb 10, 2012, at 6:09 PM, Mark Miller wrote:
 
  I'm trying, but so far I don't see anything. I'll have to try and mimic
 your setup closer it seems.
 
  I tried starting up 6 solr instances on different ports as 2 shards,
 each with a replication factor of 3.
 
  Then I indexed 20k documents to the cluster and verified doc counts.
 
  Then I shutdown all the replicas so that only one instance served each
 shard.
 
  Then I indexed 20k documents to the cluster.
 
  Then I started the downed nodes and verified that they where in a
 recovery state.
 
  After enough time went by I checked and verified document counts on
 each instance - they where as expected.
 
  I guess next I can try a similar experiment using multiple cores, but
 if you notice anything that stands out that is largely different in what
 you are doing, let me know.
 
  The cores that are behind, does it say they are down, recovering, or
 active in zookeeper?
 
  On Feb 10, 2012, at 4:48 PM, Jamie Johnson wrote:
 
  Sorry for pinging this again, is more information needed on this?  I
  can provide more details but am not sure what to provide.
 
  On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson jej2...@gmail.com
 wrote:
  Sorry, I shut down the full solr instance.
 
  On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller markrmil...@gmail.com
 wrote:
  Can you explain a little more how you doing this? How are you
 bringing the cores down and then back up? Shutting down a full solr
 instance, unloading the core?
 
  On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:
 
  I know that the latest Solr Cloud doesn't use standard replication
 but
  I have a question about how it appears to be working.  I currently
  have the following cluster state
 
  {collection1:{
slice1:{
  JamiesMac.local:8501_solr_slice1_shard1:{
shard_id:slice1,
state:active,
core:slice1_shard1,
collection:collection1,
node_name:JamiesMac.local:8501_solr,
base_url:http://JamiesMac.local:8501/solr},
  JamiesMac.local:8502_solr_slice1_shard2:{
shard_id:slice1,
state:active,
core:slice1_shard2,
collection:collection1,
node_name:JamiesMac.local:8502_solr,
base_url:http://JamiesMac.local:8502/solr},
  jamiesmac:8501_solr_slice1_shard1:{
shard_id:slice1,
state:down,
core:slice1_shard1,
collection:collection1,
node_name:jamiesmac:8501_solr,
base_url:http://jamiesmac:8501/solr},
  jamiesmac:8502_solr_slice1_shard2:{
shard_id:slice1,
leader:true,
state:active,
core:slice1_shard2,
collection:collection1,
node_name:jamiesmac:8502_solr,
base_url:http://jamiesmac:8502/solr}},
slice2:{
  JamiesMac.local:8501_solr_slice2_shard2:{
shard_id:slice2,
state:active,
core:slice2_shard2,
collection:collection1,
node_name:JamiesMac.local:8501_solr,
base_url:http://JamiesMac.local:8501/solr},
  JamiesMac.local:8502_solr_slice2_shard1:{
shard_id:slice2,
state:active,
core:slice2_shard1,
collection:collection1,
node_name:JamiesMac.local:8502_solr,
base_url:http://JamiesMac.local:8502/solr},
  jamiesmac:8501_solr_slice2_shard2:{
shard_id:slice2,
state:down,
core:slice2_shard2,
collection:collection1,
node_name:jamiesmac:8501_solr,
base_url:http://jamiesmac:8501/solr},
  jamiesmac:8502_solr_slice2_shard1:{
shard_id:slice2,
leader:true,
state:active,
core:slice2_shard1,
collection:collection1,
node_name:jamiesmac:8502_solr,
base_url:http://jamiesmac:8502/solr
 
  I then added some docs to the following shards using SolrJ
  http://localhost:8502/solr/slice2_shard1
  http://localhost:8502/solr/slice1_shard2
 
  I then bring back up the other cores and I don't see replication
  happening.  Looking at the stats for each core I see that on the
 8501
  instance (the instance that was off) the number of docs is 0, so
 I've

Re: SolrCloud Replication Question

2012-02-10 Thread Mark Miller


On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:

 jamiesmac

Another note:

Have no idea if this is involved, but when I do tests with my linux box and mac 
I run into the following:

My linux box auto finds the address of halfmetal and my macbook mbpro.local. If 
I accept those defaults, my mac connect reach my linux box. It can only reach 
the linux box through halfmetal.local, and so I have to override the host on 
the linux box to advertise as halfmetal.local and then they can talk.

In the bad case, if my leaders where on the linux box, they would be able to 
forward to the mac no problem, but then if shards on the mac needed to recover, 
they would fail to reach the linux box through the halfmetal address.

- Mark Miller
lucidimagination.com

Re: SolrCloud Replication Question

2012-02-10 Thread Jamie Johnson

hmmperhaps I'm seeing the issue you're speaking of.  I have
everything running right now and my state is as follows:

{collection1:{
slice1:{
  JamiesMac.local:8501_solr_slice1_shard1:{
shard_id:slice1,
leader:true,
state:active,
core:slice1_shard1,
collection:collection1,
node_name:JamiesMac.local:8501_solr,
base_url:http://JamiesMac.local:8501/solr},
  JamiesMac.local:8502_solr_slice1_shard2:{
shard_id:slice1,
state:down,
core:slice1_shard2,
collection:collection1,
node_name:JamiesMac.local:8502_solr,
base_url:http://JamiesMac.local:8502/solr}},
slice2:{
  JamiesMac.local:8502_solr_slice2_shard1:{
shard_id:slice2,
leader:true,
state:active,
core:slice2_shard1,
collection:collection1,
node_name:JamiesMac.local:8502_solr,
base_url:http://JamiesMac.local:8502/solr},
  JamiesMac.local:8501_solr_slice2_shard2:{
shard_id:slice2,
state:down,
core:slice2_shard2,
collection:dataspace,
node_name:JamiesMac.local:8501_solr,
base_url:http://JamiesMac.local:8501/solr


how'd you resolve this issue?


On Fri, Feb 10, 2012 at 8:49 PM, Mark Miller markrmil...@gmail.com wrote:

 On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:

 jamiesmac

 Another note:

 Have no idea if this is involved, but when I do tests with my linux box and 
 mac I run into the following:

 My linux box auto finds the address of halfmetal and my macbook mbpro.local. 
 If I accept those defaults, my mac connect reach my linux box. It can only 
 reach the linux box through halfmetal.local, and so I have to override the 
 host on the linux box to advertise as halfmetal.local and then they can talk.

 In the bad case, if my leaders where on the linux box, they would be able to 
 forward to the mac no problem, but then if shards on the mac needed to 
 recover, they would fail to reach the linux box through the halfmetal address.

 - Mark Miller
 lucidimagination.com

Replication question

2011-05-06 Thread kenf_nc

I have Replication set up with
  str name=pollInterval00:00:60/str 

I assumed that meant it would poll the master for updates once a minute. But
my logs make it look like it is trying to sync up almost constantly. Below
is an example of my log from just 1 minute in time. Am I reading this wrong?
This is from one of the slaves, I have 2 of them so my Master's log file is
double this.

Is this normal?

May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replication-question-tp2909157p2909157.html
Sent from the Solr - User mailing list archive at Nabble.com.

solr1.4 replication question

2011-02-11 Thread Mike Franon

Hi,

I am fairly new to solr, and have setup two servers, one with master,
other as a slave.

I have a load balancer in front with 2 different VIP, one to do
gets/reads distributed evenly on the master and slave, and another VIP
to do posts/updates just to the master.  If the master fails I have
the second VIP to automatically update the slave.  But if that happens
is there a way to automatically switch whcih is master and which is
slave instead of going into solrconfig.xml and then restarting the
instances?

Any recommendations for the best way to set it up?

Thanks

57 matches

Mail list logo