Re: Replication in soft commit

2020-09-03 Thread Emir Arnautović
Hi Tushar,
This is not usecase suitable for MS model. You should go with Solr Cloud, or if 
that is an overhead for you, have separate Solr, each doing indexing on its 
own. Solr provides eventual consistency anyway, so you should have some sort of 
stickiness in place even if you use MS model.

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 3 Sep 2020, at 13:54, Tushar Arora  wrote:
> 
> Hi Emir,
> Thanks for the response.
> Actually the use case is real time indexing from DB to solr in every second
> on the master server using queueing mechanism.
> So, I think instead of doing hard commits every second we should go for
> soft commits. And doing hard commits after some intervals.
> And we have to replicate the data to slave immediately.
> 
> Regards,
> Tushar
> On Thu, 3 Sep 2020 at 16:17, Emir Arnautović 
> wrote:
> 
>> Hi Tushar,
>> Replication is file based process and hard commit is when segment is
>> flushed to disk. It is not common that you use soft commits on master. The
>> only usecase that I can think of is when you read your index as part of
>> indexing process, but even that is bad practice and should be avoided.
>> 
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 3 Sep 2020, at 08:38, Tushar Arora  wrote:
>>> 
>>> Hi,
>>> I want to ask if the soft commit works in replication.
>>> One of our use cases deals with indexing the data every second on a
>> master
>>> server. And then it has to replicate to slaves. So if we use soft commit,
>>> then does the data replicate immediately to the slave server or after the
>>> hard commit takes place.
>>> Use cases require transfer of data from master to slave immediately.
>>> 
>>> Regards,
>>> Tushar
>> 
>> 



Re: Replication in soft commit

2020-09-03 Thread Tushar Arora
Hi Emir,
Thanks for the response.
Actually the use case is real time indexing from DB to solr in every second
on the master server using queueing mechanism.
So, I think instead of doing hard commits every second we should go for
soft commits. And doing hard commits after some intervals.
And we have to replicate the data to slave immediately.

Regards,
Tushar
On Thu, 3 Sep 2020 at 16:17, Emir Arnautović 
wrote:

> Hi Tushar,
> Replication is file based process and hard commit is when segment is
> flushed to disk. It is not common that you use soft commits on master. The
> only usecase that I can think of is when you read your index as part of
> indexing process, but even that is bad practice and should be avoided.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 3 Sep 2020, at 08:38, Tushar Arora  wrote:
> >
> > Hi,
> > I want to ask if the soft commit works in replication.
> > One of our use cases deals with indexing the data every second on a
> master
> > server. And then it has to replicate to slaves. So if we use soft commit,
> > then does the data replicate immediately to the slave server or after the
> > hard commit takes place.
> > Use cases require transfer of data from master to slave immediately.
> >
> > Regards,
> > Tushar
>
>


Re: Replication in soft commit

2020-09-03 Thread Emir Arnautović
Hi Tushar,
Replication is file based process and hard commit is when segment is flushed to 
disk. It is not common that you use soft commits on master. The only usecase 
that I can think of is when you read your index as part of indexing process, 
but even that is bad practice and should be avoided.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 3 Sep 2020, at 08:38, Tushar Arora  wrote:
> 
> Hi,
> I want to ask if the soft commit works in replication.
> One of our use cases deals with indexing the data every second on a master
> server. And then it has to replicate to slaves. So if we use soft commit,
> then does the data replicate immediately to the slave server or after the
> hard commit takes place.
> Use cases require transfer of data from master to slave immediately.
> 
> Regards,
> Tushar



Re: Replication of Solr Model and feature store

2020-08-20 Thread Monica Skidmore
Thank you, Krishan.  It did turn out to be a mis-match between the model and 
feature store with a feature name.

And thanks for the information that the API can struggle when there is a 
corrupt model or feature store file – we get the model and feature store from 
our data science team and have had issues a couple of times.

Monica D Skidmore
Lead Engineer, Core Search

[CareerBuilder]

CareerBuilder.com<https://www.careerbuilder.com/> | 
Blog<https://www.careerbuilder.com/advice> | Press 
Room<https://press.careerbuilder.com/>




From: krishan goyal 
Date: Friday, August 7, 2020 at 10:28 AM
To: "solr-user@lucene.apache.org" , Monica 
Skidmore 
Cc: Christine Poerschke 
Subject: Re: Replication of Solr Model and feature store

Hi Monica,

Replication is working fine for me. You just have to add the 
_schema_feature-store.json and _schema_model-store.json to confFiles under 
/replication in solrconfig.xml

I think the issue you are seeing is where the model is referencing a feature 
which is not present in the feature store. Or the feature weights for the model 
are incorrect. The issue in solr is that it doesn't return you the right 
exception but throws a model not found exception

Try these ways to fix it
1. verify feature weights are < 1. I am not sure why having weights > 1 is an 
issue but apparently it is in some random cases
2. verify all features used in the model file _schema_model-store.json are 
actually present in the feature file _schema_feature-store.json.

Another issue with solr LTR is if you have a corrupt model/feature file, you 
can't update/delete it via the API in some cases. you would need to change the 
respective _schema_model-store.json and _schema_feature-store.json files and 
reload the cores for the changes to take effect.

Please try these and let me know if the issue still exists

On Thu, Aug 6, 2020 at 11:18 PM Monica Skidmore 
mailto:monica.skidm...@careerbuilder.com>> 
wrote:
I would be interested in the answer here, as well.  We're using LTR 
successfully on Solr 7.3 and Solr 8.3 in cloud mode, but we're struggling to 
load a simple, test model on 8.3 in master/slave mode.   The FeatureStore 
appears to load, but we're not sure it's loading correctly, either. Here are 
some details from the engineer on our team who is leading that effort:

"I'm getting a ClassCastException when uploading a Model. Using the debugger, 
was able to see the line throwing the exception is: 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:488)

Apparently it cannot find: org.apache.solr.ltr.model.LinearModel, although the 
features appear to be created without issues with the following class: 
org.apache.solr.ltr.feature.FieldValueFeature

Another thing we were able to see is that the List features has a list 
of null elements, so that made us think there may be some issues when creating 
the instances of Feature.

We had begun to believe this might be related to the fact that we are running 
Solr in Master/Slave config. Was LTR ever tested on non-cloud deployments??

Any help is appreciated."

Monica D Skidmore
Lead Engineer, Core Search



CareerBuilder.com 
<https://www.careerbuilder.com/<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.careerbuilder.com%2F=01%7C01%7CMonica.Skidmore%40careerbuilder.com%7C24c17c2feda848aa568f08d83ade2a1c%7C7cc1677566a34e8b80fd5b1f1db15061%7C0=YOVsrFay2%2F7DikgP4kabmRwBd6aOwXVtWXZvXOCdz3k%3D=0>>
 | Blog 
<https://www.careerbuilder.com/advice<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.careerbuilder.com%2Fadvice=01%7C01%7CMonica.Skidmore%40careerbuilder.com%7C24c17c2feda848aa568f08d83ade2a1c%7C7cc1677566a34e8b80fd5b1f1db15061%7C0=74IweSSuSygVP%2FyhO4mTW%2B3crHtSoo6aT4T1rPQ8ijg%3D=0>>
 | Press Room 
<https://press.careerbuilder.com/<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpress.careerbuilder.com%2F=01%7C01%7CMonica.Skidmore%40careerbuilder.com%7C24c17c2feda848aa568f08d83ade2a1c%7C7cc1677566a34e8b80fd5b1f1db15061%7C0=%2BnAZ1sigMex2ijQ4J5Aox%2FdfhVh5tJPz3FyGp5wiYZk%3D=0>>




On 7/24/20, 7:58 AM, "Christine Poerschke (BLOOMBERG/ LONDON)" 
mailto:cpoersc...@bloomberg.net>> wrote:

Hi Krishan,

Could you share what version of Solr you are using?

And I wonder if the observed behaviour could be reproduced e.g. with the 
techproducts example, changes not applying after reload [1] sounds like a bug 
if so.

Hope that helps.

Regards,

Christine

[1] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F8_6%2Flearning-to-rank.html%23applying-changesdata=01%7C01%7CMonica.Skidmore%40careerbuilder.com%7C65581e5e79414c90832508d82fc8ce21%7C7cc1677566a34e8b80fd5b1f1db15061%7C0sdata=mMqgPhnkjb8h7ETQNaySOBJQ8x%2FP2dtzM%2FgSE1K1FZg%3Dreserved=0<https://nam04.safelinks.protection.outlook.com/?ur

Re: Replication of Solr Model and feature store

2020-08-07 Thread krishan goyal
Hi Monica,

Replication is working fine for me. You just have to add the
_schema_feature-store.json and _schema_model-store.json to confFiles under
/replication in solrconfig.xml

I think the issue you are seeing is where the model is referencing a
feature which is not present in the feature store. Or the feature weights
for the model are incorrect. The issue in solr is that it doesn't return
you the right exception but throws a model not found exception

Try these ways to fix it
1. verify feature weights are < 1. I am not sure why having weights > 1 is
an issue but apparently it is in some random cases
2. verify all features used in the model file _schema_model-store.json are
actually present in the feature file _schema_feature-store.json.

Another issue with solr LTR is if you have a corrupt model/feature file,
you can't update/delete it via the API in some cases. you would need to
change the respective _schema_model-store.json
and _schema_feature-store.json files and reload the cores for the changes
to take effect.

Please try these and let me know if the issue still exists

On Thu, Aug 6, 2020 at 11:18 PM Monica Skidmore <
monica.skidm...@careerbuilder.com> wrote:

> I would be interested in the answer here, as well.  We're using LTR
> successfully on Solr 7.3 and Solr 8.3 in cloud mode, but we're struggling
> to load a simple, test model on 8.3 in master/slave mode.   The
> FeatureStore appears to load, but we're not sure it's loading correctly,
> either. Here are some details from the engineer on our team who is leading
> that effort:
>
> "I'm getting a ClassCastException when uploading a Model. Using the
> debugger, was able to see the line throwing the exception is:
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:488)
>
> Apparently it cannot find: org.apache.solr.ltr.model.LinearModel, although
> the features appear to be created without issues with the following class:
> org.apache.solr.ltr.feature.FieldValueFeature
>
> Another thing we were able to see is that the List features has a
> list of null elements, so that made us think there may be some issues when
> creating the instances of Feature.
>
> We had begun to believe this might be related to the fact that we are
> running Solr in Master/Slave config. Was LTR ever tested on non-cloud
> deployments??
>
> Any help is appreciated."
>
> Monica D Skidmore
> Lead Engineer, Core Search
>
>
>
> CareerBuilder.com <https://www.careerbuilder.com/> | Blog <
> https://www.careerbuilder.com/advice> | Press Room <
> https://press.careerbuilder.com/>
>
>
>
>
> On 7/24/20, 7:58 AM, "Christine Poerschke (BLOOMBERG/ LONDON)" <
> cpoersc...@bloomberg.net> wrote:
>
> Hi Krishan,
>
> Could you share what version of Solr you are using?
>
> And I wonder if the observed behaviour could be reproduced e.g. with
> the techproducts example, changes not applying after reload [1] sounds like
> a bug if so.
>
> Hope that helps.
>
> Regards,
>
> Christine
>
> [1]
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F8_6%2Flearning-to-rank.html%23applying-changesdata=01%7C01%7CMonica.Skidmore%40careerbuilder.com%7C65581e5e79414c90832508d82fc8ce21%7C7cc1677566a34e8b80fd5b1f1db15061%7C0sdata=mMqgPhnkjb8h7ETQNaySOBJQ8x%2FP2dtzM%2FgSE1K1FZg%3Dreserved=0
>
> From: solr-user@lucene.apache.org At: 07/22/20 14:00:59To:
> solr-user@lucene.apache.org
> Subject: Re: Replication of Solr Model and feature store
>
> Adding more details here
>
> I need some help on how to enable the solr LTR model and features on
> all
> nodes of a solr cluster.
>
> I am unable to replicate the model and the feature store though from
> any
> master to its slaves with the replication API ? And unable to find any
> documentation for the same. Is replication possible?
>
> Without replication, would I have to individually update all nodes of a
> cluster ? Or can the feature and model files be read as a resource
> (like
> config or schema) so that I can replicate the file or add the file to
> my
> deployments.
>
>
> On Wed, Jul 22, 2020 at 5:53 PM krishan goyal 
> wrote:
>
> > Bump. Any one has an idea how to proceed here ?
> >
> > On Wed, Jul 8, 2020 at 5:41 PM krishan goyal 
> > wrote:
> >
> >> Hi,
> >>
> >> How do I enable replication of the model and feature store ?
> >>
> >> Thanks
> >> Krishan
> >>
> >
>
>
>
>


Re: Replication of Solr Model and feature store

2020-08-06 Thread Monica Skidmore
I would be interested in the answer here, as well.  We're using LTR 
successfully on Solr 7.3 and Solr 8.3 in cloud mode, but we're struggling to 
load a simple, test model on 8.3 in master/slave mode.   The FeatureStore 
appears to load, but we're not sure it's loading correctly, either. Here are 
some details from the engineer on our team who is leading that effort:

"I'm getting a ClassCastException when uploading a Model. Using the debugger, 
was able to see the line throwing the exception is: 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:488)

Apparently it cannot find: org.apache.solr.ltr.model.LinearModel, although the 
features appear to be created without issues with the following class: 
org.apache.solr.ltr.feature.FieldValueFeature

Another thing we were able to see is that the List features has a list 
of null elements, so that made us think there may be some issues when creating 
the instances of Feature.

We had begun to believe this might be related to the fact that we are running 
Solr in Master/Slave config. Was LTR ever tested on non-cloud deployments??
 
Any help is appreciated."

Monica D Skidmore
Lead Engineer, Core Search



CareerBuilder.com <https://www.careerbuilder.com/> | Blog 
<https://www.careerbuilder.com/advice> | Press Room 
<https://press.careerbuilder.com/>
 
 
 

On 7/24/20, 7:58 AM, "Christine Poerschke (BLOOMBERG/ LONDON)" 
 wrote:

Hi Krishan,

Could you share what version of Solr you are using?

And I wonder if the observed behaviour could be reproduced e.g. with the 
techproducts example, changes not applying after reload [1] sounds like a bug 
if so.

Hope that helps.

Regards,

Christine

[1] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F8_6%2Flearning-to-rank.html%23applying-changesdata=01%7C01%7CMonica.Skidmore%40careerbuilder.com%7C65581e5e79414c90832508d82fc8ce21%7C7cc1677566a34e8b80fd5b1f1db15061%7C0sdata=mMqgPhnkjb8h7ETQNaySOBJQ8x%2FP2dtzM%2FgSE1K1FZg%3Dreserved=0

From: solr-user@lucene.apache.org At: 07/22/20 14:00:59To:  
solr-user@lucene.apache.org
Subject: Re: Replication of Solr Model and feature store

Adding more details here

I need some help on how to enable the solr LTR model and features on all
nodes of a solr cluster.

I am unable to replicate the model and the feature store though from any
master to its slaves with the replication API ? And unable to find any
documentation for the same. Is replication possible?

Without replication, would I have to individually update all nodes of a
cluster ? Or can the feature and model files be read as a resource (like
config or schema) so that I can replicate the file or add the file to my
deployments.


On Wed, Jul 22, 2020 at 5:53 PM krishan goyal  wrote:

> Bump. Any one has an idea how to proceed here ?
>
> On Wed, Jul 8, 2020 at 5:41 PM krishan goyal 
> wrote:
>
>> Hi,
>>
>> How do I enable replication of the model and feature store ?
>>
>> Thanks
>> Krishan
>>
>





Re: Replication of Solr Model and feature store

2020-07-28 Thread krishan goyal
Hi Christine,

I am using Solr 7.7

I am able to get it replicated now. I didn't know that the feature and
model store are saved as files in the config structure. And by providing
these names in /replication handle, I can replicate them.

I guess this is something that can be provided in the LTR documentation.
Will try to raise a PR for this.


On Fri, Jul 24, 2020 at 5:28 PM Christine Poerschke (BLOOMBERG/ LONDON) <
cpoersc...@bloomberg.net> wrote:

> Hi Krishan,
>
> Could you share what version of Solr you are using?
>
> And I wonder if the observed behaviour could be reproduced e.g. with the
> techproducts example, changes not applying after reload [1] sounds like a
> bug if so.
>
> Hope that helps.
>
> Regards,
>
> Christine
>
> [1]
> https://lucene.apache.org/solr/guide/8_6/learning-to-rank.html#applying-changes
>
> From: solr-user@lucene.apache.org At: 07/22/20 14:00:59To:
> solr-user@lucene.apache.org
> Subject: Re: Replication of Solr Model and feature store
>
> Adding more details here
>
> I need some help on how to enable the solr LTR model and features on all
> nodes of a solr cluster.
>
> I am unable to replicate the model and the feature store though from any
> master to its slaves with the replication API ? And unable to find any
> documentation for the same. Is replication possible?
>
> Without replication, would I have to individually update all nodes of a
> cluster ? Or can the feature and model files be read as a resource (like
> config or schema) so that I can replicate the file or add the file to my
> deployments.
>
>
> On Wed, Jul 22, 2020 at 5:53 PM krishan goyal 
> wrote:
>
> > Bump. Any one has an idea how to proceed here ?
> >
> > On Wed, Jul 8, 2020 at 5:41 PM krishan goyal 
> > wrote:
> >
> >> Hi,
> >>
> >> How do I enable replication of the model and feature store ?
> >>
> >> Thanks
> >> Krishan
> >>
> >
>
>
>


Re: Replication of Solr Model and feature store

2020-07-24 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hi Krishan,

Could you share what version of Solr you are using?

And I wonder if the observed behaviour could be reproduced e.g. with the 
techproducts example, changes not applying after reload [1] sounds like a bug 
if so.

Hope that helps.

Regards,

Christine

[1] 
https://lucene.apache.org/solr/guide/8_6/learning-to-rank.html#applying-changes

From: solr-user@lucene.apache.org At: 07/22/20 14:00:59To:  
solr-user@lucene.apache.org
Subject: Re: Replication of Solr Model and feature store

Adding more details here

I need some help on how to enable the solr LTR model and features on all
nodes of a solr cluster.

I am unable to replicate the model and the feature store though from any
master to its slaves with the replication API ? And unable to find any
documentation for the same. Is replication possible?

Without replication, would I have to individually update all nodes of a
cluster ? Or can the feature and model files be read as a resource (like
config or schema) so that I can replicate the file or add the file to my
deployments.


On Wed, Jul 22, 2020 at 5:53 PM krishan goyal  wrote:

> Bump. Any one has an idea how to proceed here ?
>
> On Wed, Jul 8, 2020 at 5:41 PM krishan goyal 
> wrote:
>
>> Hi,
>>
>> How do I enable replication of the model and feature store ?
>>
>> Thanks
>> Krishan
>>
>




Re: Replication of Solr Model and feature store

2020-07-22 Thread krishan goyal
Adding more details here

I need some help on how to enable the solr LTR model and features on all
nodes of a solr cluster.

I am unable to replicate the model and the feature store though from any
master to its slaves with the replication API ? And unable to find any
documentation for the same. Is replication possible?

Without replication, would I have to individually update all nodes of a
cluster ? Or can the feature and model files be read as a resource (like
config or schema) so that I can replicate the file or add the file to my
deployments.


On Wed, Jul 22, 2020 at 5:53 PM krishan goyal  wrote:

> Bump. Any one has an idea how to proceed here ?
>
> On Wed, Jul 8, 2020 at 5:41 PM krishan goyal 
> wrote:
>
>> Hi,
>>
>> How do I enable replication of the model and feature store ?
>>
>> Thanks
>> Krishan
>>
>


Re: Replication of Solr Model and feature store

2020-07-22 Thread krishan goyal
Bump. Any one has an idea how to proceed here ?

On Wed, Jul 8, 2020 at 5:41 PM krishan goyal  wrote:

> Hi,
>
> How do I enable replication of the model and feature store ?
>
> Thanks
> Krishan
>


Re: Replication Iteration

2019-09-13 Thread Paras Lehana
Hi Akreeti,

How much should I set "commitReserveDuration" for 2.62 GB ?


That's why I asked you about the time taken by the replication. You can
easily get hint about it after manually starting replication. The
commitReserveDuration should be roughly set as the time taken to download
5MB from master to slave. Do read the resource I had shared.


On Fri, 13 Sep 2019 at 14:24, Akreeti Agarwal  wrote:

> Hi,
>
> I have no idea about how much time is taken for successful replication for
> 2.62 GB. How much should I set "commitReserveDuration" for 2.62 GB ?
>
> Thanks & Regards,
> Akreeti Agarwal
>
> -Original Message-
> From: Paras Lehana 
> Sent: Thursday, September 12, 2019 6:46 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Replication Iteration
>
> Hey Akreeti,
>
> 00:00:10
>
>
> Have you tried increasing *commitReserveDuration*? Do you have any idea
> how much time your successful replications take for 2.62 GB?
>
>
>
> On Wed, 11 Sep 2019 at 22:30, Akreeti Agarwal  wrote:
>
> > Hi,
> >
> > It fails many times, sharing the iteration:
> >
> > Passed:
> > Wed Sep 11 16:49:18 UTC 2019
> > Wed Sep 11 16:48:56 UTC 2019
> > Wed Sep 11 16:48:36 UTC 2019
> > Wed Sep 11 16:48:18 UTC 2019
> > Wed Sep 11 16:47:55 UTC 2019
> > Wed Sep 11 16:47:35 UTC 2019
> > Wed Sep 11 16:47:16 UTC 2019
> > Wed Sep 11 16:46:55 UTC 2019
> > Wed Sep 11 16:46:33 UTC 2019
> >
> > Failed:
> > Wed Sep 11 16:42:47 UTC 2019
> > Wed Sep 11 16:36:07 UTC 2019
> > Wed Sep 11 16:32:47 UTC 2019
> > Wed Sep 11 16:29:27 UTC 2019
> > Wed Sep 11 16:24:27 UTC 2019
> > Wed Sep 11 16:11:07 UTC 2019
> > Wed Sep 11 16:09:47 UTC 2019
> > Wed Sep 11 16:05:47 UTC 2019
> > Wed Sep 11 15:53:27 UTC 2019
> > Wed Sep 11 15:51:47 UTC 2019
> >
> > Memory details:
> >
> >  total   used   free sharedbuffers cached
> > Mem: 15947  15549398 62198   5650
> > -/+ buffers/cache:   9700   6246
> > Swap:0  0  0
> >
> >
> > Thanks & Regards,
> > Akreeti Agarwal
> >
> >
> > -Original Message-
> > From: Jon Kjær Amundsen 
> > Sent: Wednesday, September 11, 2019 7:28 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Replication Iteration
> >
> > Is it every time it fails, or just sometimes?
> > What is the timestamps on the failed and passed iterations?
> > And how much disk space do you have available on the slave?
> >
> > Venlig hilsen/Best regards
> >
> > *Jon Kjær Amundsen*
> > Developer
> >
> >
> > Phone: +45 7023 9080
> > E-mail: j...@udbudsvagten.dk
> > Web:
> > https://apc01.safelinks.protection.outlook.com/?url=www.udbudsvagten.d
> > kdata=02%7C01%7CAkreetiA%40hcl.com%7Cf7230fabcabe4144303c08d73783
> > 6ad1%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C637038909894888022
> > mp;sdata=jABLGHXUJv6DyJ6vqiHRHgr0PbiP8Si2gMDD5fMxDls%3Dreserved=0
> > Parken - Tårn D - 5. Sal
> > Øster Allé 48 | DK - 2100 København
> >
> > <
> > https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdk.li
> > nkedin.com%2Fin%2FJonKjaerAmundsen%2Fdata=02%7C01%7CAkreetiA%40hc
> > l.com%7Cf7230fabcabe4144303c08d737836ad1%7C189de737c93a4f5a8b686f4ca99
> > 41912%7C0%7C0%7C637038909894888022sdata=UTMNsDYtN2OiLfkItVP6JwE3V
> > WSPhhWHvCGBIZfLifU%3Dreserved=0
> > >
> >
> > Intelligent Offentlig Samhandel
> > *Før, under og efter udbud*
> >
> > *Følg UdbudsVagten og markedet her Linkedin <
> > https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.l
> > inkedin.com%2Fgroups%3FgroupDashboard%3D%26gid%3D1862353data=02%7
> > C01%7CAkreetiA%40hcl.com%7Cf7230fabcabe4144303c08d737836ad1%7C189de737
> > c93a4f5a8b686f4ca9941912%7C0%7C0%7C637038909894888022sdata=tI2RNk
> > nFQjpFZZHaIM7gedcVdwEs6uJGgQH7FEHX5ag%3Dreserved=0>
> > *
> >
> >
> > Den ons. 11. sep. 2019 kl. 15.23 skrev Akreeti Agarwal  >:
> >
> > > My index size is 2.62 GB, and :
> > > 00:00:10
> > >
> > > Thanks & Regards,
> > > Akreeti Agarwal
> > >
> > >
> > > -Original Message-
> > > From: Paras Lehana 
> > > Sent: Wednesday, September 11, 2019 5:39 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Replication Iteration
> > >
> > > What is the size of your index? Is it too big? How fast is your link
>

RE: Replication Iteration

2019-09-13 Thread Akreeti Agarwal
Hi,

I have no idea about how much time is taken for successful replication for 2.62 
GB. How much should I set "commitReserveDuration" for 2.62 GB ?

Thanks & Regards,
Akreeti Agarwal

-Original Message-
From: Paras Lehana  
Sent: Thursday, September 12, 2019 6:46 PM
To: solr-user@lucene.apache.org
Subject: Re: Replication Iteration

Hey Akreeti,

00:00:10


Have you tried increasing *commitReserveDuration*? Do you have any idea how 
much time your successful replications take for 2.62 GB?



On Wed, 11 Sep 2019 at 22:30, Akreeti Agarwal  wrote:

> Hi,
>
> It fails many times, sharing the iteration:
>
> Passed:
> Wed Sep 11 16:49:18 UTC 2019
> Wed Sep 11 16:48:56 UTC 2019
> Wed Sep 11 16:48:36 UTC 2019
> Wed Sep 11 16:48:18 UTC 2019
> Wed Sep 11 16:47:55 UTC 2019
> Wed Sep 11 16:47:35 UTC 2019
> Wed Sep 11 16:47:16 UTC 2019
> Wed Sep 11 16:46:55 UTC 2019
> Wed Sep 11 16:46:33 UTC 2019
>
> Failed:
> Wed Sep 11 16:42:47 UTC 2019
> Wed Sep 11 16:36:07 UTC 2019
> Wed Sep 11 16:32:47 UTC 2019
> Wed Sep 11 16:29:27 UTC 2019
> Wed Sep 11 16:24:27 UTC 2019
> Wed Sep 11 16:11:07 UTC 2019
> Wed Sep 11 16:09:47 UTC 2019
> Wed Sep 11 16:05:47 UTC 2019
> Wed Sep 11 15:53:27 UTC 2019
> Wed Sep 11 15:51:47 UTC 2019
>
> Memory details:
>
>  total   used   free sharedbuffers cached
> Mem: 15947  15549398 62198   5650
> -/+ buffers/cache:   9700   6246
> Swap:0  0  0
>
>
> Thanks & Regards,
> Akreeti Agarwal
>
>
> -Original Message-
> From: Jon Kjær Amundsen 
> Sent: Wednesday, September 11, 2019 7:28 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Replication Iteration
>
> Is it every time it fails, or just sometimes?
> What is the timestamps on the failed and passed iterations?
> And how much disk space do you have available on the slave?
>
> Venlig hilsen/Best regards
>
> *Jon Kjær Amundsen*
> Developer
>
>
> Phone: +45 7023 9080
> E-mail: j...@udbudsvagten.dk
> Web:
> https://apc01.safelinks.protection.outlook.com/?url=www.udbudsvagten.d
> kdata=02%7C01%7CAkreetiA%40hcl.com%7Cf7230fabcabe4144303c08d73783
> 6ad1%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C637038909894888022
> mp;sdata=jABLGHXUJv6DyJ6vqiHRHgr0PbiP8Si2gMDD5fMxDls%3Dreserved=0
> Parken - Tårn D - 5. Sal
> Øster Allé 48 | DK - 2100 København
>
> <
> https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdk.li
> nkedin.com%2Fin%2FJonKjaerAmundsen%2Fdata=02%7C01%7CAkreetiA%40hc
> l.com%7Cf7230fabcabe4144303c08d737836ad1%7C189de737c93a4f5a8b686f4ca99
> 41912%7C0%7C0%7C637038909894888022sdata=UTMNsDYtN2OiLfkItVP6JwE3V
> WSPhhWHvCGBIZfLifU%3Dreserved=0
> >
>
> Intelligent Offentlig Samhandel
> *Før, under og efter udbud*
>
> *Følg UdbudsVagten og markedet her Linkedin < 
> https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.l
> inkedin.com%2Fgroups%3FgroupDashboard%3D%26gid%3D1862353data=02%7
> C01%7CAkreetiA%40hcl.com%7Cf7230fabcabe4144303c08d737836ad1%7C189de737
> c93a4f5a8b686f4ca9941912%7C0%7C0%7C637038909894888022sdata=tI2RNk
> nFQjpFZZHaIM7gedcVdwEs6uJGgQH7FEHX5ag%3Dreserved=0>
> *
>
>
> Den ons. 11. sep. 2019 kl. 15.23 skrev Akreeti Agarwal :
>
> > My index size is 2.62 GB, and :
> > 00:00:10
> >
> > Thanks & Regards,
> > Akreeti Agarwal
> >
> >
> > -Original Message-
> > From: Paras Lehana 
> > Sent: Wednesday, September 11, 2019 5:39 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Replication Iteration
> >
> > What is the size of your index? Is it too big? How fast is your link 
> > between master and slave?
> >
> > I'm asking these because, for larger indexes, you may want to raise 
> > commitReserveDuration defined in ReplicationHandler in solrconfig.xml.
> >
> > 00:00:10
> >
> >
> > From SolrReplication
> > <
> > https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcw
> > ik 
> > i.apache.org%2Fconfluence%2Fdisplay%2Fsolr%2FSolrReplicationdat
> > a= 
> > 02%7C01%7CAkreetiA%40hcl.com%7Cd0ccab4b71554c325d0e08d736c019bb%7C18
> > 9d 
> > e737c93a4f5a8b686f4ca9941912%7C0%7C0%7C637038071020061996sdata=
> > 4p
> > rY6X8%2FXVTbFj9OAlgQ1t6Vq7GFytRPHNzPsQkFktc%3Dreserved=0
> > >
> > documentation for Master
> > <
> > https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcw
> > ik 
> > i.apache.org%2Fconfluence%2Fdisplay%2Fsolr%2FSolrReplication%23Maste
> > r&
> > amp;data=02%7C01%7CAkreetiA%40hcl.com%7Cd0ccab4b71554c325d0e

Re: Replication Iteration

2019-09-12 Thread Paras Lehana
Hey Akreeti,

00:00:10


Have you tried increasing *commitReserveDuration*? Do you have any idea how
much time your successful replications take for 2.62 GB?



On Wed, 11 Sep 2019 at 22:30, Akreeti Agarwal  wrote:

> Hi,
>
> It fails many times, sharing the iteration:
>
> Passed:
> Wed Sep 11 16:49:18 UTC 2019
> Wed Sep 11 16:48:56 UTC 2019
> Wed Sep 11 16:48:36 UTC 2019
> Wed Sep 11 16:48:18 UTC 2019
> Wed Sep 11 16:47:55 UTC 2019
> Wed Sep 11 16:47:35 UTC 2019
> Wed Sep 11 16:47:16 UTC 2019
> Wed Sep 11 16:46:55 UTC 2019
> Wed Sep 11 16:46:33 UTC 2019
>
> Failed:
> Wed Sep 11 16:42:47 UTC 2019
> Wed Sep 11 16:36:07 UTC 2019
> Wed Sep 11 16:32:47 UTC 2019
> Wed Sep 11 16:29:27 UTC 2019
> Wed Sep 11 16:24:27 UTC 2019
> Wed Sep 11 16:11:07 UTC 2019
> Wed Sep 11 16:09:47 UTC 2019
> Wed Sep 11 16:05:47 UTC 2019
> Wed Sep 11 15:53:27 UTC 2019
> Wed Sep 11 15:51:47 UTC 2019
>
> Memory details:
>
>  total   used   free sharedbuffers cached
> Mem: 15947  15549398 62198   5650
> -/+ buffers/cache:   9700   6246
> Swap:0  0  0
>
>
> Thanks & Regards,
> Akreeti Agarwal
>
>
> -Original Message-----
> From: Jon Kjær Amundsen 
> Sent: Wednesday, September 11, 2019 7:28 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Replication Iteration
>
> Is it every time it fails, or just sometimes?
> What is the timestamps on the failed and passed iterations?
> And how much disk space do you have available on the slave?
>
> Venlig hilsen/Best regards
>
> *Jon Kjær Amundsen*
> Developer
>
>
> Phone: +45 7023 9080
> E-mail: j...@udbudsvagten.dk
> Web:
> https://apc01.safelinks.protection.outlook.com/?url=www.udbudsvagten.dkdata=02%7C01%7CAkreetiA%40hcl.com%7Cd0ccab4b71554c325d0e08d736c019bb%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C637038071020061996sdata=DSWnCimBzOilM8DoS4efsbgE%2BP%2BG2RDP0IUjolch7Z4%3Dreserved=0
> Parken - Tårn D - 5. Sal
> Øster Allé 48 | DK - 2100 København
>
> <
> https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdk.linkedin.com%2Fin%2FJonKjaerAmundsen%2Fdata=02%7C01%7CAkreetiA%40hcl.com%7Cd0ccab4b71554c325d0e08d736c019bb%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C637038071020061996sdata=84xwmZ%2BEj6u9sXiSmM7QMKHpkDMpYMpltFNhKT6v7F0%3Dreserved=0
> >
>
> Intelligent Offentlig Samhandel
> *Før, under og efter udbud*
>
> *Følg UdbudsVagten og markedet her Linkedin <
> https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.linkedin.com%2Fgroups%3FgroupDashboard%3D%26gid%3D1862353data=02%7C01%7CAkreetiA%40hcl.com%7Cd0ccab4b71554c325d0e08d736c019bb%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C637038071020061996sdata=pFZ%2FUDSmCzzuUJur74FMF9%2FDXSCKkJHCs8x4Wj3Jyfo%3Dreserved=0>
> *
>
>
> Den ons. 11. sep. 2019 kl. 15.23 skrev Akreeti Agarwal :
>
> > My index size is 2.62 GB, and :
> > 00:00:10
> >
> > Thanks & Regards,
> > Akreeti Agarwal
> >
> >
> > -Original Message-
> > From: Paras Lehana 
> > Sent: Wednesday, September 11, 2019 5:39 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Replication Iteration
> >
> > What is the size of your index? Is it too big? How fast is your link
> > between master and slave?
> >
> > I'm asking these because, for larger indexes, you may want to raise
> > commitReserveDuration defined in ReplicationHandler in solrconfig.xml.
> >
> > 00:00:10
> >
> >
> > From SolrReplication
> > <
> > https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwik
> > i.apache.org%2Fconfluence%2Fdisplay%2Fsolr%2FSolrReplicationdata=
> > 02%7C01%7CAkreetiA%40hcl.com%7Cd0ccab4b71554c325d0e08d736c019bb%7C189d
> > e737c93a4f5a8b686f4ca9941912%7C0%7C0%7C637038071020061996sdata=4p
> > rY6X8%2FXVTbFj9OAlgQ1t6Vq7GFytRPHNzPsQkFktc%3Dreserved=0
> > >
> > documentation for Master
> > <
> > https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwik
> > i.apache.org%2Fconfluence%2Fdisplay%2Fsolr%2FSolrReplication%23Master&
> > amp;data=02%7C01%7CAkreetiA%40hcl.com%7Cd0ccab4b71554c325d0e08d736c019
> > bb%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C637038071020061996
> > ;sdata=vPlsIvqTrwzq0wSkBkMiJc3C7IOaXpQWdwG06xRlG9k%3Dreserved=0
> > >:
> >
> > If your commits are very frequent and network is particularly slow,
> > you can
> > > tweak an extra attribute  > name="commitReserveDuration">00:00:10.
> > > This is roughly the time taken to download 5MB from master to slave.
> > &g

RE: Replication Iteration

2019-09-11 Thread Akreeti Agarwal
Hi,

It fails many times, sharing the iteration:

Passed:
Wed Sep 11 16:49:18 UTC 2019
Wed Sep 11 16:48:56 UTC 2019
Wed Sep 11 16:48:36 UTC 2019
Wed Sep 11 16:48:18 UTC 2019
Wed Sep 11 16:47:55 UTC 2019
Wed Sep 11 16:47:35 UTC 2019
Wed Sep 11 16:47:16 UTC 2019
Wed Sep 11 16:46:55 UTC 2019
Wed Sep 11 16:46:33 UTC 2019

Failed:
Wed Sep 11 16:42:47 UTC 2019
Wed Sep 11 16:36:07 UTC 2019
Wed Sep 11 16:32:47 UTC 2019
Wed Sep 11 16:29:27 UTC 2019
Wed Sep 11 16:24:27 UTC 2019
Wed Sep 11 16:11:07 UTC 2019
Wed Sep 11 16:09:47 UTC 2019
Wed Sep 11 16:05:47 UTC 2019
Wed Sep 11 15:53:27 UTC 2019
Wed Sep 11 15:51:47 UTC 2019

Memory details:

 total   used   free sharedbuffers cached
Mem: 15947  15549398 62198   5650
-/+ buffers/cache:   9700   6246
Swap:0  0  0


Thanks & Regards,
Akreeti Agarwal


-Original Message-
From: Jon Kjær Amundsen  
Sent: Wednesday, September 11, 2019 7:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Replication Iteration

Is it every time it fails, or just sometimes?
What is the timestamps on the failed and passed iterations?
And how much disk space do you have available on the slave?

Venlig hilsen/Best regards

*Jon Kjær Amundsen*
Developer


Phone: +45 7023 9080
E-mail: j...@udbudsvagten.dk
Web: 
https://apc01.safelinks.protection.outlook.com/?url=www.udbudsvagten.dkdata=02%7C01%7CAkreetiA%40hcl.com%7Cd0ccab4b71554c325d0e08d736c019bb%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C637038071020061996sdata=DSWnCimBzOilM8DoS4efsbgE%2BP%2BG2RDP0IUjolch7Z4%3Dreserved=0
Parken - Tårn D - 5. Sal
Øster Allé 48 | DK - 2100 København

<https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdk.linkedin.com%2Fin%2FJonKjaerAmundsen%2Fdata=02%7C01%7CAkreetiA%40hcl.com%7Cd0ccab4b71554c325d0e08d736c019bb%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C637038071020061996sdata=84xwmZ%2BEj6u9sXiSmM7QMKHpkDMpYMpltFNhKT6v7F0%3Dreserved=0>

Intelligent Offentlig Samhandel
*Før, under og efter udbud*

*Følg UdbudsVagten og markedet her Linkedin 
<https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.linkedin.com%2Fgroups%3FgroupDashboard%3D%26gid%3D1862353data=02%7C01%7CAkreetiA%40hcl.com%7Cd0ccab4b71554c325d0e08d736c019bb%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C637038071020061996sdata=pFZ%2FUDSmCzzuUJur74FMF9%2FDXSCKkJHCs8x4Wj3Jyfo%3Dreserved=0>
 *


Den ons. 11. sep. 2019 kl. 15.23 skrev Akreeti Agarwal :

> My index size is 2.62 GB, and :
> 00:00:10
>
> Thanks & Regards,
> Akreeti Agarwal
>
>
> -Original Message-
> From: Paras Lehana 
> Sent: Wednesday, September 11, 2019 5:39 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Replication Iteration
>
> What is the size of your index? Is it too big? How fast is your link 
> between master and slave?
>
> I'm asking these because, for larger indexes, you may want to raise 
> commitReserveDuration defined in ReplicationHandler in solrconfig.xml.
>
> 00:00:10
>
>
> From SolrReplication
> <
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwik
> i.apache.org%2Fconfluence%2Fdisplay%2Fsolr%2FSolrReplicationdata=
> 02%7C01%7CAkreetiA%40hcl.com%7Cd0ccab4b71554c325d0e08d736c019bb%7C189d
> e737c93a4f5a8b686f4ca9941912%7C0%7C0%7C637038071020061996sdata=4p
> rY6X8%2FXVTbFj9OAlgQ1t6Vq7GFytRPHNzPsQkFktc%3Dreserved=0
> >
> documentation for Master
> <
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwik
> i.apache.org%2Fconfluence%2Fdisplay%2Fsolr%2FSolrReplication%23Master&
> amp;data=02%7C01%7CAkreetiA%40hcl.com%7Cd0ccab4b71554c325d0e08d736c019
> bb%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C637038071020061996
> ;sdata=vPlsIvqTrwzq0wSkBkMiJc3C7IOaXpQWdwG06xRlG9k%3Dreserved=0
> >:
>
> If your commits are very frequent and network is particularly slow, 
> you can
> > tweak an extra attribute  name="commitReserveDuration">00:00:10.
> > This is roughly the time taken to download 5MB from master to slave.
> > Default is 10 secs.
>
>
>  Do check the hyperlinks.
>
> On Wed, 11 Sep 2019 at 17:25, Akreeti Agarwal  wrote:
>
> > I am seeing the logs on both UI and file, but I only see this error:
> >
> > ReplicationHandler
> > Index fetch failed :org.apache.solr.common.SolrException: Unable to 
> > download segments_znow completely. Downloaded 0!=2217
> >
> > Thanks & Regards,
> > Akreeti Agarwal
> >
> > -Original Message-
> > From: Paras Lehana 
> > Sent: Wednesday, September 11, 2019 5:17 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Replication Iteration
> >
> > Hi Akreeti,
> >
> > Have you tried using the old UI to see errors? I had always 
> > 

Re: Replication Iteration

2019-09-11 Thread Jon Kjær Amundsen
Is it every time it fails, or just sometimes?
What is the timestamps on the failed and passed iterations?
And how much disk space do you have available on the slave?

Venlig hilsen/Best regards

*Jon Kjær Amundsen*
Developer


Phone: +45 7023 9080
E-mail: j...@udbudsvagten.dk
Web: www.udbudsvagten.dk
Parken - Tårn D - 5. Sal
Øster Allé 48 | DK - 2100 København

<http://dk.linkedin.com/in/JonKjaerAmundsen/>

Intelligent Offentlig Samhandel
*Før, under og efter udbud*

*Følg UdbudsVagten og markedet her Linkedin
<http://www.linkedin.com/groups?groupDashboard==1862353> *


Den ons. 11. sep. 2019 kl. 15.23 skrev Akreeti Agarwal :

> My index size is 2.62 GB, and :
> 00:00:10
>
> Thanks & Regards,
> Akreeti Agarwal
>
>
> -Original Message-
> From: Paras Lehana 
> Sent: Wednesday, September 11, 2019 5:39 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Replication Iteration
>
> What is the size of your index? Is it too big? How fast is your link
> between master and slave?
>
> I'm asking these because, for larger indexes, you may want to raise
> commitReserveDuration defined in ReplicationHandler in solrconfig.xml.
>
> 00:00:10
>
>
> From SolrReplication
> <
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2Fsolr%2FSolrReplicationdata=02%7C01%7CAkreetiA%40hcl.com%7C23769a9cc4094154170008d736b0f190%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C637038005907686643sdata=NEwvmMCXOFB7v56IuNJ8cJglInSJZjLTSUQTT6d07f0%3Dreserved=0
> >
> documentation for Master
> <
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2Fsolr%2FSolrReplication%23Masterdata=02%7C01%7CAkreetiA%40hcl.com%7C23769a9cc4094154170008d736b0f190%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C637038005907686643sdata=5gX%2FbeaPUQWxMkJ8bD4Jt%2BB7feB509VHD25d66iLCnU%3Dreserved=0
> >:
>
> If your commits are very frequent and network is particularly slow, you can
> > tweak an extra attribute  name="commitReserveDuration">00:00:10.
> > This is roughly the time taken to download 5MB from master to slave.
> > Default is 10 secs.
>
>
>  Do check the hyperlinks.
>
> On Wed, 11 Sep 2019 at 17:25, Akreeti Agarwal  wrote:
>
> > I am seeing the logs on both UI and file, but I only see this error:
> >
> > ReplicationHandler
> > Index fetch failed :org.apache.solr.common.SolrException: Unable to
> > download segments_znow completely. Downloaded 0!=2217
> >
> > Thanks & Regards,
> > Akreeti Agarwal
> >
> > -Original Message-
> > From: Paras Lehana 
> > Sent: Wednesday, September 11, 2019 5:17 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Replication Iteration
> >
> > Hi Akreeti,
> >
> > Have you tried using the old UI to see errors? I had always
> > experienced not seeing status updates about replication in the newer
> > UI. Check for the option on top right of Solr UI.
> >
> > And where are you seeing logs - on solr UI or from a file?
> >
> > On Wed, 11 Sep 2019 at 16:12, Akreeti Agarwal  wrote:
> >
> > > In the logs I don't see any errors, mostly after every  1-2 min
> > > replication fails and I am not able to identify the root cause for it.
> > >
> > > Thanks & Regards,
> > > Akreeti Agarwal
> > >
> > > -Original Message-
> > > From: Jon Kjær Amundsen 
> > > Sent: Wednesday, September 11, 2019 12:15 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Replication Iteration
> > >
> > > It depends on the timestamps.
> > > The red iterations are failed replications and the green are passed
> > > replications.
> > > If the newest timestamp is green the latest replication went well,
> > > if it is red, it failed.
> > >
> > > You should check the solr log on the slave if a recent replication
> > > have failed to see the cause.
> > >
> > > Venlig hilsen/Best regards
> > >
> > > *Jon Kjær Amundsen*
> > > Developer
> > >
> > >
> > > Phone: +45 7023 9080
> > > E-mail: j...@udbudsvagten.dk
> > > Web:
> > > https://apc01.safelinks.protection.outlook.com/?url=www.udbudsvagten
> > > .d
> > > kdata=02%7C01%7CAkreetiA%40hcl.com%7Cdac98a88d43446d88f8208d736
> > > ad
> > > ebfa%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C1%7C637037992931651416
> > > 
> > > mp;sdata=HBrEXexrQZ2UrhmDwGaZfnvn4XjbawVGq8PnMDA5ocA%3Dreserved
> > > =0
> > > Parken - Tårn D - 5

RE: Replication Iteration

2019-09-11 Thread Akreeti Agarwal
My index size is 2.62 GB, and :
00:00:10

Thanks & Regards,
Akreeti Agarwal


-Original Message-
From: Paras Lehana  
Sent: Wednesday, September 11, 2019 5:39 PM
To: solr-user@lucene.apache.org
Subject: Re: Replication Iteration

What is the size of your index? Is it too big? How fast is your link between 
master and slave?

I'm asking these because, for larger indexes, you may want to raise 
commitReserveDuration defined in ReplicationHandler in solrconfig.xml.

00:00:10


From SolrReplication
<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2Fsolr%2FSolrReplicationdata=02%7C01%7CAkreetiA%40hcl.com%7C23769a9cc4094154170008d736b0f190%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C637038005907686643sdata=NEwvmMCXOFB7v56IuNJ8cJglInSJZjLTSUQTT6d07f0%3Dreserved=0>
documentation for Master
<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2Fsolr%2FSolrReplication%23Masterdata=02%7C01%7CAkreetiA%40hcl.com%7C23769a9cc4094154170008d736b0f190%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C637038005907686643sdata=5gX%2FbeaPUQWxMkJ8bD4Jt%2BB7feB509VHD25d66iLCnU%3Dreserved=0>:

If your commits are very frequent and network is particularly slow, you can
> tweak an extra attribute 00:00:10.
> This is roughly the time taken to download 5MB from master to slave.
> Default is 10 secs.


 Do check the hyperlinks.

On Wed, 11 Sep 2019 at 17:25, Akreeti Agarwal  wrote:

> I am seeing the logs on both UI and file, but I only see this error:
>
> ReplicationHandler
> Index fetch failed :org.apache.solr.common.SolrException: Unable to 
> download segments_znow completely. Downloaded 0!=2217
>
> Thanks & Regards,
> Akreeti Agarwal
>
> -Original Message-
> From: Paras Lehana 
> Sent: Wednesday, September 11, 2019 5:17 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Replication Iteration
>
> Hi Akreeti,
>
> Have you tried using the old UI to see errors? I had always 
> experienced not seeing status updates about replication in the newer 
> UI. Check for the option on top right of Solr UI.
>
> And where are you seeing logs - on solr UI or from a file?
>
> On Wed, 11 Sep 2019 at 16:12, Akreeti Agarwal  wrote:
>
> > In the logs I don't see any errors, mostly after every  1-2 min 
> > replication fails and I am not able to identify the root cause for it.
> >
> > Thanks & Regards,
> > Akreeti Agarwal
> >
> > -Original Message-
> > From: Jon Kjær Amundsen 
> > Sent: Wednesday, September 11, 2019 12:15 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Replication Iteration
> >
> > It depends on the timestamps.
> > The red iterations are failed replications and the green are passed 
> > replications.
> > If the newest timestamp is green the latest replication went well, 
> > if it is red, it failed.
> >
> > You should check the solr log on the slave if a recent replication 
> > have failed to see the cause.
> >
> > Venlig hilsen/Best regards
> >
> > *Jon Kjær Amundsen*
> > Developer
> >
> >
> > Phone: +45 7023 9080
> > E-mail: j...@udbudsvagten.dk
> > Web:
> > https://apc01.safelinks.protection.outlook.com/?url=www.udbudsvagten
> > .d 
> > kdata=02%7C01%7CAkreetiA%40hcl.com%7Cdac98a88d43446d88f8208d736
> > ad 
> > ebfa%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C1%7C637037992931651416
> > 
> > mp;sdata=HBrEXexrQZ2UrhmDwGaZfnvn4XjbawVGq8PnMDA5ocA%3Dreserved
> > =0
> > Parken - Tårn D - 5. Sal
> > Øster Allé 48 | DK - 2100 København
> >
> > <
> > https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdk.
> > li 
> > nkedin.com%2Fin%2FJonKjaerAmundsen%2Fdata=02%7C01%7CAkreetiA%40
> > hc
> > l.com%7Cdac98a88d43446d88f8208d736adebfa%7C189de737c93a4f5a8b686f4ca
> > 99 
> > 41912%7C0%7C1%7C637037992931651416sdata=9sealYl8sRN7Et2Oc%2F8Nx
> > nx
> > ooisjC15hPV5y9NSCgKg%3Dreserved=0
> > >
> >
> > Intelligent Offentlig Samhandel
> > *Før, under og efter udbud*
> >
> > *Følg UdbudsVagten og markedet her Linkedin < 
> > https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww
> > .l
> > inkedin.com%2Fgroups%3FgroupDashboard%3D%26gid%3D1862353data=02
> > %7
> > C01%7CAkreetiA%40hcl.com%7Cdac98a88d43446d88f8208d736adebfa%7C189de7
> > 37
> > c93a4f5a8b686f4ca9941912%7C0%7C1%7C637037992931651416sdata=zQVT
> > %2 FJjMY6GlU4IVHZEMc6P4EWh9MrMcWEybBonKn7w%3Dreserved=0>
> > *
> >
> >
> > Den ons. 11. sep. 2019 kl. 06.35 skrev Akreeti Agarwal 
> > >:
> >
> 

Re: Replication Iteration

2019-09-11 Thread Paras Lehana
What is the size of your index? Is it too big? How fast is your link
between master and slave?

I'm asking these because, for larger indexes, you may want to raise
commitReserveDuration defined in ReplicationHandler in solrconfig.xml.

00:00:10


>From SolrReplication
<https://cwiki.apache.org/confluence/display/solr/SolrReplication>
documentation for Master
<https://cwiki.apache.org/confluence/display/solr/SolrReplication#Master>:

If your commits are very frequent and network is particularly slow, you can
> tweak an extra attribute 00:00:10.
> This is roughly the time taken to download 5MB from master to slave.
> Default is 10 secs.


 Do check the hyperlinks.

On Wed, 11 Sep 2019 at 17:25, Akreeti Agarwal  wrote:

> I am seeing the logs on both UI and file, but I only see this error:
>
> ReplicationHandler
> Index fetch failed :org.apache.solr.common.SolrException: Unable to
> download segments_znow completely. Downloaded 0!=2217
>
> Thanks & Regards,
> Akreeti Agarwal
>
> -Original Message-
> From: Paras Lehana 
> Sent: Wednesday, September 11, 2019 5:17 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Replication Iteration
>
> Hi Akreeti,
>
> Have you tried using the old UI to see errors? I had always experienced
> not seeing status updates about replication in the newer UI. Check for the
> option on top right of Solr UI.
>
> And where are you seeing logs - on solr UI or from a file?
>
> On Wed, 11 Sep 2019 at 16:12, Akreeti Agarwal  wrote:
>
> > In the logs I don't see any errors, mostly after every  1-2 min
> > replication fails and I am not able to identify the root cause for it.
> >
> > Thanks & Regards,
> > Akreeti Agarwal
> >
> > -Original Message-
> > From: Jon Kjær Amundsen 
> > Sent: Wednesday, September 11, 2019 12:15 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Replication Iteration
> >
> > It depends on the timestamps.
> > The red iterations are failed replications and the green are passed
> > replications.
> > If the newest timestamp is green the latest replication went well, if
> > it is red, it failed.
> >
> > You should check the solr log on the slave if a recent replication
> > have failed to see the cause.
> >
> > Venlig hilsen/Best regards
> >
> > *Jon Kjær Amundsen*
> > Developer
> >
> >
> > Phone: +45 7023 9080
> > E-mail: j...@udbudsvagten.dk
> > Web:
> > https://apc01.safelinks.protection.outlook.com/?url=www.udbudsvagten.d
> > kdata=02%7C01%7CAkreetiA%40hcl.com%7Cdac98a88d43446d88f8208d736ad
> > ebfa%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C1%7C637037992931651416
> > mp;sdata=HBrEXexrQZ2UrhmDwGaZfnvn4XjbawVGq8PnMDA5ocA%3Dreserved=0
> > Parken - Tårn D - 5. Sal
> > Øster Allé 48 | DK - 2100 København
> >
> > <
> > https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdk.li
> > nkedin.com%2Fin%2FJonKjaerAmundsen%2Fdata=02%7C01%7CAkreetiA%40hc
> > l.com%7Cdac98a88d43446d88f8208d736adebfa%7C189de737c93a4f5a8b686f4ca99
> > 41912%7C0%7C1%7C637037992931651416sdata=9sealYl8sRN7Et2Oc%2F8Nxnx
> > ooisjC15hPV5y9NSCgKg%3Dreserved=0
> > >
> >
> > Intelligent Offentlig Samhandel
> > *Før, under og efter udbud*
> >
> > *Følg UdbudsVagten og markedet her Linkedin <
> > https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.l
> > inkedin.com%2Fgroups%3FgroupDashboard%3D%26gid%3D1862353data=02%7
> > C01%7CAkreetiA%40hcl.com%7Cdac98a88d43446d88f8208d736adebfa%7C189de737
> > c93a4f5a8b686f4ca9941912%7C0%7C1%7C637037992931651416sdata=zQVT%2
> > FJjMY6GlU4IVHZEMc6P4EWh9MrMcWEybBonKn7w%3Dreserved=0>
> > *
> >
> >
> > Den ons. 11. sep. 2019 kl. 06.35 skrev Akreeti Agarwal  >:
> >
> > > Hi All,
> > >
> > > I am using solr-5.5.5, in which I have one master and two slaves. I
> > > see some red and some green replication iteration on my slave side.
> > > What does these red and green iteration means?
> > > Will this cause problem?
> > >
> > > Thanks & Regards,
> > > Akreeti Agarwal
> > >
> > > ::DISCLAIMER::
> > > 
> > > The contents of this e-mail and any attachment(s) are confidential
> > > and intended for the named recipient(s) only. E-mail transmission is
> > > not guaranteed to be secure or error-free as information could be
> > > intercepted, corrupted, lost, destroyed, arrive late or incomplete,
> > > or may contain viruses in transmission. The e mail and its contents
> > > 

RE: Replication Iteration

2019-09-11 Thread Akreeti Agarwal
I am seeing the logs on both UI and file, but I only see this error:

ReplicationHandler
Index fetch failed :org.apache.solr.common.SolrException: Unable to download 
segments_znow completely. Downloaded 0!=2217

Thanks & Regards,
Akreeti Agarwal

-Original Message-
From: Paras Lehana  
Sent: Wednesday, September 11, 2019 5:17 PM
To: solr-user@lucene.apache.org
Subject: Re: Replication Iteration

Hi Akreeti,

Have you tried using the old UI to see errors? I had always experienced not 
seeing status updates about replication in the newer UI. Check for the option 
on top right of Solr UI.

And where are you seeing logs - on solr UI or from a file?

On Wed, 11 Sep 2019 at 16:12, Akreeti Agarwal  wrote:

> In the logs I don't see any errors, mostly after every  1-2 min 
> replication fails and I am not able to identify the root cause for it.
>
> Thanks & Regards,
> Akreeti Agarwal
>
> -Original Message-
> From: Jon Kjær Amundsen 
> Sent: Wednesday, September 11, 2019 12:15 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Replication Iteration
>
> It depends on the timestamps.
> The red iterations are failed replications and the green are passed 
> replications.
> If the newest timestamp is green the latest replication went well, if 
> it is red, it failed.
>
> You should check the solr log on the slave if a recent replication 
> have failed to see the cause.
>
> Venlig hilsen/Best regards
>
> *Jon Kjær Amundsen*
> Developer
>
>
> Phone: +45 7023 9080
> E-mail: j...@udbudsvagten.dk
> Web:
> https://apc01.safelinks.protection.outlook.com/?url=www.udbudsvagten.d
> kdata=02%7C01%7CAkreetiA%40hcl.com%7Cdac98a88d43446d88f8208d736ad
> ebfa%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C1%7C637037992931651416
> mp;sdata=HBrEXexrQZ2UrhmDwGaZfnvn4XjbawVGq8PnMDA5ocA%3Dreserved=0
> Parken - Tårn D - 5. Sal
> Øster Allé 48 | DK - 2100 København
>
> <
> https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdk.li
> nkedin.com%2Fin%2FJonKjaerAmundsen%2Fdata=02%7C01%7CAkreetiA%40hc
> l.com%7Cdac98a88d43446d88f8208d736adebfa%7C189de737c93a4f5a8b686f4ca99
> 41912%7C0%7C1%7C637037992931651416sdata=9sealYl8sRN7Et2Oc%2F8Nxnx
> ooisjC15hPV5y9NSCgKg%3Dreserved=0
> >
>
> Intelligent Offentlig Samhandel
> *Før, under og efter udbud*
>
> *Følg UdbudsVagten og markedet her Linkedin < 
> https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.l
> inkedin.com%2Fgroups%3FgroupDashboard%3D%26gid%3D1862353data=02%7
> C01%7CAkreetiA%40hcl.com%7Cdac98a88d43446d88f8208d736adebfa%7C189de737
> c93a4f5a8b686f4ca9941912%7C0%7C1%7C637037992931651416sdata=zQVT%2
> FJjMY6GlU4IVHZEMc6P4EWh9MrMcWEybBonKn7w%3Dreserved=0>
> *
>
>
> Den ons. 11. sep. 2019 kl. 06.35 skrev Akreeti Agarwal :
>
> > Hi All,
> >
> > I am using solr-5.5.5, in which I have one master and two slaves. I 
> > see some red and some green replication iteration on my slave side.
> > What does these red and green iteration means?
> > Will this cause problem?
> >
> > Thanks & Regards,
> > Akreeti Agarwal
> >
> > ::DISCLAIMER::
> > 
> > The contents of this e-mail and any attachment(s) are confidential 
> > and intended for the named recipient(s) only. E-mail transmission is 
> > not guaranteed to be secure or error-free as information could be 
> > intercepted, corrupted, lost, destroyed, arrive late or incomplete, 
> > or may contain viruses in transmission. The e mail and its contents 
> > (with or without referred errors) shall therefore not attach any 
> > liability on the originator or HCL or its affiliates. Views or 
> > opinions, if any, presented in this email are solely those of the 
> > author and may not necessarily reflect the views or opinions of HCL or its 
> > affiliates.
> > Any form of reproduction, dissemination, copying, disclosure, 
> > modification, distribution and / or publication of this message 
> > without the prior written consent of authorized representative of 
> > HCL is strictly prohibited. If you have received this email in error 
> > please delete it and notify the sender immediately. Before opening 
> > any email and/or attachments, please check them for viruses and 
> > other
> defects.
> > 
> >
>


--
--
Regards,

*Paras Lehana* [65871]
Software Programmer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142, Noida, UP, IN - 
201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

--
IMPORTANT:
NEVER share your IndiaMART OTP/ Password with anyone.


Re: Replication Iteration

2019-09-11 Thread Paras Lehana
Hi Akreeti,

Have you tried using the old UI to see errors? I had always experienced not
seeing status updates about replication in the newer UI. Check for the
option on top right of Solr UI.

And where are you seeing logs - on solr UI or from a file?

On Wed, 11 Sep 2019 at 16:12, Akreeti Agarwal  wrote:

> In the logs I don't see any errors, mostly after every  1-2 min
> replication fails and I am not able to identify the root cause for it.
>
> Thanks & Regards,
> Akreeti Agarwal
>
> -Original Message-
> From: Jon Kjær Amundsen 
> Sent: Wednesday, September 11, 2019 12:15 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Replication Iteration
>
> It depends on the timestamps.
> The red iterations are failed replications and the green are passed
> replications.
> If the newest timestamp is green the latest replication went well, if it
> is red, it failed.
>
> You should check the solr log on the slave if a recent replication have
> failed to see the cause.
>
> Venlig hilsen/Best regards
>
> *Jon Kjær Amundsen*
> Developer
>
>
> Phone: +45 7023 9080
> E-mail: j...@udbudsvagten.dk
> Web:
> https://apc01.safelinks.protection.outlook.com/?url=www.udbudsvagten.dkdata=02%7C01%7CAkreetiA%40hcl.com%7Ccede491a1aa34fddc7cc08d73683a389%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C1%7C637037811343910361sdata=4GyE5bBQdOoIZW3GH5uB%2FmA1jqzsQqBmJmvg3U6PDsg%3Dreserved=0
> Parken - Tårn D - 5. Sal
> Øster Allé 48 | DK - 2100 København
>
> <
> https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdk.linkedin.com%2Fin%2FJonKjaerAmundsen%2Fdata=02%7C01%7CAkreetiA%40hcl.com%7Ccede491a1aa34fddc7cc08d73683a389%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C1%7C637037811343910361sdata=YsXMQqLqxXvwoF3d1Juq9LB60gNT9%2FQSVD3L8y08a%2B4%3Dreserved=0
> >
>
> Intelligent Offentlig Samhandel
> *Før, under og efter udbud*
>
> *Følg UdbudsVagten og markedet her Linkedin <
> https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.linkedin.com%2Fgroups%3FgroupDashboard%3D%26gid%3D1862353data=02%7C01%7CAkreetiA%40hcl.com%7Ccede491a1aa34fddc7cc08d73683a389%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C1%7C637037811343910361sdata=Eibo4cubbi%2BTkukH6D9SClTafKzB5Xud2EnJBX9xD0Q%3Dreserved=0>
> *
>
>
> Den ons. 11. sep. 2019 kl. 06.35 skrev Akreeti Agarwal :
>
> > Hi All,
> >
> > I am using solr-5.5.5, in which I have one master and two slaves. I
> > see some red and some green replication iteration on my slave side.
> > What does these red and green iteration means?
> > Will this cause problem?
> >
> > Thanks & Regards,
> > Akreeti Agarwal
> >
> > ::DISCLAIMER::
> > 
> > The contents of this e-mail and any attachment(s) are confidential and
> > intended for the named recipient(s) only. E-mail transmission is not
> > guaranteed to be secure or error-free as information could be
> > intercepted, corrupted, lost, destroyed, arrive late or incomplete, or
> > may contain viruses in transmission. The e mail and its contents (with
> > or without referred errors) shall therefore not attach any liability
> > on the originator or HCL or its affiliates. Views or opinions, if any,
> > presented in this email are solely those of the author and may not
> > necessarily reflect the views or opinions of HCL or its affiliates.
> > Any form of reproduction, dissemination, copying, disclosure,
> > modification, distribution and / or publication of this message
> > without the prior written consent of authorized representative of HCL
> > is strictly prohibited. If you have received this email in error
> > please delete it and notify the sender immediately. Before opening any
> > email and/or attachments, please check them for viruses and other
> defects.
> > 
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Software Programmer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


RE: Replication Iteration

2019-09-11 Thread Akreeti Agarwal
In the logs I don't see any errors, mostly after every  1-2 min replication 
fails and I am not able to identify the root cause for it.

Thanks & Regards,
Akreeti Agarwal

-Original Message-
From: Jon Kjær Amundsen  
Sent: Wednesday, September 11, 2019 12:15 PM
To: solr-user@lucene.apache.org
Subject: Re: Replication Iteration

It depends on the timestamps.
The red iterations are failed replications and the green are passed 
replications.
If the newest timestamp is green the latest replication went well, if it is 
red, it failed.

You should check the solr log on the slave if a recent replication have failed 
to see the cause.

Venlig hilsen/Best regards

*Jon Kjær Amundsen*
Developer


Phone: +45 7023 9080
E-mail: j...@udbudsvagten.dk
Web: 
https://apc01.safelinks.protection.outlook.com/?url=www.udbudsvagten.dkdata=02%7C01%7CAkreetiA%40hcl.com%7Ccede491a1aa34fddc7cc08d73683a389%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C1%7C637037811343910361sdata=4GyE5bBQdOoIZW3GH5uB%2FmA1jqzsQqBmJmvg3U6PDsg%3Dreserved=0
Parken - Tårn D - 5. Sal
Øster Allé 48 | DK - 2100 København

<https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdk.linkedin.com%2Fin%2FJonKjaerAmundsen%2Fdata=02%7C01%7CAkreetiA%40hcl.com%7Ccede491a1aa34fddc7cc08d73683a389%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C1%7C637037811343910361sdata=YsXMQqLqxXvwoF3d1Juq9LB60gNT9%2FQSVD3L8y08a%2B4%3Dreserved=0>

Intelligent Offentlig Samhandel
*Før, under og efter udbud*

*Følg UdbudsVagten og markedet her Linkedin 
<https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.linkedin.com%2Fgroups%3FgroupDashboard%3D%26gid%3D1862353data=02%7C01%7CAkreetiA%40hcl.com%7Ccede491a1aa34fddc7cc08d73683a389%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C1%7C637037811343910361sdata=Eibo4cubbi%2BTkukH6D9SClTafKzB5Xud2EnJBX9xD0Q%3Dreserved=0>
 *


Den ons. 11. sep. 2019 kl. 06.35 skrev Akreeti Agarwal :

> Hi All,
>
> I am using solr-5.5.5, in which I have one master and two slaves. I 
> see some red and some green replication iteration on my slave side.
> What does these red and green iteration means?
> Will this cause problem?
>
> Thanks & Regards,
> Akreeti Agarwal
>
> ::DISCLAIMER::
> 
> The contents of this e-mail and any attachment(s) are confidential and 
> intended for the named recipient(s) only. E-mail transmission is not 
> guaranteed to be secure or error-free as information could be 
> intercepted, corrupted, lost, destroyed, arrive late or incomplete, or 
> may contain viruses in transmission. The e mail and its contents (with 
> or without referred errors) shall therefore not attach any liability 
> on the originator or HCL or its affiliates. Views or opinions, if any, 
> presented in this email are solely those of the author and may not 
> necessarily reflect the views or opinions of HCL or its affiliates. 
> Any form of reproduction, dissemination, copying, disclosure, 
> modification, distribution and / or publication of this message 
> without the prior written consent of authorized representative of HCL 
> is strictly prohibited. If you have received this email in error 
> please delete it and notify the sender immediately. Before opening any 
> email and/or attachments, please check them for viruses and other defects.
> 
>


Re: Replication Iteration

2019-09-11 Thread Jon Kjær Amundsen
It depends on the timestamps.
The red iterations are failed replications and the green are passed
replications.
If the newest timestamp is green the latest replication went well, if it is
red, it failed.

You should check the solr log on the slave if a recent replication have
failed to see the cause.

Venlig hilsen/Best regards

*Jon Kjær Amundsen*
Developer


Phone: +45 7023 9080
E-mail: j...@udbudsvagten.dk
Web: www.udbudsvagten.dk
Parken - Tårn D - 5. Sal
Øster Allé 48 | DK - 2100 København



Intelligent Offentlig Samhandel
*Før, under og efter udbud*

*Følg UdbudsVagten og markedet her Linkedin
 *


Den ons. 11. sep. 2019 kl. 06.35 skrev Akreeti Agarwal :

> Hi All,
>
> I am using solr-5.5.5, in which I have one master and two slaves. I see
> some red and some green replication iteration on my slave side.
> What does these red and green iteration means?
> Will this cause problem?
>
> Thanks & Regards,
> Akreeti Agarwal
>
> ::DISCLAIMER::
> 
> The contents of this e-mail and any attachment(s) are confidential and
> intended for the named recipient(s) only. E-mail transmission is not
> guaranteed to be secure or error-free as information could be intercepted,
> corrupted, lost, destroyed, arrive late or incomplete, or may contain
> viruses in transmission. The e mail and its contents (with or without
> referred errors) shall therefore not attach any liability on the originator
> or HCL or its affiliates. Views or opinions, if any, presented in this
> email are solely those of the author and may not necessarily reflect the
> views or opinions of HCL or its affiliates. Any form of reproduction,
> dissemination, copying, disclosure, modification, distribution and / or
> publication of this message without the prior written consent of authorized
> representative of HCL is strictly prohibited. If you have received this
> email in error please delete it and notify the sender immediately. Before
> opening any email and/or attachments, please check them for viruses and
> other defects.
> 
>


Re: Replication issue with version 0 index in SOLR 7.5

2019-06-26 Thread Patrick Bordelon
One other question related to this.

I know the change was made for a specific problem that was occurring but has
this caused a similar problem as mine with anyone else?

We're looking to try changing the second 'if' statement to add an extra
conditional to prevent it from performing the "deleteAll" operation unless
absolutely specified.

The idea is to use the skipCommitOnMasterVersionZero and set it so that the
if statement will never be true on a new generation index on the primary.

We're going to try some modifications on our polling strategy as a temporary
solution while we test out changing that section of the index fetcher.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Replication issue with version 0 index in SOLR 7.5

2019-06-25 Thread Patrick Bordelon
I removed the replicate after startup from our solrconfig.xml file. However
that didn't solve the issue. When I rebuilt the primary, the associated
replicas all went to 0 documents. 





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Replication issue with version 0 index in SOLR 7.5

2019-06-25 Thread Mikhail Khludnev
Ok. probable dropping  startup will help. Another idea
set replication.enable.master=false and enable it when master index is
build after restart.

On Tue, Jun 25, 2019 at 6:18 PM Patrick Bordelon <
patrick.borde...@coxautoinc.com> wrote:

> We are currently using the replicate after commit and startup
>
> 
> ${replication.enable.master:false}
> commit
> startup
> schema.xml,stopwords.txt
> 
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Replication issue with version 0 index in SOLR 7.5

2019-06-25 Thread Patrick Bordelon
We are currently using the replicate after commit and startup


${replication.enable.master:false}
commit
startup
schema.xml,stopwords.txt




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Replication issue with version 0 index in SOLR 7.5

2019-06-25 Thread Mikhail Khludnev
Note, it seems like the current Solr's logic relies on persistent master
disks.
https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/handler/TestReplicationHandler.java#L615


On Tue, Jun 25, 2019 at 3:16 PM Mikhail Khludnev  wrote:

> Hello, Patrick.
> Can commit help you?
>
> On Tue, Jun 25, 2019 at 12:55 AM Patrick Bordelon <
> patrick.borde...@coxautoinc.com> wrote:
>
>> Hi,
>>
>> We recently upgraded to SOLR 7.5 in AWS, we had previously been running
>> SOLR
>> 6.5. In our current configuration we have our applications broken into a
>> single instance primary environment and a multi-instance replica
>> environment
>> separated behind a load balancer for each environment.
>>
>> Until recently we've been able to reload the primary without the replicas
>> updating until there was a full index. However when we upgraded to 7.5 we
>> started noticing that after terminating and rebuilding a primary instance
>> that the associated replicas would all start showing 0 documents in all
>> indexes. After some research we believe we've tracked down the issue.
>> SOLR-11293.
>>
>> SOLR-11293 changes
>> <
>> https://issues.apache.org/jira/browse/SOLR-11293?focusedCommentId=16182379=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16182379>
>>
>>
>> This fix changed the way the replication handler checks before updating a
>> replica when the primary has an empty index. Whether it's from deleting
>> the
>> old index or from terminating the instance.
>>
>> This is the code as it was in 6.5 replication handler
>>
>>   if (latestVersion == 0L) {
>> if (forceReplication && commit.getGeneration() != 0) {
>>   // since we won't get the files for an empty index,
>>   // we just clear ours and commit
>>   RefCounted iw =
>> solrCore.getUpdateHandler().getSolrCoreState().getIndexWriter(solrCore);
>>   try {
>> iw.get().deleteAll();
>>   } finally {
>> iw.decref();
>>   }
>>   SolrQueryRequest req = new LocalSolrQueryRequest(solrCore, new
>> ModifiableSolrParams());
>>   solrCore.getUpdateHandler().commit(new CommitUpdateCommand(req,
>> false));
>> }
>>
>>
>> Without forced replication the index on the replica won't perform the
>> deletaAll operation and will keep the old index until a new index version
>> is
>> created.
>>
>> However in 7.5 the code was changed to this.
>>
>>   if (latestVersion == 0L) {
>> if (commit.getGeneration() != 0) {
>>   // since we won't get the files for an empty index,
>>   // we just clear ours and commit
>>   log.info("New index in Master. Deleting mine...");
>>   RefCounted iw =
>> solrCore.getUpdateHandler().getSolrCoreState().getIndexWriter(solrCore);
>>   try {
>> iw.get().deleteAll();
>>   } finally {
>> iw.decref();
>>   }
>>   assert TestInjection.injectDelayBeforeSlaveCommitRefresh();
>>   if (skipCommitOnMasterVersionZero) {
>> openNewSearcherAndUpdateCommitPoint();
>>   } else {
>> SolrQueryRequest req = new LocalSolrQueryRequest(solrCore, new
>> ModifiableSolrParams());
>> solrCore.getUpdateHandler().commit(new
>> CommitUpdateCommand(req,
>> false));
>>   }
>> }
>>
>> With the removal of the forceReplication check we believe the replica
>> always
>> deletes it's index when it detects that a new version 0 index is created.
>>
>> This is a problem as we can't afford to have active replicas to have 0
>> documents on them in the event of a failure of the primary. Since we can't
>> control the termination on AWS instances this opens up a problem as any
>> primary outage has a chance of jeopardizing the replicas viability.
>>
>> Is there a way to restore this functionality in the current or future
>> releases? We are willing to upgrade to a later version including the
>> latest
>> if it will help resolve this problem.
>>
>> If you suggest we use a load balancer health check to prevent this we
>> already are. However the load balancer type we are using (application)
>> has a
>> feature that allows access through it when all instances under it are
>> failing. This bypasses our health check and still allows the replicas to
>> poll from the primary even when it's not fully loaded. We can't change
>> load
>> balancer types as there are other features that we are taking advantage of
>> and can't change currently.
>>
>>
>>
>>
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Replication issue with version 0 index in SOLR 7.5

2019-06-25 Thread Mikhail Khludnev
Hello, Patrick.
Can commit help you?

On Tue, Jun 25, 2019 at 12:55 AM Patrick Bordelon <
patrick.borde...@coxautoinc.com> wrote:

> Hi,
>
> We recently upgraded to SOLR 7.5 in AWS, we had previously been running
> SOLR
> 6.5. In our current configuration we have our applications broken into a
> single instance primary environment and a multi-instance replica
> environment
> separated behind a load balancer for each environment.
>
> Until recently we've been able to reload the primary without the replicas
> updating until there was a full index. However when we upgraded to 7.5 we
> started noticing that after terminating and rebuilding a primary instance
> that the associated replicas would all start showing 0 documents in all
> indexes. After some research we believe we've tracked down the issue.
> SOLR-11293.
>
> SOLR-11293 changes
> <
> https://issues.apache.org/jira/browse/SOLR-11293?focusedCommentId=16182379=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16182379>
>
>
> This fix changed the way the replication handler checks before updating a
> replica when the primary has an empty index. Whether it's from deleting the
> old index or from terminating the instance.
>
> This is the code as it was in 6.5 replication handler
>
>   if (latestVersion == 0L) {
> if (forceReplication && commit.getGeneration() != 0) {
>   // since we won't get the files for an empty index,
>   // we just clear ours and commit
>   RefCounted iw =
> solrCore.getUpdateHandler().getSolrCoreState().getIndexWriter(solrCore);
>   try {
> iw.get().deleteAll();
>   } finally {
> iw.decref();
>   }
>   SolrQueryRequest req = new LocalSolrQueryRequest(solrCore, new
> ModifiableSolrParams());
>   solrCore.getUpdateHandler().commit(new CommitUpdateCommand(req,
> false));
> }
>
>
> Without forced replication the index on the replica won't perform the
> deletaAll operation and will keep the old index until a new index version
> is
> created.
>
> However in 7.5 the code was changed to this.
>
>   if (latestVersion == 0L) {
> if (commit.getGeneration() != 0) {
>   // since we won't get the files for an empty index,
>   // we just clear ours and commit
>   log.info("New index in Master. Deleting mine...");
>   RefCounted iw =
> solrCore.getUpdateHandler().getSolrCoreState().getIndexWriter(solrCore);
>   try {
> iw.get().deleteAll();
>   } finally {
> iw.decref();
>   }
>   assert TestInjection.injectDelayBeforeSlaveCommitRefresh();
>   if (skipCommitOnMasterVersionZero) {
> openNewSearcherAndUpdateCommitPoint();
>   } else {
> SolrQueryRequest req = new LocalSolrQueryRequest(solrCore, new
> ModifiableSolrParams());
> solrCore.getUpdateHandler().commit(new CommitUpdateCommand(req,
> false));
>   }
> }
>
> With the removal of the forceReplication check we believe the replica
> always
> deletes it's index when it detects that a new version 0 index is created.
>
> This is a problem as we can't afford to have active replicas to have 0
> documents on them in the event of a failure of the primary. Since we can't
> control the termination on AWS instances this opens up a problem as any
> primary outage has a chance of jeopardizing the replicas viability.
>
> Is there a way to restore this functionality in the current or future
> releases? We are willing to upgrade to a later version including the latest
> if it will help resolve this problem.
>
> If you suggest we use a load balancer health check to prevent this we
> already are. However the load balancer type we are using (application) has
> a
> feature that allows access through it when all instances under it are
> failing. This bypasses our health check and still allows the replicas to
> poll from the primary even when it's not fully loaded. We can't change load
> balancer types as there are other features that we are taking advantage of
> and can't change currently.
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Replication error in SOLR-6.5.1

2018-09-26 Thread Erick Erickson
bq. In all my solr servers I have 40% free space

Well, clearly that's not enough if you're getting this error: "No
space left on device"

Solr/Lucene need _at least_ as much free space as the indexes occupy.
In some circumstances it can require more. It sounds like you're
having an issue with full replications all happening at the same time
and effectively at least doubling your index space requirements.

I'd fix that problem first, the other messages likely will go away.
Either get bigger disks, make your indexes smaller or move some of the
replicas to a new machine.

Best,
Erick
On Tue, Sep 25, 2018 at 10:20 PM SOLR4189  wrote:
>
> Hi all,
>
> I use SOLR-6.5.1. Before couple weeks I started to use replication feature
> in cloud mode without override default behavior of ReplicationHandler.
>
> After deployment replication feature to production, almost every day I hit
> these errors:
> SolrException: Unable to download  completely. Downloaded x!=y
> OR
> SolrException: Unable to download  completely. (Downloaded x of y
> bytes) No space left on device
> OR
> Error deleting file: 
> NoSuchFileException: /opt/solr//data/index./
>
> All these errors I get when replica in recovering mode, sometimes after
> physical machine failing or sometimes after simple solr restarting. Today I
> have only one solution for it: after 5th unsuccess replica recovering, I
> remove replica and add replica anew.
>
> In all my solr servers I have 40% free space, hard/soft commit is 5 minutes.
>
>
> What's wrong here and what can be done to correct these errors?
> Due to free space or commitReserveDuration parameter or something else?
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: /replication?command=details does not show infos for all replicas on the core

2018-07-02 Thread Arturas Mazeika
Hi Shawn,
hi Erick,
hi et al.,

Very nice clarifications indeed. I also looked at the index replication
section. In addition to the clarifications in this thread this brought
quite some light into the area (and shows that I need to read solrcloud
part of the manual more extensively). Thanks a lot indeed!

Cheers,
Arturas


On Fri, Jun 29, 2018 at 5:44 PM, Shawn Heisey  wrote:

> On 6/29/2018 8:47 AM, Arturas Mazeika wrote:
>
>> Out of curiosity: some cores give infos for both shards (through
>> replication query) and some only for one (if you still be able to see the
>> prev post). I wonder why..
>>
>
> Adding to what Erick said:
>
> If SolrCloud has initiated a replication on that core at some point since
> that Solr instance started, then you might see both the master and slave
> side of that replication reported by the replication handler.  If a
> replication has never been initiated, then you will only see info about the
> local core.
>
> The replication handler is used by SolrCloud for two things:
>
> 1) Index recovery when a replica gets too far out of sync.
> 2) Replicating data to TLOG and PULL replica types (new in 7.x).
>
> Thanks,
> Shawn
>
>


Re: /replication?command=details does not show infos for all replicas on the core

2018-06-29 Thread Shawn Heisey

On 6/29/2018 8:47 AM, Arturas Mazeika wrote:

Out of curiosity: some cores give infos for both shards (through
replication query) and some only for one (if you still be able to see the
prev post). I wonder why..


Adding to what Erick said:

If SolrCloud has initiated a replication on that core at some point 
since that Solr instance started, then you might see both the master and 
slave side of that replication reported by the replication handler.  If 
a replication has never been initiated, then you will only see info 
about the local core.


The replication handler is used by SolrCloud for two things:

1) Index recovery when a replica gets too far out of sync.
2) Replicating data to TLOG and PULL replica types (new in 7.x).

Thanks,
Shawn



Re: /replication?command=details does not show infos for all replicas on the core

2018-06-29 Thread Erick Erickson
Arturas:

Please make yourself a promise, "Only use the collections commands" ;)
At least for a while.

Trying to mix collection-level commands and core-level commands is
extremely confusing at the start. Under the covers, the Collections
API _uses_ the Core API, but in a very precise manner. Any seemingly
innocent mistake will be hard to untangle.

For your first question: "I wonder why the infos for the second
replica are not shown..." the answer is that you are using a
core-level API which does not "understand" anything about SolrCloud,
it's all purely local to that instance. So it's doing exactly what you
ask it to; reporting on the status of cores (replicas) _on that
particular Solr instance_. The _Collections_ API _is_ cloud/Zookeeper
aware and will report them all. What it does is fire the core-level
command out to all live Solr nodes and assemble the response into a
single cluster-wide report.

Second, the core-level "replication" command is all about old-style
master/slave index replication and I have no idea what it's reporting
on when you ask for replication status in SolrCloud. It has nothing to
do with, say, "replication factor" or anything else cloud related as
Shawn indicates. Old-style master/slave is used in SolrCloud under the
covers for "full sync", perhaps that's happened sometime (although
ideally it won't happen at all unless something goes wrong with normal
indexing and the only option is to copy the entire index from the
leader). The take-away is that the replication command is probably not
doing what you think it is.

Best,
Erick

On Fri, Jun 29, 2018 at 7:47 AM, Arturas Mazeika  wrote:
> Hi Shawn et al,
>
> Thanks a lot for the clarification. It makes a lot of sense and explains
> which functionality needs to be used to get the infos :-).
>
> Out of curiosity: some cores give infos for both shards (through
> replication query) and some only for one (if you still be able to see the
> prev post). I wonder why..
>
> Cheers,
> Arturas
>
> On Fri, Jun 29, 2018 at 4:30 PM, Shawn Heisey  wrote:
>
>> On 6/29/2018 7:53 AM, Arturas Mazeika wrote:
>>
>>> but the query reports infos on only one shard:
>>>
>>> F:\solr_server\solr-7.2.1>curl -s
>>> http://localhost:9996/solr/de_wiki_man/replication?command=details | grep
>>> "indexPath\|indexSize"
>>>  "indexSize":"15.04 GB",
>>>
>>> "indexPath":"F:\\solr_server\\solr-7.2.1\\example\\cloud\\no
>>> de4\\solr\\de_wiki_man_shard4_replica_n12\\data\\index/",
>>>
>>> I wonder why the infos for the second replica are not shown. Comments?
>>>
>>
>> SolrCloud is aware of (and uses) the replication feature, but the
>> replication feature is not cloud-aware.  It is a core-level feature (not a
>> cloud-specific feature) and is only aware of that one specific core (shard
>> replica).  This is not likely to ever change.
>>
>> Thanks,
>> Shawn
>>
>>


Re: /replication?command=details does not show infos for all replicas on the core

2018-06-29 Thread Arturas Mazeika
Hi Shawn et al,

Thanks a lot for the clarification. It makes a lot of sense and explains
which functionality needs to be used to get the infos :-).

Out of curiosity: some cores give infos for both shards (through
replication query) and some only for one (if you still be able to see the
prev post). I wonder why..

Cheers,
Arturas

On Fri, Jun 29, 2018 at 4:30 PM, Shawn Heisey  wrote:

> On 6/29/2018 7:53 AM, Arturas Mazeika wrote:
>
>> but the query reports infos on only one shard:
>>
>> F:\solr_server\solr-7.2.1>curl -s
>> http://localhost:9996/solr/de_wiki_man/replication?command=details | grep
>> "indexPath\|indexSize"
>>  "indexSize":"15.04 GB",
>>
>> "indexPath":"F:\\solr_server\\solr-7.2.1\\example\\cloud\\no
>> de4\\solr\\de_wiki_man_shard4_replica_n12\\data\\index/",
>>
>> I wonder why the infos for the second replica are not shown. Comments?
>>
>
> SolrCloud is aware of (and uses) the replication feature, but the
> replication feature is not cloud-aware.  It is a core-level feature (not a
> cloud-specific feature) and is only aware of that one specific core (shard
> replica).  This is not likely to ever change.
>
> Thanks,
> Shawn
>
>


Re: /replication?command=details does not show infos for all replicas on the core

2018-06-29 Thread Shawn Heisey

On 6/29/2018 7:53 AM, Arturas Mazeika wrote:

but the query reports infos on only one shard:

F:\solr_server\solr-7.2.1>curl -s
http://localhost:9996/solr/de_wiki_man/replication?command=details | grep
"indexPath\|indexSize"
 "indexSize":"15.04 GB",

"indexPath":"F:\\solr_server\\solr-7.2.1\\example\\cloud\\node4\\solr\\de_wiki_man_shard4_replica_n12\\data\\index/",

I wonder why the infos for the second replica are not shown. Comments?


SolrCloud is aware of (and uses) the replication feature, but the 
replication feature is not cloud-aware.  It is a core-level feature (not 
a cloud-specific feature) and is only aware of that one specific core 
(shard replica).  This is not likely to ever change.


Thanks,
Shawn



Re: replication

2018-04-13 Thread Shawn Heisey

On 4/10/2018 9:14 AM, Erick Erickson wrote:

The very first thing I'd do is set up a simple SolrCloud setup and
give it a spin. Unless your indexing load is quite heavy, the added
work the NRT replicas have in SolrCloud isn't a problem so worrying
about that is premature optimization unless you have a heavy load.







Also, you should understand something that has come to my attention
recently (and is backed up by documentation):  If the master does a soft
commit and the segment that was committed remains in memory (not flushed
to disk), that segment will NOT be replicated to the slaves.  It has to
get flushed to disk before it can be replicated.


Erick, I wasn't sure whether you caught onto that part of what I said.  
The "backed up by documentation" part is this:  The reference guide 
specifically says that TLOG and PULL replica types do not support NRT 
indexing (soft commit).  I was actually unaware of that limitation, but 
in hindsight it makes sense.


Thanks,
Shawn



Re: replication

2018-04-13 Thread John Blythe
great. thanks, erick!

--
John Blythe

On Wed, Apr 11, 2018 at 12:16 PM, Erick Erickson 
wrote:

> bq: are you simply flagging the fact that we wouldn't direct the queries
> to A
> v. B v. C since SolrCloud will make the decisions itself as to which part
> of the distro gets hit for the operation
>
> Yep. SolrCloud takes care of it all itself. I should also add that there
> are
> about a zillion metrics now available in Solr that you can use to make the
> best use of hardware, including things like CPU usage, I/O, GC etc.
> SolrCloud
> doesn't _yet_ make use of these but will in future. The current software LB
> does a pretty simple round-robin distribution.
>
> Best,
> Erick
>
> On Wed, Apr 11, 2018 at 5:57 AM, John Blythe  wrote:
> > thanks, erick. great info.
> >
> > although you can't (yet) direct queries to one or the other. So just
> making
> >> them all NRT and forgetting about it is reasonable.
> >
> >
> > are you simply flagging the fact that we wouldn't direct the queries to A
> > v. B v. C since SolrCloud will make the decisions itself as to which part
> > of the distro gets hit for the operation? if not, can you expound on
> this a
> > bit more?
> >
> > The very nature of merging is such that you will _always_ get large
> merges
> >> until you have 5G segments (by default)
> >
> >
> > bummer
> >
> > Quite possible, but you have to route things yourself. But in that case
> >> you're limited to one machine to handle all your NRT traffic. I skimmed
> >> your post so don't know whether your NRT traffic load is high enough to
> >> worry about.
> >
> >
> > ok. i think we'll take a two-pronged approach. for the immediate purposes
> > of trying to solve an issue we've begun encountering we will begin
> > thoroughtesting the load between various operations in the master-slave
> > setup we've set up. pending the results, we can roll forward w a
> temporary
> > patch in which all end-user touch points route through the primary box
> for
> > read/write while large scale operations/processing we do in the
> background
> > will point to the ELB the slaves are sitting behind. we'll also begin
> > setting up a simple solrcloud instance to toy with per your suggestion
> > above. inb4 tons more questions on my part :)
> >
> > thanks!
> >
> > --
> > John Blythe
> >
> > On Tue, Apr 10, 2018 at 11:14 AM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> >> bq: should we try to bite the solrcloud bullet and be done w it
> >>
> >> that's what I'd do. As of 7.0 there are different "flavors", TLOG,
> >> PULL and NRT so that's also a possibility, although you can't (yet)
> >> direct queries to one or the other. So just making them all NRT and
> >> forgetting about it is reasonable.
> >>
> >> bq:  is there some more config work we could put in place to avoid ...
> >> commit issue and the ultra large merge dangers
> >>
> >> No. The very nature of merging is such that you will _always_ get
> >> large merges until you have 5G segments (by default). The max segment
> >> size (outside "optimize/forceMerge/expungeDeletes" which you shouldn't
> >> do) is 5G so the steady-state worst-case segment pull is limited to
> >> that.
> >>
> >> bq: maybe for our initial need we use Master for writing and user
> >> access in NRT events, but slaves for the heavier backend
> >>
> >> Quite possible, but you have to route things yourself. But in that
> >> case you're limited to one machine to handle all your NRT traffic. I
> >> skimmed your post so don't know whether your NRT traffic load is high
> >> enough to worry about.
> >>
> >> The very first thing I'd do is set up a simple SolrCloud setup and
> >> give it a spin. Unless your indexing load is quite heavy, the added
> >> work the NRT replicas have in SolrCloud isn't a problem so worrying
> >> about that is premature optimization unless you have a heavy load.
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Apr 9, 2018 at 4:36 PM, John Blythe 
> wrote:
> >> > Thanks a bunch for the thorough reply, Shawn.
> >> >
> >> > Phew. We’d chosen to go w Master-slave replication instead of
> SolrCloud
> >> per
> >> > the sudden need we had encountered and the desire to avoid the nuances
> >> and
> >> > changes related to moving to SolrCloud. But so much for this being a
> more
> >> > straightforward solution, huh?
> >> >
> >> > Few questions:
> >> > - should we try to bite the solrcloud bullet and be done w it?
> >> > - is there some more config work we could put in place to avoid the
> soft
> >> > commit issue and the ultra large merge dangers, keeping the
> replications
> >> > happening quickly?
> >> > - maybe for our initial need we use Master for writing and user
> access in
> >> > NRT events, but slaves for the heavier backend processing. Thoughts?
> >> > - anyone do consulting on this that would be interested in chatting?
> >> >
> >> > Thanks again!
> >> >
> >> > On Mon, Apr 9, 2018 at 18:18 Shawn Heisey 

Re: replication

2018-04-11 Thread Erick Erickson
bq: are you simply flagging the fact that we wouldn't direct the queries to A
v. B v. C since SolrCloud will make the decisions itself as to which part
of the distro gets hit for the operation

Yep. SolrCloud takes care of it all itself. I should also add that there are
about a zillion metrics now available in Solr that you can use to make the
best use of hardware, including things like CPU usage, I/O, GC etc. SolrCloud
doesn't _yet_ make use of these but will in future. The current software LB
does a pretty simple round-robin distribution.

Best,
Erick

On Wed, Apr 11, 2018 at 5:57 AM, John Blythe  wrote:
> thanks, erick. great info.
>
> although you can't (yet) direct queries to one or the other. So just making
>> them all NRT and forgetting about it is reasonable.
>
>
> are you simply flagging the fact that we wouldn't direct the queries to A
> v. B v. C since SolrCloud will make the decisions itself as to which part
> of the distro gets hit for the operation? if not, can you expound on this a
> bit more?
>
> The very nature of merging is such that you will _always_ get large merges
>> until you have 5G segments (by default)
>
>
> bummer
>
> Quite possible, but you have to route things yourself. But in that case
>> you're limited to one machine to handle all your NRT traffic. I skimmed
>> your post so don't know whether your NRT traffic load is high enough to
>> worry about.
>
>
> ok. i think we'll take a two-pronged approach. for the immediate purposes
> of trying to solve an issue we've begun encountering we will begin
> thoroughtesting the load between various operations in the master-slave
> setup we've set up. pending the results, we can roll forward w a temporary
> patch in which all end-user touch points route through the primary box for
> read/write while large scale operations/processing we do in the background
> will point to the ELB the slaves are sitting behind. we'll also begin
> setting up a simple solrcloud instance to toy with per your suggestion
> above. inb4 tons more questions on my part :)
>
> thanks!
>
> --
> John Blythe
>
> On Tue, Apr 10, 2018 at 11:14 AM, Erick Erickson 
> wrote:
>
>> bq: should we try to bite the solrcloud bullet and be done w it
>>
>> that's what I'd do. As of 7.0 there are different "flavors", TLOG,
>> PULL and NRT so that's also a possibility, although you can't (yet)
>> direct queries to one or the other. So just making them all NRT and
>> forgetting about it is reasonable.
>>
>> bq:  is there some more config work we could put in place to avoid ...
>> commit issue and the ultra large merge dangers
>>
>> No. The very nature of merging is such that you will _always_ get
>> large merges until you have 5G segments (by default). The max segment
>> size (outside "optimize/forceMerge/expungeDeletes" which you shouldn't
>> do) is 5G so the steady-state worst-case segment pull is limited to
>> that.
>>
>> bq: maybe for our initial need we use Master for writing and user
>> access in NRT events, but slaves for the heavier backend
>>
>> Quite possible, but you have to route things yourself. But in that
>> case you're limited to one machine to handle all your NRT traffic. I
>> skimmed your post so don't know whether your NRT traffic load is high
>> enough to worry about.
>>
>> The very first thing I'd do is set up a simple SolrCloud setup and
>> give it a spin. Unless your indexing load is quite heavy, the added
>> work the NRT replicas have in SolrCloud isn't a problem so worrying
>> about that is premature optimization unless you have a heavy load.
>>
>> Best,
>> Erick
>>
>> On Mon, Apr 9, 2018 at 4:36 PM, John Blythe  wrote:
>> > Thanks a bunch for the thorough reply, Shawn.
>> >
>> > Phew. We’d chosen to go w Master-slave replication instead of SolrCloud
>> per
>> > the sudden need we had encountered and the desire to avoid the nuances
>> and
>> > changes related to moving to SolrCloud. But so much for this being a more
>> > straightforward solution, huh?
>> >
>> > Few questions:
>> > - should we try to bite the solrcloud bullet and be done w it?
>> > - is there some more config work we could put in place to avoid the soft
>> > commit issue and the ultra large merge dangers, keeping the replications
>> > happening quickly?
>> > - maybe for our initial need we use Master for writing and user access in
>> > NRT events, but slaves for the heavier backend processing. Thoughts?
>> > - anyone do consulting on this that would be interested in chatting?
>> >
>> > Thanks again!
>> >
>> > On Mon, Apr 9, 2018 at 18:18 Shawn Heisey  wrote:
>> >
>> >> On 4/9/2018 12:15 PM, John Blythe wrote:
>> >> > we're starting to dive into master/slave replication architecture.
>> we'll
>> >> > have 1 master w 4 slaves behind it. our app is NRT. if user performs
>> an
>> >> > action in section A's data they may choose to jump to section B which
>> >> will
>> >> > be dependent on having the updates 

Re: replication

2018-04-11 Thread John Blythe
thanks, erick. great info.

although you can't (yet) direct queries to one or the other. So just making
> them all NRT and forgetting about it is reasonable.


are you simply flagging the fact that we wouldn't direct the queries to A
v. B v. C since SolrCloud will make the decisions itself as to which part
of the distro gets hit for the operation? if not, can you expound on this a
bit more?

The very nature of merging is such that you will _always_ get large merges
> until you have 5G segments (by default)


bummer

Quite possible, but you have to route things yourself. But in that case
> you're limited to one machine to handle all your NRT traffic. I skimmed
> your post so don't know whether your NRT traffic load is high enough to
> worry about.


ok. i think we'll take a two-pronged approach. for the immediate purposes
of trying to solve an issue we've begun encountering we will begin
thoroughtesting the load between various operations in the master-slave
setup we've set up. pending the results, we can roll forward w a temporary
patch in which all end-user touch points route through the primary box for
read/write while large scale operations/processing we do in the background
will point to the ELB the slaves are sitting behind. we'll also begin
setting up a simple solrcloud instance to toy with per your suggestion
above. inb4 tons more questions on my part :)

thanks!

--
John Blythe

On Tue, Apr 10, 2018 at 11:14 AM, Erick Erickson 
wrote:

> bq: should we try to bite the solrcloud bullet and be done w it
>
> that's what I'd do. As of 7.0 there are different "flavors", TLOG,
> PULL and NRT so that's also a possibility, although you can't (yet)
> direct queries to one or the other. So just making them all NRT and
> forgetting about it is reasonable.
>
> bq:  is there some more config work we could put in place to avoid ...
> commit issue and the ultra large merge dangers
>
> No. The very nature of merging is such that you will _always_ get
> large merges until you have 5G segments (by default). The max segment
> size (outside "optimize/forceMerge/expungeDeletes" which you shouldn't
> do) is 5G so the steady-state worst-case segment pull is limited to
> that.
>
> bq: maybe for our initial need we use Master for writing and user
> access in NRT events, but slaves for the heavier backend
>
> Quite possible, but you have to route things yourself. But in that
> case you're limited to one machine to handle all your NRT traffic. I
> skimmed your post so don't know whether your NRT traffic load is high
> enough to worry about.
>
> The very first thing I'd do is set up a simple SolrCloud setup and
> give it a spin. Unless your indexing load is quite heavy, the added
> work the NRT replicas have in SolrCloud isn't a problem so worrying
> about that is premature optimization unless you have a heavy load.
>
> Best,
> Erick
>
> On Mon, Apr 9, 2018 at 4:36 PM, John Blythe  wrote:
> > Thanks a bunch for the thorough reply, Shawn.
> >
> > Phew. We’d chosen to go w Master-slave replication instead of SolrCloud
> per
> > the sudden need we had encountered and the desire to avoid the nuances
> and
> > changes related to moving to SolrCloud. But so much for this being a more
> > straightforward solution, huh?
> >
> > Few questions:
> > - should we try to bite the solrcloud bullet and be done w it?
> > - is there some more config work we could put in place to avoid the soft
> > commit issue and the ultra large merge dangers, keeping the replications
> > happening quickly?
> > - maybe for our initial need we use Master for writing and user access in
> > NRT events, but slaves for the heavier backend processing. Thoughts?
> > - anyone do consulting on this that would be interested in chatting?
> >
> > Thanks again!
> >
> > On Mon, Apr 9, 2018 at 18:18 Shawn Heisey  wrote:
> >
> >> On 4/9/2018 12:15 PM, John Blythe wrote:
> >> > we're starting to dive into master/slave replication architecture.
> we'll
> >> > have 1 master w 4 slaves behind it. our app is NRT. if user performs
> an
> >> > action in section A's data they may choose to jump to section B which
> >> will
> >> > be dependent on having the updates from their action in section A. as
> >> such,
> >> > we're thinking that the replication time should be set to 1-2s (the
> >> chances
> >> > of them arriving at section B quickly enough to catch the 2s gap is
> >> highly
> >> > unlikely at best).
> >>
> >> Once you start talking about master-slave replication, my assumption is
> >> that you're not running SolrCloud.  You would NOT want to try and mix
> >> SolrCloud with replication.  The features do not play well together.
> >> SolrCloud with NRT replicas (this is the only replica type that exists
> >> in 6.x and earlier) may be a better option than master-slave
> replication.
> >>
> >> > since the replicas will simply be looking for new files it seems like
> >> this
> >> > would be a lightweight 

Re: replication

2018-04-10 Thread Erick Erickson
bq: should we try to bite the solrcloud bullet and be done w it

that's what I'd do. As of 7.0 there are different "flavors", TLOG,
PULL and NRT so that's also a possibility, although you can't (yet)
direct queries to one or the other. So just making them all NRT and
forgetting about it is reasonable.

bq:  is there some more config work we could put in place to avoid ...
commit issue and the ultra large merge dangers

No. The very nature of merging is such that you will _always_ get
large merges until you have 5G segments (by default). The max segment
size (outside "optimize/forceMerge/expungeDeletes" which you shouldn't
do) is 5G so the steady-state worst-case segment pull is limited to
that.

bq: maybe for our initial need we use Master for writing and user
access in NRT events, but slaves for the heavier backend

Quite possible, but you have to route things yourself. But in that
case you're limited to one machine to handle all your NRT traffic. I
skimmed your post so don't know whether your NRT traffic load is high
enough to worry about.

The very first thing I'd do is set up a simple SolrCloud setup and
give it a spin. Unless your indexing load is quite heavy, the added
work the NRT replicas have in SolrCloud isn't a problem so worrying
about that is premature optimization unless you have a heavy load.

Best,
Erick

On Mon, Apr 9, 2018 at 4:36 PM, John Blythe  wrote:
> Thanks a bunch for the thorough reply, Shawn.
>
> Phew. We’d chosen to go w Master-slave replication instead of SolrCloud per
> the sudden need we had encountered and the desire to avoid the nuances and
> changes related to moving to SolrCloud. But so much for this being a more
> straightforward solution, huh?
>
> Few questions:
> - should we try to bite the solrcloud bullet and be done w it?
> - is there some more config work we could put in place to avoid the soft
> commit issue and the ultra large merge dangers, keeping the replications
> happening quickly?
> - maybe for our initial need we use Master for writing and user access in
> NRT events, but slaves for the heavier backend processing. Thoughts?
> - anyone do consulting on this that would be interested in chatting?
>
> Thanks again!
>
> On Mon, Apr 9, 2018 at 18:18 Shawn Heisey  wrote:
>
>> On 4/9/2018 12:15 PM, John Blythe wrote:
>> > we're starting to dive into master/slave replication architecture. we'll
>> > have 1 master w 4 slaves behind it. our app is NRT. if user performs an
>> > action in section A's data they may choose to jump to section B which
>> will
>> > be dependent on having the updates from their action in section A. as
>> such,
>> > we're thinking that the replication time should be set to 1-2s (the
>> chances
>> > of them arriving at section B quickly enough to catch the 2s gap is
>> highly
>> > unlikely at best).
>>
>> Once you start talking about master-slave replication, my assumption is
>> that you're not running SolrCloud.  You would NOT want to try and mix
>> SolrCloud with replication.  The features do not play well together.
>> SolrCloud with NRT replicas (this is the only replica type that exists
>> in 6.x and earlier) may be a better option than master-slave replication.
>>
>> > since the replicas will simply be looking for new files it seems like
>> this
>> > would be a lightweight operation even every couple seconds for 4
>> replicas.
>> > that said, i'm going *entirely* off of assumption at this point and
>> wanted
>> > to check in w you all to see any nuances, gotchas, hidden landmines, etc.
>> > that we should be considering before rolling things out.
>>
>> Most of the time, you'd be correct to think that indexing is going to
>> create a new small segment and replication will have little work to do.
>> But as you create more and more segments, eventually Lucene is going to
>> start merging those segments.  For discussion purposes, I'm going to
>> describe a situation where each new segment during indexing is about
>> 100KB in size, and the merge policy is left at the default settings.
>> I'm also going to assume that no documents are getting deleted or
>> reindexed (which will delete the old version).  Deleted documents can
>> have an impact on merging, but it will usually only be a dramatic impact
>> if there are a LOT of deleted documents.
>>
>> The first ten segments created will be this 100KB size.  Then Lucene is
>> going to see that there are enough segments to trigger the merge policy
>> - it's going to combine ten of those segments into one that's
>> approximately one megabyte.  Repeat this ten times, and ten of those 1
>> megabyte segments will be combined into one ten megabyte segment.
>> Repeat all of THAT ten times, and there will be a 100 megabyte segment.
>> And there will eventually be another level creating 1 gigabyte
>> segments.  If the index is below 5GB in size, the entire thing *could*
>> be merged into one segment by this process.
>>
>> The end result of all this:  

Re: replication

2018-04-09 Thread John Blythe
Thanks a bunch for the thorough reply, Shawn.

Phew. We’d chosen to go w Master-slave replication instead of SolrCloud per
the sudden need we had encountered and the desire to avoid the nuances and
changes related to moving to SolrCloud. But so much for this being a more
straightforward solution, huh?

Few questions:
- should we try to bite the solrcloud bullet and be done w it?
- is there some more config work we could put in place to avoid the soft
commit issue and the ultra large merge dangers, keeping the replications
happening quickly?
- maybe for our initial need we use Master for writing and user access in
NRT events, but slaves for the heavier backend processing. Thoughts?
- anyone do consulting on this that would be interested in chatting?

Thanks again!

On Mon, Apr 9, 2018 at 18:18 Shawn Heisey  wrote:

> On 4/9/2018 12:15 PM, John Blythe wrote:
> > we're starting to dive into master/slave replication architecture. we'll
> > have 1 master w 4 slaves behind it. our app is NRT. if user performs an
> > action in section A's data they may choose to jump to section B which
> will
> > be dependent on having the updates from their action in section A. as
> such,
> > we're thinking that the replication time should be set to 1-2s (the
> chances
> > of them arriving at section B quickly enough to catch the 2s gap is
> highly
> > unlikely at best).
>
> Once you start talking about master-slave replication, my assumption is
> that you're not running SolrCloud.  You would NOT want to try and mix
> SolrCloud with replication.  The features do not play well together.
> SolrCloud with NRT replicas (this is the only replica type that exists
> in 6.x and earlier) may be a better option than master-slave replication.
>
> > since the replicas will simply be looking for new files it seems like
> this
> > would be a lightweight operation even every couple seconds for 4
> replicas.
> > that said, i'm going *entirely* off of assumption at this point and
> wanted
> > to check in w you all to see any nuances, gotchas, hidden landmines, etc.
> > that we should be considering before rolling things out.
>
> Most of the time, you'd be correct to think that indexing is going to
> create a new small segment and replication will have little work to do.
> But as you create more and more segments, eventually Lucene is going to
> start merging those segments.  For discussion purposes, I'm going to
> describe a situation where each new segment during indexing is about
> 100KB in size, and the merge policy is left at the default settings.
> I'm also going to assume that no documents are getting deleted or
> reindexed (which will delete the old version).  Deleted documents can
> have an impact on merging, but it will usually only be a dramatic impact
> if there are a LOT of deleted documents.
>
> The first ten segments created will be this 100KB size.  Then Lucene is
> going to see that there are enough segments to trigger the merge policy
> - it's going to combine ten of those segments into one that's
> approximately one megabyte.  Repeat this ten times, and ten of those 1
> megabyte segments will be combined into one ten megabyte segment.
> Repeat all of THAT ten times, and there will be a 100 megabyte segment.
> And there will eventually be another level creating 1 gigabyte
> segments.  If the index is below 5GB in size, the entire thing *could*
> be merged into one segment by this process.
>
> The end result of all this:  Replication is not always going to be
> super-quick.  If merging creates a 1 gigabyte segment, then the amount
> of time to transfer that new segment is going to depend on how fast your
> disks are, and how fast your network is.  If you're using commodity SATA
> drives in the 4 to 10 terabyte range and a gigabit network, the network
> is probably going to be the bottleneck -- assuming that the system has
> plenty of memory and isn't under a high load.  If the network is the
> bottleneck in that situation, it's probably going to take close to ten
> seconds to transfer a 1GB segment, and the greater part of a minute to
> transfer a 5GB segment, which is the biggest one that the default merge
> policy configuration will create without an optimize operation.
>
> Also, you should understand something that has come to my attention
> recently (and is backed up by documentation):  If the master does a soft
> commit and the segment that was committed remains in memory (not flushed
> to disk), that segment will NOT be replicated to the slaves.  It has to
> get flushed to disk before it can be replicated.
>
> Thanks,
> Shawn
>
> --
John Blythe


Re: replication

2018-04-09 Thread Shawn Heisey
On 4/9/2018 12:15 PM, John Blythe wrote:
> we're starting to dive into master/slave replication architecture. we'll
> have 1 master w 4 slaves behind it. our app is NRT. if user performs an
> action in section A's data they may choose to jump to section B which will
> be dependent on having the updates from their action in section A. as such,
> we're thinking that the replication time should be set to 1-2s (the chances
> of them arriving at section B quickly enough to catch the 2s gap is highly
> unlikely at best).

Once you start talking about master-slave replication, my assumption is
that you're not running SolrCloud.  You would NOT want to try and mix
SolrCloud with replication.  The features do not play well together. 
SolrCloud with NRT replicas (this is the only replica type that exists
in 6.x and earlier) may be a better option than master-slave replication.

> since the replicas will simply be looking for new files it seems like this
> would be a lightweight operation even every couple seconds for 4 replicas.
> that said, i'm going *entirely* off of assumption at this point and wanted
> to check in w you all to see any nuances, gotchas, hidden landmines, etc.
> that we should be considering before rolling things out.

Most of the time, you'd be correct to think that indexing is going to
create a new small segment and replication will have little work to do. 
But as you create more and more segments, eventually Lucene is going to
start merging those segments.  For discussion purposes, I'm going to
describe a situation where each new segment during indexing is about
100KB in size, and the merge policy is left at the default settings. 
I'm also going to assume that no documents are getting deleted or
reindexed (which will delete the old version).  Deleted documents can
have an impact on merging, but it will usually only be a dramatic impact
if there are a LOT of deleted documents.

The first ten segments created will be this 100KB size.  Then Lucene is
going to see that there are enough segments to trigger the merge policy
- it's going to combine ten of those segments into one that's
approximately one megabyte.  Repeat this ten times, and ten of those 1
megabyte segments will be combined into one ten megabyte segment. 
Repeat all of THAT ten times, and there will be a 100 megabyte segment. 
And there will eventually be another level creating 1 gigabyte
segments.  If the index is below 5GB in size, the entire thing *could*
be merged into one segment by this process.

The end result of all this:  Replication is not always going to be
super-quick.  If merging creates a 1 gigabyte segment, then the amount
of time to transfer that new segment is going to depend on how fast your
disks are, and how fast your network is.  If you're using commodity SATA
drives in the 4 to 10 terabyte range and a gigabit network, the network
is probably going to be the bottleneck -- assuming that the system has
plenty of memory and isn't under a high load.  If the network is the
bottleneck in that situation, it's probably going to take close to ten
seconds to transfer a 1GB segment, and the greater part of a minute to
transfer a 5GB segment, which is the biggest one that the default merge
policy configuration will create without an optimize operation.

Also, you should understand something that has come to my attention
recently (and is backed up by documentation):  If the master does a soft
commit and the segment that was committed remains in memory (not flushed
to disk), that segment will NOT be replicated to the slaves.  It has to
get flushed to disk before it can be replicated.

Thanks,
Shawn



Re: Replication in Master Slave Solr setup

2018-03-19 Thread Erick Erickson
The OP was making an invalid assumption I think, that the index would
replicate _whenever_ the index changed. But that's not necessarily
true, although it's the most common (and default) case.

>From the ref guiide:

'If you use "startup", you need to have a "commit" and/or "optimize"
entry also if you want to trigger replication on future commits or
optimizes.'

So you can set up your master/slave installation to replicate on
master startup but never again automatically _unless_ replicateAfter
is also set to commit or optimize too. You could then explicitly send
a replication API "fetchIndex" command if you wanted total control of
when replications happened.

For instance, imagine a setup where you updated your master index over
the course of a day, but only wanted the results available for
yesterday on the slaves. Setting your poll interval wouldn't help
because that timer starts whenever you start your slave. Set
replicateAfter to onStartup then at midnight each night use the
replication fetchIndex API call on each of the slaves.

Somewhat of a corner case, but possible.

Best,
Erick


On Sun, Mar 18, 2018 at 10:06 PM, Shawn Heisey  wrote:
> On 3/17/2018 8:06 PM, vracks wrote:
>>
>> Basic Questions about the Replication in Master Slave Solr Setup.
>>
>> 1) Can Master push the changes to Slaves using the replication handler
>
>
> Replication is always pull -- the slave asks the master if there's anything
> to copy.
>
>> 2) If the Answer to the above question is no, then what is use of having
>> the
>> option of replicateAfter in the replicationHandler, since only the Slave
>> is
>> going to poll the master at a particular interval.
>
>
> The replicateAfter options control when the master will tell a polling slave
> that there is a change.
>
>> If the answer to the above question is yes, then wanted to know the master
>> knows about the Slave instances to which to push the changes.
>
>
> The master does not know about slaves until they connect. It does not push
> changes.
>
> Thanks,
> Shawn
>


Re: Replication in Master Slave Solr setup

2018-03-18 Thread Shawn Heisey

On 3/17/2018 8:06 PM, vracks wrote:

Basic Questions about the Replication in Master Slave Solr Setup.

1) Can Master push the changes to Slaves using the replication handler


Replication is always pull -- the slave asks the master if there's 
anything to copy.



2) If the Answer to the above question is no, then what is use of having the
option of replicateAfter in the replicationHandler, since only the Slave is
going to poll the master at a particular interval.


The replicateAfter options control when the master will tell a polling 
slave that there is a change.



If the answer to the above question is yes, then wanted to know the master
knows about the Slave instances to which to push the changes.


The master does not know about slaves until they connect. It does not 
push changes.


Thanks,
Shawn



Re: Replication Factor Bug in Collections Restore API?

2018-01-05 Thread Ansgar Wiechers
On 2018-01-04 Shalin Shekhar Mangar wrote:
> Sounds like a bug. Can you please open a Jira issue?

https://issues.apache.org/jira/browse/SOLR-11823

Regards
Ansgar Wiechers


Re: Replication Factor Bug in Collections Restore API?

2018-01-04 Thread Shalin Shekhar Mangar
Sounds like a bug. Can you please open a Jira issue?

On Thu, Jan 4, 2018 at 8:37 PM, Ansgar Wiechers
 wrote:
> Hi all.
>
> I'm running Solr 7.1 in SolrCloud mode ona a 3-node cluster and tried
> using the backup/restore API for the first time. Backup worked fine, but
> when trying to restore the backed-up collection I ran into an unexpected
> problem with the replication factor setting.
>
> Below command attempts to restore a backup of the collection "demo" with
> 3 shards, creating 2 replicas per shard:
>
> # curl -s -k 
> 'https://localhost:8983/solr/admin/collections?action=restore=demo=/srv/backup/solr/solr-dev=demo=2=2'
> {
>   "error": {
> "code": 400,
> "msg": "Solr cloud with available number of nodes:3 is insufficient for 
> restoring a collection with 3 shards, total replicas per shard 6 and 
> maxShardsPerNode 2. Consider increasing maxShardsPerNode value OR number of 
> available nodes.",
> "metadata": [
>   "error-class",
>   "org.apache.solr.common.SolrException",
>   "root-error-class",
>   "org.apache.solr.common.SolrException"
> ]
>   },
>   "exception": {
> "rspCode": 400,
> "msg": "Solr cloud with available number of nodes:3 is insufficient for 
> restoring a collection with 3 shards, total replicas per shard 6 and 
> maxShardsPerNode 2. Consider increasing maxShardsPerNode value OR number of 
> available nodes."
>   },
>   "Operation restore caused exception:": 
> "org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
> Solr cloud with available number of nodes:3 is insufficient for restoring a 
> collection with 3 shards, total replicas per shard 6 and maxShardsPerNode 2. 
> Consider increasing maxShardsPerNode value OR number of available nodes.",
>   "responseHeader": {
> "QTime": 28,
> "status": 400
>   }
> }
>
> It looks to me like the restore API multiplies the replication factor
> with the number of nodes, which is not how the replication factor
> behaves in other contexts. The documentation[1] also didn't lead me to
> expect this behavior:
>
>> replicationFactor
>>
>>The number of replicas to be created for each shard.
>
> Is this expected behavior (by anyone but me)?
> Should I report it as a bug?
>
> [1]: https://lucene.apache.org/solr/guide/7_1/collections-api.html
>
> Regards
> Ansgar Wiechers
> --
> "Abstractions save us time working, but they don't save us time learning."
> --Joel Spolsky



-- 
Regards,
Shalin Shekhar Mangar.


Re: Replication on startup takes a long time

2017-09-25 Thread Erick Erickson
Emir:

OK, thanks for pointing that out, that relieves me a lot!

Erick

On Mon, Sep 25, 2017 at 1:03 AM, Emir Arnautović
 wrote:
> Hi Eric,
> I don’t think that there are some bugs with searcher reopening - this is a 
> scenario with a new slave:
>
> “But when I add a *new* slave pointing to the master…”
>
> So expected to have zero results until replication finishes.
>
> Regards,
> Emir
>
>> On 23 Sep 2017, at 19:21, Erick Erickson  wrote:
>>
>> First I'd like to say that I wish more people would take the time like
>> you have to fully describe the problem and your observations, it makes
>> it s much nicer than having half-a-dozen back and forths! Thanks!
>>
>> Just so it doesn't get buried in the rest of the response, I do tend
>> to go on I suspect you have a suggester configured. The
>> index-based suggesters read through your _entire_ index, all the
>> stored fields from all the documents and process them into an FST or
>> "sidecar" index. See:
>> https://lucidworks.com/2015/03/04/solr-suggester/. If this is true
>> they might be being built on the slaves whenever a replication
>> happens. Hmmm, if this is true, let us know. You can tell by removing
>> the suggester from the config and timing again. It seems like in the
>> master/slave config we should copy these down but don't know if it's
>> been tested.
>>
>> If they are being built on the slaves, you might try commenting out
>> all of the buildOn bits on the slave configurations. Frankly I
>> don't know if building the suggester structures on the master would
>> propagate them to the slave correctly if the slave doesn't build them,
>> but it would certainly be a fat clue if it changed the load time on
>> the slaves and we could look some more at options.
>>
>> Observation 1: Allocating 40G of memory for an index only 12G seems
>> like overkill. This isn't the root of your problem, but a 12G index
>> shouldn't need near 40G of JVM. In fact, due to MMapDirectory being
>> used (see Uwe Schindler's blog here:
>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html)
>> I'd guess you can get away with MUCH less memory, maybe as low as 8G
>> or so. The wildcard here would be the size of your caches, especially
>> your filterCache configured in solrconfig.xml. Like I mentioned, this
>> isn't the root of your replication issue, just sayin'.
>>
>> Observation 2: Hard commits (the  setting is not a very
>> expensive operation with openSearcher=false. Again this isn't the root
>> of your problem but consider removing the number of docs limitation
>> and just making it time-based, say every minute. Long blog on the
>> topic here: 
>> https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/.
>> You might be accumulating pretty large transaction logs (assuming you
>> haven't disabled them) to no good purpose. Given your observation that
>> the actual transmission of the index takes 2 minutes, this is probably
>> not something to worry about much, but is worth checking.
>>
>> Question 1:
>>
>> Solr should be doing nothing other than opening a new searcher, which
>> should be roughly the "autowarm" time on master plus (perhaps)
>> suggester build. Your observation that autowarming takes quite a bit
>> of time (evidenced by much shorter times when you set the counts to
>> zero) is a smoking gun that you're probably doing far too much
>> autowarming. HOWEVER, during this interval the replica should be
>> serving queries from the old searcher so something else is going on
>> here. Autowarming is actually pretty simple, perhaps this will help
>> you to keep in mind while tuning:
>>
>> The queryResultCache and filterCache are essentially maps where the
>> key is just the text of the clause (simplifying here). So for the
>> queryResultCache the key is the entire search request. For the
>> filterCache, the key is just the "fq" clause. autowarm count in each
>> just means the number of keys that are replayed when a new searcher is
>> opened. I usually start with a pretty small number, on the order of
>> 10-20. The purpose of them is just to keep from experiencing a delay
>> when the first few searches are performed after a searcher is opened.
>>
>> My bet: you won't notice a measurable difference when dropping the
>> atuowarm counts drastically in terms of query response, but you will
>> save the startup time. I also suspect you can reduce the size of the
>> caches drastically, but don't know what you have them set to, it's a
>> guess.
>>
>> As to what's happening such that you serve queries with zero counts,
>> my best guess at this point is that you are rebuilding
>> autosuggesters. We shouldn't be serving queries from the new
>> searcher during this interval, if confirmed we need to raise a JIRA.
>>
>> Question 2: see above, autosuggester?
>>
>> Question 3a: documents should become searchable on the slave when 1>
>> all the segments 

Re: Replication on startup takes a long time

2017-09-25 Thread Emir Arnautović
Hi Eric,
I don’t think that there are some bugs with searcher reopening - this is a 
scenario with a new slave:

“But when I add a *new* slave pointing to the master…”

So expected to have zero results until replication finishes.

Regards,
Emir

> On 23 Sep 2017, at 19:21, Erick Erickson  wrote:
> 
> First I'd like to say that I wish more people would take the time like
> you have to fully describe the problem and your observations, it makes
> it s much nicer than having half-a-dozen back and forths! Thanks!
> 
> Just so it doesn't get buried in the rest of the response, I do tend
> to go on I suspect you have a suggester configured. The
> index-based suggesters read through your _entire_ index, all the
> stored fields from all the documents and process them into an FST or
> "sidecar" index. See:
> https://lucidworks.com/2015/03/04/solr-suggester/. If this is true
> they might be being built on the slaves whenever a replication
> happens. Hmmm, if this is true, let us know. You can tell by removing
> the suggester from the config and timing again. It seems like in the
> master/slave config we should copy these down but don't know if it's
> been tested.
> 
> If they are being built on the slaves, you might try commenting out
> all of the buildOn bits on the slave configurations. Frankly I
> don't know if building the suggester structures on the master would
> propagate them to the slave correctly if the slave doesn't build them,
> but it would certainly be a fat clue if it changed the load time on
> the slaves and we could look some more at options.
> 
> Observation 1: Allocating 40G of memory for an index only 12G seems
> like overkill. This isn't the root of your problem, but a 12G index
> shouldn't need near 40G of JVM. In fact, due to MMapDirectory being
> used (see Uwe Schindler's blog here:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html)
> I'd guess you can get away with MUCH less memory, maybe as low as 8G
> or so. The wildcard here would be the size of your caches, especially
> your filterCache configured in solrconfig.xml. Like I mentioned, this
> isn't the root of your replication issue, just sayin'.
> 
> Observation 2: Hard commits (the  setting is not a very
> expensive operation with openSearcher=false. Again this isn't the root
> of your problem but consider removing the number of docs limitation
> and just making it time-based, say every minute. Long blog on the
> topic here: 
> https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/.
> You might be accumulating pretty large transaction logs (assuming you
> haven't disabled them) to no good purpose. Given your observation that
> the actual transmission of the index takes 2 minutes, this is probably
> not something to worry about much, but is worth checking.
> 
> Question 1:
> 
> Solr should be doing nothing other than opening a new searcher, which
> should be roughly the "autowarm" time on master plus (perhaps)
> suggester build. Your observation that autowarming takes quite a bit
> of time (evidenced by much shorter times when you set the counts to
> zero) is a smoking gun that you're probably doing far too much
> autowarming. HOWEVER, during this interval the replica should be
> serving queries from the old searcher so something else is going on
> here. Autowarming is actually pretty simple, perhaps this will help
> you to keep in mind while tuning:
> 
> The queryResultCache and filterCache are essentially maps where the
> key is just the text of the clause (simplifying here). So for the
> queryResultCache the key is the entire search request. For the
> filterCache, the key is just the "fq" clause. autowarm count in each
> just means the number of keys that are replayed when a new searcher is
> opened. I usually start with a pretty small number, on the order of
> 10-20. The purpose of them is just to keep from experiencing a delay
> when the first few searches are performed after a searcher is opened.
> 
> My bet: you won't notice a measurable difference when dropping the
> atuowarm counts drastically in terms of query response, but you will
> save the startup time. I also suspect you can reduce the size of the
> caches drastically, but don't know what you have them set to, it's a
> guess.
> 
> As to what's happening such that you serve queries with zero counts,
> my best guess at this point is that you are rebuilding
> autosuggesters. We shouldn't be serving queries from the new
> searcher during this interval, if confirmed we need to raise a JIRA.
> 
> Question 2: see above, autosuggester?
> 
> Question 3a: documents should become searchable on the slave when 1>
> all the segments are copied, 2> autowarm is completed. As above, the
> fact that you get 0-hit responses isn't what _should_ be happening.
> 
> Autocommit settings are pretty irrelevant on the slave.
> 
> Question 3b: soft commit on the master shouldn't affect the 

Re: Replication on startup takes a long time

2017-09-23 Thread Erick Erickson
First I'd like to say that I wish more people would take the time like
you have to fully describe the problem and your observations, it makes
it s much nicer than having half-a-dozen back and forths! Thanks!

Just so it doesn't get buried in the rest of the response, I do tend
to go on I suspect you have a suggester configured. The
index-based suggesters read through your _entire_ index, all the
stored fields from all the documents and process them into an FST or
"sidecar" index. See:
https://lucidworks.com/2015/03/04/solr-suggester/. If this is true
they might be being built on the slaves whenever a replication
happens. Hmmm, if this is true, let us know. You can tell by removing
the suggester from the config and timing again. It seems like in the
master/slave config we should copy these down but don't know if it's
been tested.

If they are being built on the slaves, you might try commenting out
all of the buildOn bits on the slave configurations. Frankly I
don't know if building the suggester structures on the master would
propagate them to the slave correctly if the slave doesn't build them,
but it would certainly be a fat clue if it changed the load time on
the slaves and we could look some more at options.

Observation 1: Allocating 40G of memory for an index only 12G seems
like overkill. This isn't the root of your problem, but a 12G index
shouldn't need near 40G of JVM. In fact, due to MMapDirectory being
used (see Uwe Schindler's blog here:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html)
I'd guess you can get away with MUCH less memory, maybe as low as 8G
or so. The wildcard here would be the size of your caches, especially
your filterCache configured in solrconfig.xml. Like I mentioned, this
isn't the root of your replication issue, just sayin'.

Observation 2: Hard commits (the  setting is not a very
expensive operation with openSearcher=false. Again this isn't the root
of your problem but consider removing the number of docs limitation
and just making it time-based, say every minute. Long blog on the
topic here: 
https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/.
You might be accumulating pretty large transaction logs (assuming you
haven't disabled them) to no good purpose. Given your observation that
the actual transmission of the index takes 2 minutes, this is probably
not something to worry about much, but is worth checking.

Question 1:

Solr should be doing nothing other than opening a new searcher, which
should be roughly the "autowarm" time on master plus (perhaps)
suggester build. Your observation that autowarming takes quite a bit
of time (evidenced by much shorter times when you set the counts to
zero) is a smoking gun that you're probably doing far too much
autowarming. HOWEVER, during this interval the replica should be
serving queries from the old searcher so something else is going on
here. Autowarming is actually pretty simple, perhaps this will help
you to keep in mind while tuning:

The queryResultCache and filterCache are essentially maps where the
key is just the text of the clause (simplifying here). So for the
queryResultCache the key is the entire search request. For the
filterCache, the key is just the "fq" clause. autowarm count in each
just means the number of keys that are replayed when a new searcher is
opened. I usually start with a pretty small number, on the order of
10-20. The purpose of them is just to keep from experiencing a delay
when the first few searches are performed after a searcher is opened.

My bet: you won't notice a measurable difference when dropping the
atuowarm counts drastically in terms of query response, but you will
save the startup time. I also suspect you can reduce the size of the
caches drastically, but don't know what you have them set to, it's a
guess.

As to what's happening such that you serve queries with zero counts,
my best guess at this point is that you are rebuilding
autosuggesters. We shouldn't be serving queries from the new
searcher during this interval, if confirmed we need to raise a JIRA.

Question 2: see above, autosuggester?

Question 3a: documents should become searchable on the slave when 1>
all the segments are copied, 2> autowarm is completed. As above, the
fact that you get 0-hit responses isn't what _should_ be happening.

Autocommit settings are pretty irrelevant on the slave.

Question 3b: soft commit on the master shouldn't affect the slave at all.

The fact that you have 500 fields shouldn't matter that much in this
scenario. Again, the fact that removing your autowarm settings makes
such a difference indicates the counts are excessive, and I have a
secondary assumption that you probably have your cache settings far
higher than you need, but you'll have to test if you try to reduce
them BTW, I often find the 512 default setting more than ample,
monitor via admin UI>>core>>plugins/stats to see the hit ratio...

As I 

Re: Replication Question

2017-08-04 Thread Shawn Heisey
On 8/2/2017 8:56 AM, Michael B. Klein wrote:
> SCALE DOWN
> 1) Call admin/collections?action=BACKUP for each collection to a
> shared NFS volume
> 2) Shut down all the nodes
>
> SCALE UP
> 1) Spin up 2 Zookeeper nodes and wait for them to stabilize
> 2) Spin up 3 Solr nodes and wait for them to show up under Zookeeper's
> live_nodes
> 3) Call admin/collections?action=RESTORE to put all the collections back
>
> This has been working very well, for the most part, with the following
> complications/observations:
>
> 1) If I don't optimize each collection right before BACKUP, the backup
> fails (see the attached solr_backup_error.json).

Sounds like you're being hit by this at backup time:

https://issues.apache.org/jira/browse/SOLR-9120

There's a patch in the issue which I have not verified and tested.  The
workaround of optimizing the collection is not one I would have thought of.

> 2) If I don't specify a replicationFactor during RESTORE, the admin
> interface's Cloud diagram only shows one active node per collection.
> Is this expected? Am I required to specify the replicationFactor
> unless I'm using a shared HDFS volume for solr data?

The documentation for RESTORE (looking at the 6.6 docs) says that the
restored collection will have the same number of shards and replicas as
the original collection.  Your experience says that either the
documentation is wrong or the version of Solr you're running doesn't
behave that way, and might have a bug.

> 3) If I don't specify maxShardsPerNode=1 during RESTORE, I get a
> warning message in the response, even though the restore seems to succeed.

I would like to see that warning, including whatever stacktrace is
present.  It might be expected, but I'd like to look into it.

> 4) Aside from the replicationFactor parameter on the CREATE/RESTORE, I
> do not currently have any replication stuff configured (as it seems I
> should not).

Correct, you don't need any replication configured.  It's not for cloud
mode.

> 5) At the time my "1-in-3 requests are failing" issue occurred, the
> Cloud diagram looked like the attached solr_admin_cloud_diagram.png.
> It seemed to think all replicas were live and synced and happy, and
> because I was accessing solr through a round-robin load balancer, I
> was never able to tell which node was out of sync.
>
> If it happens again, I'll make node-by-node requests and try to figure
> out what's different about the failing one. But the fact that this
> happened (and the way it happened) is making me wonder if/how I can
> automate this automated staging environment scaling reliably and with
> confidence that it will Just Work™.

That image didn't make it to the mailing list.  Your JSON showing errors
did, though.  Your description of the diagram is good -- sounds like it
was all green and looked exactly how you expected it to look.

What you've described sounds like there may be a problem in the RESTORE
action on the collections API, or possibly a problem with your shared
storage where you put the backups, so the restored data on one replica
isn't faithful to the backup.  I don't know very much about that code,
and what you've described makes me think that this is going to be a hard
one to track down.

Thanks,
Shawn



Re: Replication Question

2017-08-02 Thread Michael B. Klein
And the one that isn't getting the updates is the one marked in the cloud
diagram as the leader.

/me bangs head on desk

On Wed, Aug 2, 2017 at 10:31 AM, Michael B. Klein  wrote:

> Another observation: After bringing the cluster back up just now, the
> "1-in-3 nodes don't get the updates" issue persists, even with the cloud
> diagram showing 3 nodes, all green.
>
> On Wed, Aug 2, 2017 at 9:56 AM, Michael B. Klein 
> wrote:
>
>> Thanks for your responses, Shawn and Erick.
>>
>> Some clarification questions, but first a description of my
>> (non-standard) use case:
>>
>> My Zookeeper/SolrCloud cluster is running on Amazon AWS. Things are
>> working well so far on the production cluster (knock wood); its the staging
>> cluster that's giving me fits. Here's why: In order to save money, I have
>> the AWS auto-scaler scale the cluster down to zero nodes when it's not in
>> use. Here's the (automated) procedure:
>>
>> SCALE DOWN
>> 1) Call admin/collections?action=BACKUP for each collection to a shared
>> NFS volume
>> 2) Shut down all the nodes
>>
>> SCALE UP
>> 1) Spin up 2 Zookeeper nodes and wait for them to stabilize
>> 2) Spin up 3 Solr nodes and wait for them to show up under Zookeeper's
>> live_nodes
>> 3) Call admin/collections?action=RESTORE to put all the collections back
>>
>> This has been working very well, for the most part, with the following
>> complications/observations:
>>
>> 1) If I don't optimize each collection right before BACKUP, the backup
>> fails (see the attached solr_backup_error.json).
>> 2) If I don't specify a replicationFactor during RESTORE, the admin
>> interface's Cloud diagram only shows one active node per collection. Is
>> this expected? Am I required to specify the replicationFactor unless I'm
>> using a shared HDFS volume for solr data?
>> 3) If I don't specify maxShardsPerNode=1 during RESTORE, I get a warning
>> message in the response, even though the restore seems to succeed.
>> 4) Aside from the replicationFactor parameter on the CREATE/RESTORE, I do
>> not currently have any replication stuff configured (as it seems I should
>> not).
>> 5) At the time my "1-in-3 requests are failing" issue occurred, the Cloud
>> diagram looked like the attached solr_admin_cloud_diagram.png. It seemed to
>> think all replicas were live and synced and happy, and because I was
>> accessing solr through a round-robin load balancer, I was never able to
>> tell which node was out of sync.
>>
>> If it happens again, I'll make node-by-node requests and try to figure
>> out what's different about the failing one. But the fact that this happened
>> (and the way it happened) is making me wonder if/how I can automate this
>> automated staging environment scaling reliably and with confidence that it
>> will Just Work™.
>>
>> Comments and suggestions would be GREATLY appreciated.
>>
>> Michael
>>
>>
>>
>> On Tue, Aug 1, 2017 at 8:14 PM, Erick Erickson 
>> wrote:
>>
>>> And please do not use optimize unless your index is
>>> totally static. I only recommend it when the pattern is
>>> to update the index periodically, like every day or
>>> something and not update any docs in between times.
>>>
>>> Implied in Shawn's e-mail was that you should undo
>>> anything you've done in terms of configuring replication,
>>> just go with the defaults.
>>>
>>> Finally, my bet is that your problematic Solr node is misconfigured.
>>>
>>> Best,
>>> Erick
>>>
>>> On Tue, Aug 1, 2017 at 2:36 PM, Shawn Heisey 
>>> wrote:
>>> > On 8/1/2017 12:09 PM, Michael B. Klein wrote:
>>> >> I have a 3-node solrcloud cluster orchestrated by zookeeper. Most
>>> stuff
>>> >> seems to be working OK, except that one of the nodes never seems to
>>> get its
>>> >> replica updated.
>>> >>
>>> >> Queries take place through a non-caching, round-robin load balancer.
>>> The
>>> >> collection looks fine, with one shard and a replicationFactor of 3.
>>> >> Everything in the cloud diagram is green.
>>> >>
>>> >> But if I (for example) select?q=id:hd76s004z, the results come up
>>> empty 1
>>> >> out of every 3 times.
>>> >>
>>> >> Even several minutes after a commit and optimize, one replica still
>>> isn’t
>>> >> returning the right info.
>>> >>
>>> >> Do I need to configure my `solrconfig.xml` with `replicateAfter`
>>> options on
>>> >> the `/replication` requestHandler, or is that a non-solrcloud,
>>> >> standalone-replication thing?
>>> >
>>> > This is one of the more confusing aspects of SolrCloud.
>>> >
>>> > When everything is working perfectly in a SolrCloud install, the
>>> feature
>>> > in Solr called "replication" is *never* used.  SolrCloud does require
>>> > the replication feature, though ... which is what makes this whole
>>> thing
>>> > very confusing.
>>> >
>>> > Replication is used to replicate an entire Lucene index (consisting of
>>> a
>>> > bunch of files on the disk) from a core on a master server to a core on
>>> > a slave server.  

Re: Replication Question

2017-08-02 Thread Michael B. Klein
Another observation: After bringing the cluster back up just now, the
"1-in-3 nodes don't get the updates" issue persists, even with the cloud
diagram showing 3 nodes, all green.

On Wed, Aug 2, 2017 at 9:56 AM, Michael B. Klein  wrote:

> Thanks for your responses, Shawn and Erick.
>
> Some clarification questions, but first a description of my (non-standard)
> use case:
>
> My Zookeeper/SolrCloud cluster is running on Amazon AWS. Things are
> working well so far on the production cluster (knock wood); its the staging
> cluster that's giving me fits. Here's why: In order to save money, I have
> the AWS auto-scaler scale the cluster down to zero nodes when it's not in
> use. Here's the (automated) procedure:
>
> SCALE DOWN
> 1) Call admin/collections?action=BACKUP for each collection to a shared
> NFS volume
> 2) Shut down all the nodes
>
> SCALE UP
> 1) Spin up 2 Zookeeper nodes and wait for them to stabilize
> 2) Spin up 3 Solr nodes and wait for them to show up under Zookeeper's
> live_nodes
> 3) Call admin/collections?action=RESTORE to put all the collections back
>
> This has been working very well, for the most part, with the following
> complications/observations:
>
> 1) If I don't optimize each collection right before BACKUP, the backup
> fails (see the attached solr_backup_error.json).
> 2) If I don't specify a replicationFactor during RESTORE, the admin
> interface's Cloud diagram only shows one active node per collection. Is
> this expected? Am I required to specify the replicationFactor unless I'm
> using a shared HDFS volume for solr data?
> 3) If I don't specify maxShardsPerNode=1 during RESTORE, I get a warning
> message in the response, even though the restore seems to succeed.
> 4) Aside from the replicationFactor parameter on the CREATE/RESTORE, I do
> not currently have any replication stuff configured (as it seems I should
> not).
> 5) At the time my "1-in-3 requests are failing" issue occurred, the Cloud
> diagram looked like the attached solr_admin_cloud_diagram.png. It seemed to
> think all replicas were live and synced and happy, and because I was
> accessing solr through a round-robin load balancer, I was never able to
> tell which node was out of sync.
>
> If it happens again, I'll make node-by-node requests and try to figure out
> what's different about the failing one. But the fact that this happened
> (and the way it happened) is making me wonder if/how I can automate this
> automated staging environment scaling reliably and with confidence that it
> will Just Work™.
>
> Comments and suggestions would be GREATLY appreciated.
>
> Michael
>
>
>
> On Tue, Aug 1, 2017 at 8:14 PM, Erick Erickson 
> wrote:
>
>> And please do not use optimize unless your index is
>> totally static. I only recommend it when the pattern is
>> to update the index periodically, like every day or
>> something and not update any docs in between times.
>>
>> Implied in Shawn's e-mail was that you should undo
>> anything you've done in terms of configuring replication,
>> just go with the defaults.
>>
>> Finally, my bet is that your problematic Solr node is misconfigured.
>>
>> Best,
>> Erick
>>
>> On Tue, Aug 1, 2017 at 2:36 PM, Shawn Heisey  wrote:
>> > On 8/1/2017 12:09 PM, Michael B. Klein wrote:
>> >> I have a 3-node solrcloud cluster orchestrated by zookeeper. Most stuff
>> >> seems to be working OK, except that one of the nodes never seems to
>> get its
>> >> replica updated.
>> >>
>> >> Queries take place through a non-caching, round-robin load balancer.
>> The
>> >> collection looks fine, with one shard and a replicationFactor of 3.
>> >> Everything in the cloud diagram is green.
>> >>
>> >> But if I (for example) select?q=id:hd76s004z, the results come up
>> empty 1
>> >> out of every 3 times.
>> >>
>> >> Even several minutes after a commit and optimize, one replica still
>> isn’t
>> >> returning the right info.
>> >>
>> >> Do I need to configure my `solrconfig.xml` with `replicateAfter`
>> options on
>> >> the `/replication` requestHandler, or is that a non-solrcloud,
>> >> standalone-replication thing?
>> >
>> > This is one of the more confusing aspects of SolrCloud.
>> >
>> > When everything is working perfectly in a SolrCloud install, the feature
>> > in Solr called "replication" is *never* used.  SolrCloud does require
>> > the replication feature, though ... which is what makes this whole thing
>> > very confusing.
>> >
>> > Replication is used to replicate an entire Lucene index (consisting of a
>> > bunch of files on the disk) from a core on a master server to a core on
>> > a slave server.  This is how replication was done before SolrCloud was
>> > created.
>> >
>> > The way that SolrCloud keeps replicas in sync is *entirely* different.
>> > SolrCloud has no masters and no slaves.  When you index or delete a
>> > document in a SolrCloud collection, the request is forwarded to the
>> > leader of the correct shard for that 

Re: Replication Question

2017-08-02 Thread Michael B. Klein
Thanks for your responses, Shawn and Erick.

Some clarification questions, but first a description of my (non-standard)
use case:

My Zookeeper/SolrCloud cluster is running on Amazon AWS. Things are working
well so far on the production cluster (knock wood); its the staging cluster
that's giving me fits. Here's why: In order to save money, I have the AWS
auto-scaler scale the cluster down to zero nodes when it's not in use.
Here's the (automated) procedure:

SCALE DOWN
1) Call admin/collections?action=BACKUP for each collection to a shared NFS
volume
2) Shut down all the nodes

SCALE UP
1) Spin up 2 Zookeeper nodes and wait for them to stabilize
2) Spin up 3 Solr nodes and wait for them to show up under Zookeeper's
live_nodes
3) Call admin/collections?action=RESTORE to put all the collections back

This has been working very well, for the most part, with the following
complications/observations:

1) If I don't optimize each collection right before BACKUP, the backup
fails (see the attached solr_backup_error.json).
2) If I don't specify a replicationFactor during RESTORE, the admin
interface's Cloud diagram only shows one active node per collection. Is
this expected? Am I required to specify the replicationFactor unless I'm
using a shared HDFS volume for solr data?
3) If I don't specify maxShardsPerNode=1 during RESTORE, I get a warning
message in the response, even though the restore seems to succeed.
4) Aside from the replicationFactor parameter on the CREATE/RESTORE, I do
not currently have any replication stuff configured (as it seems I should
not).
5) At the time my "1-in-3 requests are failing" issue occurred, the Cloud
diagram looked like the attached solr_admin_cloud_diagram.png. It seemed to
think all replicas were live and synced and happy, and because I was
accessing solr through a round-robin load balancer, I was never able to
tell which node was out of sync.

If it happens again, I'll make node-by-node requests and try to figure out
what's different about the failing one. But the fact that this happened
(and the way it happened) is making me wonder if/how I can automate this
automated staging environment scaling reliably and with confidence that it
will Just Work™.

Comments and suggestions would be GREATLY appreciated.

Michael



On Tue, Aug 1, 2017 at 8:14 PM, Erick Erickson 
wrote:

> And please do not use optimize unless your index is
> totally static. I only recommend it when the pattern is
> to update the index periodically, like every day or
> something and not update any docs in between times.
>
> Implied in Shawn's e-mail was that you should undo
> anything you've done in terms of configuring replication,
> just go with the defaults.
>
> Finally, my bet is that your problematic Solr node is misconfigured.
>
> Best,
> Erick
>
> On Tue, Aug 1, 2017 at 2:36 PM, Shawn Heisey  wrote:
> > On 8/1/2017 12:09 PM, Michael B. Klein wrote:
> >> I have a 3-node solrcloud cluster orchestrated by zookeeper. Most stuff
> >> seems to be working OK, except that one of the nodes never seems to get
> its
> >> replica updated.
> >>
> >> Queries take place through a non-caching, round-robin load balancer. The
> >> collection looks fine, with one shard and a replicationFactor of 3.
> >> Everything in the cloud diagram is green.
> >>
> >> But if I (for example) select?q=id:hd76s004z, the results come up empty
> 1
> >> out of every 3 times.
> >>
> >> Even several minutes after a commit and optimize, one replica still
> isn’t
> >> returning the right info.
> >>
> >> Do I need to configure my `solrconfig.xml` with `replicateAfter`
> options on
> >> the `/replication` requestHandler, or is that a non-solrcloud,
> >> standalone-replication thing?
> >
> > This is one of the more confusing aspects of SolrCloud.
> >
> > When everything is working perfectly in a SolrCloud install, the feature
> > in Solr called "replication" is *never* used.  SolrCloud does require
> > the replication feature, though ... which is what makes this whole thing
> > very confusing.
> >
> > Replication is used to replicate an entire Lucene index (consisting of a
> > bunch of files on the disk) from a core on a master server to a core on
> > a slave server.  This is how replication was done before SolrCloud was
> > created.
> >
> > The way that SolrCloud keeps replicas in sync is *entirely* different.
> > SolrCloud has no masters and no slaves.  When you index or delete a
> > document in a SolrCloud collection, the request is forwarded to the
> > leader of the correct shard for that document.  The leader then sends a
> > copy of that request to all the other replicas, and each replica
> > (including the leader) independently handles the updates that are in the
> > request.  Since all replicas index the same content, they stay in sync.
> >
> > What SolrCloud does with the replication feature is index recovery.  In
> > some situations recovery can be done from the leader's transaction log,
> > 

Re: Replication Question

2017-08-01 Thread Erick Erickson
And please do not use optimize unless your index is
totally static. I only recommend it when the pattern is
to update the index periodically, like every day or
something and not update any docs in between times.

Implied in Shawn's e-mail was that you should undo
anything you've done in terms of configuring replication,
just go with the defaults.

Finally, my bet is that your problematic Solr node is misconfigured.

Best,
Erick

On Tue, Aug 1, 2017 at 2:36 PM, Shawn Heisey  wrote:
> On 8/1/2017 12:09 PM, Michael B. Klein wrote:
>> I have a 3-node solrcloud cluster orchestrated by zookeeper. Most stuff
>> seems to be working OK, except that one of the nodes never seems to get its
>> replica updated.
>>
>> Queries take place through a non-caching, round-robin load balancer. The
>> collection looks fine, with one shard and a replicationFactor of 3.
>> Everything in the cloud diagram is green.
>>
>> But if I (for example) select?q=id:hd76s004z, the results come up empty 1
>> out of every 3 times.
>>
>> Even several minutes after a commit and optimize, one replica still isn’t
>> returning the right info.
>>
>> Do I need to configure my `solrconfig.xml` with `replicateAfter` options on
>> the `/replication` requestHandler, or is that a non-solrcloud,
>> standalone-replication thing?
>
> This is one of the more confusing aspects of SolrCloud.
>
> When everything is working perfectly in a SolrCloud install, the feature
> in Solr called "replication" is *never* used.  SolrCloud does require
> the replication feature, though ... which is what makes this whole thing
> very confusing.
>
> Replication is used to replicate an entire Lucene index (consisting of a
> bunch of files on the disk) from a core on a master server to a core on
> a slave server.  This is how replication was done before SolrCloud was
> created.
>
> The way that SolrCloud keeps replicas in sync is *entirely* different.
> SolrCloud has no masters and no slaves.  When you index or delete a
> document in a SolrCloud collection, the request is forwarded to the
> leader of the correct shard for that document.  The leader then sends a
> copy of that request to all the other replicas, and each replica
> (including the leader) independently handles the updates that are in the
> request.  Since all replicas index the same content, they stay in sync.
>
> What SolrCloud does with the replication feature is index recovery.  In
> some situations recovery can be done from the leader's transaction log,
> but when a replica has gotten so far out of sync that the only option
> available is to completely replace the index on the bad replica,
> SolrCloud will fire up the replication feature and create an exact copy
> of the index from the replica that is currently elected as leader.
> SolrCloud temporarily designates the leader core as master and the bad
> replica as slave, then initiates a one-time replication.  This is all
> completely automated and requires no configuration or input from the
> administrator.
>
> The configuration elements you have asked about are for the old
> master-slave replication setup and do not apply to SolrCloud at all.
>
> What I would recommend that you do to solve your immediate issue:  Shut
> down the Solr instance that is having the problem, rename the "data"
> directory in the core that isn't working right to something else, and
> start Solr back up.  As long as you still have at least one good replica
> in the cloud, SolrCloud will see that the index data is gone and copy
> the index from the leader.  You could delete the data directory instead
> of renaming it, but that would leave you with no "undo" option.
>
> Thanks,
> Shawn
>


Re: Replication Question

2017-08-01 Thread Shawn Heisey
On 8/1/2017 12:09 PM, Michael B. Klein wrote:
> I have a 3-node solrcloud cluster orchestrated by zookeeper. Most stuff
> seems to be working OK, except that one of the nodes never seems to get its
> replica updated.
>
> Queries take place through a non-caching, round-robin load balancer. The
> collection looks fine, with one shard and a replicationFactor of 3.
> Everything in the cloud diagram is green.
>
> But if I (for example) select?q=id:hd76s004z, the results come up empty 1
> out of every 3 times.
>
> Even several minutes after a commit and optimize, one replica still isn’t
> returning the right info.
>
> Do I need to configure my `solrconfig.xml` with `replicateAfter` options on
> the `/replication` requestHandler, or is that a non-solrcloud,
> standalone-replication thing?

This is one of the more confusing aspects of SolrCloud.

When everything is working perfectly in a SolrCloud install, the feature
in Solr called "replication" is *never* used.  SolrCloud does require
the replication feature, though ... which is what makes this whole thing
very confusing.

Replication is used to replicate an entire Lucene index (consisting of a
bunch of files on the disk) from a core on a master server to a core on
a slave server.  This is how replication was done before SolrCloud was
created.

The way that SolrCloud keeps replicas in sync is *entirely* different. 
SolrCloud has no masters and no slaves.  When you index or delete a
document in a SolrCloud collection, the request is forwarded to the
leader of the correct shard for that document.  The leader then sends a
copy of that request to all the other replicas, and each replica
(including the leader) independently handles the updates that are in the
request.  Since all replicas index the same content, they stay in sync.

What SolrCloud does with the replication feature is index recovery.  In
some situations recovery can be done from the leader's transaction log,
but when a replica has gotten so far out of sync that the only option
available is to completely replace the index on the bad replica,
SolrCloud will fire up the replication feature and create an exact copy
of the index from the replica that is currently elected as leader. 
SolrCloud temporarily designates the leader core as master and the bad
replica as slave, then initiates a one-time replication.  This is all
completely automated and requires no configuration or input from the
administrator.

The configuration elements you have asked about are for the old
master-slave replication setup and do not apply to SolrCloud at all.

What I would recommend that you do to solve your immediate issue:  Shut
down the Solr instance that is having the problem, rename the "data"
directory in the core that isn't working right to something else, and
start Solr back up.  As long as you still have at least one good replica
in the cloud, SolrCloud will see that the index data is gone and copy
the index from the leader.  You could delete the data directory instead
of renaming it, but that would leave you with no "undo" option.

Thanks,
Shawn



Re: Replication Index fetch failed

2016-10-10 Thread Arkadi Colson

Hi

I could not find "Could not download file" in the logs. Should I 
increase the log level somewhere? Just let me know... so I can provide 
you more detailed logs...


Thx!
Arkadi


On 02-09-16 11:21, Arkadi Colson wrote:

Hi

I cannot find a string in the logs matching "Could not download file...".

This info is logged on the slave:

WARN  - 2016-09-02 09:28:36.923; [c:intradesk s:shard10 r:core_node23 
x:intradesk_shard10_replica1] 
org.apache.solr.handler.IndexFetcher$FileFetcher; Error in fetching 
file: _5qd6_ya.liv (downloaded 0 of 13692 bytes)

java.io.EOFException
at 
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:168)
at 
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:160)
at 
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1460)
at 
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1426)
at 
org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:852)
at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:428)
at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:251)
at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:388)
at 
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:156)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:408)
at 
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:221)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

And this on the master:

WARN  - 2016-09-02 09:28:36.936; [c:intradesk s:shard10 r:core_node13 
x:intradesk_shard10_replica2] 
org.apache.solr.handler.ReplicationHandler$DirectoryFileStream; 
Exception while writing response for params: 
generation=124148=/replication=_5qd6_ya.liv=true=filestream=filecontent

BPerSec=18.75
java.nio.file.NoSuchFileException: 
/var/solr/data/intradesk_shard10_replica2/data/index.20160816102332501/_5qd6_ya.liv
at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at 
sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)

at java.nio.channels.FileChannel.open(FileChannel.java:287)
at java.nio.channels.FileChannel.open(FileChannel.java:335)
at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238)
at 
org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:192)
at 
org.apache.solr.handler.ReplicationHandler$DirectoryFileStream.write(ReplicationHandler.java:1435)

at org.apache.solr.core.SolrCore$3.write(SolrCore.java:2154)
at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:49)
at 
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:731)
at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 

Re: Replication Index fetch failed

2016-09-02 Thread Arkadi Colson

Hi

I cannot find a string in the logs matching "Could not download file...".

This info is logged on the slave:

WARN  - 2016-09-02 09:28:36.923; [c:intradesk s:shard10 r:core_node23 
x:intradesk_shard10_replica1] 
org.apache.solr.handler.IndexFetcher$FileFetcher; Error in fetching 
file: _5qd6_ya.liv (downloaded 0 of 13692 bytes)

java.io.EOFException
at 
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:168)
at 
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:160)
at 
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1460)
at 
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1426)
at 
org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:852)
at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:428)
at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:251)
at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:388)
at 
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:156)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:408)
at 
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:221)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

And this on the master:

WARN  - 2016-09-02 09:28:36.936; [c:intradesk s:shard10 r:core_node13 
x:intradesk_shard10_replica2] 
org.apache.solr.handler.ReplicationHandler$DirectoryFileStream; 
Exception while writing response for params: 
generation=124148=/replication=_5qd6_ya.liv=true=filestream=filecontent

BPerSec=18.75
java.nio.file.NoSuchFileException: 
/var/solr/data/intradesk_shard10_replica2/data/index.20160816102332501/_5qd6_ya.liv
at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at 
sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)

at java.nio.channels.FileChannel.open(FileChannel.java:287)
at java.nio.channels.FileChannel.open(FileChannel.java:335)
at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238)
at 
org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:192)
at 
org.apache.solr.handler.ReplicationHandler$DirectoryFileStream.write(ReplicationHandler.java:1435)

at org.apache.solr.core.SolrCore$3.write(SolrCore.java:2154)
at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:49)
at 
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:731)

at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)

at org.eclipse.jetty.server.Server.handle(Server.java:518)
at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
at 

Re: Replication Index fetch failed

2016-09-01 Thread Shalin Shekhar Mangar
On Thu, Sep 1, 2016 at 6:05 PM, Arkadi Colson  wrote:

> ERROR - 2016-09-01 14:30:43.653; [c:intradesk s:shard1 r:core_node5
> x:intradesk_shard1_replica1] org.apache.solr.common.SolrException; Index
> fetch failed :org.apache.solr.common.SolrException: Unable to download
> _6f46_cj.liv completely. Downloaded 0!=5596
> at org.apache.solr.handler.IndexFetcher$FileFetcher.cleanup(
> IndexFetcher.java:1554)
> at org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(
> IndexFetcher.java:1437)
> at org.apache.solr.handler.IndexFetcher.downloadIndexFiles(Inde
> xFetcher.java:852)
> at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexF
> etcher.java:428)
> at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexF
> etcher.java:251)
>


There should be another exception in the logs that looks like the following:
"Could not download file"...

That one will have a more useful stack trace. Can you please find it and
paste it on email?

-- 
Regards,
Shalin Shekhar Mangar.


Re: Replication with managed resources?

2016-08-04 Thread rosbaldeston
Raised as https://issues.apache.org/jira/browse/SOLR-9382



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replication-with-managed-resources-tp4289880p4290386.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Replication with managed resources?

2016-08-03 Thread rosbaldeston
I was just running my own test and it seems it doesn't replicate or reload
the managed schema synonyms file. Not on a manual replication request after
a synonym change and not on an index change triggering an automatic
replication at least.

Used this as the slaves confFiles, not sure if this allows globs for the
language variants?

  solrconfig.xml,managed-schema,_schema_analysis_stopwords_english.json,_schema_analysis_synonyms_english.json

This is with a Solr 5.5, new schemas for both master & slave and all on
Centos 6.5 with Java 7.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replication-with-managed-resources-tp4289880p4290248.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Replication with managed resources?

2016-08-03 Thread Erick Erickson
bq: I'm also guessing those _schema and managed_schema files are an
implementation detail for the missing zookeeper functionality. But if I did
add those to a conffiles option it might automate the slave core reloads for
me?

You're getting closer ;). There's nothing Cloud specific about the whole
managed schema functionality, although that is where it's gotten the most
exercise so

So if you're saying that you change the managed schema file on the
master and it is _not_ replicated automatically to the slave (you'll
have to have added docs I believe on the master, I don't think replication
happens just because of config changes) then I think that's worth a
JIRA, can you please confirm?

So if this sequence doesn't work:
1> change the managed schema
2> index some docs
3> wait for a replication
4> the managed schema file on the slave has _not_ been updated

then please raise a JIRA. Make sure you identify that this is stand-alone.
NOTE: I'm not sure what the right thing to do in this case is, but the JIRA
would allow a place to discuss what "the right thing" would be.

In the meantime, you should be able to work around that by explicitly listing
them in the conffiles section.

Best,
Erick

On Wed, Aug 3, 2016 at 8:58 AM, rosbaldeston  wrote:
> Erick Erickson wrote
>> It Depends. When running in Cloud mode then "yes". If you're running
>> stand-alone
>> then there is no Zookeeper running so the answer is "no".
>
> Ah that helps, so no zookeeper in my case. I did wonder if it wasn't just
> sharing the same config files between master and slave from sharing the same
> configset. So it would appear I'm not replicating any of the managed files
> and reloading the slave core probably just reread the shared synonyms file.
>
> I'm also guessing those _schema and managed_schema files are an
> implementation detail for the missing zookeeper functionality. But if I did
> add those to a conffiles option it might automate the slave core reloads for
> me?
>
>
>> If a replication involved downloading of at least one configuration file,
>> the ReplicationHandler issues a core-reload command instead of a commit
>> command.
>
> (from https://cwiki.apache.org/confluence/display/solr/Index+Replication)
>
> Currently I've no conffiles set on the slave and I know it didn't get
> reloaded after synonym changes to the master.
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Replication-with-managed-resources-tp4289880p4290242.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Replication with managed resources?

2016-08-03 Thread rosbaldeston
Erick Erickson wrote
> It Depends. When running in Cloud mode then "yes". If you're running
> stand-alone
> then there is no Zookeeper running so the answer is "no".

Ah that helps, so no zookeeper in my case. I did wonder if it wasn't just
sharing the same config files between master and slave from sharing the same
configset. So it would appear I'm not replicating any of the managed files
and reloading the slave core probably just reread the shared synonyms file.

I'm also guessing those _schema and managed_schema files are an
implementation detail for the missing zookeeper functionality. But if I did
add those to a conffiles option it might automate the slave core reloads for
me? 


> If a replication involved downloading of at least one configuration file,
> the ReplicationHandler issues a core-reload command instead of a commit
> command.

(from https://cwiki.apache.org/confluence/display/solr/Index+Replication)

Currently I've no conffiles set on the slave and I know it didn't get
reloaded after synonym changes to the master.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replication-with-managed-resources-tp4289880p4290242.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Replication with managed resources?

2016-08-03 Thread Erick Erickson
bq: Am I right in saying managed resources are handled by zookeeper rather than
files on the filesystem

It Depends. When running in Cloud mode then "yes". If you're running stand-alone
then there is no Zookeeper running so the answer is "no".

You can run Solr just like you always have in master/slave setups. In
that case you
need to manage your own configurations on every node just like you always have,
probably through replication.

In stand-alone mode, you should send all your managed schema API calls to the
master core and let the replication distribute the changes to the slaves.

Best,
Erick

On Wed, Aug 3, 2016 at 4:48 AM, rosbaldeston  wrote:
> Am I right in saying managed resources are handled by zookeeper rather than
> files on the filesystem and I should ignore any files such as:
> managed-schema,   _rest_managed.json,
> _schema_analysis_stopwords_english.json,
> _schema_analysis_synonyms_english.json ...
>
> I should not try to copy any of these via the slaves confFiles option?
>
> What I was planning to do was have the master as the indexing source and all
> slaves as query sources. But they need the same synonyms & stopwords.
>
> One thing I am seeing is when I create my master and slave from a custom
> configset without any copying of configs is when the masters synonyms have
> been changed the synonyms on the slave don't reflect these changes even
> sometime after after replication?
>
> It appears I need to reload the slave core(s) before they show the same
> synonyms as the master? is this because they're sharing the same file? how
> do should I keep slaves in sync with managed resources? do I just have to
> keep reloading all slave cores ever so often?
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Replication-with-managed-resources-tp4289880p4290177.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Replication with managed resources?

2016-08-03 Thread rosbaldeston
Am I right in saying managed resources are handled by zookeeper rather than
files on the filesystem and I should ignore any files such as:   
managed-schema,   _rest_managed.json,
_schema_analysis_stopwords_english.json,
_schema_analysis_synonyms_english.json ...

I should not try to copy any of these via the slaves confFiles option?

What I was planning to do was have the master as the indexing source and all
slaves as query sources. But they need the same synonyms & stopwords.

One thing I am seeing is when I create my master and slave from a custom
configset without any copying of configs is when the masters synonyms have
been changed the synonyms on the slave don't reflect these changes even
sometime after after replication?

It appears I need to reload the slave core(s) before they show the same
synonyms as the master? is this because they're sharing the same file? how
do should I keep slaves in sync with managed resources? do I just have to
keep reloading all slave cores ever so often?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replication-with-managed-resources-tp4289880p4290177.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Replication as backup in SolrCloud

2015-11-15 Thread KNitin
We built and open sourced haft precisely for such use cases.
https://github.com/bloomreach/solrcloud-haft

 You can clone an entire
cluster or selective collections between clusters. It has only been tested
upto solr 4.10.

Let me know if you run into issues
Nitin

On Monday, June 22, 2015, Erick Erickson  wrote:

> Currently, one is best off treating these as two separate clusters and
> having your client send the data to both, or reproducing your
> system-of-record and running your DCs completely separately.
>
> Hopefully soon, though, there'll be what you're asking for
> active/passive DCs, see:
> https://issues.apache.org/jira/browse/SOLR-6273
>
> Best,
> Erick
>
> On Mon, Jun 22, 2015 at 10:16 AM, StrW_dev  > wrote:
> > Hi,
> >
> > I have a SolrCloud cluster in one data center, but as backup I want to
> have
> > a second (replicated) cluster in another data center.
> >
> > What I want is to replicate to this second cluster, but I don't want my
> > queries to go to this cluster. Is this possible within SolrCloud? As now
> it
> > seems to replicate, but also distribute the query request to this
> replicated
> > server.
> >
> > Gr
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Replication-as-backup-in-SolrCloud-tp4213267.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Replication as backup in SolrCloud

2015-11-15 Thread Arcadius Ahouansou
Hello Gr.

We are in a similar situation to yours... and we are using
https://issues.apache.org/jira/browse/SOLR-8146

It is a small patch for the SolrJ client that can send all of your queries
to your main DC unless all nodes in the main DC are down.
Write/updates/delete and admin operations remain unchanged.

All you need is to come up with a regex describing the SolrCloud nodes in
your main DC and pass the regex to all of your SolrJ clients.

Hope this helps.


Arcadius.


On 15 November 2015 at 14:56, KNitin  wrote:

> We built and open sourced haft precisely for such use cases.
> https://github.com/bloomreach/solrcloud-haft
>
>  You can clone an entire
> cluster or selective collections between clusters. It has only been tested
> upto solr 4.10.
>
> Let me know if you run into issues
> Nitin
>
> On Monday, June 22, 2015, Erick Erickson  wrote:
>
> > Currently, one is best off treating these as two separate clusters and
> > having your client send the data to both, or reproducing your
> > system-of-record and running your DCs completely separately.
> >
> > Hopefully soon, though, there'll be what you're asking for
> > active/passive DCs, see:
> > https://issues.apache.org/jira/browse/SOLR-6273
> >
> > Best,
> > Erick
> >
> > On Mon, Jun 22, 2015 at 10:16 AM, StrW_dev  > > wrote:
> > > Hi,
> > >
> > > I have a SolrCloud cluster in one data center, but as backup I want to
> > have
> > > a second (replicated) cluster in another data center.
> > >
> > > What I want is to replicate to this second cluster, but I don't want my
> > > queries to go to this cluster. Is this possible within SolrCloud? As
> now
> > it
> > > seems to replicate, but also distribute the query request to this
> > replicated
> > > server.
> > >
> > > Gr
> > >
> > >
> > >
> > > --
> > > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Replication-as-backup-in-SolrCloud-tp4213267.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>



-- 
Arcadius Ahouansou
Menelic Ltd | Information is Power
M: 07908761999
W: www.menelic.com
---


Re: Replication and soft commits for NRT searches

2015-10-15 Thread MOIS Martin (MORPHO)
Hello,

the background for my question is that one of the requirements for our 
injection tool is that it should report that a new document has been 
successfully enrolled to the cluster only if it is available on all replicas. 
The automated integration test for this feature will submit a document to the 
cluster and afterwards check if it can be found with an appropriate query (that 
is why I have configured autoSoftCommit/maxDocs=1).

In this context the question appeared, what happens if the update request 
returns rf=1 and I submit a query to a cluster with replication factor of two 
directly after the update (maybe to the replica due to load balancing)? Will 
the automated integration test fail sometimes and sometimes not? Will I have to 
wait artificially between the update and the query and if yes, how long? And 
how can I implement the requirement that our injection tool should report 
successful only if the document has been replicated to all replicas?

Best Regards,
Martin Mois

>bq: If a timeout between shard leader and replica can
>lead to a smaller rf value (because replication has
>timed out), is it possible to increase this timeout in the configuration?
>
>Why do you care? If it timed out, then the follower will
>no longer be active and will not serve queries. The Cloud view
>should show it in "down", "recovery" or the like. Before it
>goes back to the "active" state, it will synchronize from
>the leader automatically without you having to do anything and
>any docs that were indexed to the leader will be faithfully
>reflected on the follower  _before_ the recovering
>follower serves any new queries. So practically it makes no
>difference whether there was an update timeout or not.
>
>This is feeling a lot like an "XY" problem. You're asking detailed
>questions about "X" (in this case timeouts, what rf means and the like)
>without telling us what the problem you're concerned about is ("Y").
>
>So please back up and tell us what your higher level concern is.
>Do you have any evidence of Bad Things Happening?
>
>And do, please, change your commit intervals to not happen after
>doc. That's a Really Bad Practice in Solr.
>
>Best,
>Erick
>
>On Tue, Oct 13, 2015 at 11:58 PM, MOIS Martin (MORPHO)
> wrote:
>> Hello,
>>
>> thank you for the detailed answer.
>>
>> If a timeout between shard leader and replica can lead to a smaller rf value 
>> (because
>replication has timed out), is it possible to increase this timeout in the 
>configuration?
>>
>> Best Regards,
>> Martin Mois
>>
>> Comments inline:
>>
>> On Mon, Oct 12, 2015 at 1:31 PM, MOIS Martin (MORPHO)
>>  wrote:
>>> Hello,
>>>
>>> I am running Solr 5.2.1 in a cluster with 6 nodes. My collections have been 
>>> created
>with
>> replicationFactor=2, i.e. I have one replica for each shard. Beyond that I 
>> am using autoCommit/maxDocs=1
>> and autoSoftCommits/maxDocs=1 in order to achieve near realtime search 
>> behavior.
>>>
>>> As far as I understand from section "Write Side Fault Tolerance" in the 
>>> documentation
>> (https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance),
>I
>> cannot enforce that an update gets replicated to all replicas, but I can 
>> only get the
>achieved
>> replication factor by requesting the return value rf.
>>>
>>> My question is now, what exactly does rf=2 mean? Does it only mean that the 
>>> replica
>has
>> written the update to its transaction log? Or has the replica also performed 
>> the soft
>commit
>> as configured with autoSoftCommits/maxDocs=1? The answer is important for 
>> me, as if the
>update
>> would only get written to the transaction log, I could not search for it 
>> reliable, as
>the
>> replica may not have added it to the searchable index.
>>
>> rf=2 means that the update was successfully replicated to and
>> acknowledged by two replicas (including the leader). The rf only deals
>> with the durability of the update and has no relation to visibility of
>> the update to searchers. The auto(soft)commit settings are applied
>> asynchronously and do not block an update request.
>>
>>>
>>> My second question is, does rf=1 mean that the update was definitely not 
>>> successful
>on
>> the replica or could it also represent a timeout of the replication request 
>> from the
>shard
>> leader? If it could also represent a timeout, then there would be a small 
>> chance that
>the
>> replication was successfully despite of the timeout.
>>
>> Well, rf=1 implies that the update was only applied on the leader's
>> index + tlog and either replicas weren't available or returned an
>> error or the request timed out. So yes, you are right that it can
>> represent a timeout and as such there is a chance that the replication
>> was indeed successful despite of the timeout.
>>
>>>
>>> Is there a way to retrieve the replication factor for a specific document 
>>> after the
>update
>> in order to check if replication was successful in 

Re: Replication and soft commits for NRT searches

2015-10-15 Thread Erick Erickson
bq: the background for my question is that one of the requirements for
our injection tool is that it should report that a new document has
been successfully enrolled to the cluster only if it is available on
all replicas

Frankly, this is the tail wagging the dog. SolrCloud is designed to
guarantee eventual consistency, and you're trying to force it to
satisfy other criteria, it'll be a difficult fit.

My guess is that either you're only in development at this point or
that your query rate is very low, because setting soft commits to 1
second is going to be a problem in production if indexing is happening
consistently and you have to serve a significant query rate.

Really, I recommend you revisit this with requirement and see if it
can be relaxed. For example, you could keep track of the number of
unique IDs you expect to be in solr and periodically (when not
indexing and after the (longer) soft commit interval has expired)
query each replica (=false) or get replica stats and check
that each replica for each shard has the same number of live documents
and that the sum across the shards is what you expect.

Best,
Erick

On Wed, Oct 14, 2015 at 11:26 PM, MOIS Martin (MORPHO)
 wrote:
> Hello,
>
> the background for my question is that one of the requirements for our 
> injection tool is that it should report that a new document has been 
> successfully enrolled to the cluster only if it is available on all replicas. 
> The automated integration test for this feature will submit a document to the 
> cluster and afterwards check if it can be found with an appropriate query 
> (that is why I have configured autoSoftCommit/maxDocs=1).
>
> In this context the question appeared, what happens if the update request 
> returns rf=1 and I submit a query to a cluster with replication factor of two 
> directly after the update (maybe to the replica due to load balancing)? Will 
> the automated integration test fail sometimes and sometimes not? Will I have 
> to wait artificially between the update and the query and if yes, how long? 
> And how can I implement the requirement that our injection tool should report 
> successful only if the document has been replicated to all replicas?
>
> Best Regards,
> Martin Mois
>
>>bq: If a timeout between shard leader and replica can
>>lead to a smaller rf value (because replication has
>>timed out), is it possible to increase this timeout in the configuration?
>>
>>Why do you care? If it timed out, then the follower will
>>no longer be active and will not serve queries. The Cloud view
>>should show it in "down", "recovery" or the like. Before it
>>goes back to the "active" state, it will synchronize from
>>the leader automatically without you having to do anything and
>>any docs that were indexed to the leader will be faithfully
>>reflected on the follower  _before_ the recovering
>>follower serves any new queries. So practically it makes no
>>difference whether there was an update timeout or not.
>>
>>This is feeling a lot like an "XY" problem. You're asking detailed
>>questions about "X" (in this case timeouts, what rf means and the like)
>>without telling us what the problem you're concerned about is ("Y").
>>
>>So please back up and tell us what your higher level concern is.
>>Do you have any evidence of Bad Things Happening?
>>
>>And do, please, change your commit intervals to not happen after
>>doc. That's a Really Bad Practice in Solr.
>>
>>Best,
>>Erick
>>
>>On Tue, Oct 13, 2015 at 11:58 PM, MOIS Martin (MORPHO)
>> wrote:
>>> Hello,
>>>
>>> thank you for the detailed answer.
>>>
>>> If a timeout between shard leader and replica can lead to a smaller rf 
>>> value (because
>>replication has timed out), is it possible to increase this timeout in the 
>>configuration?
>>>
>>> Best Regards,
>>> Martin Mois
>>>
>>> Comments inline:
>>>
>>> On Mon, Oct 12, 2015 at 1:31 PM, MOIS Martin (MORPHO)
>>>  wrote:
 Hello,

 I am running Solr 5.2.1 in a cluster with 6 nodes. My collections have 
 been created
>>with
>>> replicationFactor=2, i.e. I have one replica for each shard. Beyond that I 
>>> am using autoCommit/maxDocs=1
>>> and autoSoftCommits/maxDocs=1 in order to achieve near realtime search 
>>> behavior.

 As far as I understand from section "Write Side Fault Tolerance" in the 
 documentation
>>> (https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance),
>>I
>>> cannot enforce that an update gets replicated to all replicas, but I can 
>>> only get the
>>achieved
>>> replication factor by requesting the return value rf.

 My question is now, what exactly does rf=2 mean? Does it only mean that 
 the replica
>>has
>>> written the update to its transaction log? Or has the replica also 
>>> performed the soft
>>commit
>>> as configured with autoSoftCommits/maxDocs=1? The answer is important for 
>>> me, as if the
>>update
>>> would only 

Re: Replication and soft commits for NRT searches

2015-10-14 Thread Erick Erickson
bq: If a timeout between shard leader and replica can
lead to a smaller rf value (because replication has
timed out), is it possible to increase this timeout in the configuration?

Why do you care? If it timed out, then the follower will
no longer be active and will not serve queries. The Cloud view
should show it in "down", "recovery" or the like. Before it
goes back to the "active" state, it will synchronize from
the leader automatically without you having to do anything and
any docs that were indexed to the leader will be faithfully
reflected on the follower  _before_ the recovering
follower serves any new queries. So practically it makes no
difference whether there was an update timeout or not.

This is feeling a lot like an "XY" problem. You're asking detailed
questions about "X" (in this case timeouts, what rf means and the like)
without telling us what the problem you're concerned about is ("Y").

So please back up and tell us what your higher level concern is.
Do you have any evidence of Bad Things Happening?

And do, please, change your commit intervals to not happen after
doc. That's a Really Bad Practice in Solr.

Best,
Erick

On Tue, Oct 13, 2015 at 11:58 PM, MOIS Martin (MORPHO)
 wrote:
> Hello,
>
> thank you for the detailed answer.
>
> If a timeout between shard leader and replica can lead to a smaller rf value 
> (because replication has timed out), is it possible to increase this timeout 
> in the configuration?
>
> Best Regards,
> Martin Mois
>
> Comments inline:
>
> On Mon, Oct 12, 2015 at 1:31 PM, MOIS Martin (MORPHO)
>  wrote:
>> Hello,
>>
>> I am running Solr 5.2.1 in a cluster with 6 nodes. My collections have been 
>> created with
> replicationFactor=2, i.e. I have one replica for each shard. Beyond that I am 
> using autoCommit/maxDocs=1
> and autoSoftCommits/maxDocs=1 in order to achieve near realtime search 
> behavior.
>>
>> As far as I understand from section "Write Side Fault Tolerance" in the 
>> documentation
> (https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance),
>  I
> cannot enforce that an update gets replicated to all replicas, but I can only 
> get the achieved
> replication factor by requesting the return value rf.
>>
>> My question is now, what exactly does rf=2 mean? Does it only mean that the 
>> replica has
> written the update to its transaction log? Or has the replica also performed 
> the soft commit
> as configured with autoSoftCommits/maxDocs=1? The answer is important for me, 
> as if the update
> would only get written to the transaction log, I could not search for it 
> reliable, as the
> replica may not have added it to the searchable index.
>
> rf=2 means that the update was successfully replicated to and
> acknowledged by two replicas (including the leader). The rf only deals
> with the durability of the update and has no relation to visibility of
> the update to searchers. The auto(soft)commit settings are applied
> asynchronously and do not block an update request.
>
>>
>> My second question is, does rf=1 mean that the update was definitely not 
>> successful on
> the replica or could it also represent a timeout of the replication request 
> from the shard
> leader? If it could also represent a timeout, then there would be a small 
> chance that the
> replication was successfully despite of the timeout.
>
> Well, rf=1 implies that the update was only applied on the leader's
> index + tlog and either replicas weren't available or returned an
> error or the request timed out. So yes, you are right that it can
> represent a timeout and as such there is a chance that the replication
> was indeed successful despite of the timeout.
>
>>
>> Is there a way to retrieve the replication factor for a specific document 
>> after the update
> in order to check if replication was successful in the meantime?
>>
>
> No, there is no way to do that.
>
>> Thanks in advance.
>>
>> Best Regards,
>> Martin Mois
>> #
>> " This e-mail and any attached documents may contain confidential or 
>> proprietary information.
> If you are not the intended recipient, you are notified that any 
> dissemination, copying of
> this e-mail and any attachments thereto or use of their contents by any means 
> whatsoever is
> strictly prohibited. If you have received this e-mail in error, please advise 
> the sender immediately
> and delete this e-mail and all attached documents from your computer system."
>> #
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
> #
> " This e-mail and any attached documents may contain confidential or 
> proprietary information. If you are not the intended recipient, you are 
> notified that any dissemination, copying of this e-mail and any attachments 
> thereto or use of their contents by any means whatsoever is strictly 
> prohibited. If you have received this e-mail in error, please advise the 
> sender immediately and delete this e-mail and all attached documents from 
> 

Re: Replication and soft commits for NRT searches

2015-10-14 Thread MOIS Martin (MORPHO)
Hello,

thank you for the detailed answer.

If a timeout between shard leader and replica can lead to a smaller rf value 
(because replication has timed out), is it possible to increase this timeout in 
the configuration?

Best Regards,
Martin Mois

Comments inline:

On Mon, Oct 12, 2015 at 1:31 PM, MOIS Martin (MORPHO)
 wrote:
> Hello,
>
> I am running Solr 5.2.1 in a cluster with 6 nodes. My collections have been 
> created with
replicationFactor=2, i.e. I have one replica for each shard. Beyond that I am 
using autoCommit/maxDocs=1
and autoSoftCommits/maxDocs=1 in order to achieve near realtime search behavior.
>
> As far as I understand from section "Write Side Fault Tolerance" in the 
> documentation
(https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance),
 I
cannot enforce that an update gets replicated to all replicas, but I can only 
get the achieved
replication factor by requesting the return value rf.
>
> My question is now, what exactly does rf=2 mean? Does it only mean that the 
> replica has
written the update to its transaction log? Or has the replica also performed 
the soft commit
as configured with autoSoftCommits/maxDocs=1? The answer is important for me, 
as if the update
would only get written to the transaction log, I could not search for it 
reliable, as the
replica may not have added it to the searchable index.

rf=2 means that the update was successfully replicated to and
acknowledged by two replicas (including the leader). The rf only deals
with the durability of the update and has no relation to visibility of
the update to searchers. The auto(soft)commit settings are applied
asynchronously and do not block an update request.

>
> My second question is, does rf=1 mean that the update was definitely not 
> successful on
the replica or could it also represent a timeout of the replication request 
from the shard
leader? If it could also represent a timeout, then there would be a small 
chance that the
replication was successfully despite of the timeout.

Well, rf=1 implies that the update was only applied on the leader's
index + tlog and either replicas weren't available or returned an
error or the request timed out. So yes, you are right that it can
represent a timeout and as such there is a chance that the replication
was indeed successful despite of the timeout.

>
> Is there a way to retrieve the replication factor for a specific document 
> after the update
in order to check if replication was successful in the meantime?
>

No, there is no way to do that.

> Thanks in advance.
>
> Best Regards,
> Martin Mois
> #
> " This e-mail and any attached documents may contain confidential or 
> proprietary information.
If you are not the intended recipient, you are notified that any dissemination, 
copying of
this e-mail and any attachments thereto or use of their contents by any means 
whatsoever is
strictly prohibited. If you have received this e-mail in error, please advise 
the sender immediately
and delete this e-mail and all attached documents from your computer system."
> #



--
Regards,
Shalin Shekhar Mangar.

#
" This e-mail and any attached documents may contain confidential or 
proprietary information. If you are not the intended recipient, you are 
notified that any dissemination, copying of this e-mail and any attachments 
thereto or use of their contents by any means whatsoever is strictly 
prohibited. If you have received this e-mail in error, please advise the sender 
immediately and delete this e-mail and all attached documents from your 
computer system."
#


Re: Replication and soft commits for NRT searches

2015-10-13 Thread Shalin Shekhar Mangar
Comments inline:

On Mon, Oct 12, 2015 at 1:31 PM, MOIS Martin (MORPHO)
 wrote:
> Hello,
>
> I am running Solr 5.2.1 in a cluster with 6 nodes. My collections have been 
> created with replicationFactor=2, i.e. I have one replica for each shard. 
> Beyond that I am using autoCommit/maxDocs=1 and autoSoftCommits/maxDocs=1 
> in order to achieve near realtime search behavior.
>
> As far as I understand from section "Write Side Fault Tolerance" in the 
> documentation 
> (https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance),
>  I cannot enforce that an update gets replicated to all replicas, but I can 
> only get the achieved replication factor by requesting the return value rf.
>
> My question is now, what exactly does rf=2 mean? Does it only mean that the 
> replica has written the update to its transaction log? Or has the replica 
> also performed the soft commit as configured with autoSoftCommits/maxDocs=1? 
> The answer is important for me, as if the update would only get written to 
> the transaction log, I could not search for it reliable, as the replica may 
> not have added it to the searchable index.

rf=2 means that the update was successfully replicated to and
acknowledged by two replicas (including the leader). The rf only deals
with the durability of the update and has no relation to visibility of
the update to searchers. The auto(soft)commit settings are applied
asynchronously and do not block an update request.

>
> My second question is, does rf=1 mean that the update was definitely not 
> successful on the replica or could it also represent a timeout of the 
> replication request from the shard leader? If it could also represent a 
> timeout, then there would be a small chance that the replication was 
> successfully despite of the timeout.

Well, rf=1 implies that the update was only applied on the leader's
index + tlog and either replicas weren't available or returned an
error or the request timed out. So yes, you are right that it can
represent a timeout and as such there is a chance that the replication
was indeed successful despite of the timeout.

>
> Is there a way to retrieve the replication factor for a specific document 
> after the update in order to check if replication was successful in the 
> meantime?
>

No, there is no way to do that.

> Thanks in advance.
>
> Best Regards,
> Martin Mois
> #
> " This e-mail and any attached documents may contain confidential or 
> proprietary information. If you are not the intended recipient, you are 
> notified that any dissemination, copying of this e-mail and any attachments 
> thereto or use of their contents by any means whatsoever is strictly 
> prohibited. If you have received this e-mail in error, please advise the 
> sender immediately and delete this e-mail and all attached documents from 
> your computer system."
> #



-- 
Regards,
Shalin Shekhar Mangar.


Re: Replication and soft commits for NRT searches

2015-10-12 Thread Erick Erickson
First of all, setting soft commit with maxDocs=1 is almost (but not
quite) guaranteed to lead to problems. For _every_ document you add to
Solr, all your top-level caches (i.e. the ones configured in
solrconrig.xml) will be thrown away, all autowarming will be performed
etc. Essentially assuming a constant indexing load none of your
top-level caches are doing you any good.

This might help:
https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

By the time an indexing request returns, the document(s) have all been
forwarded to all replicas and indexed to the in-memory structures
_and_ written to the tlog. The next expiration of the soft commit
interval will allow them to be searched, assuming that autowarming is
completed.

I'm going to guess that you'll see a bunch of warnings like
"overlapping ondeck searchers" and you'll be tempted to set
maxWarmingSearchers to some number greater than 2 in solrconfig.xml. I
recommend against this too, that setting is there for a reason.

Do you have any evidence of a problem or is this theoretical?

All that said, I would _strongly_ urge you to revisit the requirement
of having your soft commit maxDocs set to 1.

Best,
Erick

On Mon, Oct 12, 2015 at 1:01 AM, MOIS Martin (MORPHO)
 wrote:
> Hello,
>
> I am running Solr 5.2.1 in a cluster with 6 nodes. My collections have been 
> created with replicationFactor=2, i.e. I have one replica for each shard. 
> Beyond that I am using autoCommit/maxDocs=1 and autoSoftCommits/maxDocs=1 
> in order to achieve near realtime search behavior.
>
> As far as I understand from section "Write Side Fault Tolerance" in the 
> documentation 
> (https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance),
>  I cannot enforce that an update gets replicated to all replicas, but I can 
> only get the achieved replication factor by requesting the return value rf.
>
> My question is now, what exactly does rf=2 mean? Does it only mean that the 
> replica has written the update to its transaction log? Or has the replica 
> also performed the soft commit as configured with autoSoftCommits/maxDocs=1? 
> The answer is important for me, as if the update would only get written to 
> the transaction log, I could not search for it reliable, as the replica may 
> not have added it to the searchable index.
>
> My second question is, does rf=1 mean that the update was definitely not 
> successful on the replica or could it also represent a timeout of the 
> replication request from the shard leader? If it could also represent a 
> timeout, then there would be a small chance that the replication was 
> successfully despite of the timeout.
>
> Is there a way to retrieve the replication factor for a specific document 
> after the update in order to check if replication was successful in the 
> meantime?
>
> Thanks in advance.
>
> Best Regards,
> Martin Mois
> #
> " This e-mail and any attached documents may contain confidential or 
> proprietary information. If you are not the intended recipient, you are 
> notified that any dissemination, copying of this e-mail and any attachments 
> thereto or use of their contents by any means whatsoever is strictly 
> prohibited. If you have received this e-mail in error, please advise the 
> sender immediately and delete this e-mail and all attached documents from 
> your computer system."
> #


Re: Replication Sync OR Async?

2015-09-09 Thread Shawn Heisey
On 9/8/2015 11:16 PM, Maulin Rathod wrote:
> When replicas are running it took around 900 seconds for indexing.
> After stopping replicas it took around 500 seconds for indexing.
> 
> Is the replication happens in Sync or Async?  If it is Sync, can we make it
> Async so that it will not affect indexing performance.

Running things in that way results in problems like SOLR-3284, which was
an issue long before SolrCloud arrived on the scene.  If you throw
processing to the background, it becomes very difficult to handle errors.

https://issues.apache.org/jira/browse/SOLR-3284

If you really don't care about knowing whether your indexing succeeded,
then something like this would be helpful.  Most users *do* want to know
whether their indexing succeeded.  Feel free to open an enhancement
issue in Jira, but please be aware of the disadvantages of what you're
asking for.

Handling errors correctly is not impossible with async operation, but it
is very challenging, prone to bugs, and may end up erasing any speed
advantage.

Thanks,
Shawn



RE: Replication Sync OR Async?

2015-09-09 Thread Maulin Rathod
Hi Shawn,

Thanks for reply. If we keep replication Async, Can error handling not work 
same like replica down scenario?

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: 09 September 2015 19:40
To: solr-user@lucene.apache.org
Subject: Re: Replication Sync OR Async?

On 9/8/2015 11:16 PM, Maulin Rathod wrote:
> When replicas are running it took around 900 seconds for indexing.
> After stopping replicas it took around 500 seconds for indexing.
> 
> Is the replication happens in Sync or Async?  If it is Sync, can we 
> make it Async so that it will not affect indexing performance.

Running things in that way results in problems like SOLR-3284, which was an 
issue long before SolrCloud arrived on the scene.  If you throw processing to 
the background, it becomes very difficult to handle errors.

https://issues.apache.org/jira/browse/SOLR-3284

If you really don't care about knowing whether your indexing succeeded, then 
something like this would be helpful.  Most users *do* want to know whether 
their indexing succeeded.  Feel free to open an enhancement issue in Jira, but 
please be aware of the disadvantages of what you're asking for.

Handling errors correctly is not impossible with async operation, but it is 
very challenging, prone to bugs, and may end up erasing any speed advantage.

Thanks,
Shawn



Re: replication and HDFS

2015-08-31 Thread Joseph Obernberger
Thank you Erick.  What about cache size?  If we add replicas to our 
cluster and each replica has nGBytes of RAM allocated for HDFS caching, 
would that help performance?  Specifically the performance we want to 
increase is time to facet data, time to cluster data and search time.  
While we index a lot of data (~4 million docs per day), we do not 
perform that many searches of the data (~250 searches per day).


-Joe

On 8/20/2015 4:21 PM, Erick Erickson wrote:

Yes. Maybe. It Depends (tm).

Details matter (tm).

If you're firing just a few QPS at the system, then improved
throughput by adding replicas is unlikely. OTOH, if you're firing lots
of simultaneous queries at Solr and are pegging the processors, then
adding replication will increase aggregate QPS.

If your soft commit interval is very short and you're not doing proper
warming, it won't help at all in all probability.

Replication in Solr is about increasing the number of instances
available to serve queries. The two types of replication (HDFS or
Solr) are really orthogonal, the first is about data integrity and the
second is about increasing the number of Solr nodes available to
service queries.

Best,
Erick

On Thu, Aug 20, 2015 at 9:23 AM, Joseph Obernberger
 wrote:

Hi - we currently have a multi-shard setup running solr cloud without
replication running on top of HDFS.  Does it make sense to use replication
when using HDFS?  Will we expect to see a performance increase in searches?
Thank you!

-Joe




Re: replication and HDFS

2015-08-31 Thread Erick Erickson
Yes, No, Maybe.

bq; Specifically the performance we want to increase is time to facet
data, time to cluster data and search time

Well, that about covers everything ;)

You cannot talk about this without also taking about cache warming. Given your
setup, I'm guessing you have very few searches on the same Solr
searcher. Every time
you commit (hard with openSearcher=true or soft), you get a new searcher and
your top-level caches are  thrown away. The next request in will not
have any benefit
from the caches unless you've also done autowarming, look at the
counts for filterCache,
queryResultsCache and the newSearch and firstSearcher events.

So talking about significantly increasing cache size is premature until you know
you _use_ the caches.

And don't go wild with the autowarm counts for your caches, start
quite low in the
20-30 range IMO.

You'll particularly want to make newSearcher searches that exercise
your faceting and
reference all the fields you care about at least once.


Best,
Erick

On Mon, Aug 31, 2015 at 12:41 PM, Joseph Obernberger
 wrote:
> Thank you Erick.  What about cache size?  If we add replicas to our cluster
> and each replica has nGBytes of RAM allocated for HDFS caching, would that
> help performance?  Specifically the performance we want to increase is time
> to facet data, time to cluster data and search time.  While we index a lot
> of data (~4 million docs per day), we do not perform that many searches of
> the data (~250 searches per day).
>
> -Joe
>
> On 8/20/2015 4:21 PM, Erick Erickson wrote:
>>
>> Yes. Maybe. It Depends (tm).
>>
>> Details matter (tm).
>>
>> If you're firing just a few QPS at the system, then improved
>> throughput by adding replicas is unlikely. OTOH, if you're firing lots
>> of simultaneous queries at Solr and are pegging the processors, then
>> adding replication will increase aggregate QPS.
>>
>> If your soft commit interval is very short and you're not doing proper
>> warming, it won't help at all in all probability.
>>
>> Replication in Solr is about increasing the number of instances
>> available to serve queries. The two types of replication (HDFS or
>> Solr) are really orthogonal, the first is about data integrity and the
>> second is about increasing the number of Solr nodes available to
>> service queries.
>>
>> Best,
>> Erick
>>
>> On Thu, Aug 20, 2015 at 9:23 AM, Joseph Obernberger
>>  wrote:
>>>
>>> Hi - we currently have a multi-shard setup running solr cloud without
>>> replication running on top of HDFS.  Does it make sense to use
>>> replication
>>> when using HDFS?  Will we expect to see a performance increase in
>>> searches?
>>> Thank you!
>>>
>>> -Joe
>
>


Re: replication and HDFS

2015-08-20 Thread Erick Erickson
Yes. Maybe. It Depends (tm).

Details matter (tm).

If you're firing just a few QPS at the system, then improved
throughput by adding replicas is unlikely. OTOH, if you're firing lots
of simultaneous queries at Solr and are pegging the processors, then
adding replication will increase aggregate QPS.

If your soft commit interval is very short and you're not doing proper
warming, it won't help at all in all probability.

Replication in Solr is about increasing the number of instances
available to serve queries. The two types of replication (HDFS or
Solr) are really orthogonal, the first is about data integrity and the
second is about increasing the number of Solr nodes available to
service queries.

Best,
Erick

On Thu, Aug 20, 2015 at 9:23 AM, Joseph Obernberger
j...@lovehorsepower.com wrote:
 Hi - we currently have a multi-shard setup running solr cloud without
 replication running on top of HDFS.  Does it make sense to use replication
 when using HDFS?  Will we expect to see a performance increase in searches?
 Thank you!

 -Joe


Re: Replication as backup in SolrCloud

2015-06-22 Thread Erick Erickson
Currently, one is best off treating these as two separate clusters and
having your client send the data to both, or reproducing your
system-of-record and running your DCs completely separately.

Hopefully soon, though, there'll be what you're asking for
active/passive DCs, see:
https://issues.apache.org/jira/browse/SOLR-6273

Best,
Erick

On Mon, Jun 22, 2015 at 10:16 AM, StrW_dev r.j.bamb...@structweb.nl wrote:
 Hi,

 I have a SolrCloud cluster in one data center, but as backup I want to have
 a second (replicated) cluster in another data center.

 What I want is to replicate to this second cluster, but I don't want my
 queries to go to this cluster. Is this possible within SolrCloud? As now it
 seems to replicate, but also distribute the query request to this replicated
 server.

 Gr



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Replication-as-backup-in-SolrCloud-tp4213267.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Replication for SolrCloud

2015-04-19 Thread gengmao
Thanks for the suggestion, Erick. However here what we need is not a patch,
is a clarification from practice perspective.

I think solr replication is a great feature to scale reads, and kind of
increase reliability. However, on HDFS it is not as useful as just
sharding. Sharding can scale both reads and writes at same time, and
doesn't have consistency concern along with replication. So I doubt Solr
replication on HDFS has real meanings?

I will try to reach out Mark Miller and will appreciate if he or anyone can
provide more convincing points on this.

Thanks,
Mao

On Sat, Apr 18, 2015 at 4:44 PM Erick Erickson erickerick...@gmail.com
wrote:

 AFAIK, the HDFS replication of Solr indexes isn't something that was
 designed, it just came along for the ride given HDFS replication.
 Having a shard with 1 leader and two followers have 9 copies of the
 index around _is_ overkill, nobody argues that at all.

 I know the folks at Cloudera (who contributed the original HDFS
 implementation) have discussed various options around this. In the
 grand scheme of things, there have been other priorities without
 tearing into the guts of Solr and/or HDFS since disk space is
 relatively cheap.

 That said, I'm also sure that this will get some attention as
 priorities change. All patches welcome of course ;), But if you're
 inclined to work on this issue, I'd _really_ discuss it with Mark
 Miller  etc. before investing too much effort in it. I don't quite
 know the tradeoffs well enough to have an opinion on the right
 implementation.

 Best
 Erick

 On Sat, Apr 18, 2015 at 1:59 AM, Shalin Shekhar Mangar
 shalinman...@gmail.com wrote:
  Some comments inline:
 
  On Sat, Apr 18, 2015 at 2:12 PM, gengmao geng...@gmail.com wrote:
 
  On Sat, Apr 18, 2015 at 12:20 AM Jürgen Wagner (DVT) 
  juergen.wag...@devoteam.com wrote:
 
Replication on the storage layer will provide a reliable storage for
 the
   index and other data of Solr. In particular, this replication does not
   guarantee your index files are consistent at any time as there may be
   intermediate states that are only partially replicated. Replication is
  only
   a convergent process, not an instant, atomic operation. With frequent
   changes, this becomes an issue.
  
  Firstly thanks for your reply. However I can't agree with you on this.
  HDFS guarantees the consistency even with replicates - you always read
 what
  you write, no partially replicated state will be read, which is
 guaranteed
  by HDFS server and client. Hence HBase can rely on HDFS for consistency
 and
  availability, without implementing another replication mechanism - if I
  understand correctly.
 
 
  Lucene index is not one file but a collection of files which are written
  independently. So if you replicate them out of order, Lucene might
 consider
  the index as corrupted (because of missing files). I don't think HBase
  works in that way.
 
 
 
   Replication inside SolrCloud as an application will not only maintain
 the
   consistency of the search-level interfaces to your indexes, but also
  scale
   in the sense of the application (query throughput).
  
   Split one shard into two shards can increase the query throughput too.
 
 
   Imagine a database: if you change one record, this may also result in
 an
   index change. If the record and the index are stored in different
 storage
   blocks, one will get replicated first. However, the replication target
  will
   only be consistent again when both have been replicated. So, you would
  have
   to suspend all accesses until the entire replication has completed.
  That's
   undesirable. If you replicate on the application (database management
   system) level, the application will employ a more fine-grained
 approach
  to
   replication, guaranteeing application consistency.
  
  In HBase, a region only locates on single region server at any time,
 which
  guarantee its consistency. Because your read/write always drops in one
  region, you won't have concern of parallel writes happens on multiple
  replicates of same region.
  The replication of HDFS is totally transparent to HBase. When a HDFS
 write
  call returns, HBase know the data is written and replicated so losing
 one
  copy of the data won't impact HBase at all.
  So HDFS means consistency and reliability for HBase. However, HBase
 doesn't
  use replicates (either HBase itself or HDFS's) to scale reads. If one
  region's is too hot for reads or write, you split that region into two
  regions, so that the reads and writes of that region can be distributed
  into two region servers. Hence HBase scales.
  I think this is the simplicity and beauty of HBase. Again, I am curious
 if
  SolrCloud has better reason to use replication on HDFS? As I described,
  HDFS provided consistency and reliability, meanwhile scalability can be
  achieved via sharding, even without Solr replication.
 
 
  That's something that has been considered and may even be in the roadmap
  for the 

Re: Replication for SolrCloud

2015-04-19 Thread juergen.wag...@devoteam.com
In simple words:

HDFS is good for file-oriented replication. Solr is good for index replication.

Consequently, if atomic file update operations of an application (like Solr) 
are not atomic on a file level, HDFS is not adequate - like for Solr with live 
index updates. Running Solr on HDFS (as a file system) will pose limitations 
due to HDFS properties. Indexing, however, still won't use Hadoop.

If you produce indexes and distribute them as finalized, read-only structures 
(e.g., through Hadoop jobs), HDFS is fine. Solr does not need to be much aware 
of HDFS.

The third one in the picture is records-based replication to be handled by 
Hbase, Cassandra or Zookeeper, depending on requirements.

Cheers,
Jürgen

Re: Replication for SolrCloud

2015-04-19 Thread gengmao
Please see my response in line:

On Fri, Apr 17, 2015 at 10:59 PM Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 Some comments inline:

 On Sat, Apr 18, 2015 at 2:12 PM, gengmao geng...@gmail.com wrote:

  On Sat, Apr 18, 2015 at 12:20 AM Jürgen Wagner (DVT) 
  juergen.wag...@devoteam.com wrote:
 
Replication on the storage layer will provide a reliable storage for
 the
   index and other data of Solr. In particular, this replication does not
   guarantee your index files are consistent at any time as there may be
   intermediate states that are only partially replicated. Replication is
  only
   a convergent process, not an instant, atomic operation. With frequent
   changes, this becomes an issue.
  
  Firstly thanks for your reply. However I can't agree with you on this.
  HDFS guarantees the consistency even with replicates - you always read
 what
  you write, no partially replicated state will be read, which is
 guaranteed
  by HDFS server and client. Hence HBase can rely on HDFS for consistency
 and
  availability, without implementing another replication mechanism - if I
  understand correctly.
 
 
 Lucene index is not one file but a collection of files which are written
 independently. So if you replicate them out of order, Lucene might consider
 the index as corrupted (because of missing files). I don't think HBase
 works in that way.

Again HDFS replication is transparent to HBase. You can set HDFS
replication factor to 1 and HBase will still work, but it will lose the
fault tolerance to any disk failure which is provided by HDFS replicates.
Also HBase doesn't directly utilize HDFS replicates. Increase HDFS
replication factors won't improve HBase's scalability. To achieve better
read/write throughput, split shards is the only approach.



 
   Replication inside SolrCloud as an application will not only maintain
 the
   consistency of the search-level interfaces to your indexes, but also
  scale
   in the sense of the application (query throughput).
  
   Split one shard into two shards can increase the query throughput too.
 
 
   Imagine a database: if you change one record, this may also result in
 an
   index change. If the record and the index are stored in different
 storage
   blocks, one will get replicated first. However, the replication target
  will
   only be consistent again when both have been replicated. So, you would
  have
   to suspend all accesses until the entire replication has completed.
  That's
   undesirable. If you replicate on the application (database management
   system) level, the application will employ a more fine-grained approach
  to
   replication, guaranteeing application consistency.
  
  In HBase, a region only locates on single region server at any time,
 which
  guarantee its consistency. Because your read/write always drops in one
  region, you won't have concern of parallel writes happens on multiple
  replicates of same region.
  The replication of HDFS is totally transparent to HBase. When a HDFS
 write
  call returns, HBase know the data is written and replicated so losing one
  copy of the data won't impact HBase at all.
  So HDFS means consistency and reliability for HBase. However, HBase
 doesn't
  use replicates (either HBase itself or HDFS's) to scale reads. If one
  region's is too hot for reads or write, you split that region into two
  regions, so that the reads and writes of that region can be distributed
  into two region servers. Hence HBase scales.
  I think this is the simplicity and beauty of HBase. Again, I am curious
 if
  SolrCloud has better reason to use replication on HDFS? As I described,
  HDFS provided consistency and reliability, meanwhile scalability can be
  achieved via sharding, even without Solr replication.
 
 
 That's something that has been considered and may even be in the roadmap
 for the Cloudera guys. See https://issues.apache.org/jira/browse/SOLR-6237

 But one problem that isn't solved by HDFS replication is of near-real-time
 indexing where you want the documents to be available for searchers as fast
 as possible. SolrCloud replication supports that by replicating documents
 as they come in and indexing them in several replicas. A new index searcher
 is opened on the flushed index files as well as on the internal data
 structures of the index writer. If we switch to relying on HDFS replication
 then this will be awfully expensive. However, as Jürgen mentioned, HDFS can
 certainly help with replicating static indexes

My understanding is near-real-time indexing is not necessary to rely on
replication.
https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
just describes soft commit but doesn't mention replication. Also the
Cloudera Search, which is Solr based on HDFS, claims near-real-time
indexing however doesn't mention replication too. Quote from
http://www.cloudera.com/content/cloudera/en/documentation/cloudera-search/v1-latest/Cloudera-Search-User-Guide/csug_introducing.html
:

Re: Replication for SolrCloud

2015-04-18 Thread Erick Erickson
AFAIK, the HDFS replication of Solr indexes isn't something that was
designed, it just came along for the ride given HDFS replication.
Having a shard with 1 leader and two followers have 9 copies of the
index around _is_ overkill, nobody argues that at all.

I know the folks at Cloudera (who contributed the original HDFS
implementation) have discussed various options around this. In the
grand scheme of things, there have been other priorities without
tearing into the guts of Solr and/or HDFS since disk space is
relatively cheap.

That said, I'm also sure that this will get some attention as
priorities change. All patches welcome of course ;), But if you're
inclined to work on this issue, I'd _really_ discuss it with Mark
Miller  etc. before investing too much effort in it. I don't quite
know the tradeoffs well enough to have an opinion on the right
implementation.

Best
Erick

On Sat, Apr 18, 2015 at 1:59 AM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 Some comments inline:

 On Sat, Apr 18, 2015 at 2:12 PM, gengmao geng...@gmail.com wrote:

 On Sat, Apr 18, 2015 at 12:20 AM Jürgen Wagner (DVT) 
 juergen.wag...@devoteam.com wrote:

   Replication on the storage layer will provide a reliable storage for the
  index and other data of Solr. In particular, this replication does not
  guarantee your index files are consistent at any time as there may be
  intermediate states that are only partially replicated. Replication is
 only
  a convergent process, not an instant, atomic operation. With frequent
  changes, this becomes an issue.
 
 Firstly thanks for your reply. However I can't agree with you on this.
 HDFS guarantees the consistency even with replicates - you always read what
 you write, no partially replicated state will be read, which is guaranteed
 by HDFS server and client. Hence HBase can rely on HDFS for consistency and
 availability, without implementing another replication mechanism - if I
 understand correctly.


 Lucene index is not one file but a collection of files which are written
 independently. So if you replicate them out of order, Lucene might consider
 the index as corrupted (because of missing files). I don't think HBase
 works in that way.



  Replication inside SolrCloud as an application will not only maintain the
  consistency of the search-level interfaces to your indexes, but also
 scale
  in the sense of the application (query throughput).
 
  Split one shard into two shards can increase the query throughput too.


  Imagine a database: if you change one record, this may also result in an
  index change. If the record and the index are stored in different storage
  blocks, one will get replicated first. However, the replication target
 will
  only be consistent again when both have been replicated. So, you would
 have
  to suspend all accesses until the entire replication has completed.
 That's
  undesirable. If you replicate on the application (database management
  system) level, the application will employ a more fine-grained approach
 to
  replication, guaranteeing application consistency.
 
 In HBase, a region only locates on single region server at any time, which
 guarantee its consistency. Because your read/write always drops in one
 region, you won't have concern of parallel writes happens on multiple
 replicates of same region.
 The replication of HDFS is totally transparent to HBase. When a HDFS write
 call returns, HBase know the data is written and replicated so losing one
 copy of the data won't impact HBase at all.
 So HDFS means consistency and reliability for HBase. However, HBase doesn't
 use replicates (either HBase itself or HDFS's) to scale reads. If one
 region's is too hot for reads or write, you split that region into two
 regions, so that the reads and writes of that region can be distributed
 into two region servers. Hence HBase scales.
 I think this is the simplicity and beauty of HBase. Again, I am curious if
 SolrCloud has better reason to use replication on HDFS? As I described,
 HDFS provided consistency and reliability, meanwhile scalability can be
 achieved via sharding, even without Solr replication.


 That's something that has been considered and may even be in the roadmap
 for the Cloudera guys. See https://issues.apache.org/jira/browse/SOLR-6237

 But one problem that isn't solved by HDFS replication is of near-real-time
 indexing where you want the documents to be available for searchers as fast
 as possible. SolrCloud replication supports that by replicating documents
 as they come in and indexing them in several replicas. A new index searcher
 is opened on the flushed index files as well as on the internal data
 structures of the index writer. If we switch to relying on HDFS replication
 then this will be awfully expensive. However, as Jürgen mentioned, HDFS can
 certainly help with replicating static indexes.



  Consequently, HDFS will allow you to scale storage and possibly even
  replicate static indexes that won't 

Re: Replication for SolrCloud

2015-04-18 Thread gengmao
On Sat, Apr 18, 2015 at 12:20 AM Jürgen Wagner (DVT) 
juergen.wag...@devoteam.com wrote:

  Replication on the storage layer will provide a reliable storage for the
 index and other data of Solr. In particular, this replication does not
 guarantee your index files are consistent at any time as there may be
 intermediate states that are only partially replicated. Replication is only
 a convergent process, not an instant, atomic operation. With frequent
 changes, this becomes an issue.

Firstly thanks for your reply. However I can't agree with you on this.
HDFS guarantees the consistency even with replicates - you always read what
you write, no partially replicated state will be read, which is guaranteed
by HDFS server and client. Hence HBase can rely on HDFS for consistency and
availability, without implementing another replication mechanism - if I
understand correctly.


 Replication inside SolrCloud as an application will not only maintain the
 consistency of the search-level interfaces to your indexes, but also scale
 in the sense of the application (query throughput).

 Split one shard into two shards can increase the query throughput too.


 Imagine a database: if you change one record, this may also result in an
 index change. If the record and the index are stored in different storage
 blocks, one will get replicated first. However, the replication target will
 only be consistent again when both have been replicated. So, you would have
 to suspend all accesses until the entire replication has completed. That's
 undesirable. If you replicate on the application (database management
 system) level, the application will employ a more fine-grained approach to
 replication, guaranteeing application consistency.

In HBase, a region only locates on single region server at any time, which
guarantee its consistency. Because your read/write always drops in one
region, you won't have concern of parallel writes happens on multiple
replicates of same region.
The replication of HDFS is totally transparent to HBase. When a HDFS write
call returns, HBase know the data is written and replicated so losing one
copy of the data won't impact HBase at all.
So HDFS means consistency and reliability for HBase. However, HBase doesn't
use replicates (either HBase itself or HDFS's) to scale reads. If one
region's is too hot for reads or write, you split that region into two
regions, so that the reads and writes of that region can be distributed
into two region servers. Hence HBase scales.
I think this is the simplicity and beauty of HBase. Again, I am curious if
SolrCloud has better reason to use replication on HDFS? As I described,
HDFS provided consistency and reliability, meanwhile scalability can be
achieved via sharding, even without Solr replication.


 Consequently, HDFS will allow you to scale storage and possibly even
 replicate static indexes that won't change, but it won't help much with
 live index replication. That's where SolrCloud jumps in.


 Cheers,
 --Jürgen


 On 18.04.2015 08:44, gengmao wrote:

 I wonder why need to use SolrCloud replication on HDFS at all, given HDFS
 already provides replication and availability? The way to optimize
 performance and scalability should be tweaking shards, just like tweaking
 regions on HBase - which doesn't provide region replication too, isn't
 it?

 I have this question for a while and I didn't find clear answer about it.
 Could some experts please explain a bit?

 Best regards,
 Mao Geng





 --

 Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
 уважением
 *i.A. Jürgen Wagner*
 Head of Competence Center Intelligence
  Senior Cloud Consultant

 Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
 Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
 E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de
 --
 Managing Board: Jürgen Hatzipantelis (CEO)
 Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
 Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071





Re: Replication for SolrCloud

2015-04-18 Thread gengmao
I wonder why need to use SolrCloud replication on HDFS at all, given HDFS
already provides replication and availability? The way to optimize
performance and scalability should be tweaking shards, just like tweaking
regions on HBase - which doesn't provide region replication too, isn't
it?

I have this question for a while and I didn't find clear answer about it.
Could some experts please explain a bit?

Best regards,
Mao Geng

On Thu, Apr 9, 2015 at 8:41 AM Erick Erickson erickerick...@gmail.com
wrote:

 Yes. 3 replicas and an HDFS replication factor of 3 means 9 copies of
 the index are laying around. You can change your HDFS replication
 factor, but that affects other applications using HDFS, so that may
 not be an option.

 Best,
 Erick

 On Thu, Apr 9, 2015 at 2:31 AM, Vijaya Narayana Reddy Bhoomi Reddy
 vijaya.bhoomire...@whishworks.com wrote:
  Hi,
 
  Can anyone please tell me how does shard replication work when the
 indexes
  are stored in HDFS? i..e with HDFS, the default replication factor is 3.
  Now, for the Solr shards, if I set the replication factor to 3 again,
 does
  that mean, internally index data is replicated thrice and then HDFS
  replication works on top it again and duplicates the data across HDFS
  cluster?
 
 
  Thanks  Regards
  Vijay
 
  --
  The contents of this e-mail are confidential and for the exclusive use of
  the intended recipient. If you receive this e-mail in error please delete
  it from your system immediately and notify us either by e-mail or
  telephone. You should not copy, forward or otherwise disclose the content
  of the e-mail. The views expressed in this communication may not
  necessarily be the view held by WHISHWORKS.



Re: Replication for SolrCloud

2015-04-18 Thread Shalin Shekhar Mangar
Some comments inline:

On Sat, Apr 18, 2015 at 2:12 PM, gengmao geng...@gmail.com wrote:

 On Sat, Apr 18, 2015 at 12:20 AM Jürgen Wagner (DVT) 
 juergen.wag...@devoteam.com wrote:

   Replication on the storage layer will provide a reliable storage for the
  index and other data of Solr. In particular, this replication does not
  guarantee your index files are consistent at any time as there may be
  intermediate states that are only partially replicated. Replication is
 only
  a convergent process, not an instant, atomic operation. With frequent
  changes, this becomes an issue.
 
 Firstly thanks for your reply. However I can't agree with you on this.
 HDFS guarantees the consistency even with replicates - you always read what
 you write, no partially replicated state will be read, which is guaranteed
 by HDFS server and client. Hence HBase can rely on HDFS for consistency and
 availability, without implementing another replication mechanism - if I
 understand correctly.


Lucene index is not one file but a collection of files which are written
independently. So if you replicate them out of order, Lucene might consider
the index as corrupted (because of missing files). I don't think HBase
works in that way.



  Replication inside SolrCloud as an application will not only maintain the
  consistency of the search-level interfaces to your indexes, but also
 scale
  in the sense of the application (query throughput).
 
  Split one shard into two shards can increase the query throughput too.


  Imagine a database: if you change one record, this may also result in an
  index change. If the record and the index are stored in different storage
  blocks, one will get replicated first. However, the replication target
 will
  only be consistent again when both have been replicated. So, you would
 have
  to suspend all accesses until the entire replication has completed.
 That's
  undesirable. If you replicate on the application (database management
  system) level, the application will employ a more fine-grained approach
 to
  replication, guaranteeing application consistency.
 
 In HBase, a region only locates on single region server at any time, which
 guarantee its consistency. Because your read/write always drops in one
 region, you won't have concern of parallel writes happens on multiple
 replicates of same region.
 The replication of HDFS is totally transparent to HBase. When a HDFS write
 call returns, HBase know the data is written and replicated so losing one
 copy of the data won't impact HBase at all.
 So HDFS means consistency and reliability for HBase. However, HBase doesn't
 use replicates (either HBase itself or HDFS's) to scale reads. If one
 region's is too hot for reads or write, you split that region into two
 regions, so that the reads and writes of that region can be distributed
 into two region servers. Hence HBase scales.
 I think this is the simplicity and beauty of HBase. Again, I am curious if
 SolrCloud has better reason to use replication on HDFS? As I described,
 HDFS provided consistency and reliability, meanwhile scalability can be
 achieved via sharding, even without Solr replication.


That's something that has been considered and may even be in the roadmap
for the Cloudera guys. See https://issues.apache.org/jira/browse/SOLR-6237

But one problem that isn't solved by HDFS replication is of near-real-time
indexing where you want the documents to be available for searchers as fast
as possible. SolrCloud replication supports that by replicating documents
as they come in and indexing them in several replicas. A new index searcher
is opened on the flushed index files as well as on the internal data
structures of the index writer. If we switch to relying on HDFS replication
then this will be awfully expensive. However, as Jürgen mentioned, HDFS can
certainly help with replicating static indexes.



  Consequently, HDFS will allow you to scale storage and possibly even
  replicate static indexes that won't change, but it won't help much with
  live index replication. That's where SolrCloud jumps in.
 

  Cheers,
  --Jürgen
 
 
  On 18.04.2015 08:44, gengmao wrote:
 
  I wonder why need to use SolrCloud replication on HDFS at all, given HDFS
  already provides replication and availability? The way to optimize
  performance and scalability should be tweaking shards, just like tweaking
  regions on HBase - which doesn't provide region replication too, isn't
  it?
 
  I have this question for a while and I didn't find clear answer about it.
  Could some experts please explain a bit?
 
  Best regards,
  Mao Geng
 
 
 
 
 
  --
 
  Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
  уважением
  *i.A. Jürgen Wagner*
  Head of Competence Center Intelligence
   Senior Cloud Consultant
 
  Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
  Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864
 1543
  E-Mail: juergen.wag...@devoteam.com, URL: 

Re: Replication for SolrCloud

2015-04-09 Thread Erick Erickson
Yes. 3 replicas and an HDFS replication factor of 3 means 9 copies of
the index are laying around. You can change your HDFS replication
factor, but that affects other applications using HDFS, so that may
not be an option.

Best,
Erick

On Thu, Apr 9, 2015 at 2:31 AM, Vijaya Narayana Reddy Bhoomi Reddy
vijaya.bhoomire...@whishworks.com wrote:
 Hi,

 Can anyone please tell me how does shard replication work when the indexes
 are stored in HDFS? i..e with HDFS, the default replication factor is 3.
 Now, for the Solr shards, if I set the replication factor to 3 again, does
 that mean, internally index data is replicated thrice and then HDFS
 replication works on top it again and duplicates the data across HDFS
 cluster?


 Thanks  Regards
 Vijay

 --
 The contents of this e-mail are confidential and for the exclusive use of
 the intended recipient. If you receive this e-mail in error please delete
 it from your system immediately and notify us either by e-mail or
 telephone. You should not copy, forward or otherwise disclose the content
 of the e-mail. The views expressed in this communication may not
 necessarily be the view held by WHISHWORKS.


Re: Replication of a corrupt master index

2014-12-02 Thread Erick Erickson
No. The master is the master and will always stay the master
unless you change it. This is one of the reasons I really like
to keep the original source around in case I every have this
problem.

Best,
Erick

On Tue, Dec 2, 2014 at 2:34 AM, Charra, Johannes
johannes.charrahorstm...@haufe-lexware.com wrote:

 Hi,

 If I have a master/slave setup and the master index gets corrupted, will the 
 slaves realize they should not replicate from the master anymore, since the 
 master does not have a newer index version?

 I'm using Solr version 4.2.1.

 Regards,
 Johannes




Re: Replication of a corrupt master index

2014-12-02 Thread Erick Erickson
If nothing else, the disk underlying the index could have a bad spot...

There have been some corrupt index bugs in the past, but they always
get a super-high priority for fixing so don't hang around for long.

You can always take periodic backups. Perhaps the slickest way to do that
is to set up a slave that does nothing but poll once/day. Since you know
that's not changing, you can do simple disk copies of the index and at least
minimize your possible outage.

Now, all that said you may wan to consider SolrCloud. The advantage there is
that each node gets the raw input and very rarely does replication. Failover
is as simple in that scenario as killing the bad node and things just work.

Best,
Erick

On Tue, Dec 2, 2014 at 7:40 AM, Charra, Johannes
johannes.charrahorstm...@haufe-lexware.com wrote:
 Thanks for your response, Erick.

 Do you think it is possible to corrupt an index merely with HTTP requests? 
 I've been using the aforementioned m/s setup for years now and have never 
 seen a master failure.

 I'm trying to think of scenarios where this setup (1 master, 4 slaves) might 
 have a total outage. The master runs on a h/a cluster.

 Regards,
 Johannes

 -Ursprüngliche Nachricht-
 Von: Erick Erickson [mailto:erickerick...@gmail.com]
 Gesendet: Dienstag, 2. Dezember 2014 15:54
 An: solr-user@lucene.apache.org
 Betreff: Re: Replication of a corrupt master index

 No. The master is the master and will always stay the master unless you 
 change it. This is one of the reasons I really like to keep the original 
 source around in case I every have this problem.

 Best,
 Erick

 On Tue, Dec 2, 2014 at 2:34 AM, Charra, Johannes 
 johannes.charrahorstm...@haufe-lexware.com wrote:

 Hi,

 If I have a master/slave setup and the master index gets corrupted, will the 
 slaves realize they should not replicate from the master anymore, since the 
 master does not have a newer index version?

 I'm using Solr version 4.2.1.

 Regards,
 Johannes




Re: Replication of full index to replica after merge index into leader not working

2014-08-19 Thread Mark Miller
I’d just file a JIRA. Merge, like optimize and a few other things, were never 
tested or considered in early SolrCloud days. It’s used in the HDFS stuff, but 
in that case, the index is merged to all replicas and no recovery is necessary.

If you want to make the local filesystem merge work well with SolrCloud, sounds 
like we should write a test and make it work.

--  
Mark Miller
about.me/markrmiller

On August 19, 2014 at 1:20:54 PM, Timothy Potter (thelabd...@gmail.com) wrote:
 Hi,
  
 Using the coreAdmin mergeindexes command to merge an index into a
 leader (SolrCloud mode on 4.9.0) and the replica does not do a snap
 pull from the leader as I would have expected. The merge into the
 leader worked like a charm except I had to send a hard commit after
 that (which makes sense).
  
 I'm guessing the replica would snap pull from the leader if I
 restarted it, but reloading the collection or core does not trigger
 the replica to pull from the leader. This seems like an oversight in
 the mergeindex interaction with SolrCloud. Seems like the simplest
 would be for the leader to send all replicas a request recovery
 command after performing the merge.
  
 Advice?
  
 Cheers,
 Tim
  



Re: Replication of full index to replica after merge index into leader not working

2014-08-19 Thread Mark Miller

On August 19, 2014 at 1:33:10 PM, Mark Miller (markrmil...@gmail.com) wrote:
  sounds like we should write a test and make it work.

Keeping in mind that when using a shared filesystem like HDFS or especially if 
using the MapReduce contrib, you probably won’t want this new behavior.

-- 
Mark Miller
about.me/markrmiller


Re: Replication of full index to replica after merge index into leader not working

2014-08-19 Thread Timothy Potter
Was able to get around it for now sending the REQUESTRECOVERY command
to the replica. Will open an improvement JIRA but not sure if it's
worth it as the work-around is pretty clean (IMO).

Tim

On Tue, Aug 19, 2014 at 5:33 PM, Mark Miller markrmil...@gmail.com wrote:
 I’d just file a JIRA. Merge, like optimize and a few other things, were never 
 tested or considered in early SolrCloud days. It’s used in the HDFS stuff, 
 but in that case, the index is merged to all replicas and no recovery is 
 necessary.

 If you want to make the local filesystem merge work well with SolrCloud, 
 sounds like we should write a test and make it work.

 --
 Mark Miller
 about.me/markrmiller

 On August 19, 2014 at 1:20:54 PM, Timothy Potter (thelabd...@gmail.com) wrote:
 Hi,

 Using the coreAdmin mergeindexes command to merge an index into a
 leader (SolrCloud mode on 4.9.0) and the replica does not do a snap
 pull from the leader as I would have expected. The merge into the
 leader worked like a charm except I had to send a hard commit after
 that (which makes sense).

 I'm guessing the replica would snap pull from the leader if I
 restarted it, but reloading the collection or core does not trigger
 the replica to pull from the leader. This seems like an oversight in
 the mergeindex interaction with SolrCloud. Seems like the simplest
 would be for the leader to send all replicas a request recovery
 command after performing the merge.

 Advice?

 Cheers,
 Tim




RE: Replication Issue with Repeater Please help

2014-08-16 Thread waqas sarwar


 Date: Thu, 14 Aug 2014 06:51:02 -0600
 From: s...@elyograg.org
 To: solr-user@lucene.apache.org
 Subject: Re: Replication Issue with Repeater Please help
 
 On 8/14/2014 2:09 AM, waqas sarwar wrote:
  Thanks Shawn. What i got is Circular replication is totally impossible  
  Solr fails in distributed environment. Then why solr documentation says 
  that configure REPEATER for distributed architecture, because REPEATER 
  behave like master-slave at a time.
  Can i configure SolrCloud on LAN, or i've to configure zookeeper myself. 
  Please provide me any solution for LAN distributed servers. If zookeeper in 
  only solution then provide me any link to configure it that can help me  
  to avoid wrong direction.
 
 The repeater config is designed to avoid master overload from many
 slaves.  So instead of configuring ten slaves to replicate from one
 master, you configure two slaves to replicate directly from your master,
 and then you configure those as repeaters.  The other eight slaves are
 configured so that four of them replicate from each of the repeaters
 instead of the true master, reducing the load.
 
 SolrCloud is the easiest way to build a fully distributed and redundant
 solution.  It is designed for a LAN.  You configure three machines as
 your zookeeper ensemble, using the zookeeper download and instructions
 for a clustered setup:
 
 http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#sc_zkMulitServerSetup
 
 The way to start Solr in cloud mode is to give it a zkHost system
 property.  That informs Solr about all of your ZK servers.  If you have
 another way of setting that property, you can use that instead.  I
 strongly recommend using a chroot with the zkHost parameter, but that is
 not required.  Search the zookeeper page linked above for chroot to
 find a link to additional documentation about chroot.
 
 You can use the same servers for ZK as you do for Solr, but be aware
 that if Solr puts a large I/O load on the disks, you may want the ZK
 database to be on its own disks(s) so that it responds quickly. 
 Separate servers is even better, but not strictly required unless the
 servers are under extreme load.
 
 https://cwiki.apache.org/confluence/display/solr/SolrCloud
 
 You will find a Getting Started link on the page above.  Note that the
 Getting Started page talks about a zkRun option, which starts an
 embedded zookeeper as part of Solr.  I strongly recommend that you do
 NOT take this route, except for *initial* testing.  SolrCloud works much
 better if the Zookeeper ensemble is in its own process, separate from Solr.
 
 Thanks,
 Shawn
 
 Thank you so much. You helped alot. One more question is that can i use only 
 one zookeeper server to manage 3 solr servers, or i've to configure 3 
 zookeeper servers for each. And zookeeper servers should be stand alone or 
 better to use same solr server machine ?Best Regards,Waqas 


Re: Replication Issue with Repeater Please help

2014-08-16 Thread Erick Erickson
It Depends (tm).

 One ZooKeeper is a single point of failure. It goes away and your SolrCloud 
 cluster is kinda hosed. OTOH, with only 3 servers, the chance that one of 
 them is going down is low anyway. How lucky do you feel?

 I would be cautious about running your ZK instances embedded, 
 super-especially if there's only one ZK instance. That couples your ZK 
 instances with your Solr instances. So if for any reason you want  to 
 stop/start Solr, you will stop/start ZK as well and it's easy to fall below a 
 quorum. It's perfectly viable to run them embedded, especially on a very 
 small cluster. You do have to think a bit more about sequencing Solr nodes 
 going up/down is all.

Best,
Erick

On Sat, Aug 16, 2014 at 7:11 AM, waqas sarwar waqassarwa...@hotmail.com wrote:


 Date: Thu, 14 Aug 2014 06:51:02 -0600
 From: s...@elyograg.org
 To: solr-user@lucene.apache.org
 Subject: Re: Replication Issue with Repeater Please help

 On 8/14/2014 2:09 AM, waqas sarwar wrote:
  Thanks Shawn. What i got is Circular replication is totally impossible  
  Solr fails in distributed environment. Then why solr documentation says 
  that configure REPEATER for distributed architecture, because REPEATER 
  behave like master-slave at a time.
  Can i configure SolrCloud on LAN, or i've to configure zookeeper myself. 
  Please provide me any solution for LAN distributed servers. If zookeeper 
  in only solution then provide me any link to configure it that can help me 
   to avoid wrong direction.

 The repeater config is designed to avoid master overload from many
 slaves.  So instead of configuring ten slaves to replicate from one
 master, you configure two slaves to replicate directly from your master,
 and then you configure those as repeaters.  The other eight slaves are
 configured so that four of them replicate from each of the repeaters
 instead of the true master, reducing the load.

 SolrCloud is the easiest way to build a fully distributed and redundant
 solution.  It is designed for a LAN.  You configure three machines as
 your zookeeper ensemble, using the zookeeper download and instructions
 for a clustered setup:

 http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#sc_zkMulitServerSetup

 The way to start Solr in cloud mode is to give it a zkHost system
 property.  That informs Solr about all of your ZK servers.  If you have
 another way of setting that property, you can use that instead.  I
 strongly recommend using a chroot with the zkHost parameter, but that is
 not required.  Search the zookeeper page linked above for chroot to
 find a link to additional documentation about chroot.

 You can use the same servers for ZK as you do for Solr, but be aware
 that if Solr puts a large I/O load on the disks, you may want the ZK
 database to be on its own disks(s) so that it responds quickly.
 Separate servers is even better, but not strictly required unless the
 servers are under extreme load.

 https://cwiki.apache.org/confluence/display/solr/SolrCloud

 You will find a Getting Started link on the page above.  Note that the
 Getting Started page talks about a zkRun option, which starts an
 embedded zookeeper as part of Solr.  I strongly recommend that you do
 NOT take this route, except for *initial* testing.  SolrCloud works much
 better if the Zookeeper ensemble is in its own process, separate from Solr.

 Thanks,
 Shawn

 Thank you so much. You helped alot. One more question is that can i use only 
 one zookeeper server to manage 3 solr servers, or i've to configure 3 
 zookeeper servers for each. And zookeeper servers should be stand alone or 
 better to use same solr server machine ?Best Regards,Waqas


Re: Replication Issue with Repeater Please help

2014-08-16 Thread Shawn Heisey
On 8/16/2014 8:11 AM, waqas sarwar wrote:
 Thank you so much. You helped alot. One more question is that can i use only 
 one zookeeper server to manage 3 solr servers, or i've to configure 3 
 zookeeper servers for each. And zookeeper servers should be stand alone or 
 better to use same solr server machine ?Best Regards,Waqas


I think Erick basically said the same thing as this, in a slightly
different way:

If you want zookeeper to be fault tolerant, you must have at least three
servers running it.  One zookeeper will work, but if it goes down,
SolrCloud doesn't function properly.  Three are needed for full
redundancy.  If one of the three goes down, the other two will still
function as a quorum.

You can use the same servers for Zookeeper and Solr.  This *can* be a
source of performance problems, but that will usually only be a problem
if you put a major load on your SolrCloud.  If you do put them on the
same server, I would recommend putting the zk database on a separate
disk or disks -- the CPU requirements for Zookeeper are very small, but
it relies on extremely responsive I/O to/from its database.

As Erick said, we strongly recommend that you don't use the embedded ZK
-- this starts up a zookeeper server in the same Java process as Solr.
If Solr is stopped or goes down, you also lose zookeeper.

Thanks,
Shawn



RE: Replication Issue with Repeater Please help

2014-08-14 Thread waqas sarwar


 Date: Wed, 13 Aug 2014 07:19:58 -0600
 From: s...@elyograg.org
 To: solr-user@lucene.apache.org
 Subject: Re: Replication Issue with Repeater Please help
 
 On 8/13/2014 12:49 AM, waqas sarwar wrote:
  Hi, I'm using Solr. I need a little bit assistance from you. I am 
  bit stuck with Solr replication, before discussing issue let me write a 
  brief description.Scenario:- I want to set up solr in distributed 
  architecture, suppose start with least no of nodes (suppose 3), how can i 
  replicate data of each node to 2 others and vice versa.My Solution:- I 
  set up “REPEATER” on all nodes, each node is master to other, and 
  configured circular replication. Issue i'm facing:- All nodes are 
  working fine replicating data from other node, but when node1 replicate 
  data from node2, node1 loses its own data. I think node1 don’t have to 
  atleast lose its own data  have to merge new data. I think now question is 
  pretty simple and clear, I want to set up solr in distributed architecture, 
  each node is replica to other, how may i achieve it. Is there be any other 
  way except Repeater and circular replication using repeater, to replicate 
  data of each node to all others.  Environme
  nt:- LA
 N, Solr (3.6 to 4.9), Redhat
 
 With master-slave replication, there must be a clear master, from which
 slaves replicate.  You can't set up fully circular replication, or the
 master will replicate from the empty slave and your data will be gone.
 This form of replication does not merge data -- it makes the slave index
 identical to the master by copying the actual files on disk for the index.
 
 I think you'll want to use SolrCloud.  You have three machines, so you
 have the minimum number for a redundant zookeeper ensemble.  SolrCloud
 relies on zookeeper to handle cluster functions.  SolrCloud is a true
 cluster -- no replication, no master.
 
 https://cwiki.apache.org/confluence/display/solr/SolrCloud
 
 Thanks,
 Shawn


Thanks Shawn. What i got is Circular replication is totally impossible  Solr 
fails in distributed environment. Then why solr documentation says that 
configure REPEATER for distributed architecture, because REPEATER behave 
like master-slave at a time.
Can i configure SolrCloud on LAN, or i've to configure zookeeper myself. Please 
provide me any solution for LAN distributed servers. If zookeeper in only 
solution then provide me any link to configure it that can help me  to avoid 
wrong direction.
Regards,Waqas 

Re: Replication Issue with Repeater Please help

2014-08-14 Thread Shawn Heisey
On 8/14/2014 2:09 AM, waqas sarwar wrote:
 Thanks Shawn. What i got is Circular replication is totally impossible  Solr 
 fails in distributed environment. Then why solr documentation says that 
 configure REPEATER for distributed architecture, because REPEATER behave 
 like master-slave at a time.
 Can i configure SolrCloud on LAN, or i've to configure zookeeper myself. 
 Please provide me any solution for LAN distributed servers. If zookeeper in 
 only solution then provide me any link to configure it that can help me  to 
 avoid wrong direction.

The repeater config is designed to avoid master overload from many
slaves.  So instead of configuring ten slaves to replicate from one
master, you configure two slaves to replicate directly from your master,
and then you configure those as repeaters.  The other eight slaves are
configured so that four of them replicate from each of the repeaters
instead of the true master, reducing the load.

SolrCloud is the easiest way to build a fully distributed and redundant
solution.  It is designed for a LAN.  You configure three machines as
your zookeeper ensemble, using the zookeeper download and instructions
for a clustered setup:

http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#sc_zkMulitServerSetup

The way to start Solr in cloud mode is to give it a zkHost system
property.  That informs Solr about all of your ZK servers.  If you have
another way of setting that property, you can use that instead.  I
strongly recommend using a chroot with the zkHost parameter, but that is
not required.  Search the zookeeper page linked above for chroot to
find a link to additional documentation about chroot.

You can use the same servers for ZK as you do for Solr, but be aware
that if Solr puts a large I/O load on the disks, you may want the ZK
database to be on its own disks(s) so that it responds quickly. 
Separate servers is even better, but not strictly required unless the
servers are under extreme load.

https://cwiki.apache.org/confluence/display/solr/SolrCloud

You will find a Getting Started link on the page above.  Note that the
Getting Started page talks about a zkRun option, which starts an
embedded zookeeper as part of Solr.  I strongly recommend that you do
NOT take this route, except for *initial* testing.  SolrCloud works much
better if the Zookeeper ensemble is in its own process, separate from Solr.

Thanks,
Shawn



Re: Replication Issue with Repeater Please help

2014-08-13 Thread Shawn Heisey
On 8/13/2014 12:49 AM, waqas sarwar wrote:
 Hi, I'm using Solr. I need a little bit assistance from you. I am bit 
 stuck with Solr replication, before discussing issue let me write a brief 
 description.Scenario:- I want to set up solr in distributed architecture, 
 suppose start with least no of nodes (suppose 3), how can i replicate data of 
 each node to 2 others and vice versa.My Solution:- I set up “REPEATER” on 
 all nodes, each node is master to other, and configured circular replication. 
 Issue i'm facing:- All nodes are working fine replicating data from other 
 node, but when node1 replicate data from node2, node1 loses its own data. I 
 think node1 don’t have to atleast lose its own data  have to merge new data. 
 I think now question is pretty simple and clear, I want to set up solr in 
 distributed architecture, each node is replica to other, how may i achieve 
 it. Is there be any other way except Repeater and circular replication using 
 repeater, to replicate data of each node to all others.  Environme
 nt:- LA
N, Solr (3.6 to 4.9), Redhat  

With master-slave replication, there must be a clear master, from which
slaves replicate.  You can't set up fully circular replication, or the
master will replicate from the empty slave and your data will be gone.
This form of replication does not merge data -- it makes the slave index
identical to the master by copying the actual files on disk for the index.

I think you'll want to use SolrCloud.  You have three machines, so you
have the minimum number for a redundant zookeeper ensemble.  SolrCloud
relies on zookeeper to handle cluster functions.  SolrCloud is a true
cluster -- no replication, no master.

https://cwiki.apache.org/confluence/display/solr/SolrCloud

Thanks,
Shawn



Re: Replication Problem from solr-3.6 to solr-4.0

2014-07-24 Thread Sree..
I did optimize the master and the slave started replicating the indices!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replication-Problem-from-solr-3-6-to-solr-4-0-tp4025028p4148953.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Replication Problem from solr-3.6 to solr-4.0

2014-07-22 Thread askumar1444
Same with me too, in a multi-core Master/Slave.

11:17:30.476 [snapPuller-8-thread-1] INFO  o.a.s.h.SnapPuller - Master's
generation: 87
11:17:30.476 [snapPuller-8-thread-1] INFO  o.a.s.h.SnapPuller - Slave's
generation: 3
11:17:30.476 [snapPuller-8-thread-1] INFO  o.a.s.h.SnapPuller - Starting
replication process
11:17:30.713 [snapPuller-8-thread-1] ERROR o.a.s.h.SnapPuller - No files to
download for index generation: 87

Any solution/fix for it?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replication-Problem-from-solr-3-6-to-solr-4-0-tp4025028p4148703.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Replication (Solr Cloud)

2014-03-25 Thread Shawn Heisey

On 3/25/2014 10:42 AM, Software Dev wrote:

I see that by default in SolrCloud that my collections are
replicating. Should this be disabled in SolrCloud as this is already
handled by it?

 From the documentation:

The Replication screen shows you the current replication state for
the named core you have specified. In Solr, replication is for the
index only. SolrCloud has supplanted much of this functionality, but
if you are still using index replication, you can use this screen to
see the replication state:

I just want to make sure before I disable it that if we send an update
to one server that the document will be correctly replicated across
all nodes. Thanks


The replication handler must be configured for SolrCloud to operate 
properly ... but not in the way that you might think. This is a source 
of major confusion for those who are new to SolrCloud, especially if 
they already understand master/slave replication.


During normal operation, SolrCloud does NOT use replication.  
Replication is ONLY used to recover indexes.  When everything is working 
well, recovery only happens when a Solr instance starts up.


Every Solr instance will be a master.  If that Solr instance has *EVER* 
(since the last instance start) replicated its index from a shard 
leader, it will *also* say that it is a slave. These are NOT indications 
that a replication is occurring, they are just the current configuration 
state of the replication handler.


You can ignore everything you see on the replication tab if you are 
running SolrCloud.  It only has meaning at the moment a replication is 
happening, and that is completely automated by SolrCloud.


Thanks,
Shawn



Re: Replication (Solr Cloud)

2014-03-25 Thread Michael Della Bitta
No, don't disable replication!

The way shards ordinarily keep up with updates is by sending every document
to each member of the shard. However, if a shard goes offline for a period
of time and comes back, replication is used to catch up that shard. So
you really need it on.

If you created your collection with the collections API and the required
bits are in schema.xml and solrconfig.xml, you should be good to go. See
https://wiki.apache.org/solr/SolrCloud#Required_Config

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

The Science of Influence Marketing

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Tue, Mar 25, 2014 at 12:42 PM, Software Dev static.void@gmail.comwrote:

 I see that by default in SolrCloud that my collections are
 replicating. Should this be disabled in SolrCloud as this is already
 handled by it?

 From the documentation:

 The Replication screen shows you the current replication state for
 the named core you have specified. In Solr, replication is for the
 index only. SolrCloud has supplanted much of this functionality, but
 if you are still using index replication, you can use this screen to
 see the replication state:

 I just want to make sure before I disable it that if we send an update
 to one server that the document will be correctly replicated across
 all nodes. Thanks



Re: Replication (Solr Cloud)

2014-03-25 Thread Software Dev
Thanks for the reply. Ill make sure NOT to disable it.


  1   2   3   4   5   >