Replication error in SOLR-6.5.1

2018-09-25 Thread SOLR4189
Hi all,

I use SOLR-6.5.1. Before couple weeks I started to use replication feature
in cloud mode without override default behavior of ReplicationHandler.

After deployment replication feature to production, almost every day I hit
these errors:
SolrException: Unable to download  completely. Downloaded x!=y
OR
SolrException: Unable to download  completely. (Downloaded x of y
bytes) No space left on device
OR
Error deleting file: 
NoSuchFileException: /opt/solr//data/index./

All these errors I get when replica in recovering mode, sometimes after
physical machine failing or sometimes after simple solr restarting. Today I
have only one solution for it: after 5th unsuccess replica recovering, I
remove replica and add replica anew.

In all my solr servers I have 40% free space, hard/soft commit is 5 minutes.


What's wrong here and what can be done to correct these errors?
Due to free space or commitReserveDuration parameter or something else?




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Rule-based replication or sharing

2018-09-25 Thread Chuck Reynolds
Shawn,

Thanks for the info. We’ve been running this way for the past 4 years. 

We were running on very large hardware, 20 physical cores with 256 gigs of ram 
with 3 billion document and it was the only way we could take advantage of the 
hardware. 

Running 1 Solr instance per server never gave us the throughput we needed. 

So I somewhat disagree with your statement because our test proved otherwise. 

Thanks for the info. 

Sent from my iPhone

> On Sep 25, 2018, at 4:19 PM, Shawn Heisey  wrote:
> 
>> On 9/25/2018 9:21 AM, Chuck Reynolds wrote:
>> Each server has three instances of Solr running on it so every instance on 
>> the server has to be in the same replica set.
> 
> You should be running exactly one Solr instance per server.  When evaluating 
> rules for replica placement, SolrCloud will treat each instance as completely 
> separate from all others, including others on the same machine.  It will not 
> know that those three instances are on the same machine.  One Solr instance 
> can handle MANY indexes.
> 
> There is only ONE situation where it makes sense to run multiple instances 
> per machine, and in my strong opinion, even that situation should not be 
> handled with multiple instances. That situation is this:  When running one 
> instance would require a REALLY large heap.  Garbage collection pauses can 
> become extreme in that situation, so some people will run multiple instances 
> that each have a smaller heap, and divide their indexes between them. In my 
> opinion, when you have enough index data on an instance that it requires a 
> huge heap, instead of running two or more instances on one server, it's time 
> to add more servers.
> 
> Thanks,
> Shawn
> 


Re: Rule-based replication or sharing

2018-09-25 Thread Shawn Heisey

On 9/25/2018 9:21 AM, Chuck Reynolds wrote:

Each server has three instances of Solr running on it so every instance on the 
server has to be in the same replica set.


You should be running exactly one Solr instance per server.  When 
evaluating rules for replica placement, SolrCloud will treat each 
instance as completely separate from all others, including others on the 
same machine.  It will not know that those three instances are on the 
same machine.  One Solr instance can handle MANY indexes.


There is only ONE situation where it makes sense to run multiple 
instances per machine, and in my strong opinion, even that situation 
should not be handled with multiple instances. That situation is this:  
When running one instance would require a REALLY large heap.  Garbage 
collection pauses can become extreme in that situation, so some people 
will run multiple instances that each have a smaller heap, and divide 
their indexes between them. In my opinion, when you have enough index 
data on an instance that it requires a huge heap, instead of running two 
or more instances on one server, it's time to add more servers.


Thanks,
Shawn



Re: Faceting with a multi valued field

2018-09-25 Thread Alexandre Rafalovitch
What specifically do you control? Just keyword (and "Communities:"
part is locked?) or anything after q= or anything that allows multiple
variables?

Because if you could isolate search value, you could use for example
facet.prefix, set in solrconfig as a default parameter and populated
from the same variable as the Communities search.

You may also want to set facet.mincount=1 in solrconfig.xml to avoid
0-value facets in general:
https://lucene.apache.org/solr/guide/7_4/faceting.html

Regards,
   Alex.


On 25 September 2018 at 16:50, John Blythe  wrote:
> you can update your filter query to be a facet query, this will apply the
> query to the resulting facet set instead of the Communities field itself.
>
> --
> John Blythe
>
>
> On Tue, Sep 25, 2018 at 4:15 PM Hanjan, Harinder 
> wrote:
>
>> Hello!
>>
>> I am doing faceting on a field which has multiple values and it's yielding
>> expected but undesireable results. I need different behaviour but not sure
>> how to formulate a query for it. Here is my current setup.
>>
>> = Data Set =
>>   {
>> "Communities":["BANFF TRAIL - BNF", "PARKDALE - PKD"],
>> "Document Type":"Engagement - What We Heard Report",
>> "Navigation":"Livelink",
>> "SolrId":"http://thesimpsons.com/one;
>>   }
>>   {
>> "Communities":["BANFF TRAIL - BNF", "PARKDALE - PKD"],
>> "Document Type":"Engagement - What We Heard Report",
>> "Navigation":"Livelink",
>> "Id":"http://thesimpsons.com/two;
>>   }
>>   {
>> "Communities":["SUNALTA - SNA"],
>> "Document Type":"Engagement - What We Heard Report",
>> "Navigation":"Livelink",
>> "Id":"http://thesimpsons.com/three;
>>   }
>>
>> = Query I run now =
>>
>> http://localhost:8984/solr/everything/select?q=*:*=on=Communities=Communities:"BANFF
>> TRAIL - BNF"
>>
>>
>> = Results I get now =
>> {
>>   ...
>>   "facet_counts":{
>> "facet_queries":{},
>> "facet_fields":{
>>   "Communities":[
>> "BANFF TRAIL - BNF",2,
>> "PARKDALE - PKD",2,
>> "SUNALTA - SNA",0]},
>>...
>>
>> Notice that the Communities facet has 2 non zero results. I understand
>> this is because I'm using fq to get only documents which contain BANFF
>> TRAIL but those documents also contain PARKDALE.
>>
>> Now, I am using facets to drive navigation on my page. The business case
>> is that user can select a community to get documents pertaining to that
>> specific community only. This works with the query I have above. However,
>> the facets results also contain other communities which then get displayed
>> to the user. For example, with the query above, user will see both BANFF
>> TRAIL and PARKDALE as selected values even though user only selected BANFF
>> TRAIL. It's worthwhile noting that I have no control over the data being
>> sent to Solr and can't change it.
>>
>> How can I formulate a query to ensure that when user selects BANFF TRAIL,
>> only BANFF TRAIL is returned under Solr facets?
>>
>> Thanks!
>> Harinder
>>
>> 
>> NOTICE -
>> This communication is intended ONLY for the use of the person or entity
>> named above and may contain information that is confidential or legally
>> privileged. If you are not the intended recipient named above or a person
>> responsible for delivering messages or communications to the intended
>> recipient, YOU ARE HEREBY NOTIFIED that any use, distribution, or copying
>> of this communication or any of the information contained in it is strictly
>> prohibited. If you have received this communication in error, please notify
>> us immediately by telephone and then destroy or delete this communication,
>> or return it to us by mail if requested by us. The City of Calgary thanks
>> you for your attention and co-operation.
>>


Re: Faceting with a multi valued field

2018-09-25 Thread John Blythe
you can update your filter query to be a facet query, this will apply the
query to the resulting facet set instead of the Communities field itself.

--
John Blythe


On Tue, Sep 25, 2018 at 4:15 PM Hanjan, Harinder 
wrote:

> Hello!
>
> I am doing faceting on a field which has multiple values and it's yielding
> expected but undesireable results. I need different behaviour but not sure
> how to formulate a query for it. Here is my current setup.
>
> = Data Set =
>   {
> "Communities":["BANFF TRAIL - BNF", "PARKDALE - PKD"],
> "Document Type":"Engagement - What We Heard Report",
> "Navigation":"Livelink",
> "SolrId":"http://thesimpsons.com/one;
>   }
>   {
> "Communities":["BANFF TRAIL - BNF", "PARKDALE - PKD"],
> "Document Type":"Engagement - What We Heard Report",
> "Navigation":"Livelink",
> "Id":"http://thesimpsons.com/two;
>   }
>   {
> "Communities":["SUNALTA - SNA"],
> "Document Type":"Engagement - What We Heard Report",
> "Navigation":"Livelink",
> "Id":"http://thesimpsons.com/three;
>   }
>
> = Query I run now =
>
> http://localhost:8984/solr/everything/select?q=*:*=on=Communities=Communities:"BANFF
> TRAIL - BNF"
>
>
> = Results I get now =
> {
>   ...
>   "facet_counts":{
> "facet_queries":{},
> "facet_fields":{
>   "Communities":[
> "BANFF TRAIL - BNF",2,
> "PARKDALE - PKD",2,
> "SUNALTA - SNA",0]},
>...
>
> Notice that the Communities facet has 2 non zero results. I understand
> this is because I'm using fq to get only documents which contain BANFF
> TRAIL but those documents also contain PARKDALE.
>
> Now, I am using facets to drive navigation on my page. The business case
> is that user can select a community to get documents pertaining to that
> specific community only. This works with the query I have above. However,
> the facets results also contain other communities which then get displayed
> to the user. For example, with the query above, user will see both BANFF
> TRAIL and PARKDALE as selected values even though user only selected BANFF
> TRAIL. It's worthwhile noting that I have no control over the data being
> sent to Solr and can't change it.
>
> How can I formulate a query to ensure that when user selects BANFF TRAIL,
> only BANFF TRAIL is returned under Solr facets?
>
> Thanks!
> Harinder
>
> 
> NOTICE -
> This communication is intended ONLY for the use of the person or entity
> named above and may contain information that is confidential or legally
> privileged. If you are not the intended recipient named above or a person
> responsible for delivering messages or communications to the intended
> recipient, YOU ARE HEREBY NOTIFIED that any use, distribution, or copying
> of this communication or any of the information contained in it is strictly
> prohibited. If you have received this communication in error, please notify
> us immediately by telephone and then destroy or delete this communication,
> or return it to us by mail if requested by us. The City of Calgary thanks
> you for your attention and co-operation.
>


Faceting with a multi valued field

2018-09-25 Thread Hanjan, Harinder
Hello!

I am doing faceting on a field which has multiple values and it's yielding 
expected but undesireable results. I need different behaviour but not sure how 
to formulate a query for it. Here is my current setup.

= Data Set =
  {
"Communities":["BANFF TRAIL - BNF", "PARKDALE - PKD"],
"Document Type":"Engagement - What We Heard Report",
"Navigation":"Livelink",
"SolrId":"http://thesimpsons.com/one;
  }
  {
"Communities":["BANFF TRAIL - BNF", "PARKDALE - PKD"],
"Document Type":"Engagement - What We Heard Report",
"Navigation":"Livelink",
"Id":"http://thesimpsons.com/two;
  }
  {
"Communities":["SUNALTA - SNA"],
"Document Type":"Engagement - What We Heard Report",
"Navigation":"Livelink",
"Id":"http://thesimpsons.com/three;
  }

= Query I run now =
http://localhost:8984/solr/everything/select?q=*:*=on=Communities=Communities:"BANFF
 TRAIL - BNF"


= Results I get now =
{
  ...
  "facet_counts":{
"facet_queries":{},
"facet_fields":{
  "Communities":[
"BANFF TRAIL - BNF",2,
"PARKDALE - PKD",2,
"SUNALTA - SNA",0]},
   ...

Notice that the Communities facet has 2 non zero results. I understand this is 
because I'm using fq to get only documents which contain BANFF TRAIL but those 
documents also contain PARKDALE.

Now, I am using facets to drive navigation on my page. The business case is 
that user can select a community to get documents pertaining to that specific 
community only. This works with the query I have above. However, the facets 
results also contain other communities which then get displayed to the user. 
For example, with the query above, user will see both BANFF TRAIL and PARKDALE 
as selected values even though user only selected BANFF TRAIL. It's worthwhile 
noting that I have no control over the data being sent to Solr and can't change 
it.

How can I formulate a query to ensure that when user selects BANFF TRAIL, only 
BANFF TRAIL is returned under Solr facets?

Thanks!
Harinder


NOTICE -
This communication is intended ONLY for the use of the person or entity named 
above and may contain information that is confidential or legally privileged. 
If you are not the intended recipient named above or a person responsible for 
delivering messages or communications to the intended recipient, YOU ARE HEREBY 
NOTIFIED that any use, distribution, or copying of this communication or any of 
the information contained in it is strictly prohibited. If you have received 
this communication in error, please notify us immediately by telephone and then 
destroy or delete this communication, or return it to us by mail if requested 
by us. The City of Calgary thanks you for your attention and co-operation.


Re: Rule-based replication or sharing

2018-09-25 Thread Chuck Reynolds
Steve,

Sorry must have omitted it from a past response.

Here is what came back from the response.



40091org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
 Could not identify nodes matching the rules [{
  "shard":"*",
  "replica":"1",
  "sysprop.AWSAZ":"AZ1"}, {
  "shard":"*",
  "replica":"1",
  "sysprop.AWSAZ":"AZ2"}, {
  "shard":"*",
  "replica":"1",
  "sysprop.AWSAZ":"AZ3"}]
 tag values{
  "10.157.112.223:10002_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.120.207:10003_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.121.165:10002_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.190:10002_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.121.165:10003_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.190:10001_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.115.30:10003_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.121.165:10001_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.201:10002_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.120.207:10001_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.112.223:10003_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.115.30:10001_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.116.190:10003_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.112.223:10001_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.120.207:10002_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.201:10003_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.116.201:10001_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.115.30:10002_solr":{"sysprop.AWSAZ":"AZ1"}}Could not identify nodes matching the rules [{
  "shard":"*",
  "replica":"1",
  "sysprop.AWSAZ":"AZ1"}, {
  "shard":"*",
  "replica":"1",
  "sysprop.AWSAZ":"AZ2"}, {
  "shard":"*",
  "replica":"1",
  "sysprop.AWSAZ":"AZ3"}]
 tag values{
  "10.157.112.223:10002_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.120.207:10003_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.121.165:10002_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.190:10002_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.121.165:10003_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.190:10001_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.115.30:10003_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.121.165:10001_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.201:10002_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.120.207:10001_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.112.223:10003_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.115.30:10001_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.116.190:10003_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.112.223:10001_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.120.207:10002_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.201:10003_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.116.201:10001_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.115.30:10002_solr":{"sysprop.AWSAZ":"AZ1"}}400org.apache.solr.common.SolrExceptionorg.apache.solr.common.SolrExceptionCould not identify nodes matching the rules [{
  "shard":"*",
  "replica":"1",
  "sysprop.AWSAZ":"AZ1"}, {
  "shard":"*",
  "replica":"1",
  "sysprop.AWSAZ":"AZ2"}, {
  "shard":"*",
  "replica":"1",
  "sysprop.AWSAZ":"AZ3"}]
 tag values{
  "10.157.112.223:10002_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.120.207:10003_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.121.165:10002_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.190:10002_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.121.165:10003_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.190:10001_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.115.30:10003_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.121.165:10001_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.201:10002_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.120.207:10001_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.112.223:10003_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.115.30:10001_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.116.190:10003_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.112.223:10001_solr":{"sysprop.AWSAZ":"AZ1"},
  "10.157.120.207:10002_solr":{"sysprop.AWSAZ":"AZ3"},
  "10.157.116.201:10003_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.116.201:10001_solr":{"sysprop.AWSAZ":"AZ2"},
  "10.157.115.30:10002_solr":{"sysprop.AWSAZ":"AZ1"}}400






On 9/25/18, 11:33 AM, "Steve Rowe"  wrote:

Chuck, see my responses inline below:

> On Sep 25, 2018, at 12:50 PM, Chuck Reynolds  
wrote:
> The bottom line is I guess I'm confused by the documentation and the 
reference to replicas. Normally when referring to replicas in the documentation 
it is referring to the number of times you want the data replicated. As in 
replication factor.  That's where the confusion was for me.

We can always use help improving Solr’s documentation, and your perspective 
is valuable.  Please see 
https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org_solr_HowToContribute=DwIFaQ=kKqjBR9KKWaWpMhASkPbOg=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0=_ZWThlXl48Sa2f_pVyPzwxiCmVnOtDdddq8wfK6CVqM=XR2tqAPyyNaSPvrlnjhqm81DZj3WP3v7s8iEciH1xss=
 and open JIRA issues with the problems you find, and ideally with patches 
against the ref guide sources.

From the Solr 6.4 ref guide’s “Solr Glossary”:

  Replica: A Core that acts as a physical copy of a Shard in a SolrCloud 

Re: Rule-based replication or sharing

2018-09-25 Thread Steve Rowe
Chuck, see my responses inline below:

> On Sep 25, 2018, at 12:50 PM, Chuck Reynolds  wrote:
> The bottom line is I guess I'm confused by the documentation and the 
> reference to replicas. Normally when referring to replicas in the 
> documentation it is referring to the number of times you want the data 
> replicated. As in replication factor.  That's where the confusion was for me.

We can always use help improving Solr’s documentation, and your perspective is 
valuable.  Please see https://wiki.apache.org/solr/HowToContribute and open 
JIRA issues with the problems you find, and ideally with patches against the 
ref guide sources.

From the Solr 6.4 ref guide’s “Solr Glossary”:

  Replica: A Core that acts as a physical copy of a Shard in a SolrCloud 
Collection.

As ^^ indicates, a “replica” is not a replication factor.  (Though “replica:1” 
in a rule-based replica placement rule is a condition on replica *count*, so I 
can see where that could be confusing.)

> If I want to create a rule that insures that my replication factor of three 
> correctly shards the data across three AZ so if I was to lose one or even two 
> AZ's in AWS Solr would still have 1 - 2 copies of the data.   How would that 
> rule work?

I thought I already answered that exact question:

> If you mean “exactly one Solr instance in an AZ must host exactly one replica 
> of each shard of the collection”, then yes, that makes sense :).
> 
> Okay, one more try :) - here are the rules that should do the trick for you 
> (i.e., what I wrote in the previous sentence):
> 
> -
> rule=shard:*,replica:1,sysprop.AWSAZ:AZ1
> =shard:*,replica:1,sysprop.AWSAZ:AZ2
> =shard:*,replica:1,sysprop.AWSAZ:AZ3
> -

Have you tried ^^ ?

--
Steve
www.lucidworks.com



Re: Auto recovery of a failed Solr Cloud Node?

2018-09-25 Thread Erick Erickson
What does "Failed solr node" mean? How do you mean if fails? There's
lots of recovery built in for a replica that gets out-of-sync somehow
(is shut down while indexing is going on etc). All that relies on
having more than one replica per shard of course.

If the node completely dies due to hardware for instance, then yes the
best solution now is to spin up another Solr node. I'm not sure what
REPLACENODE does in this scenario.

If you're using HDFS there's an option to do this since the index is
replicated by HDFS.

Best,
Erick
On Tue, Sep 25, 2018 at 8:48 AM Kimber, Mike  wrote:
>
> Hi,
>
> Is there a recommend design pattern or best practice for auto recovery of a 
> failed Solr Node?
>
> I'm I correct to assume there is nothing out of the box for this and we have 
> to code our own solution?
>
> Thanks
>
> Michael Kimber
>
>
> This electronic message may contain proprietary and confidential information 
> of Verint Systems Inc., its affiliates and/or subsidiaries. The information 
> is intended to be for the use of the individual(s) or entity(ies) named 
> above. If you are not the intended recipient (or authorized to receive this 
> e-mail for the intended recipient), you may not use, copy, disclose or 
> distribute to anyone this message or any information contained in this 
> message. If you have received this electronic message in error, please notify 
> us by replying to this e-mail.


Re: Rule-based replication or sharing

2018-09-25 Thread Chuck Reynolds
Steve,

No doubt I confused you.  I'm confused myself__

When I said replica set what I was referring to was one of the three replicas 
of the data.  Each replica needing to be in a different AZ.

What is a "replica set”?  And why does each instance of Solr (referred to in 
the reference guide as a “node”, BTW) running on a server need to be in the 
same “replica set”?
What I should of said is each node on the server which there are three 
per server needs to be in the same AZ.

The bottom line is I guess I'm confused by the documentation and the reference 
to replicas. Normally when referring to replicas in the documentation it is 
referring to the number of times you want the data replicated. As in 
replication factor.  That's where the confusion was for me.

So let me ask this simple question.

If I want to create a rule that insures that my replication factor of three 
correctly shards the data across three AZ so if I was to lose one or even two 
AZ's in AWS Solr would still have 1 - 2 copies of the data.   How would that 
rule work?



On 9/25/18, 10:17 AM, "Steve Rowe"  wrote:

Hi Chuck, see my replies inline below:

> On Sep 25, 2018, at 11:21 AM, Chuck Reynolds  
wrote:
> 
> So we have 90 server in AWS, 30 servers per AZ's.
> 90 shards for the cluster.
> Each server has three instances of Solr running on it so every instance 
on the server has to be in the same replica set.

You lost me here.  What is a "replica set”?  And why does each instance of 
Solr (referred to in the reference guide as a “node”, BTW) running on a server 
need to be in the same “replica set”?

(I’m guessing you have theorized that “replica:3” is a way of referring to 
"replica set #3”, but that’s incorrect; “replica:3” means that exactly 3 
replicas must be placed on the bucket of nodes you specify in the rule; more 
info below.)

> So for example shard 1 will have three replicas and each replica needs to 
be in a separate AZ.

Okay, I understand this part, but I fail to see how this is an example of 
your “replica set” assertion above.

> So does the rule of replica:>2 work?

I assume you did not mean ^^ literally, since you wrote “>” where I wrote 
“<“ in my previous response. 

I checked offline with Noble Paul, who wrote the rule-based replica 
placement feature, and he corrected a misunderstanding of mine:

> On 9/25/18, 9:08 AM, "Steve Rowe"  wrote:

> So you could specify “replica:<2”, which means that no node can host more 
than one replica, but it's acceptable for a node to host zero replicas.

But ^^ is incorrect. 

“replica:<2” means that either zero or one replica of each shard of the 
collection to be created may be hosted on the bucket of *all* of the nodes that 
have the specified AWSAZ sysprop value.  That is, when placing replicas, Solr 
will put either zero or one replica on one of the nodes in the bucket.  And 
AFAICT that’s not exatly what you want, since zero replicas of a shard on an AZ 
is not acceptable. 

> I just need all of the servers in an AZ to be in the same replica.  Does 
that make sense?

I’m not sure?  This sounds like something different from your above 
example: "shard 1 will have three replicas and each replica needs to be in a 
separate AZ.”

If you mean “exactly one Solr instance in an AZ must host exactly one 
replica of each shard of the collection”, then yes, that makes sense :).

Okay, one more try :) - here are the rules that should do the trick for you 
(i.e., what I wrote in the previous sentence):

-
 rule=shard:*,replica:1,sysprop.AWSAZ:AZ1
=shard:*,replica:1,sysprop.AWSAZ:AZ2
=shard:*,replica:1,sysprop.AWSAZ:AZ3
-

--
Steve

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.lucidworks.com=DwIFaQ=kKqjBR9KKWaWpMhASkPbOg=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0=e6E6P07QRR0Gn9oI2V1tk3sWdVDq5EF_tIgdoh4DxpE=Ddb5KOc_t4p64xyxt5rmqnWWwMcByecGQ2iJYv2BWiY=

> On 9/25/18, 9:08 AM, "Steve Rowe"  wrote:
> 
>Chuck,
> 
>The default Snitch is the one that’s used if you don’t specify one in 
a rule.  The sysprop.* tag is provided by the default Snitch.
> 
>The only thing that seems wrong to me in your rules is “replica:1”, 
“replica:2”, and “replica:3” - these say that exactly one, two, and three 
replicas of each shard, respectively, must be on each of the nodes that has the 
respective sysprop value.
> 
>Since these rules will apply to all nodes that match the sysprop 
value, you have to allow for the possibility that some nodes will have *zero* 
replicas of a shard.  So you could specify “replica:<2”, which means that no 
node can host more than one replica, but it's acceptable for a node to host 
zero replicas.
> 
>Did you set system property AWSAZ on each Solr node with an 
appropriate value?
> 
>--
 

Re: Rule-based replication or sharing

2018-09-25 Thread Steve Rowe
Hi Chuck, see my replies inline below:

> On Sep 25, 2018, at 11:21 AM, Chuck Reynolds  wrote:
> 
> So we have 90 server in AWS, 30 servers per AZ's.
> 90 shards for the cluster.
> Each server has three instances of Solr running on it so every instance on 
> the server has to be in the same replica set.

You lost me here.  What is a "replica set”?  And why does each instance of Solr 
(referred to in the reference guide as a “node”, BTW) running on a server need 
to be in the same “replica set”?

(I’m guessing you have theorized that “replica:3” is a way of referring to 
"replica set #3”, but that’s incorrect; “replica:3” means that exactly 3 
replicas must be placed on the bucket of nodes you specify in the rule; more 
info below.)

> So for example shard 1 will have three replicas and each replica needs to be 
> in a separate AZ.

Okay, I understand this part, but I fail to see how this is an example of your 
“replica set” assertion above.

> So does the rule of replica:>2 work?

I assume you did not mean ^^ literally, since you wrote “>” where I wrote “<“ 
in my previous response. 

I checked offline with Noble Paul, who wrote the rule-based replica placement 
feature, and he corrected a misunderstanding of mine:

> On 9/25/18, 9:08 AM, "Steve Rowe"  wrote:

> So you could specify “replica:<2”, which means that no node can host more 
> than one replica, but it's acceptable for a node to host zero replicas.

But ^^ is incorrect. 

“replica:<2” means that either zero or one replica of each shard of the 
collection to be created may be hosted on the bucket of *all* of the nodes that 
have the specified AWSAZ sysprop value.  That is, when placing replicas, Solr 
will put either zero or one replica on one of the nodes in the bucket.  And 
AFAICT that’s not exatly what you want, since zero replicas of a shard on an AZ 
is not acceptable. 

> I just need all of the servers in an AZ to be in the same replica.  Does that 
> make sense?

I’m not sure?  This sounds like something different from your above example: 
"shard 1 will have three replicas and each replica needs to be in a separate 
AZ.”

If you mean “exactly one Solr instance in an AZ must host exactly one replica 
of each shard of the collection”, then yes, that makes sense :).

Okay, one more try :) - here are the rules that should do the trick for you 
(i.e., what I wrote in the previous sentence):

-
 rule=shard:*,replica:1,sysprop.AWSAZ:AZ1
=shard:*,replica:1,sysprop.AWSAZ:AZ2
=shard:*,replica:1,sysprop.AWSAZ:AZ3
-

--
Steve
www.lucidworks.com

> On 9/25/18, 9:08 AM, "Steve Rowe"  wrote:
> 
>Chuck,
> 
>The default Snitch is the one that’s used if you don’t specify one in a 
> rule.  The sysprop.* tag is provided by the default Snitch.
> 
>The only thing that seems wrong to me in your rules is “replica:1”, 
> “replica:2”, and “replica:3” - these say that exactly one, two, and three 
> replicas of each shard, respectively, must be on each of the nodes that has 
> the respective sysprop value.
> 
>Since these rules will apply to all nodes that match the sysprop value, 
> you have to allow for the possibility that some nodes will have *zero* 
> replicas of a shard.  So you could specify “replica:<2”, which means that no 
> node can host more than one replica, but it's acceptable for a node to host 
> zero replicas.
> 
>Did you set system property AWSAZ on each Solr node with an appropriate 
> value?
> 
>--
>Steve
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.lucidworks.com=DwIFaQ=kKqjBR9KKWaWpMhASkPbOg=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0=uG91WrgZB5UTKLOAB53AcrY5LyBsJ3VyBH8cN7xe2mU=9V6TJoE0h5NMjEWZ38ipa3zgYvoLJ1H9GHplSz1DJLU=
> 
>> On Sep 25, 2018, at 10:39 AM, Chuck Reynolds  wrote:
>> 
>> Steve,
>> 
>> I wasn't able to get the sysprop to work.  I think maybe there is a 
>> disconnect on my part.
>> 
>> From the documentation it looks like I can only use the sysprop tag if I'm 
>> using a Snitch.  Is that correct.
>> 
>> I can't find any example of anyone using the default Snitch.
>> 
>> Here is what I have for my rule:
>> rule=shard:*,replica:1,sysprop.AWSAZ:AZ1=shard:*,replica:2,sysprop.AWSAZ:AZ2=shard:*,replica:3,sysprop.AWSAZ:AZ3
>> 
>> I'm not specifying a snitch.  Is that my problem or is there a problem with 
>> my rule?
>> 
>> Thanks for your help.
>> On 9/21/18, 2:40 PM, "Steve Rowe"  wrote:
>> 
>>   Hi Chuck,
>> 
>>   One way to do it is to set a system property on the JVM running each Solr 
>> node, corresponding to the the AWS availability zone on which the node is 
>> hosted.
>> 
>>   For example, you could use sysprop “AWSAZ”, then use rules like:
>> 
>>  replica:<2,sysprop.AWSAZ:us-east-1
>>  replica:<2,sysprop.AWSAZ:us-west-1
>>  replica:<2,sysprop.AWSAZ:ca-central-1
>> 
>>   --
>>   Steve
>>   
>> 

Auto recovery of a failed Solr Cloud Node?

2018-09-25 Thread Kimber, Mike
Hi,

Is there a recommend design pattern or best practice for auto recovery of a 
failed Solr Node?

I'm I correct to assume there is nothing out of the box for this and we have to 
code our own solution?

Thanks

Michael Kimber


This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries. The information is 
intended to be for the use of the individual(s) or entity(ies) named above. If 
you are not the intended recipient (or authorized to receive this e-mail for 
the intended recipient), you may not use, copy, disclose or distribute to 
anyone this message or any information contained in this message. If you have 
received this electronic message in error, please notify us by replying to this 
e-mail.


Re: Rule-based replication or sharing

2018-09-25 Thread Chuck Reynolds
Steve,

Yes I set the set system property AWSAZ and I've checked the java properties in 
Solr and I can see them.

It maybe the way we are configuring Solr so let me explain that first.

So we have 90 server in AWS, 30 servers per AZ's.
90 shards for the cluster.
Each server has three instances of Solr running on it so every instance on the 
server has to be in the same replica set.
So for example shard 1 will have three replicas and each replica needs to be in 
a separate AZ.

So does the rule of replica:>2 work?

I just need all of the servers in an AZ to be in the same replica.  Does that 
make sense?

On 9/25/18, 9:08 AM, "Steve Rowe"  wrote:

Chuck,

The default Snitch is the one that’s used if you don’t specify one in a 
rule.  The sysprop.* tag is provided by the default Snitch.

The only thing that seems wrong to me in your rules is “replica:1”, 
“replica:2”, and “replica:3” - these say that exactly one, two, and three 
replicas of each shard, respectively, must be on each of the nodes that has the 
respective sysprop value.

Since these rules will apply to all nodes that match the sysprop value, you 
have to allow for the possibility that some nodes will have *zero* replicas of 
a shard.  So you could specify “replica:<2”, which means that no node can host 
more than one replica, but it's acceptable for a node to host zero replicas.

Did you set system property AWSAZ on each Solr node with an appropriate 
value?

--
Steve

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.lucidworks.com=DwIFaQ=kKqjBR9KKWaWpMhASkPbOg=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0=uG91WrgZB5UTKLOAB53AcrY5LyBsJ3VyBH8cN7xe2mU=9V6TJoE0h5NMjEWZ38ipa3zgYvoLJ1H9GHplSz1DJLU=

> On Sep 25, 2018, at 10:39 AM, Chuck Reynolds  
wrote:
> 
> Steve,
> 
> I wasn't able to get the sysprop to work.  I think maybe there is a 
disconnect on my part.
> 
> From the documentation it looks like I can only use the sysprop tag if 
I'm using a Snitch.  Is that correct.
> 
> I can't find any example of anyone using the default Snitch.
> 
> Here is what I have for my rule:
> 
rule=shard:*,replica:1,sysprop.AWSAZ:AZ1=shard:*,replica:2,sysprop.AWSAZ:AZ2=shard:*,replica:3,sysprop.AWSAZ:AZ3
> 
> I'm not specifying a snitch.  Is that my problem or is there a problem 
with my rule?
> 
> Thanks for your help.
> On 9/21/18, 2:40 PM, "Steve Rowe"  wrote:
> 
>Hi Chuck,
> 
>One way to do it is to set a system property on the JVM running each 
Solr node, corresponding to the the AWS availability zone on which the node is 
hosted.
> 
>For example, you could use sysprop “AWSAZ”, then use rules like:
> 
>   replica:<2,sysprop.AWSAZ:us-east-1
>   replica:<2,sysprop.AWSAZ:us-west-1
>   replica:<2,sysprop.AWSAZ:ca-central-1
> 
>--
>Steve
>
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.lucidworks.com=DwIFaQ=kKqjBR9KKWaWpMhASkPbOg=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0=glt-Kw4TwOAGYMt6NB7R6qMysuNssE_CjJH46rL4tqo=6CzANqo-EwE1nnzHaTwr71MxQd7-im366kZUXznMKC8=
> 
>> On Sep 21, 2018, at 4:07 PM, Chuck Reynolds  
wrote:
>> 
>> I'm using Solr 6.6 and I want to create a 90 node cluster with a 
replication
>> factor of three.  I'm using AWS EC2 instances and I have a requirement to
>> replicate the data into 3 AWS availability zones.  
>> 
>> So 30 servers in each zone and I don't see a create collection rule that
>> will put one replica in each of the three zones.
>> 
>> What am I missing?
>> 
>> 
>> 
>> --
>> Sent from: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.472066.n3.nabble.com_Solr-2DUser-2Df472068.html=DwIFaQ=kKqjBR9KKWaWpMhASkPbOg=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0=glt-Kw4TwOAGYMt6NB7R6qMysuNssE_CjJH46rL4tqo=pnPq-r9xSpo7DZsgF-XgR0MyUIFNcaZpAI-xcX4HjCY=
> 
> 
> 





Re: Rule-based replication or sharing

2018-09-25 Thread Steve Rowe
Chuck,

The default Snitch is the one that’s used if you don’t specify one in a rule.  
The sysprop.* tag is provided by the default Snitch.

The only thing that seems wrong to me in your rules is “replica:1”, 
“replica:2”, and “replica:3” - these say that exactly one, two, and three 
replicas of each shard, respectively, must be on each of the nodes that has the 
respective sysprop value.

Since these rules will apply to all nodes that match the sysprop value, you 
have to allow for the possibility that some nodes will have *zero* replicas of 
a shard.  So you could specify “replica:<2”, which means that no node can host 
more than one replica, but it's acceptable for a node to host zero replicas.

Did you set system property AWSAZ on each Solr node with an appropriate value?

--
Steve
www.lucidworks.com

> On Sep 25, 2018, at 10:39 AM, Chuck Reynolds  wrote:
> 
> Steve,
> 
> I wasn't able to get the sysprop to work.  I think maybe there is a 
> disconnect on my part.
> 
> From the documentation it looks like I can only use the sysprop tag if I'm 
> using a Snitch.  Is that correct.
> 
> I can't find any example of anyone using the default Snitch.
> 
> Here is what I have for my rule:
> rule=shard:*,replica:1,sysprop.AWSAZ:AZ1=shard:*,replica:2,sysprop.AWSAZ:AZ2=shard:*,replica:3,sysprop.AWSAZ:AZ3
> 
> I'm not specifying a snitch.  Is that my problem or is there a problem with 
> my rule?
> 
> Thanks for your help.
> On 9/21/18, 2:40 PM, "Steve Rowe"  wrote:
> 
>Hi Chuck,
> 
>One way to do it is to set a system property on the JVM running each Solr 
> node, corresponding to the the AWS availability zone on which the node is 
> hosted.
> 
>For example, you could use sysprop “AWSAZ”, then use rules like:
> 
>   replica:<2,sysprop.AWSAZ:us-east-1
>   replica:<2,sysprop.AWSAZ:us-west-1
>   replica:<2,sysprop.AWSAZ:ca-central-1
> 
>--
>Steve
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.lucidworks.com=DwIFaQ=kKqjBR9KKWaWpMhASkPbOg=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0=glt-Kw4TwOAGYMt6NB7R6qMysuNssE_CjJH46rL4tqo=6CzANqo-EwE1nnzHaTwr71MxQd7-im366kZUXznMKC8=
> 
>> On Sep 21, 2018, at 4:07 PM, Chuck Reynolds  wrote:
>> 
>> I'm using Solr 6.6 and I want to create a 90 node cluster with a replication
>> factor of three.  I'm using AWS EC2 instances and I have a requirement to
>> replicate the data into 3 AWS availability zones.  
>> 
>> So 30 servers in each zone and I don't see a create collection rule that
>> will put one replica in each of the three zones.
>> 
>> What am I missing?
>> 
>> 
>> 
>> --
>> Sent from: 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.472066.n3.nabble.com_Solr-2DUser-2Df472068.html=DwIFaQ=kKqjBR9KKWaWpMhASkPbOg=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0=glt-Kw4TwOAGYMt6NB7R6qMysuNssE_CjJH46rL4tqo=pnPq-r9xSpo7DZsgF-XgR0MyUIFNcaZpAI-xcX4HjCY=
> 
> 
> 



Querying with ConcurrentUpdateSolrClient

2018-09-25 Thread Jason Gerlowski
Hi all,

The Javadocs for ConcurrentUpdateSolrClient steer users away from
using it for query requests:

"Although any SolrClient request can be made with this implementation,
it is only recommended to use ConcurrentUpdateSolrClient with /update
requests. The class HttpSolrClient is better suited for the query
interface."

Looking at CUSC's code though, it immediately defers all non-update
requests to an internal HttpSolrClient.
(https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrClient.java#L477)
 I can't see how this would be any better or worse than using an
unwrapped HttpSolrClient instead.

Is there something I'm missing that changes how this internal
HttpSolrClient behaves?  Or is the advice in CUSC's javadocs maybe
outdated and ripe for removal?

Best,

Jason


Re: Rule-based replication or sharing

2018-09-25 Thread Chuck Reynolds
Steve,

I wasn't able to get the sysprop to work.  I think maybe there is a disconnect 
on my part.

From the documentation it looks like I can only use the sysprop tag if I'm 
using a Snitch.  Is that correct.

I can't find any example of anyone using the default Snitch.

Here is what I have for my rule:
rule=shard:*,replica:1,sysprop.AWSAZ:AZ1=shard:*,replica:2,sysprop.AWSAZ:AZ2=shard:*,replica:3,sysprop.AWSAZ:AZ3

I'm not specifying a snitch.  Is that my problem or is there a problem with my 
rule?

Thanks for your help.
On 9/21/18, 2:40 PM, "Steve Rowe"  wrote:

Hi Chuck,

One way to do it is to set a system property on the JVM running each Solr 
node, corresponding to the the AWS availability zone on which the node is 
hosted.

For example, you could use sysprop “AWSAZ”, then use rules like:

   replica:<2,sysprop.AWSAZ:us-east-1
   replica:<2,sysprop.AWSAZ:us-west-1
   replica:<2,sysprop.AWSAZ:ca-central-1

--
Steve

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.lucidworks.com=DwIFaQ=kKqjBR9KKWaWpMhASkPbOg=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0=glt-Kw4TwOAGYMt6NB7R6qMysuNssE_CjJH46rL4tqo=6CzANqo-EwE1nnzHaTwr71MxQd7-im366kZUXznMKC8=

> On Sep 21, 2018, at 4:07 PM, Chuck Reynolds  
wrote:
> 
> I'm using Solr 6.6 and I want to create a 90 node cluster with a 
replication
> factor of three.  I'm using AWS EC2 instances and I have a requirement to
> replicate the data into 3 AWS availability zones.  
> 
> So 30 servers in each zone and I don't see a create collection rule that
> will put one replica in each of the three zones.
> 
> What am I missing?
> 
> 
> 
> --
> Sent from: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.472066.n3.nabble.com_Solr-2DUser-2Df472068.html=DwIFaQ=kKqjBR9KKWaWpMhASkPbOg=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0=glt-Kw4TwOAGYMt6NB7R6qMysuNssE_CjJH46rL4tqo=pnPq-r9xSpo7DZsgF-XgR0MyUIFNcaZpAI-xcX4HjCY=





SPLITSHARD throwing OutOfMemory Error

2018-09-25 Thread Atita Arora
Hi,

I am working on a test setup with Solr 6.1.0 cloud with 1 collection
sharded across 2 shards with no replication. When triggered a SPLITSHARD
command it throws "java.lang.OutOfMemoryError: Java heap space" everytime.
I tried this with multiple heap settings of 8, 12 & 20G but every time it
does create 2 sub-shards but then fails eventually.
I know the issue => https://jira.apache.org/jira/browse/SOLR-5214 has been
resolved but the trace looked very similar to this one.
Also just to ensure that I do not run into exceptions due to merge as
reported in this ticket, I also tried running optimize before proceeding
with splitting the shard.
I issued the following commands :

1.
http://localhost:8983/solr/admin/collections?collection=testcollection=shard1=SPLITSHARD

This threw java.lang.OutOfMemoryError: Java heap space

2.
http://localhost:8983/solr/admin/collections?collection=testcollection=shard1=SPLITSHARD=1000

Then I ran with async=1000 and checked the status. Every time It's creating
the sub shards, but not splitting the index.

Is there something that I am not doing correctly?

Please guide.

Thanks,
Atita


Index Stats using Luke request in SOLR cloud

2018-09-25 Thread Sudip Mukherjee
Hi,

I am trying to get Index stats using luke request and CloudSolrClient but it is 
not returning index stats from all the shards of a collection.

Each request returns a different result from a different shard.

Is there way I can get Index statistics of collection is SOLR cloud?

For a Standalone server, we have /admin/cores API. Is there something similar 
to that?


Thanks & Regards,
Sudip Mukherjee
***Legal Disclaimer***
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**


Re: Solr index clearing

2018-09-25 Thread Jan Høydahl
Hi,

Solr does not do anything automatically, so I think this is a question for the 
Nutch community - http://nutch.apache.org/mailing_lists.html

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 24. sep. 2018 kl. 20:06 skrev Bineesh :
> 
> Team,
> 
> We use solr 7.3.1 and Nucth 1.15.
> 
> I created two collections in solr and data successfully indexed from Nutch
> after crawling. Up on the third collection index in solr, i see that first
> collecion indexed data automatically clears.Pls suggest
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Solr index clearing

2018-09-25 Thread Bineesh
Team,

We use solr 7.3.1 and Nucth 1.15.

I created two collections in solr and data successfully indexed from Nutch
after crawling. Up on the third collection index in solr, i see that first
collecion indexed data automatically clears.Pls suggest



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html