Re: Something odd with async request status for BACKUP operation on Collections API

2018-10-14 Thread Shawn Heisey

On 10/14/2018 10:39 PM, Shalin Shekhar Mangar wrote:

The responses are collected by node so subsequent responses from the same
node overwrite previous responses. Definitely a bug. Please open an issue.


Done.

https://issues.apache.org/jira/browse/SOLR-12867

Thanks,
Shawn



Re: Something odd with async request status for BACKUP operation on Collections API

2018-10-14 Thread Shalin Shekhar Mangar
The responses are collected by node so subsequent responses from the same
node overwrite previous responses. Definitely a bug. Please open an issue.

On Mon, Oct 15, 2018 at 6:24 AM Shawn Heisey  wrote:

> On 10/14/2018 6:25 PM, dami...@gmail.com wrote:
> > I had an issue with async backup on solr 6.5.1 reporting that the backup
> > was complete when clearly it was not. I was using 12 shards across 6
> nodes.
> > I only noticed this issue when one shard was much larger than the others.
> > There were no answers here
> > http://lucene.472066.n3.nabble.com/async-backup-td4342776.html
>
> One detail I thought I had written but isn't there:  The backup did
> fully complete -- all 30 shards were in the backup location.  Not a lot
> in each shard backup -- the collection was empty.  It would be easy
> enough to add a few thousand documents to the collection before doing
> the backup.
>
> If the backup process reports that it's done before it's ACTUALLY done,
> that's a bad thing.  It's hard to say whether that problem is related to
> the problem I described.  Since I haven't dived into the code, I cannot
> say for sure, but it honestly would not surprise me to find they are
> connected.  Every time I try to understand Collections API code, I find
> it extremely difficult to follow.
>
> I'm sorry that you never got resolution on your problem.  Do you know
> whether that is still a problem in 7.x?  Setting up a reproduction where
> one shard is significantly larger than the others will take a little bit
> of work.
>
> > I was focusing on the STATUS returned from the REQUESTSTATUS command, but
> > looking again now I can see a response from only 6 shards, and each shard
> > is from a different node. So this fits with what you're seeing. I assume
> > your shards 1, 7, 9 are all on different nodes.
>
> I did not actually check, and the cloud example I was using isn't around
> any more, but each of the shards in the status response were PROBABLY on
> separate nodes.  The cloud example was 3 nodes.  It's an easy enough
> scenario to replicate, and I provided enough details for anyone to do it.
>
> The person on IRC that reported this problem had a cluster of 15 nodes,
> and the status response had ten shards (out of 30) mentioned.  It was
> shards 1-9 and shard 20.  The suspicion is that there's something
> hard-coded that limits it to 10 responses ... because without that, I
> would expect the number of shards in the response to match the number of
> nodes.
>
> Thanks,
> Shawn
>
>

-- 
Regards,
Shalin Shekhar Mangar.


Re: Zookeeper external vs internal

2018-10-14 Thread Shawn Heisey

On 10/14/2018 9:31 PM, Sourav Moitra wrote:

My question does running separate zookeeper ensemble in the same boxes
provides any advantage over using the solr embedded zookeeper ?


The major disadvantage to having ZK embedded in Solr is this:  If you 
stop or restart the Solr process, part of your ZK ensemble goes down 
too.  It is vastly preferable to have it running as a separate process, 
so that you can restart one of the services without causing disruption 
in the other service.


Thanks,
Shawn



Zookeeper external vs internal

2018-10-14 Thread Sourav Moitra
Hello,

As per the documentation it is preferable to use external zookeeper
service. I am provisioning 3 Solr servers with each having Solr 7.5 in
cloud mode and seperate zookeeper daemon process running. The
zookeepers of each boxes are configured to form ensemble among them.

My question does running separate zookeeper ensemble in the same boxes
provides any advantage over using the solr embedded zookeeper ?


Sourav Moitra
https://souravmoitra.com


Re: Something odd with async request status for BACKUP operation on Collections API

2018-10-14 Thread Shawn Heisey

On 10/14/2018 6:25 PM, dami...@gmail.com wrote:

I had an issue with async backup on solr 6.5.1 reporting that the backup
was complete when clearly it was not. I was using 12 shards across 6 nodes.
I only noticed this issue when one shard was much larger than the others.
There were no answers here
http://lucene.472066.n3.nabble.com/async-backup-td4342776.html


One detail I thought I had written but isn't there:  The backup did 
fully complete -- all 30 shards were in the backup location.  Not a lot 
in each shard backup -- the collection was empty.  It would be easy 
enough to add a few thousand documents to the collection before doing 
the backup.


If the backup process reports that it's done before it's ACTUALLY done, 
that's a bad thing.  It's hard to say whether that problem is related to 
the problem I described.  Since I haven't dived into the code, I cannot 
say for sure, but it honestly would not surprise me to find they are 
connected.  Every time I try to understand Collections API code, I find 
it extremely difficult to follow.


I'm sorry that you never got resolution on your problem.  Do you know 
whether that is still a problem in 7.x?  Setting up a reproduction where 
one shard is significantly larger than the others will take a little bit 
of work.



I was focusing on the STATUS returned from the REQUESTSTATUS command, but
looking again now I can see a response from only 6 shards, and each shard
is from a different node. So this fits with what you're seeing. I assume
your shards 1, 7, 9 are all on different nodes.


I did not actually check, and the cloud example I was using isn't around 
any more, but each of the shards in the status response were PROBABLY on 
separate nodes.  The cloud example was 3 nodes.  It's an easy enough 
scenario to replicate, and I provided enough details for anyone to do it.


The person on IRC that reported this problem had a cluster of 15 nodes, 
and the status response had ten shards (out of 30) mentioned.  It was 
shards 1-9 and shard 20.  The suspicion is that there's something 
hard-coded that limits it to 10 responses ... because without that, I 
would expect the number of shards in the response to match the number of 
nodes.


Thanks,
Shawn



Re: Something odd with async request status for BACKUP operation on Collections API

2018-10-14 Thread damienk
Hi Shawn,

I had an issue with async backup on solr 6.5.1 reporting that the backup
was complete when clearly it was not. I was using 12 shards across 6 nodes.
I only noticed this issue when one shard was much larger than the others.
There were no answers here
http://lucene.472066.n3.nabble.com/async-backup-td4342776.html

I was focusing on the STATUS returned from the REQUESTSTATUS command, but
looking again now I can see a response from only 6 shards, and each shard
is from a different node. So this fits with what you're seeing. I assume
your shards 1, 7, 9 are all on different nodes.

HTH,
Damien.


On Sat, 13 Oct 2018 at 02:28, Shawn Heisey  wrote:

> I'm working on reproducing a problem reported via the IRC channel.
>
> Started a test cloud with 7.5.0. Initially with two nodes, then again
> with 3 nodes.  Did this on Windows 10.
>
> Command to create a collection:
>
> bin\solr create -c test2 -shards 30 -replicationFactor 2
>
> For these URLs, I dropped them into a browser, so URL encoding was
> handled automatically.  I'm sure the URL to start the backup wouldn't
> work as-is with curl because it includes characters that need encoding.
>
> Backup URL:
>
>
> http://localhost:8983/solr/admin/collections?action=BACKUP=test2.3=test2=C
> :\Users\elyograg\Downloads\solrbackups=sometag
>
> Request status URL:
>
>
> http://localhost:8983/solr/admin/collections?action=REQUESTSTATUS=sometag
>
> Here's the raw JSON response from the status URL:
> {
>"responseHeader":{
>  "status":0,
>  "QTime":3},
>"success":{
>  "192.168.56.1:7574_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":0}},
>  "192.168.56.1:8983_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":2}},
>  "192.168.56.1:8983_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":0}},
>  "192.168.56.1:8983_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":2}},
>  "192.168.56.1:8983_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":0}},
>  "192.168.56.1:7574_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":0}},
>  "192.168.56.1:7574_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":1}},
>  "192.168.56.1:8983_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":35}},
>  "192.168.56.1:8983_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":0}},
>  "192.168.56.1:8983_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":1}},
>  "192.168.56.1:8983_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":1}},
>  "192.168.56.1:8983_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":0}},
>  "192.168.56.1:8983_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":33}},
>  "192.168.56.1:8983_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":34}},
>  "192.168.56.1:8983_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":0}},
>  "192.168.56.1:8983_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":40}},
>  "192.168.56.1:8984_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":2}},
>  "192.168.56.1:8984_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":2}},
>  "192.168.56.1:7574_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":0}},
>  "192.168.56.1:8983_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":0}},
>  "192.168.56.1:7574_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":0}},
>  "192.168.56.1:7574_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":0}},
>  "192.168.56.1:8983_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":0}},
>  "192.168.56.1:7574_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":0}},
>  "192.168.56.1:8984_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":0}},
>  "192.168.56.1:8984_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":0}},
>  "192.168.56.1:7574_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":0}},
>  "192.168.56.1:8983_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":0}},
>  "192.168.56.1:8983_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":0}},
>  "192.168.56.1:8983_solr":{
>"responseHeader":{
>  "status":0,
>  "QTime":1}}},
>"sometag135341573915254":{
>  "responseHeader":{
>"status":0,
>"QTime":0},
>  "STATUS":"completed",
>  "Response":"TaskId: 

Solr Question

2018-10-14 Thread Joseph Costello - F Reports
I had a quick question regarding using Solr for doing fast geospatial 
calculations against multiple locations.  For example we have a product that 
takes 2 to 10 companies at a time (i.e. McDonalds 14,000 Subway 20,000, Duncan 
Donuts 5000), and identifies and maps any store overlap based on a range 
between 0.1 or 20 mile radius.   As you probably aware with this many locations 
performing these calculations on the fly just takes too long.   Our initial 
solution was to process all distance calculations via a nightly process so the 
system just needs to retrieve them from the database.  This for the most part 
has work really well and returns results no matter how large the dataset almost 
immediately.

I know that Solr is very fast, especially in the Geospatial queries, but is 
there any way it will be faster doing millions of on the fly geospatial 
calculations, then having the calculations already done and just retrieving 
them from the Database?

Regards,

Joe


Joseph Costello
Chief Information Officer

F Reports | Creditntell | ARMS
===
Information Clearinghouse Inc. & Market Service Inc.
310 East Shore Road, Great Neck, NY 11023
email: jose...@fdreports.com | Tel: 800.789.0123 
ext 112 | Cell: 516.263.6555 | 
www.informationclearinghouseinc.com

[Help Desk]Need Help?  Click 
here to request IT Support.



Re: CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)

2018-10-14 Thread Shawn Heisey

On 10/14/2018 6:32 AM, yasoobhaider wrote:

Memory Analyzer output:

One instance of "org.apache.solr.uninverting.FieldCacheImpl" loaded by
"org.eclipse.jetty.webapp.WebAppClassLoader @ 0x7f60f7b38658" occupies
61,234,712,560 (91.86%) bytes. The memory is accumulated in one instance of
"java.util.HashMap$Node[]" loaded by "".



But I also noticed that the fieldcache entries on solr UI have the same
entries for all collections on that solr instance.

Ques 1. Is the field cache reset on commit? If so, is it reset when any of
the collections are committed? Or it is not reset at all and I'm missing
something here?


ALL caches are invalidated when a new searcher is opened. Some of the 
caches support autowarming.  The field cache isn't one of them.  The 
documentCache also cannot be warmed ... but warming queryResultCache 
will populate documentCache.



Ques 2. Is there a way to reset/delete this cache every x minutes (the
current autocommit duration) irrespective of whether documents were added or
not?


Not that I know of.  Opening a new searcher is generally required to 
clear out Solr's caches.  Opening a new searcher requires a change to 
the index and a commit.  One thing you could do is have your indexing 
software insert/update a dummy document (with a special value in the 
uniqueKey field and all non-required fields missing) on a regular basis.



Other than this, I think the reason for huge heap usage (as others have
pointed out) is that we are not using docValues for any of the fields, and
we use a large number of fields in sorting functions (between 15-20 over all
queries combined). As the next step on this front, I will add new fields
with docvalues true and reindex the entire collection. Hopefully that will
help.


Yes, if you facet, do group queries, or sort on fields that do not have 
docValues, then Solr must build an uninverted index to do those things, 
and I think it uses the field cache for that. The docValues structure is 
the same info as an uninverted index, so Solr can just read it directly, 
rather than generating it and using heap memory.


Adding docValues will make your index bigger.  For fields with high 
cardinality, the increase could be large.



We use quite a few dynamic fields in sorting. There is no mention of using
docvalues with dynamic fields in the official documentation
(https://lucene.apache.org/solr/guide/6_6/docvalues.html).

Ques 3. Do docvalues work with dynamic fields or not? If they do, anything
in particular that I should look out for, like the cardinality of the field
(ie number of different x's in example_dynamic_field_x)?


The ONLY functional difference between a dynamic field and a "standard" 
field is that dynamic fields are not explicitly named in the schema, but 
use wildcard naming.  This difference is only at the Solr level.  At the 
Lucene level, there is NO difference at all.  A dynamic field can have 
any of the same attributes as other fields, including docValues.



Shawn, I've uploaded my configuration files for the two collections here:
https://ufile.io/u6oe0 (tar -zxvf c1a_confs.tar.gz to decompress)

c1 collection is ~10GB when optimized, and has 2.5 million documents.
ca collection is ~2GB when optimized, and has 9.5 million documents.

Please let me know if you think there is something amiss in the
configuration that I should fix.


I think your autowarm counts, particularly on the filterCache, are 
probably too large.  But if commits that open a new searcher are 
happening quickly, you probably won't need to fiddle with that.  You can 
check the admin UI "plugins/stats" area to see how long it takes to open 
the last searcher, and the caches have information about how long it 
took to warm.


You have increased ramBufferSizeMB. The default value is 100, and for 
most indexes, increasing it just consumes memory without making things 
work any faster.  Increasing it a little bit might make a difference on 
your c1 collection, since its documents are a bit larger than what I'd 
call typical.


Here's what I would recommend you use for autoCommit (removing maxDocs, 
lowering maxTime, and setting openSearcher to false):


    
  6
  false
    

Opening a new searcher on autoCommit isn't necessary if you configure 
autoSoftCommit, and disabling it will make those commits faster.  You 
*do* want autoCommit configured with a fairly short interval, so don't 
remove it.  Not configuring maxDocs makes the operation more predictable.


For autoSoftCommit, 30 seconds is pretty aggressive, but as long as 
commits are happening very quickly, shouldn't be a problem.  If commits 
are taking more than a few seconds, I would increase the interval.  The 
autoSoftCommit interval in the config one of your collections is set to 
an hour ... if you're not overriding that with the 
solr.autoSoftCommit.maxTime property, you could decrease that one.


Thanks,
Shawn



Re: CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)

2018-10-14 Thread yasoobhaider
After none of the JVM configuration options helped witH GC, as Erick
suggested I took a heap dump of one of the misbehaving slaves and analysis
shows that fieldcache is using a large amount of the the total heap.

Memory Analyzer output:

One instance of "org.apache.solr.uninverting.FieldCacheImpl" loaded by
"org.eclipse.jetty.webapp.WebAppClassLoader @ 0x7f60f7b38658" occupies
61,234,712,560 (91.86%) bytes. The memory is accumulated in one instance of
"java.util.HashMap$Node[]" loaded by "".

Hypotheses:

Without regular indexing, commits are not happening, so the searcher is not
being re opened, and field cache is not being reset. Since there is only one
instance of this field cache, it is a live object and not being cleaned up
in GC.

But I also noticed that the fieldcache entries on solr UI have the same
entries for all collections on that solr instance.

Ques 1. Is the field cache reset on commit? If so, is it reset when any of
the collections are committed? Or it is not reset at all and I'm missing
something here?
Ques 2. Is there a way to reset/delete this cache every x minutes (the
current autocommit duration) irrespective of whether documents were added or
not?

Other than this, I think the reason for huge heap usage (as others have
pointed out) is that we are not using docValues for any of the fields, and
we use a large number of fields in sorting functions (between 15-20 over all
queries combined). As the next step on this front, I will add new fields
with docvalues true and reindex the entire collection. Hopefully that will
help.

We use quite a few dynamic fields in sorting. There is no mention of using
docvalues with dynamic fields in the official documentation
(https://lucene.apache.org/solr/guide/6_6/docvalues.html). 

Ques 3. Do docvalues work with dynamic fields or not? If they do, anything
in particular that I should look out for, like the cardinality of the field
(ie number of different x's in example_dynamic_field_x)?

Shawn, I've uploaded my configuration files for the two collections here:
https://ufile.io/u6oe0 (tar -zxvf c1a_confs.tar.gz to decompress)

c1 collection is ~10GB when optimized, and has 2.5 million documents.
ca collection is ~2GB when optimized, and has 9.5 million documents.

Please let me know if you think there is something amiss in the
configuration that I should fix.

Thanks
Yasoob



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html