Re: number of documents exceed 2147483519

2020-03-17 Thread Hongxu Ma
I was trying "SPLITSHARD" in my test env and encountered a strange behavior:
I created a 1M doc collection and it contained 8 shards, then split shard1, 
after split:

  *   SPLITSHARD returned success.
  *   looks good
 *   shard1 (became inactive) -> shard1_0 and shard1_1
 *   range 8000-9fff -> 8000-8fff and 9000-9fff
  *   but
 *   the doc number is increasing: docnum of shard1_0 (70702) + docnum of 
shard1_1(67980) > docnum of shard1(124818)

I tested many times and this issue happened every time. Why?

Thanks.








________
From: Hongxu Ma 
Sent: Monday, March 16, 2020 16:46
To: solr-user@lucene.apache.org 
Subject: number of documents exceed 2147483519

Hi
I'm using solr-cloud (ver 6.6), got an error:
org.apache.solr.common.SolrException: Exception writing document id (null) to 
the index; possible analysis error: number of documents in the index cannot 
exceed 2147483519

After googled it, I know the number is exceed one solr shard limit.
The collection has 64 shards, so I think total limit is 20B*64=128B

My question is:
I don't want to recreate index (then split to more shards) and also don't want 
to delete docs.
Can I using the "SPLITSHARD" api to fix this issue?
https://lucene.apache.org/solr/guide/6_6/collections-api.html#CollectionsAPI-splitshard

After split each shard (now 128 shards), I think the total limit is increasing 
to 256B, right?

Thanks.


Collections API | Apache Solr Reference Guide 
6.6<https://lucene.apache.org/solr/guide/6_6/collections-api.html#CollectionsAPI-splitshard>
The Collections API is used to enable you to create, remove, or reload 
collections, but in the context of SolrCloud you can also use it to create 
collections with a specific number of shards and replicas.
lucene.apache.org




number of documents exceed 2147483519

2020-03-16 Thread Hongxu Ma
Hi
I'm using solr-cloud (ver 6.6), got an error:
org.apache.solr.common.SolrException: Exception writing document id (null) to 
the index; possible analysis error: number of documents in the index cannot 
exceed 2147483519

After googled it, I know the number is exceed one solr shard limit.
The collection has 64 shards, so I think total limit is 20B*64=128B

My question is:
I don't want to recreate index (then split to more shards) and also don't want 
to delete docs.
Can I using the "SPLITSHARD" api to fix this issue?
https://lucene.apache.org/solr/guide/6_6/collections-api.html#CollectionsAPI-splitshard

After split each shard (now 128 shards), I think the total limit is increasing 
to 256B, right?

Thanks.


Collections API | Apache Solr Reference Guide 
6.6
The Collections API is used to enable you to create, remove, or reload 
collections, but in the context of SolrCloud you can also use it to create 
collections with a specific number of shards and replicas.
lucene.apache.org




Re: A question about solr filter cache

2020-02-17 Thread Hongxu Ma
@Vadim Ivanov<mailto:vadim.iva...@spb.ntk-intourist.ru>

Thank you!

From: Vadim Ivanov 
Sent: Tuesday, February 18, 2020 15:27
To: solr-user@lucene.apache.org 
Subject: RE: A question about solr filter cache

Hi!
Yes, it may depends on Solr version
Solr 8.3 Admin filterCache page stats looks like:

stats:
CACHE.searcher.filterCache.cleanupThread:false
CACHE.searcher.filterCache.cumulative_evictions:0
CACHE.searcher.filterCache.cumulative_hitratio:0.94
CACHE.searcher.filterCache.cumulative_hits:198
CACHE.searcher.filterCache.cumulative_idleEvictions:0
CACHE.searcher.filterCache.cumulative_inserts:12
CACHE.searcher.filterCache.cumulative_lookups:210
CACHE.searcher.filterCache.evictions:0
CACHE.searcher.filterCache.hitratio:1
CACHE.searcher.filterCache.hits:84
CACHE.searcher.filterCache.idleEvictions:0
CACHE.searcher.filterCache.inserts:0
CACHE.searcher.filterCache.lookups:84
CACHE.searcher.filterCache.maxRamMB:-1
CACHE.searcher.filterCache.ramBytesUsed:70768
CACHE.searcher.filterCache.size:12
CACHE.searcher.filterCache.warmupTime:1

> -Original Message-
> From: Hongxu Ma [mailto:inte...@outlook.com]
> Sent: Tuesday, February 18, 2020 5:32 AM
> To: solr-user@lucene.apache.org
> Subject: Re: A question about solr filter cache
>
> @Erick Erickson<mailto:erickerick...@gmail.com> and @Mikhail Khludnev
>
> got it, the explanation is very clear.
>
> Thank you for your help.
> ____
> From: Hongxu Ma 
> Sent: Tuesday, February 18, 2020 10:22
> To: Vadim Ivanov ; solr-
> u...@lucene.apache.org 
> Subject: Re: A question about solr filter cache
>
> Thank you @Vadim Ivanov<mailto:vadim.iva...@spb.ntk-intourist.ru>
> I know that admin page, but I cannot find the memory usage of filter cache
> (only has "CACHE.searcher.filterCache.size", I think it's the used slot
number
> of filtercache)
>
> There is my output (solr version 7.3.1):
>
> filterCache
>
>   *
>
> class:
> org.apache.solr.search.FastLRUCache
>   *
>
> description:
> Concurrent LRU Cache(maxSize=512, initialSize=512, minSize=460,
> acceptableSize=486, cleanupThread=false)
>   *   stats:
>  *
>
> CACHE.searcher.filterCache.cumulative_evictions:
> 0
>  *
>
> CACHE.searcher.filterCache.cumulative_hitratio:
> 0.5
>  *
>
> CACHE.searcher.filterCache.cumulative_hits:
> 1
>  *
>
> CACHE.searcher.filterCache.cumulative_inserts:
> 1
>  *
>
> CACHE.searcher.filterCache.cumulative_lookups:
> 2
>  *
>
> CACHE.searcher.filterCache.evictions:
> 0
>  *
>
> CACHE.searcher.filterCache.hitratio:
> 0.5
>  *
>
> CACHE.searcher.filterCache.hits:
> 1
>  *
>
> CACHE.searcher.filterCache.inserts:
> 1
>  *
>
> CACHE.searcher.filterCache.lookups:
> 2
>  *
>
> CACHE.searcher.filterCache.size:
> 1
>  *
>
> CACHE.searcher.filterCache.warmupTime:
> 0
>
>
>
> 
> From: Vadim Ivanov 
> Sent: Monday, February 17, 2020 17:51
> To: solr-user@lucene.apache.org 
> Subject: RE: A question about solr filter cache
>
> You can easily check amount of RAM used by core filterCache in Admin UI:
> Choose core - Plugins/Stats - Cache - filterCache It shows useful
information
> on configuration, statistics and current RAM usage by filter cache, as
well as
> some examples of current filtercaches in RAM Core, for ex, with 10 mln
docs
> uses 1.3 MB of Ram for every filterCache
>
>
> > -Original Message-
> > From: Hongxu Ma [mailto:inte...@outlook.com]
> > Sent: Monday, February 17, 2020 12:13 PM
> > To: solr-user@lucene.apache.org
> > Subject: A question about solr filter cache
> >
> > Hi
> > I want to know the internal of solr filter cache, especially its
> > memory
> usage.
> >
> > I googled some pages:
> > https://teaspoon-consulting.com/articles/solr-cache-tuning.html
> > https://lucene.472066.n3.nabble.com/Solr-Filter-Cache-Size-td4120912.h
> > tml
> > (Erick Erickson's answer)
> >
> > All of them said its structure is: fq => a bitmap (total doc number
> > bits),
> but I
> > think it's not so simple, reason:
> > Given total doc number is 1 billion, each filter cache entry will use
> nearly
> > 1GB(10/8 bit), it's too big and very easy to make solr OOM (I
> > have
> a
> > 1 billion doc cluster, looks it works well)
> >
> > And I also checked solr node, but cannot find the details (only saw
> > using DocSets structure)
> >
> > So far, I guess:
> >
> >   *   degenerate into an doc id array/list when the bitmap is sparse
> >   *   using some compressed bitmap, e.g. roaring bitmaps
> >
> > which one is correct? or another answer, thanks you very much!
>




Re: A question about solr filter cache

2020-02-17 Thread Hongxu Ma
@Erick Erickson<mailto:erickerick...@gmail.com> and @Mikhail Khludnev

got it, the explanation is very clear.

Thank you for your help.

From: Hongxu Ma 
Sent: Tuesday, February 18, 2020 10:22
To: Vadim Ivanov ; 
solr-user@lucene.apache.org 
Subject: Re: A question about solr filter cache

Thank you @Vadim Ivanov<mailto:vadim.iva...@spb.ntk-intourist.ru>
I know that admin page, but I cannot find the memory usage of filter cache 
(only has "CACHE.searcher.filterCache.size", I think it's the used slot number 
of filtercache)

There is my output (solr version 7.3.1):

filterCache

  *

class:
org.apache.solr.search.FastLRUCache
  *

description:
Concurrent LRU Cache(maxSize=512, initialSize=512, minSize=460, 
acceptableSize=486, cleanupThread=false)
  *   stats:
 *

CACHE.searcher.filterCache.cumulative_evictions:
0
 *

CACHE.searcher.filterCache.cumulative_hitratio:
0.5
 *

CACHE.searcher.filterCache.cumulative_hits:
1
 *

CACHE.searcher.filterCache.cumulative_inserts:
1
 *

CACHE.searcher.filterCache.cumulative_lookups:
2
 *

CACHE.searcher.filterCache.evictions:
0
 *

CACHE.searcher.filterCache.hitratio:
0.5
 *

CACHE.searcher.filterCache.hits:
1
 *

CACHE.searcher.filterCache.inserts:
1
 *

CACHE.searcher.filterCache.lookups:
2
 *

CACHE.searcher.filterCache.size:
1
 *

CACHE.searcher.filterCache.warmupTime:
0




From: Vadim Ivanov 
Sent: Monday, February 17, 2020 17:51
To: solr-user@lucene.apache.org 
Subject: RE: A question about solr filter cache

You can easily check amount of RAM used by core filterCache in Admin UI:
Choose core - Plugins/Stats - Cache - filterCache
It shows useful information on configuration, statistics and current RAM
usage by filter cache,
as well as some examples of current filtercaches in RAM
Core, for ex, with 10 mln docs uses 1.3 MB of Ram for every filterCache


> -Original Message-
> From: Hongxu Ma [mailto:inte...@outlook.com]
> Sent: Monday, February 17, 2020 12:13 PM
> To: solr-user@lucene.apache.org
> Subject: A question about solr filter cache
>
> Hi
> I want to know the internal of solr filter cache, especially its memory
usage.
>
> I googled some pages:
> https://teaspoon-consulting.com/articles/solr-cache-tuning.html
> https://lucene.472066.n3.nabble.com/Solr-Filter-Cache-Size-td4120912.html
> (Erick Erickson's answer)
>
> All of them said its structure is: fq => a bitmap (total doc number bits),
but I
> think it's not so simple, reason:
> Given total doc number is 1 billion, each filter cache entry will use
nearly
> 1GB(10/8 bit), it's too big and very easy to make solr OOM (I have
a
> 1 billion doc cluster, looks it works well)
>
> And I also checked solr node, but cannot find the details (only saw using
> DocSets structure)
>
> So far, I guess:
>
>   *   degenerate into an doc id array/list when the bitmap is sparse
>   *   using some compressed bitmap, e.g. roaring bitmaps
>
> which one is correct? or another answer, thanks you very much!




Re: A question about solr filter cache

2020-02-17 Thread Hongxu Ma
Thank you @Vadim Ivanov<mailto:vadim.iva...@spb.ntk-intourist.ru>
I know that admin page, but I cannot find the memory usage of filter cache 
(only has "CACHE.searcher.filterCache.size", I think it's the used slot number 
of filtercache)

There is my output (solr version 7.3.1):

filterCache

  *

class:
org.apache.solr.search.FastLRUCache
  *

description:
Concurrent LRU Cache(maxSize=512, initialSize=512, minSize=460, 
acceptableSize=486, cleanupThread=false)
  *   stats:
 *

CACHE.searcher.filterCache.cumulative_evictions:
0
 *

CACHE.searcher.filterCache.cumulative_hitratio:
0.5
 *

CACHE.searcher.filterCache.cumulative_hits:
1
 *

CACHE.searcher.filterCache.cumulative_inserts:
1
 *

CACHE.searcher.filterCache.cumulative_lookups:
2
 *

CACHE.searcher.filterCache.evictions:
0
 *

CACHE.searcher.filterCache.hitratio:
0.5
 *

CACHE.searcher.filterCache.hits:
1
 *

CACHE.searcher.filterCache.inserts:
1
 *

CACHE.searcher.filterCache.lookups:
2
 *

CACHE.searcher.filterCache.size:
1
 *

CACHE.searcher.filterCache.warmupTime:
0




From: Vadim Ivanov 
Sent: Monday, February 17, 2020 17:51
To: solr-user@lucene.apache.org 
Subject: RE: A question about solr filter cache

You can easily check amount of RAM used by core filterCache in Admin UI:
Choose core - Plugins/Stats - Cache - filterCache
It shows useful information on configuration, statistics and current RAM
usage by filter cache,
as well as some examples of current filtercaches in RAM
Core, for ex, with 10 mln docs uses 1.3 MB of Ram for every filterCache


> -Original Message-----
> From: Hongxu Ma [mailto:inte...@outlook.com]
> Sent: Monday, February 17, 2020 12:13 PM
> To: solr-user@lucene.apache.org
> Subject: A question about solr filter cache
>
> Hi
> I want to know the internal of solr filter cache, especially its memory
usage.
>
> I googled some pages:
> https://teaspoon-consulting.com/articles/solr-cache-tuning.html
> https://lucene.472066.n3.nabble.com/Solr-Filter-Cache-Size-td4120912.html
> (Erick Erickson's answer)
>
> All of them said its structure is: fq => a bitmap (total doc number bits),
but I
> think it's not so simple, reason:
> Given total doc number is 1 billion, each filter cache entry will use
nearly
> 1GB(10/8 bit), it's too big and very easy to make solr OOM (I have
a
> 1 billion doc cluster, looks it works well)
>
> And I also checked solr node, but cannot find the details (only saw using
> DocSets structure)
>
> So far, I guess:
>
>   *   degenerate into an doc id array/list when the bitmap is sparse
>   *   using some compressed bitmap, e.g. roaring bitmaps
>
> which one is correct? or another answer, thanks you very much!




A question about solr filter cache

2020-02-17 Thread Hongxu Ma
Hi
I want to know the internal of solr filter cache, especially its memory usage.

I googled some pages:
https://teaspoon-consulting.com/articles/solr-cache-tuning.html
https://lucene.472066.n3.nabble.com/Solr-Filter-Cache-Size-td4120912.html 
(Erick Erickson's answer)

All of them said its structure is: fq => a bitmap (total doc number bits), but 
I think it's not so simple, reason:
Given total doc number is 1 billion, each filter cache entry will use nearly 
1GB(10/8 bit), it's too big and very easy to make solr OOM (I have a 1 
billion doc cluster, looks it works well)

And I also checked solr node, but cannot find the details (only saw using 
DocSets structure)

So far, I guess:

  *   degenerate into an doc id array/list when the bitmap is sparse
  *   using some compressed bitmap, e.g. roaring bitmaps

which one is correct? or another answer, thanks you very much!



Question about the max num of solr node

2020-01-03 Thread Hongxu Ma
Hi community
I plan to set up a 128 host cluster: 2 solr nodes on each host.
But I have a little concern about whether solr can support so many nodes.

I searched on wiki and found:
https://cwiki.apache.org/confluence/display/SOLR/2019-11+Meeting+on+SolrCloud+and+project+health
"If you create thousands of collections, it’ll lock up and become inoperable.  
Scott reported that If you boot up a 100+ node cluster, SolrCloud won’t get to 
a happy state; currently you need to start them gradually."

I wonder to know:
Beside the quoted items, does solr have known issues in a big cluster?
And does solr have a hard limit number of max node?

Thanks.


Re: A question of solr recovery

2019-12-12 Thread Hongxu Ma
Thank you @Erick Erickson for your explanation! (although I don't fully 
understand all details  ).

I am using Solr 6.6, so I think there is only NRT replica in this version, and 
I understand the whole recovery process now.

Maybe I will upgrade to Solr 7+ in future, and try the new TLOG/PULL replica.
Thanks.



From: Erick Erickson 
Sent: Thursday, December 12, 2019 22:49
To: Hongxu Ma 
Subject: Re: A question of solr recovery

If you’re using TLOG/PULL replica types, then only changed segments
are downloaded. That replication pattern has a very different
algorithm. The problem with NRT replicas is that segments on
different replicas may not contain the same documents (in fact,
almost all the time won’t). This is because the wall-clock time
that the autocommit interval expires at, which closes segments,
will be different due to network delays and the like. This was
a deliberate design choice to make indexing as fast as possible
in distributed situations. If the leader coordinated all the commits,
it’d introduce a delay, potentially quite long if, say, the leader
needed to wait for a timeout.

Even if commits were exactly synchronous over all replicas in a shard,
the leader indexes a document and forwards it to the replica. The
commit could expire on both while the doc was in-flight.

Best,
Erick

On Dec 12, 2019, at 5:37 AM, Hongxu Ma  wrote:

Thank you very much @Erick Erickson
It's very clear.

And I found my "full sync" log:
"IndexFetcher Total time taken for download 
(fullCopy=true,bytesDownloaded=178161685180) : 4377 secs (40704063 bytes/sec) 
to NIOFSDirectory@..."

A more question:
Form the log, looks it downloaded all segment files (178GB), it's very big and 
took a long time.
Is it possible only download the segment file which contains the missing part? 
No need all files, maybe it can save time?

For example, there is my fabricated algorithm (like database does):
• recovery form local tlog as much as possible
• calculate the latest version
• only download the segment file which contains data > this version
Thanks.

From: Erick Erickson 
Sent: Wednesday, December 11, 2019 20:56
To: solr-user@lucene.apache.org 
Subject: Re: A question of solr recovery

Updates in this context are individual documents, either new ones
or a new version of an existing document. Long recoveries are
quite unlikely to be replaying a few documents from the tlog.

My bet is that you had to do a “full sync” (there should be messages
to that effect in the Solr log). This means that the replica had to
copy the entire index from the leader, and that varies with the size
of the index, network speed and contention, etc.

And to make it more complicated, and despite the comment about 100
docs and the tlog…. while that copy is going on, _new_ updates are
written to the tlog of the recovering replica and after the index
has been copied, those new updates are replayed locally. The 100
doc limit does _not_ apply in this case. So say the recovery starts
at time T and lasts for 60 seconds. All updates sent to the shard
leader over that 60 seconds are put in the local tlog and after the
copy is done, they’re replayed. And then, you guessed it, any
updates received by the leader over that 60 second period are written
to the recovering replica’s tlog and replayed… Under heavy
indexing loads, this can go no for quite a long time. Not certain
that’s what’s happening, but something to be aware of.

Best,
Erick

On Dec 10, 2019, at 10:39 PM, Hongxu Ma  wrote:

Hi all
In my cluster, Solr node turned into long time recovery sometimes.
So I want to know more about recovery and have read a good blog:
https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

It mentioned in the recovery section:
"Replays the documents from its own tlog if < 100 new updates have been 
received by the leader. "

My question: what's the meaning of "updates"? commits? or documents?
I refered solr code but still not sure about it.

Hope you can help, thanks.




Re: A question of solr recovery

2019-12-12 Thread Hongxu Ma
Thank you very much @Erick Erickson<mailto:erickerick...@gmail.com>
It's very clear.

And I found my "full sync" log:
"IndexFetcher Total time taken for download 
(fullCopy=true,bytesDownloaded=178161685180) : 4377 secs (40704063 bytes/sec) 
to NIOFSDirectory@..."

A more question:
Form the log, looks it downloaded all segment files (178GB), it's very big and 
took a long time.
Is it possible only download the segment file which contains the missing part? 
No need all files, maybe it can save time?

For example, there is my fabricated algorithm (like database does):

  *   recovery form local tlog as much as possible
  *   calculate the latest version
  *   only download the segment file which contains data > this version

Thanks.


From: Erick Erickson 
Sent: Wednesday, December 11, 2019 20:56
To: solr-user@lucene.apache.org 
Subject: Re: A question of solr recovery

Updates in this context are individual documents, either new ones
or a new version of an existing document. Long recoveries are
quite unlikely to be replaying a few documents from the tlog.

My bet is that you had to do a “full sync” (there should be messages
to that effect in the Solr log). This means that the replica had to
copy the entire index from the leader, and that varies with the size
of the index, network speed and contention, etc.

And to make it more complicated, and despite the comment about 100
docs and the tlog…. while that copy is going on, _new_ updates are
written to the tlog of the recovering replica and after the index
has been copied, those new updates are replayed locally. The 100
doc limit does _not_ apply in this case. So say the recovery starts
at time T and lasts for 60 seconds. All updates sent to the shard
leader over that 60 seconds are put in the local tlog and after the
copy is done, they’re replayed. And then, you guessed it, any
updates received by the leader over that 60 second period are written
to the recovering replica’s tlog and replayed… Under heavy
indexing loads, this can go no for quite a long time. Not certain
that’s what’s happening, but something to be aware of.

Best,
Erick

> On Dec 10, 2019, at 10:39 PM, Hongxu Ma  wrote:
>
> Hi all
> In my cluster, Solr node turned into long time recovery sometimes.
> So I want to know more about recovery and have read a good blog:
> https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> It mentioned in the recovery section:
> "Replays the documents from its own tlog if < 100 new updates have been 
> received by the leader. "
>
> My question: what's the meaning of "updates"? commits? or documents?
> I refered solr code but still not sure about it.
>
> Hope you can help, thanks.
>



A question of solr recovery

2019-12-10 Thread Hongxu Ma
Hi all
In my cluster, Solr node turned into long time recovery sometimes.
So I want to know more about recovery and have read a good blog:
https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

It mentioned in the recovery section:
"Replays the documents from its own tlog if < 100 new updates have been 
received by the leader. "

My question: what's the meaning of "updates"? commits? or documents?
I refered solr code but still not sure about it.

Hope you can help, thanks.



Re: Question about startup memory usage

2019-11-14 Thread Hongxu Ma
Thank you @Shawn Heisey<mailto:apa...@elyograg.org> , you help me many times.

My -xms=1G
When restart solr, I can see the progress of memory increasing (from 1G to 9G, 
took near 10s).

I have a guess: maybe solr is loading some needed files into heap memory, e.g. 
*.tip : term index file. What's your thoughts?

thanks.



From: Shawn Heisey 
Sent: Thursday, November 14, 2019 1:15
To: solr-user@lucene.apache.org 
Subject: Re: Question about startup memory usage

On 11/13/2019 2:03 AM, Hongxu Ma wrote:
> I have a solr-cloud cluster with a big collection, after startup (no any 
> search/index operations), its jvm memory usage is 9GB (via top: RES).
>
> Cluster and collection info:
> each host: total 64G mem, two solr nodes with -xmx=15G
> collection: total 9B billion docs (but each doc is very small: only some 
> bytes), total size 3TB.
>
> My question is:
> Is the 9G mem usage after startup normal? If so, I am worried that the follow 
> up index/search operations will cause an OOM error.
> And how can I reduce the memory usage? Maybe I should introduce more host 
> with nodes, but besides this, is there any other solution?

With the "-Xmx=15G" option, you've told Java that it can use up to 15GB
for heap.  It's total resident memory usage is eventually going to reach
a little over 15GB and probably never go down.  This is how Java works.

The amount of memory that Java allocates immediately on program startup
is related to the -Xms setting.  Normally Solr uses the same number for
both -Xms and -Xmx, but that can be changed if you desire.  We recommend
using the same number.  If -Xms is smaller than -Xmx, Java may allocate
less memory as soon as it starts, then Solr is going to run through its
startup procedure.  We will not know exactly how much memory allocation
is going to occur when that happens ... but with billions of documents,
it's not going to be small.

Thanks,
Shawn


Question about startup memory usage

2019-11-13 Thread Hongxu Ma
Hi
I have a solr-cloud cluster with a big collection, after startup (no any 
search/index operations), its jvm memory usage is 9GB (via top: RES).

Cluster and collection info:
each host: total 64G mem, two solr nodes with -xmx=15G
collection: total 9B billion docs (but each doc is very small: only some 
bytes), total size 3TB.

My question is:
Is the 9G mem usage after startup normal? If so, I am worried that the follow 
up index/search operations will cause an OOM error.
And how can I reduce the memory usage? Maybe I should introduce more host with 
nodes, but besides this, is there any other solution?

Thanks.






Re: Question about "No registered leader" error

2019-09-19 Thread Hongxu Ma
@Shawn @Erick Thanks for your kindle help!

No OOM log and I confirm there was no OOM happened.

My ZK ticktime is set to 5000, so 5000*20 = 100s > 60s, and I checked solr 
code: the leader waiting time: 4000ms is a const variable, is not configurable. 
(why it isn't a configurable param?)

My solr version is 7.3.1, xmx = 3MB (via solr UI, peak memory is 22GB)
I have already used CMS GC tuning (param has a little difference from your wiki 
page).

I will try the following advice:

  *   lower heap size
  *   turn to G1 (the same param as wiki)
  *   try to restart one SOLR node when this error happens.

Thanks again.


From: Shawn Heisey 
Sent: Wednesday, September 18, 2019 20:21
To: solr-user@lucene.apache.org 
Subject: Re: Question about "No registered leader" error

On 9/18/2019 6:11 AM, Shawn Heisey wrote:
> On 9/17/2019 9:35 PM, Hongxu Ma wrote:
>> My questions:
>>
>>*   Is this error possible caused by "long gc pause"? my solr
>> zkClientTimeout=6
>
> It's possible.  I can't say for sure that this is the issue, but it
> might be.

A followup.  I was thinking about the interactions here.  It looks like
Solr only waits four seconds for the leader election, and both of the
pauses you mentioned are longer than that.

Four seconds is probably too short a time to wait, and I do not think
that timeout is configurable anywhere.

> What version of Solr do you have, and what is your max heap?  The CMS
> garbage collection that Solr 5.0 and later incorporate by default is
> pretty good.  My G1 settings might do slightly better, but the
> improvement won't be dramatic unless your existing commandline has
> absolutely no gc tuning at all.

That question will be important.  If you already have our CMS GC tuning,
switching to G1 probably is not going to solve this.  Lowering the max
heap might be the only viable solution in that case, and depending on
what you're dealing with, it will either be impossible or it will
require more servers.

Thanks,
Shawn


Question about "No registered leader" error

2019-09-17 Thread Hongxu Ma
Hi all
I got an error when I was doing index operation:

"2019-09-18 02:35:44.427244 ... No registered leader was found after waiting 
for 4000ms , collection: foo slice: shard2"

Beside it, there is no other error in solr log.

Collection foo have 2 shards, then I check their jvm gc log:

  *   2019-09-18T02:34:08.252+: 150961.017: Total time for which 
application threads were stopped: 10.4617864 seconds, Stopping threads took: 
0.0005226 seconds

  *   2019-09-18T02:34:30.194+: 151014.108: Total time for which 
application threads were stopped: 44.4809415 seconds, Stopping threads took: 
0.0005976 seconds

I saw there are long gc pauses at the near timepoint.

My questions:

  *   Is this error possible caused by "long gc pause"? my solr 
zkClientTimeout=6
  *   If so, how can I prevent this error happen? My thoughts: using G1 
collector (as 
https://cwiki.apache.org/confluence/display/SOLR/ShawnHeisey#ShawnHeisey-GCTuningforSolr)
 or enlarge zkClientTimeout again, what's your idea?


Thanks.



Re: Question: Solr perform well with thousands of replicas?

2019-09-04 Thread Hongxu Ma
Hi Erick
Thanks for your help.

Before I visit wiki/maillist, I knew solr is unstable in 1000+ collections, and 
should be safe in 10~100 collections.
But in a specific env, what's the exact number which solr begin to become 
unstable? I don't know.

So I try to deploy a test cluster to get the number and try to optimize it 
bigger. (save my cost)
That's my purpose: quantitative analysis --> How many replicas can be supported 
in my env?
After get it, I will adjust my application: (when it's near the max number) 
prevent the creation of too many indexes or give a warning message to user.


From: Erick Erickson 
Sent: Monday, September 2, 2019 21:20
To: solr-user@lucene.apache.org 
Subject: Re: Question: Solr perform well with thousands of replicas?

> why so many collection/replica: it's our customer needs, for example: each 
> database table mappings a collection.

I always cringe when I see statements like this. What this means is that your 
customer doesn’t understand search and needs guidance in the proper use of any 
search technology, Solr included.

Solr is _not_ an RDBMS. Simply mapping the DB tables onto collections will 
almost certainly result in a poor experience. Next the customer will want to 
ask Solr to do the same thing a DB does, i.e. run a join across 10 tables etc., 
which will be abysmal. Solr isn’t designed for that. Some brilliant RDBMS 
people have spent many years making DBs to what they do and do it well.

That said, RDBMSs have poor search capabilities, they aren’t built to solve the 
search problem.

I suspect the time you spend making Solr load a thousand cores will be wasted. 
Once you do get them loaded, performance will be horrible. IMO you’d be far 
better off helping the customer define their problem so they properly model 
their search problem. This may mean that the result will be a hybrid where Solr 
is used for the free-text search and the RDBMS uses the results of the search 
to do something. Or vice versa.

FWIW
Erick

> On Sep 2, 2019, at 5:55 AM, Hongxu Ma  wrote:
>
> Thanks @Jörn and @Erick
> I enlarged my JVM memory, so far it's stable (but used many memory).
> And I will check lower-level errors according to your suggestion if error 
> happens.
>
> About my scenario:
>
>  *   why so many collection/replica: it's our customer needs, for example: 
> each database table mappings a collection.
>  *   this env is just a test cluster: I want to verify the max collection 
> number solr can support stably.
>
>
> 
> From: Erick Erickson 
> Sent: Friday, August 30, 2019 20:05
> To: solr-user@lucene.apache.org 
> Subject: Re: Question: Solr perform well with thousands of replicas?
>
> “no registered leader” is the effect of some problem usually, not the root 
> cause. In this case, for instance, you could be running out of file handles 
> and see other errors like “too many open files”. That’s just one example.
>
> One common problem is that Solr needs a lot of file handles and the system 
> defaults are too low. We usually recommend you start with 65K file handles 
> (ulimit) and bump up the number of processes to 65K too.
>
> So to throw some numbers out. With 1,000 replicas, and let’s say you have 50 
> segments in the index in each replica. Each segment consists of multiple 
> files (I’m skipping “compound files” here as an advanced topic), so each 
> segment has, let’s say, 10 segments. 1,000 * 50 * 10 would require 500,000 
> file handles on your system.
>
> Bottom line: look for other, lower-level errors in the log to try to 
> understand what limit you’re running into.
>
> All that said, there’ll be a number of “gotchas” when running that many 
> replicas on a particular node, I second Jörn;’s question...
>
> Best,
> Erick
>
>> On Aug 30, 2019, at 3:18 AM, Jörn Franke  wrote:
>>
>> What is the reason for this number of replicas? Solr should work fine, but 
>> maybe it is worth to consolidate some collections to avoid also 
>> administrative overhead.
>>
>>> Am 29.08.2019 um 05:27 schrieb Hongxu Ma :
>>>
>>> Hi
>>> I have a solr-cloud cluster, but it's unstable when collection number is 
>>> big: 1000 replica/core per solr node.
>>>
>>> To solve this issue, I have read the performance guide:
>>> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems
>>>
>>> I noted there is a sentence on solr-cloud section:
>>> "Recent Solr versions perform well with thousands of replicas."
>>>
>>> I want to know does it mean a single solr node can handle thousands of 
>>> replicas? or a solr cluster can (if so, what's the size of the cluster?)
>>>
>>> My solr version is 7.3.1 and 6.6.2 (looks they are the same in performance)
>>>
>>> Thanks for you help.
>>>
>



Re: Question: Solr perform well with thousands of replicas?

2019-09-02 Thread Hongxu Ma
Thanks @Jörn and @Erick
I enlarged my JVM memory, so far it's stable (but used many memory).
And I will check lower-level errors according to your suggestion if error 
happens.

About my scenario:

  *   why so many collection/replica: it's our customer needs, for example: 
each database table mappings a collection.
  *   this env is just a test cluster: I want to verify the max collection 
number solr can support stably.



From: Erick Erickson 
Sent: Friday, August 30, 2019 20:05
To: solr-user@lucene.apache.org 
Subject: Re: Question: Solr perform well with thousands of replicas?

“no registered leader” is the effect of some problem usually, not the root 
cause. In this case, for instance, you could be running out of file handles and 
see other errors like “too many open files”. That’s just one example.

One common problem is that Solr needs a lot of file handles and the system 
defaults are too low. We usually recommend you start with 65K file handles 
(ulimit) and bump up the number of processes to 65K too.

So to throw some numbers out. With 1,000 replicas, and let’s say you have 50 
segments in the index in each replica. Each segment consists of multiple files 
(I’m skipping “compound files” here as an advanced topic), so each segment has, 
let’s say, 10 segments. 1,000 * 50 * 10 would require 500,000 file handles on 
your system.

Bottom line: look for other, lower-level errors in the log to try to understand 
what limit you’re running into.

All that said, there’ll be a number of “gotchas” when running that many 
replicas on a particular node, I second Jörn;’s question...

Best,
Erick

> On Aug 30, 2019, at 3:18 AM, Jörn Franke  wrote:
>
> What is the reason for this number of replicas? Solr should work fine, but 
> maybe it is worth to consolidate some collections to avoid also 
> administrative overhead.
>
>> Am 29.08.2019 um 05:27 schrieb Hongxu Ma :
>>
>> Hi
>> I have a solr-cloud cluster, but it's unstable when collection number is 
>> big: 1000 replica/core per solr node.
>>
>> To solve this issue, I have read the performance guide:
>> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems
>>
>> I noted there is a sentence on solr-cloud section:
>> "Recent Solr versions perform well with thousands of replicas."
>>
>> I want to know does it mean a single solr node can handle thousands of 
>> replicas? or a solr cluster can (if so, what's the size of the cluster?)
>>
>> My solr version is 7.3.1 and 6.6.2 (looks they are the same in performance)
>>
>> Thanks for you help.
>>



Re: Question: Solr perform well with thousands of replicas?

2019-08-30 Thread Hongxu Ma
Hi guys
Thanks for your helpful help!

More details about my env.
Cluster:
A 4 GCP(google cloud) hosts cluster, each host: 16Core cpu, 60G mem, 2TB HDD.
I set up 2 solr nodes on each host and there are 1000+ replicas on each solr 
node.
(Sorry for forgetting this before: 2 solr node on each host, so there are 2000+ 
replicas on each host...)
zookeeper has 3 instances, reuse the solr hosts (using a separated disk).
Workload:
just index tens of millions of record (total size near 100GB) into dozens (near 
100) of indexes, 30 concurrent, no search operation at the same time (I will do 
search test later).
Error:
"unstable" means there are many solr errors in log and the solr request is 
failed.
e.g. "No registered leader was found after waiting for 4000ms , collection ..."

@ Hendrik
after saw your reply, I noted my replicas num is too big, so I adjusted to: 720 
replicas on each host (reduced shard num), then all my index requests are 
successful. (happy)
but I saw the JVM peak mem usage is 24GB (via solr web UI), it's too big to be 
risky in the future (my JMV xmx is 32GB).
so would you give me some guides to reduce the memory usage? (like you 
mentioned "tuned a few caches down to a minimum")

@ Erick
I gave details above, please check.

@ Shawn
thanks for your info, it's a bad news...
hope solr-cloud can handle more collections in future.



From: Shawn Heisey 
Sent: Thursday, August 29, 2019 21:58
To: solr-user@lucene.apache.org 
Subject: Re: Question: Solr perform well with thousands of replicas?

On 8/28/2019 9:27 PM, Hongxu Ma wrote:
> I have a solr-cloud cluster, but it's unstable when collection number is big: 
> 1000 replica/core per solr node.
>
> To solve this issue, I have read the performance guide:
> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems
>
> I noted there is a sentence on solr-cloud section:
> "Recent Solr versions perform well with thousands of replicas."

The SolrPerformanceProblems wiki page is my work.  I only wrote that
sentence because other devs working in SolrCloud code told me that was
the case.  Based on things said by people (including your comments on
this thread), I think newer versions probably aren't any better, and
that sentence needs to be removed from the wiki page.

See this issue that I created a few years ago:

https://issues.apache.org/jira/browse/SOLR-7191

This issue was closed with a 6.3 fix version ... but nothing was
committed with a tag for the issue, so I have no idea why it was closed.
  I think the problems described there are still there in recent Solr
versions, and MIGHT be even worse than they were in 4.x and 5.x.

> I want to know does it mean a single solr node can handle thousands of 
> replicas? or a solr cluster can (if so, what's the size of the cluster?)

A single standalone Solr instance can handle lots of indexes, but Solr
startup is probably going to be slow.

No matter how many nodes there are, SolrCloud has problems with
thousands of collections or replicas due to issues with the overseer
queue getting enormous.  When I created SOLR-7191, I found that
restarting a node in a cloud with thousands of replicas (cores) can
result in a performance death spiral.

I haven't ever administered a production setup with thousands of
indexes, I've only done some single machine testing for the issue I
created.  I need to repeat it with 8.x and see what happens.  But I have
very little free time these days.

Thanks,
Shawn


Question: Solr perform well with thousands of replicas?

2019-08-28 Thread Hongxu Ma
Hi
I have a solr-cloud cluster, but it's unstable when collection number is big: 
1000 replica/core per solr node.

To solve this issue, I have read the performance guide:
https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems

I noted there is a sentence on solr-cloud section:
"Recent Solr versions perform well with thousands of replicas."

I want to know does it mean a single solr node can handle thousands of 
replicas? or a solr cluster can (if so, what's the size of the cluster?)

My solr version is 7.3.1 and 6.6.2 (looks they are the same in performance)

Thanks for you help.