Re: Largest number of indexed documents used by Solr

2018-04-05 Thread Joe Obernberger

50 billion per day?  Wow!  How large are these documents?

We have a cluster with one large collection that contains 2.4 billion 
documents spread across 40 machines using HDFS for the index.  We store 
our data inside of HBase, and in order to re-index data we pull from 
HBase and index with solr cloud.  Most we can do is around 57 million 
per day; usually limited by pulling data out of HBase not Solr.


-Joe


On 4/4/2018 10:57 PM, 苗海泉 wrote:

When we have 49 shards per collection, there are more than 600 collections.
Solr will have serious performance problems. I don't know how to deal with
them. My advice to you is to minimize the number of collections.
Our environment is 49 solr server nodes, each with 32cpu/128g, and the data
volume is about 50 billion per day.


‌
 Sent with Mailtrack


2018-04-04 9:23 GMT+08:00 Yago Riveiro :


Hi,

In my company we are running a 12 node cluster with 10 (american) Billion
documents 12 shards / 2 replicas.

We do mainly faceting queries with a very reasonable performance.

36 million documents it's not an issue, you can handle that volume of
documents with 2 nodes with SSDs and 32G of ram

Regards.

--

Yago Riveiro

On 4 Apr 2018 02:15 +0100, Abhi Basu <9000r...@gmail.com>, wrote:

We have tested Solr 4.10 with 200 million docs with avg doc size of 250

KB.

No issues with performance when using 3 shards / 2 replicas.



On Tue, Apr 3, 2018 at 8:12 PM, Steven White 

wrote:

Hi everyone,

I'm about to start a project that requires indexing 36 million records
using Solr 7.2.1. Each record range from 500 KB to 0.25 MB where the
average is 0.1 MB.

Has anyone indexed this number of records? What are the things I should
worry about? And out of curiosity, what is the largest number of

records

that Solr has indexed which is published out there?

Thanks

Steven




--
Abhi Basu







Re: Largest number of indexed documents used by Solr

2018-04-05 Thread Kelly, Frank
For us we have ~ 350M documents stored using r3.xlarge nodes with 8GB Heap
and about 31GB of RAM

We are using Solr 5.3.1 in a SolrCloud setup (3 collections, each with 3
shards and 3 replicas).

For us lots of RAM memory is not as important as CPU (as the EBS disk we
run on top of 
is quite fast and our memory hit rate is quite low).

Some things that helped
1) Turned off the filter cache (it required too much heap)
2) Set a limit on replication bandwidth (when nodes are recovering they
can tie up a lot of CPU) in particular maxWriteMBPerSec=100
3) Set query timeout to 2 seconds to help kill ³heavy² queries
4) Set preferLocalShards=true to help mitigate when any EC2 nodes are
having a ³noisy neighbor"
5) We implemented our own CloudWatch based monitoring so that when Solr VM
CPU is high (> 90%) we queue up indexing traffic rather than send it to be
indexed.
We found that if you peg Solr CPU for too long replicas can¹t keep up,
they go into recovery, which drives CPU even higher and eventually the
cluster thinks the nodes are ³down² when they repeatedly fail at recovery.
So we really try to manage Solr CPU load (We¹ll probably look to switching
to compute optimized nodes in the future)

Best

-Frank


On 4/3/18, 9:12 PM, "Steven White"  wrote:

>Hi everyone,
>
>I'm about to start a project that requires indexing 36 million records
>using Solr 7.2.1.  Each record range from 500 KB to 0.25 MB where the
>average is 0.1 MB.
>
>Has anyone indexed this number of records?  What are the things I should
>worry about?  And out of curiosity, what is the largest number of records
>that Solr has indexed which is published out there?
>
>Thanks
>
>Steven



Re: Largest number of indexed documents used by Solr

2018-04-04 Thread 苗海泉
When we have 49 shards per collection, there are more than 600 collections.
Solr will have serious performance problems. I don't know how to deal with
them. My advice to you is to minimize the number of collections.
Our environment is 49 solr server nodes, each with 32cpu/128g, and the data
volume is about 50 billion per day.


‌
 Sent with Mailtrack


2018-04-04 9:23 GMT+08:00 Yago Riveiro :

> Hi,
>
> In my company we are running a 12 node cluster with 10 (american) Billion
> documents 12 shards / 2 replicas.
>
> We do mainly faceting queries with a very reasonable performance.
>
> 36 million documents it's not an issue, you can handle that volume of
> documents with 2 nodes with SSDs and 32G of ram
>
> Regards.
>
> --
>
> Yago Riveiro
>
> On 4 Apr 2018 02:15 +0100, Abhi Basu <9000r...@gmail.com>, wrote:
> > We have tested Solr 4.10 with 200 million docs with avg doc size of 250
> KB.
> > No issues with performance when using 3 shards / 2 replicas.
> >
> >
> >
> > On Tue, Apr 3, 2018 at 8:12 PM, Steven White 
> wrote:
> >
> > > Hi everyone,
> > >
> > > I'm about to start a project that requires indexing 36 million records
> > > using Solr 7.2.1. Each record range from 500 KB to 0.25 MB where the
> > > average is 0.1 MB.
> > >
> > > Has anyone indexed this number of records? What are the things I should
> > > worry about? And out of curiosity, what is the largest number of
> records
> > > that Solr has indexed which is published out there?
> > >
> > > Thanks
> > >
> > > Steven
> > >
> >
> >
> >
> > --
> > Abhi Basu
>



-- 
==
联创科技
知行如一
==


Re: Largest number of indexed documents used by Solr

2018-04-03 Thread Yago Riveiro
Hi,

In my company we are running a 12 node cluster with 10 (american) Billion 
documents 12 shards / 2 replicas.

We do mainly faceting queries with a very reasonable performance.

36 million documents it's not an issue, you can handle that volume of documents 
with 2 nodes with SSDs and 32G of ram

Regards.

--

Yago Riveiro

On 4 Apr 2018 02:15 +0100, Abhi Basu <9000r...@gmail.com>, wrote:
> We have tested Solr 4.10 with 200 million docs with avg doc size of 250 KB.
> No issues with performance when using 3 shards / 2 replicas.
>
>
>
> On Tue, Apr 3, 2018 at 8:12 PM, Steven White  wrote:
>
> > Hi everyone,
> >
> > I'm about to start a project that requires indexing 36 million records
> > using Solr 7.2.1. Each record range from 500 KB to 0.25 MB where the
> > average is 0.1 MB.
> >
> > Has anyone indexed this number of records? What are the things I should
> > worry about? And out of curiosity, what is the largest number of records
> > that Solr has indexed which is published out there?
> >
> > Thanks
> >
> > Steven
> >
>
>
>
> --
> Abhi Basu


Re: Largest number of indexed documents used by Solr

2018-04-03 Thread Walter Underwood
We have a 24 million document index. Our documents are a bit smaller than 
yours, homework problems.

The Hathi Trust probably has the record. They haven’t updated their blog for a 
while, but they were at 11 million books and billions of pages in 2014.

https://www.hathitrust.org/blogslarge-scale-search

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 3, 2018, at 6:12 PM, Steven White  wrote:
> 
> Hi everyone,
> 
> I'm about to start a project that requires indexing 36 million records
> using Solr 7.2.1.  Each record range from 500 KB to 0.25 MB where the
> average is 0.1 MB.
> 
> Has anyone indexed this number of records?  What are the things I should
> worry about?  And out of curiosity, what is the largest number of records
> that Solr has indexed which is published out there?
> 
> Thanks
> 
> Steven



Re: Largest number of indexed documents used by Solr

2018-04-03 Thread Abhi Basu
We have tested Solr 4.10 with 200 million docs with avg doc size of 250 KB.
No issues with performance when using 3 shards / 2 replicas.



On Tue, Apr 3, 2018 at 8:12 PM, Steven White  wrote:

> Hi everyone,
>
> I'm about to start a project that requires indexing 36 million records
> using Solr 7.2.1.  Each record range from 500 KB to 0.25 MB where the
> average is 0.1 MB.
>
> Has anyone indexed this number of records?  What are the things I should
> worry about?  And out of curiosity, what is the largest number of records
> that Solr has indexed which is published out there?
>
> Thanks
>
> Steven
>



-- 
Abhi Basu


Largest number of indexed documents used by Solr

2018-04-03 Thread Steven White
Hi everyone,

I'm about to start a project that requires indexing 36 million records
using Solr 7.2.1.  Each record range from 500 KB to 0.25 MB where the
average is 0.1 MB.

Has anyone indexed this number of records?  What are the things I should
worry about?  And out of curiosity, what is the largest number of records
that Solr has indexed which is published out there?

Thanks

Steven