Re: slow indexing when keys are verious

2017-04-12 Thread moscovig
Hi all

We have changed all solr configs and commit parameters that were mentioned
by Shawn,
but still - when inserting the same 300 documents from 20 threads we see no
latency
and when inserting different 300 docs from 20 threads it is very slow and no
cpu/ram/disk/network are showing high metrics.

I am wondering if the problem might be related to the fact that when
inserting different 300 docs from each thread,
the key is the only field that varied whilst the other fields are identical.
So maybe many same values on the other fields for different keys cause the
latency? 

As for latency that is related to doc routing, I don't see where it can
affect us. Is it the zookeeper that might become a bottleneck? 

Thanks!
Gilad




--
View this message in context: 
http://lucene.472066.n3.nabble.com/slow-indexing-when-keys-are-verious-tp4327681p4329451.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: slow indexing when keys are verious

2017-03-30 Thread moscovig

With hight entropy we see the same latency even when working with 1 shard.

Assuming that even with 1 shard, Solr is still working hard to route the
documents,
what is the component that is responsible for the document routing?

Is it the zookeeper?

And how would you verify that that's the bottleneck?

I can monitor zookeeper when having high and low entropy 
to see if it has different network stats.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/slow-indexing-when-keys-are-verious-tp4327681p4327724.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: slow indexing when keys are verious

2017-03-30 Thread Alexandre Rafalovitch
Did you check the number of documents that end up on each shard in
these two scenarios.

My guess would be that - perhaps - low entropy key puts most of the
documents into one shard and high-entropy key causes a lot more
routing traffic with delay coming from the network communication
and/or confirmation. Maybe even combined with the very low commit
values.

I am not a SolrCloud specialist. But that's one place I can see the
entropy of the key becoming a factor.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 30 March 2017 at 11:57, moscovig <mosco...@gmail.com> wrote:
> Hi
>
> Yes it is solrCloud, we saw the same behavior with 1,2 and 4 shards. each
> shard has 3 replicas.
>
> Each bulk contains 300 docs. We get approximately 800 docs inserted in a
> second.
>
> ~6000 docs are being sent in an iteration by all loading threads.
> we have 20 threads, each sending bulks of 300 docs.
>
> The loaders are waiting for the response,
> which gets back after ~10 seconds for a loader.
>
> Thanks!
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/slow-indexing-when-keys-are-verious-tp4327681p4327714.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: slow indexing when keys are verious

2017-03-30 Thread moscovig
Hi

Yes it is solrCloud, we saw the same behavior with 1,2 and 4 shards. each
shard has 3 replicas.

Each bulk contains 300 docs. We get approximately 800 docs inserted in a
second.

~6000 docs are being sent in an iteration by all loading threads.
we have 20 threads, each sending bulks of 300 docs.

The loaders are waiting for the response,
which gets back after ~10 seconds for a loader. 

Thanks!





--
View this message in context: 
http://lucene.472066.n3.nabble.com/slow-indexing-when-keys-are-verious-tp4327681p4327714.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: slow indexing when keys are verious

2017-03-30 Thread Alexandre Rafalovitch
Are you by any chance in the SolrCloud?

And to confirm, the total number of documents is the same within any
particular time period?

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 30 March 2017 at 10:50, moscovig <mosco...@gmail.com> wrote:
> Thanks Shawn.
>
> We do specify
>
> 
> 3
> 30
> false
> 
>
> but I guess that still, the commitWithin 300 ms is a bad idea.
>
> We will definitely try playing with the configs you suggested.
>
> I still don't get  the reason for a fast inserting when sending sets with
> low keys cardinality.
> But lets see what will happen after the changes.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/slow-indexing-when-keys-are-verious-tp4327681p4327703.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: slow indexing when keys are verious

2017-03-30 Thread moscovig
Thanks Shawn.

We do specify 


3 
30 
false 


but I guess that still, the commitWithin 300 ms is a bad idea.

We will definitely try playing with the configs you suggested.

I still don't get  the reason for a fast inserting when sending sets with
low keys cardinality.
But lets see what will happen after the changes. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/slow-indexing-when-keys-are-verious-tp4327681p4327703.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: slow indexing when keys are verious

2017-03-30 Thread Shawn Heisey
On 3/30/2017 7:36 AM, moscovig wrote:
> We are using solr 6.2.1 for server and solrj 6.2.0, with no explicit commits, 
> and -
>
> 3 
> 30 
> for autoCommit.
>
> Each request to solr contains 300 small documents with different keys., with 
> a commitWithin of 300 ms.

I think the commitWithin is likely the problem here.  As long as you are
indexing, it will *try* to do a commit more than three times *every
second*.  Chances are that each commit is going to take longer than
300ms to complete, so the actual commit rate is probably lower, but
effectively this means that as long as you're indexing, Solr is
*constantly* doing commits that open a new Searcher.  This kind of
commit causes a large amount of disk I/O and CPU activity.  You do not
want to have an interval that low.  I would suggest a value for
commitWithin that's one or two minutes.

Your autoCommit doesn't appear to set openSearcher to false.  I
recommend doing that, setting its maxTime to 60 seconds, and removing
maxDocs.

I would also add autoSoftCommit with a three minute (18) maxTime. 
It sounds like every request includes commitWithin ... the
autoSoftCommit would just be there to catch anything that somehow didn't
include the commitWithin.  Very likely it would never be triggered as
long as commitWithin is being used.  You could choose to lower that time
to two minutes and not use CommitWithin at all.

Thanks,
Shawn



slow indexing when keys are verious

2017-03-30 Thread moscovig
Hi

We are using solr 6.2.1 for server and solrj 6.2.0, 
with no explicit commits, and - 

3 
30 
for autoCommit.

Each request to solr contains 300 small documents with different keys., with
a commitWithin of 300 ms.

We have lots of requests coming in. 

The behavior is as the following: 

Fast - When all threads are using the same key generator, means that solr
gets lots of similar documents in a second we get high throughput, and a
very high cpu. 

Slow - When each thread is using different keys, at each iteration we get
~20 bulks with 300 docs each, means 6000 different keys. The throughput is
terrible.
We don't even see any special cpu or ram usage. 

What is the bottleneck in the slow scenario?
What is the reason for that? Does solr have some kind of cache and when we
send lots of similar keys, It is immediately updating the matching doc with
no further operations?

Why is the fast scenario is so light and fast?

Thanks!









--
View this message in context: 
http://lucene.472066.n3.nabble.com/slow-indexing-when-keys-are-verious-tp4327681.html
Sent from the Solr - User mailing list archive at Nabble.com.