Re: Very high memory and CPU utilization.

2015-11-02 Thread Toke Eskildsen
On Mon, 2015-11-02 at 14:17 +0100, Toke Eskildsen wrote: > http://rosalind:52300/solr/collection1/select?q=%22der+se*% > 22=json=true=false=true=domain > > gets expanded to > > parsedquery": "(+DisjunctionMaxQuery((content_text:\"kan svane\" | > author:kan svane* | text:\"kan svane\" |

Re: Very high memory and CPU utilization.

2015-11-02 Thread Toke Eskildsen
On Mon, 2015-11-02 at 17:27 +0530, Modassar Ather wrote: > The query q=network se* is quick enough in our system too. It takes > around 3-4 seconds for around 8 million records. > > The problem is with the same query as phrase. q="network se*". I misunderstood your query then. I tried

Re: Very high memory and CPU utilization.

2015-11-02 Thread jim ferenczi
Well it seems that doing q="network se*" is working but not in the way you expect. Doing this q="network se*" would not trigger a prefix query and the "*" character would be treated as any character. I suspect that your query is in fact "network se" (assuming you're using a StandardTokenizer) and

Re: Very high memory and CPU utilization.

2015-11-02 Thread Toke Eskildsen
On Tue, 2015-11-03 at 11:09 +0530, Modassar Ather wrote: > It is around 90GB of index (around 8 million documents) on one shard and > there are 12 such shards. As per my understanding the sharding is required > for this case. Please help me understand if it is not required. Except for an internal

Re: Very high memory and CPU utilization.

2015-11-02 Thread Walter Underwood
One rule of thumb for Solr is to shard after you reach 100 million documents. With large documents, you might want to shard sooner. We are running an unsharded index of 7 million documents (55GB) without problems. The EdgeNgramFilter generates a set of prefix terms for each term in the

Re: Very high memory and CPU utilization.

2015-11-02 Thread Modassar Ather
Thanks Walter for your response, It is around 90GB of index (around 8 million documents) on one shard and there are 12 such shards. As per my understanding the sharding is required for this case. Please help me understand if it is not required. We have requirements where we need full wild card

Re: Very high memory and CPU utilization.

2015-11-02 Thread Toke Eskildsen
On Mon, 2015-11-02 at 12:00 +0530, Modassar Ather wrote: > I have a setup of 12 shard cluster started with 28gb memory each on a > single server. There are no replica. The size of index is around 90gb on > each shard. The Solr version is 5.2.1. That is 12 machines, running a shard each? What is

Re: Very high memory and CPU utilization.

2015-11-02 Thread Walter Underwood
To back up a bit, how many documents are in this 90GB index? You might not need to shard at all. Why are you sending a query with a trailing wildcard? Are you matching the prefix of words, for query completion? If so, look at the suggester, which is designed to solve exactly that. Or you can

Re: Very high memory and CPU utilization.

2015-11-02 Thread Modassar Ather
Hi Toke, Thanks for your response. My comments in-line. That is 12 machines, running a shard each? No! This is a single big machine with 12 shards on it. What is the total amount of physical memory on each machine? Around 370 gb on the single machine. Well, se* probably expands to a great deal

Re: Very high memory and CPU utilization.

2015-11-02 Thread Modassar Ather
Just to add one more point that one external Zookeeper instance is also running on this particular machine. Regards, Modassar On Mon, Nov 2, 2015 at 2:34 PM, Modassar Ather wrote: > Hi Toke, > Thanks for your response. My comments in-line. > > That is 12 machines,

Re: Very high memory and CPU utilization.

2015-11-02 Thread jim ferenczi
12 shards with 28GB for the heap and 90GB for each index means that you need at least 336GB for the heap (assuming you're using all of it which may be easily the case considering the way the GC is handling memory) and ~= 1TO for the index. Let's say that you don't need your entire index in RAM,

Re: Very high memory and CPU utilization.

2015-11-02 Thread Toke Eskildsen
On Mon, 2015-11-02 at 14:34 +0530, Modassar Ather wrote: > No! This is a single big machine with 12 shards on it. > Around 370 gb on the single machine. Okay. I guess your observation of 400% for a single core is with top and looking at that core's entry? If so, the 400% can be explained by

Re: Very high memory and CPU utilization.

2015-11-02 Thread Modassar Ather
Okay. I guess your observation of 400% for a single core is with top and looking at that core's entry? If so, the 400% can be explained by excessive garbage collection. You could turn GC-logging on to check that. With a bit of luck GC would be the cause of the slow down. Yes it is with top

Re: Very high memory and CPU utilization.

2015-11-02 Thread jim ferenczi
*if it correlates with the bad performance you're seeing. One important thing to notice is that a significant part of your index needs to be in RAM (especially if you're using SSDs) in order to achieve good performance.* Especially if you're not using SSDs, sorry ;) 2015-11-02 11:38 GMT+01:00

Re: Very high memory and CPU utilization.

2015-11-02 Thread Modassar Ather
Thanks Jim for your response. The remaining size after you removed the heap usage should be reserved for the index (not only the other system activities). I am not able to get the above point. So when I start Solr with 28g RAM, for all the activities related to Solr it should not go beyond 28g.

Re: Very high memory and CPU utilization.

2015-11-02 Thread Toke Eskildsen
On Mon, 2015-11-02 at 16:25 +0530, Modassar Ather wrote: > The remaining size after you removed the heap usage should be reserved for > the index (not only the other system activities). > I am not able to get the above point. So when I start Solr with 28g RAM, > for all the activities related to

Re: Very high memory and CPU utilization.

2015-11-02 Thread Modassar Ather
I monitored swap activities for the query using vmstat. The *so* and *si* shows 0 till the completion of query. Also the top showed 0 against swap. This means there was no scarcity of physical memory. Swap activity seems not to be a bottleneck. Kindly note that this I ran on 8 node cluster with 30

Re: Very high memory and CPU utilization.

2015-11-02 Thread jim ferenczi
*I am not able to get the above point. So when I start Solr with 28g RAM, for all the activities related to Solr it should not go beyond 28g. And the remaining heap will be used for activities other than Solr. Please help me understand.* Well those 28GB of heap are the memory "reserved" for your

Re: Very high memory and CPU utilization.

2015-11-02 Thread jim ferenczi
Oups I did not read the thread carrefully. *The problem is with the same query as phrase. q="network se*".* I was not aware that you could do that with Solr ;). I would say this is expected because in such case if the number of expansions for "se*" is big then you would have to check the positions

Re: Very high memory and CPU utilization.

2015-11-02 Thread Modassar Ather
The problem is with the same query as phrase. q="network se*". The last . is fullstops for the sentence and the query is q=field:"network se*" Best, Modassar On Mon, Nov 2, 2015 at 6:10 PM, jim ferenczi wrote: > Oups I did not read the thread carrefully. > *The problem