Re: unable to get more throughput with more threads

2017-03-24 Thread Suresh Pendap
Hi Shawn,
It looks like you probably have pointed to the root cause of the issue.
I am using a java client and using HttpClient library directly to fire the
Http get queries. I am not using SolrJ client for firing the queries.

The following is my code

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpGet = new HttpGet(url);
CloseableHttpResponse response = null;
try {
   response = httpclient.execute(httpGet);
} finally {

}

Is a route per httpHost/port combination?

I will try to set the MaxConnPerRoute and maxTotalConn to a larger value
and see if that gives me some benefit.


Thanks
Suresh









On 3/24/17 6:05 AM, "Shawn Heisey"  wrote:

>On 3/23/2017 6:10 PM, Suresh Pendap wrote:
>> I performed the test with 1 thread, 10 client threads and 50 client
>> threads. I noticed that as I increased the number of threads, the
>> query latency kept increasing drastically which I was not expecting.
>
>What language and Solr library was the client using?  If it's Java and
>SolrJ, then the following will apply.  If the client is written in a
>language other than Java, you may find that there are similar default
>settings in the HTTP library:
>
>A dependency called HttpClient is used by SolrJ.  The default settings
>for HttpClient are only capable of making *two* simultaneous connections
>to a target server.  Further connections will wait until existing
>connections are complete.  Unless it is overridden, SolrJ creates the
>HttpClient object with default settings.
>
>If more threads are needed, the SolrClient object must be built with a
>custom HttpClient.  Here's some SolrJ code to create a
>multithread-capable client object (300 threads to a single server):
>
>  RequestConfig rc = RequestConfig.custom().setConnectTimeout(15000)
>  .setSocketTimeout(Const.SOCKET_TIMEOUT).build();
>  httpClient = HttpClients.custom().setDefaultRequestConfig(rc)
> 
>.setMaxConnPerRoute(300).setMaxConnTotal(5000).disableAutomaticRetries()
>  .build();
>  client = new HttpSolrClient(serverBaseUrl, httpClient);
>
>I have also placed this code at the following URL.  It will expire in
>one month:
>
>http://apaste.info/BpoWY
>
>A similar technique can be used with CloudSolrClient if needed.
>
>It's my opinion that SolrJ needs to create client objects by default
>that are capable of more threads, but I have not yet put any time into
>making it happen.
>
>Thanks,
>Shawn
>



Re: unable to get more throughput with more threads

2017-03-24 Thread Suresh Pendap
Erick,
- I think I checked that my QueryResultsCache and DocumentCache ratios
were close to 1. I will double check that by repeating my test.
- I think checking the Qtimes in the log is a very good suggestion, I will
also check that the next time I run my test
- It is not possible as the client is just a java client program which
just fires the queries using a REST Client API

Is there any way that SOLR publishes its thread pool statistics?

For e.g in cassandra you have a command like nodetool tpstats which
provides a nice table stats for all the thread pools involved,
how many are pending jobs etc?


Thanks
Suresh

On 3/23/17 9:33 PM, "Erick Erickson"  wrote:

>I'd check my I/O. Since you're firing the same query, I expect that
>you aren't I/O bound at all, since, as you say, the docs should
>already be in memory. This assumes that your document cache size is >
>0. You can check this. Go to the admin UI, select one of your cores
>(not collection) and go to plugins/stats. You should see the
>documentCache as one of the entries and you should be hitting an
>insane hit ratio close to 100% as your test runs.
>
>Also check your queryResultCache. That also should be near 100% in
>your test. Do note that these caches really never hit this "for real",
>but as you indicated this is a highly artificial test so such high hit
>ratios are what I'd expect.
>
>Assuming that those caches are being hit near 100%, Solr really isn't
>doing any work to speak of so there almost has to be some kind of
>queueing going on.
>
>The fact that your CPU is only running 8-10% is an indication that
>you're requests are queued up somewhere, but where I have no clue. The
>Jetty thread pool is quite large.  What are the QTimes reported in the
>responses? My guess is that the QTime stays pretty constant (and very
>low) even as your response time increases, another indication that
>you're queueing.
>
>Hmmm, is it possible that on the queueing is on the _client_ side?
>What aggregate throughput to you get if you fire up 10 _clients_ each
>with one thread rather than 1 client and 10 threads? That's a shot in
>the dark, but worth a try I suppose. And how does your client fire
>queries? SolrJ? Http? Jmeter or the like?
>
>But yeah, this is weird. Since you're firing the same query, Solr
>isn't really doing any work at all.
>
>Best,
>Erick
>
>On Thu, Mar 23, 2017 at 7:56 PM, Aman Deep Singh
> wrote:
>> You can play with the merge factor in the index config.
>> If their is no frequent updates then make it 2 ,it will give you High
>> throughput and less latency.
>>
>> On 24-Mar-2017 8:22 AM, "Zheng Lin Edwin Yeo" 
>>wrote:
>>
>>> I also did find that beyond 10 threads for 8GB heap size , there isn't
>>>much
>>> improvement with the performance. But you can increase your heap size a
>>> little if your system allows it.
>>>
>>> By the way, which Solr version are you using?
>>>
>>> Regards,
>>> Edwin
>>>
>>>
>>> On 24 March 2017 at 09:21, Matt Magnusson  wrote:
>>>
>>> > Out of curosity, what is your index size? I'm trying to do something
>>> > similar with maximizing output, I'm currently looking at streaming
>>> > expressions which I'm seeing some interesting results for, I'm also
>>> > finding that the direct mass query route seems to hit a wall for
>>> > performance. I'm also finding that about 10 threads seems to be an
>>> > optimum number.
>>> >
>>> > On Thu, Mar 23, 2017 at 8:10 PM, Suresh Pendap
>>>
>>> > wrote:
>>> > > Hi,
>>> > > I am new to SOLR search engine technology and I am trying to get
>>>some
>>> > performance numbers to get maximum throughput from the SOLR cluster
>>>of a
>>> > given size.
>>> > > I am currently doing only query load testing in which I randomly
>>>fire a
>>> > bunch of queries to the SOLR cluster to generate the query load.  I
>>> > understand that it is not the ideal workload as the
>>> > > ingestion and commits happening invalidate the Solr Caches, so it
>>>is
>>> > advisable to perform query load along with some documents being
>>>ingested.
>>> > >
>>> > > The SOLR cluster was made up of 2 shards and 2 replicas. So there
>>>were
>>> > total 4 replicas serving the queries. The SOLR nodes were running on
>>>an
>>> LXD
>>> > container with 12 cores and 88GB RAM.
>>> > > The heap size allocated was 8g min and 8g max. All the other SOLR
>>> > configurations were default.
>>> > >
>>> > > The client node was running on an 8 core VM.
>>> > >
>>> > > I performed the test with 1 thread, 10 client threads and 50 client
>>> > threads.  I noticed that as I increased the number of threads, the
>>>query
>>> > latency kept increasing drastically which I was not expecting.
>>> > >
>>> > > Since my initial test was randomly picking queries from a file, I
>>> > decided to keep things constant and ran the program which fired the
>>>same
>>> > query again and again. Since it is the same query, all the documents

Re: unable to get more throughput with more threads

2017-03-24 Thread Shawn Heisey
On 3/23/2017 6:10 PM, Suresh Pendap wrote:
> I performed the test with 1 thread, 10 client threads and 50 client
> threads. I noticed that as I increased the number of threads, the
> query latency kept increasing drastically which I was not expecting. 

What language and Solr library was the client using?  If it's Java and
SolrJ, then the following will apply.  If the client is written in a
language other than Java, you may find that there are similar default
settings in the HTTP library:

A dependency called HttpClient is used by SolrJ.  The default settings
for HttpClient are only capable of making *two* simultaneous connections
to a target server.  Further connections will wait until existing
connections are complete.  Unless it is overridden, SolrJ creates the
HttpClient object with default settings.

If more threads are needed, the SolrClient object must be built with a
custom HttpClient.  Here's some SolrJ code to create a
multithread-capable client object (300 threads to a single server):

  RequestConfig rc = RequestConfig.custom().setConnectTimeout(15000)
  .setSocketTimeout(Const.SOCKET_TIMEOUT).build();
  httpClient = HttpClients.custom().setDefaultRequestConfig(rc)
 
.setMaxConnPerRoute(300).setMaxConnTotal(5000).disableAutomaticRetries()
  .build();
  client = new HttpSolrClient(serverBaseUrl, httpClient);

I have also placed this code at the following URL.  It will expire in
one month:

http://apaste.info/BpoWY

A similar technique can be used with CloudSolrClient if needed.

It's my opinion that SolrJ needs to create client objects by default
that are capable of more threads, but I have not yet put any time into
making it happen.

Thanks,
Shawn



Re: unable to get more throughput with more threads

2017-03-23 Thread Suresh Pendap
Edwin,
The heap was not being used much, only 1GB of heap was being used out of
8GB.  I do have space to allocate more to the heap size.
I was reading in some SOLR performance blogs that it is better not to use
large heap size, instead it is better to provide lot of space to the
Operating system
Disk cache so that maximum documents stay in memory in the buffer cache.

Regards
Suresh


On 3/23/17 7:52 PM, "Zheng Lin Edwin Yeo"  wrote:

>I also did find that beyond 10 threads for 8GB heap size , there isn't
>much
>improvement with the performance. But you can increase your heap size a
>little if your system allows it.
>
>By the way, which Solr version are you using?
>
>Regards,
>Edwin
>
>
>On 24 March 2017 at 09:21, Matt Magnusson  wrote:
>
>> Out of curosity, what is your index size? I'm trying to do something
>> similar with maximizing output, I'm currently looking at streaming
>> expressions which I'm seeing some interesting results for, I'm also
>> finding that the direct mass query route seems to hit a wall for
>> performance. I'm also finding that about 10 threads seems to be an
>> optimum number.
>>
>> On Thu, Mar 23, 2017 at 8:10 PM, Suresh Pendap 
>> wrote:
>> > Hi,
>> > I am new to SOLR search engine technology and I am trying to get some
>> performance numbers to get maximum throughput from the SOLR cluster of a
>> given size.
>> > I am currently doing only query load testing in which I randomly fire
>>a
>> bunch of queries to the SOLR cluster to generate the query load.  I
>> understand that it is not the ideal workload as the
>> > ingestion and commits happening invalidate the Solr Caches, so it is
>> advisable to perform query load along with some documents being
>>ingested.
>> >
>> > The SOLR cluster was made up of 2 shards and 2 replicas. So there were
>> total 4 replicas serving the queries. The SOLR nodes were running on an
>>LXD
>> container with 12 cores and 88GB RAM.
>> > The heap size allocated was 8g min and 8g max. All the other SOLR
>> configurations were default.
>> >
>> > The client node was running on an 8 core VM.
>> >
>> > I performed the test with 1 thread, 10 client threads and 50 client
>> threads.  I noticed that as I increased the number of threads, the query
>> latency kept increasing drastically which I was not expecting.
>> >
>> > Since my initial test was randomly picking queries from a file, I
>> decided to keep things constant and ran the program which fired the same
>> query again and again. Since it is the same query, all the documents
>>will
>> > be in the Cache and the query response time should be very fast. I was
>> also expecting that with 10 or 50 client threads, the query latency
>>should
>> not be increasing.
>> >
>> > The throughput increased only up to 10 client threads but then it was
>> same for 50 threads, 100 threads and the latency of the query kept
>> increasing as I increased the number of threads.
>> > The query was returning 2 documents only.
>> >
>> > The table below summarizes the numbers that I was saying with a single
>> query.
>> >
>> >
>> >
>> >
>> >
>> > #No of Client Nodes
>> > #No of Threads  99 pct Latency  95 pct latency  throughput
>> CPU Utilization Server Configuration
>> >
>> > 1   1   9 ms7 ms180 reqs/sec8%
>> >
>> > Heap size: ms=8g, mx=8g
>> >
>> > default configuration
>> >
>> >
>> > 1   10  400 ms  360 ms  360 reqs/sec10%
>> >
>> > Heap size: ms=8g, mx=8g
>> >
>> > default configuration
>> >
>> >
>> >
>> >
>> > I also ran the client program on the SOLR server node in order to rule
>> our the network latency factor. On the server node also the response
>>time
>> was higher for 10 threads, but the amplification was smaller.
>> >
>> > I am getting an impression that probably my query requests are getting
>> queued up and limited due to probably some thread pool size on the
>>server
>> side.  However I saw that the default jetty.xml does
>> > have the thread pool of min size of 10 and  max of 1.
>> >
>> > Is there any other internal SOLR thread pool configuration which might
>> be limiting the query response time?
>> >
>> > I wanted to check with the community if what I am seeing is abnormal
>> behavior, what could be the issue?  Is there any configuration that I
>>can
>> tweak to get better query response times for more load?
>> >
>> > Regards
>> > Suresh
>> >
>>



Re: unable to get more throughput with more threads

2017-03-23 Thread Suresh Pendap
I am using version 6.3 of Solr

On 3/23/17 7:56 PM, "Aman Deep Singh"  wrote:

>system



Re: unable to get more throughput with more threads

2017-03-23 Thread Erick Erickson
I'd check my I/O. Since you're firing the same query, I expect that
you aren't I/O bound at all, since, as you say, the docs should
already be in memory. This assumes that your document cache size is >
0. You can check this. Go to the admin UI, select one of your cores
(not collection) and go to plugins/stats. You should see the
documentCache as one of the entries and you should be hitting an
insane hit ratio close to 100% as your test runs.

Also check your queryResultCache. That also should be near 100% in
your test. Do note that these caches really never hit this "for real",
but as you indicated this is a highly artificial test so such high hit
ratios are what I'd expect.

Assuming that those caches are being hit near 100%, Solr really isn't
doing any work to speak of so there almost has to be some kind of
queueing going on.

The fact that your CPU is only running 8-10% is an indication that
you're requests are queued up somewhere, but where I have no clue. The
Jetty thread pool is quite large.  What are the QTimes reported in the
responses? My guess is that the QTime stays pretty constant (and very
low) even as your response time increases, another indication that
you're queueing.

Hmmm, is it possible that on the queueing is on the _client_ side?
What aggregate throughput to you get if you fire up 10 _clients_ each
with one thread rather than 1 client and 10 threads? That's a shot in
the dark, but worth a try I suppose. And how does your client fire
queries? SolrJ? Http? Jmeter or the like?

But yeah, this is weird. Since you're firing the same query, Solr
isn't really doing any work at all.

Best,
Erick

On Thu, Mar 23, 2017 at 7:56 PM, Aman Deep Singh
 wrote:
> You can play with the merge factor in the index config.
> If their is no frequent updates then make it 2 ,it will give you High
> throughput and less latency.
>
> On 24-Mar-2017 8:22 AM, "Zheng Lin Edwin Yeo"  wrote:
>
>> I also did find that beyond 10 threads for 8GB heap size , there isn't much
>> improvement with the performance. But you can increase your heap size a
>> little if your system allows it.
>>
>> By the way, which Solr version are you using?
>>
>> Regards,
>> Edwin
>>
>>
>> On 24 March 2017 at 09:21, Matt Magnusson  wrote:
>>
>> > Out of curosity, what is your index size? I'm trying to do something
>> > similar with maximizing output, I'm currently looking at streaming
>> > expressions which I'm seeing some interesting results for, I'm also
>> > finding that the direct mass query route seems to hit a wall for
>> > performance. I'm also finding that about 10 threads seems to be an
>> > optimum number.
>> >
>> > On Thu, Mar 23, 2017 at 8:10 PM, Suresh Pendap 
>> > wrote:
>> > > Hi,
>> > > I am new to SOLR search engine technology and I am trying to get some
>> > performance numbers to get maximum throughput from the SOLR cluster of a
>> > given size.
>> > > I am currently doing only query load testing in which I randomly fire a
>> > bunch of queries to the SOLR cluster to generate the query load.  I
>> > understand that it is not the ideal workload as the
>> > > ingestion and commits happening invalidate the Solr Caches, so it is
>> > advisable to perform query load along with some documents being ingested.
>> > >
>> > > The SOLR cluster was made up of 2 shards and 2 replicas. So there were
>> > total 4 replicas serving the queries. The SOLR nodes were running on an
>> LXD
>> > container with 12 cores and 88GB RAM.
>> > > The heap size allocated was 8g min and 8g max. All the other SOLR
>> > configurations were default.
>> > >
>> > > The client node was running on an 8 core VM.
>> > >
>> > > I performed the test with 1 thread, 10 client threads and 50 client
>> > threads.  I noticed that as I increased the number of threads, the query
>> > latency kept increasing drastically which I was not expecting.
>> > >
>> > > Since my initial test was randomly picking queries from a file, I
>> > decided to keep things constant and ran the program which fired the same
>> > query again and again. Since it is the same query, all the documents will
>> > > be in the Cache and the query response time should be very fast. I was
>> > also expecting that with 10 or 50 client threads, the query latency
>> should
>> > not be increasing.
>> > >
>> > > The throughput increased only up to 10 client threads but then it was
>> > same for 50 threads, 100 threads and the latency of the query kept
>> > increasing as I increased the number of threads.
>> > > The query was returning 2 documents only.
>> > >
>> > > The table below summarizes the numbers that I was saying with a single
>> > query.
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > #No of Client Nodes
>> > > #No of Threads  99 pct Latency  95 pct latency  throughput
>> > CPU Utilization Server Configuration
>> > >
>> > > 1   1   9 ms7 ms180 reqs/sec8%
>> > >
>> > > Heap size: ms=8g, 

Re: unable to get more throughput with more threads

2017-03-23 Thread Aman Deep Singh
You can play with the merge factor in the index config.
If their is no frequent updates then make it 2 ,it will give you High
throughput and less latency.

On 24-Mar-2017 8:22 AM, "Zheng Lin Edwin Yeo"  wrote:

> I also did find that beyond 10 threads for 8GB heap size , there isn't much
> improvement with the performance. But you can increase your heap size a
> little if your system allows it.
>
> By the way, which Solr version are you using?
>
> Regards,
> Edwin
>
>
> On 24 March 2017 at 09:21, Matt Magnusson  wrote:
>
> > Out of curosity, what is your index size? I'm trying to do something
> > similar with maximizing output, I'm currently looking at streaming
> > expressions which I'm seeing some interesting results for, I'm also
> > finding that the direct mass query route seems to hit a wall for
> > performance. I'm also finding that about 10 threads seems to be an
> > optimum number.
> >
> > On Thu, Mar 23, 2017 at 8:10 PM, Suresh Pendap 
> > wrote:
> > > Hi,
> > > I am new to SOLR search engine technology and I am trying to get some
> > performance numbers to get maximum throughput from the SOLR cluster of a
> > given size.
> > > I am currently doing only query load testing in which I randomly fire a
> > bunch of queries to the SOLR cluster to generate the query load.  I
> > understand that it is not the ideal workload as the
> > > ingestion and commits happening invalidate the Solr Caches, so it is
> > advisable to perform query load along with some documents being ingested.
> > >
> > > The SOLR cluster was made up of 2 shards and 2 replicas. So there were
> > total 4 replicas serving the queries. The SOLR nodes were running on an
> LXD
> > container with 12 cores and 88GB RAM.
> > > The heap size allocated was 8g min and 8g max. All the other SOLR
> > configurations were default.
> > >
> > > The client node was running on an 8 core VM.
> > >
> > > I performed the test with 1 thread, 10 client threads and 50 client
> > threads.  I noticed that as I increased the number of threads, the query
> > latency kept increasing drastically which I was not expecting.
> > >
> > > Since my initial test was randomly picking queries from a file, I
> > decided to keep things constant and ran the program which fired the same
> > query again and again. Since it is the same query, all the documents will
> > > be in the Cache and the query response time should be very fast. I was
> > also expecting that with 10 or 50 client threads, the query latency
> should
> > not be increasing.
> > >
> > > The throughput increased only up to 10 client threads but then it was
> > same for 50 threads, 100 threads and the latency of the query kept
> > increasing as I increased the number of threads.
> > > The query was returning 2 documents only.
> > >
> > > The table below summarizes the numbers that I was saying with a single
> > query.
> > >
> > >
> > >
> > >
> > >
> > > #No of Client Nodes
> > > #No of Threads  99 pct Latency  95 pct latency  throughput
> > CPU Utilization Server Configuration
> > >
> > > 1   1   9 ms7 ms180 reqs/sec8%
> > >
> > > Heap size: ms=8g, mx=8g
> > >
> > > default configuration
> > >
> > >
> > > 1   10  400 ms  360 ms  360 reqs/sec10%
> > >
> > > Heap size: ms=8g, mx=8g
> > >
> > > default configuration
> > >
> > >
> > >
> > >
> > > I also ran the client program on the SOLR server node in order to rule
> > our the network latency factor. On the server node also the response time
> > was higher for 10 threads, but the amplification was smaller.
> > >
> > > I am getting an impression that probably my query requests are getting
> > queued up and limited due to probably some thread pool size on the server
> > side.  However I saw that the default jetty.xml does
> > > have the thread pool of min size of 10 and  max of 1.
> > >
> > > Is there any other internal SOLR thread pool configuration which might
> > be limiting the query response time?
> > >
> > > I wanted to check with the community if what I am seeing is abnormal
> > behavior, what could be the issue?  Is there any configuration that I can
> > tweak to get better query response times for more load?
> > >
> > > Regards
> > > Suresh
> > >
> >
>


Re: unable to get more throughput with more threads

2017-03-23 Thread Zheng Lin Edwin Yeo
I also did find that beyond 10 threads for 8GB heap size , there isn't much
improvement with the performance. But you can increase your heap size a
little if your system allows it.

By the way, which Solr version are you using?

Regards,
Edwin


On 24 March 2017 at 09:21, Matt Magnusson  wrote:

> Out of curosity, what is your index size? I'm trying to do something
> similar with maximizing output, I'm currently looking at streaming
> expressions which I'm seeing some interesting results for, I'm also
> finding that the direct mass query route seems to hit a wall for
> performance. I'm also finding that about 10 threads seems to be an
> optimum number.
>
> On Thu, Mar 23, 2017 at 8:10 PM, Suresh Pendap 
> wrote:
> > Hi,
> > I am new to SOLR search engine technology and I am trying to get some
> performance numbers to get maximum throughput from the SOLR cluster of a
> given size.
> > I am currently doing only query load testing in which I randomly fire a
> bunch of queries to the SOLR cluster to generate the query load.  I
> understand that it is not the ideal workload as the
> > ingestion and commits happening invalidate the Solr Caches, so it is
> advisable to perform query load along with some documents being ingested.
> >
> > The SOLR cluster was made up of 2 shards and 2 replicas. So there were
> total 4 replicas serving the queries. The SOLR nodes were running on an LXD
> container with 12 cores and 88GB RAM.
> > The heap size allocated was 8g min and 8g max. All the other SOLR
> configurations were default.
> >
> > The client node was running on an 8 core VM.
> >
> > I performed the test with 1 thread, 10 client threads and 50 client
> threads.  I noticed that as I increased the number of threads, the query
> latency kept increasing drastically which I was not expecting.
> >
> > Since my initial test was randomly picking queries from a file, I
> decided to keep things constant and ran the program which fired the same
> query again and again. Since it is the same query, all the documents will
> > be in the Cache and the query response time should be very fast. I was
> also expecting that with 10 or 50 client threads, the query latency should
> not be increasing.
> >
> > The throughput increased only up to 10 client threads but then it was
> same for 50 threads, 100 threads and the latency of the query kept
> increasing as I increased the number of threads.
> > The query was returning 2 documents only.
> >
> > The table below summarizes the numbers that I was saying with a single
> query.
> >
> >
> >
> >
> >
> > #No of Client Nodes
> > #No of Threads  99 pct Latency  95 pct latency  throughput
> CPU Utilization Server Configuration
> >
> > 1   1   9 ms7 ms180 reqs/sec8%
> >
> > Heap size: ms=8g, mx=8g
> >
> > default configuration
> >
> >
> > 1   10  400 ms  360 ms  360 reqs/sec10%
> >
> > Heap size: ms=8g, mx=8g
> >
> > default configuration
> >
> >
> >
> >
> > I also ran the client program on the SOLR server node in order to rule
> our the network latency factor. On the server node also the response time
> was higher for 10 threads, but the amplification was smaller.
> >
> > I am getting an impression that probably my query requests are getting
> queued up and limited due to probably some thread pool size on the server
> side.  However I saw that the default jetty.xml does
> > have the thread pool of min size of 10 and  max of 1.
> >
> > Is there any other internal SOLR thread pool configuration which might
> be limiting the query response time?
> >
> > I wanted to check with the community if what I am seeing is abnormal
> behavior, what could be the issue?  Is there any configuration that I can
> tweak to get better query response times for more load?
> >
> > Regards
> > Suresh
> >
>


Re: unable to get more throughput with more threads

2017-03-23 Thread Matt Magnusson
Out of curosity, what is your index size? I'm trying to do something
similar with maximizing output, I'm currently looking at streaming
expressions which I'm seeing some interesting results for, I'm also
finding that the direct mass query route seems to hit a wall for
performance. I'm also finding that about 10 threads seems to be an
optimum number.

On Thu, Mar 23, 2017 at 8:10 PM, Suresh Pendap  wrote:
> Hi,
> I am new to SOLR search engine technology and I am trying to get some 
> performance numbers to get maximum throughput from the SOLR cluster of a 
> given size.
> I am currently doing only query load testing in which I randomly fire a bunch 
> of queries to the SOLR cluster to generate the query load.  I understand that 
> it is not the ideal workload as the
> ingestion and commits happening invalidate the Solr Caches, so it is 
> advisable to perform query load along with some documents being ingested.
>
> The SOLR cluster was made up of 2 shards and 2 replicas. So there were total 
> 4 replicas serving the queries. The SOLR nodes were running on an LXD 
> container with 12 cores and 88GB RAM.
> The heap size allocated was 8g min and 8g max. All the other SOLR 
> configurations were default.
>
> The client node was running on an 8 core VM.
>
> I performed the test with 1 thread, 10 client threads and 50 client threads.  
> I noticed that as I increased the number of threads, the query latency kept 
> increasing drastically which I was not expecting.
>
> Since my initial test was randomly picking queries from a file, I decided to 
> keep things constant and ran the program which fired the same query again and 
> again. Since it is the same query, all the documents will
> be in the Cache and the query response time should be very fast. I was also 
> expecting that with 10 or 50 client threads, the query latency should not be 
> increasing.
>
> The throughput increased only up to 10 client threads but then it was same 
> for 50 threads, 100 threads and the latency of the query kept increasing as I 
> increased the number of threads.
> The query was returning 2 documents only.
>
> The table below summarizes the numbers that I was saying with a single query.
>
>
>
>
>
> #No of Client Nodes
> #No of Threads  99 pct Latency  95 pct latency  throughput  CPU 
> Utilization Server Configuration
>
> 1   1   9 ms7 ms180 reqs/sec8%
>
> Heap size: ms=8g, mx=8g
>
> default configuration
>
>
> 1   10  400 ms  360 ms  360 reqs/sec10%
>
> Heap size: ms=8g, mx=8g
>
> default configuration
>
>
>
>
> I also ran the client program on the SOLR server node in order to rule our 
> the network latency factor. On the server node also the response time was 
> higher for 10 threads, but the amplification was smaller.
>
> I am getting an impression that probably my query requests are getting queued 
> up and limited due to probably some thread pool size on the server side.  
> However I saw that the default jetty.xml does
> have the thread pool of min size of 10 and  max of 1.
>
> Is there any other internal SOLR thread pool configuration which might be 
> limiting the query response time?
>
> I wanted to check with the community if what I am seeing is abnormal 
> behavior, what could be the issue?  Is there any configuration that I can 
> tweak to get better query response times for more load?
>
> Regards
> Suresh
>


unable to get more throughput with more threads

2017-03-23 Thread Suresh Pendap
Hi,
I am new to SOLR search engine technology and I am trying to get some 
performance numbers to get maximum throughput from the SOLR cluster of a given 
size.
I am currently doing only query load testing in which I randomly fire a bunch 
of queries to the SOLR cluster to generate the query load.  I understand that 
it is not the ideal workload as the
ingestion and commits happening invalidate the Solr Caches, so it is advisable 
to perform query load along with some documents being ingested.

The SOLR cluster was made up of 2 shards and 2 replicas. So there were total 4 
replicas serving the queries. The SOLR nodes were running on an LXD container 
with 12 cores and 88GB RAM.
The heap size allocated was 8g min and 8g max. All the other SOLR 
configurations were default.

The client node was running on an 8 core VM.

I performed the test with 1 thread, 10 client threads and 50 client threads.  I 
noticed that as I increased the number of threads, the query latency kept 
increasing drastically which I was not expecting.

Since my initial test was randomly picking queries from a file, I decided to 
keep things constant and ran the program which fired the same query again and 
again. Since it is the same query, all the documents will
be in the Cache and the query response time should be very fast. I was also 
expecting that with 10 or 50 client threads, the query latency should not be 
increasing.

The throughput increased only up to 10 client threads but then it was same for 
50 threads, 100 threads and the latency of the query kept increasing as I 
increased the number of threads.
The query was returning 2 documents only.

The table below summarizes the numbers that I was saying with a single query.





#No of Client Nodes
#No of Threads  99 pct Latency  95 pct latency  throughput  CPU 
Utilization Server Configuration

1   1   9 ms7 ms180 reqs/sec8%

Heap size: ms=8g, mx=8g

default configuration


1   10  400 ms  360 ms  360 reqs/sec10%

Heap size: ms=8g, mx=8g

default configuration




I also ran the client program on the SOLR server node in order to rule our the 
network latency factor. On the server node also the response time was higher 
for 10 threads, but the amplification was smaller.

I am getting an impression that probably my query requests are getting queued 
up and limited due to probably some thread pool size on the server side.  
However I saw that the default jetty.xml does
have the thread pool of min size of 10 and  max of 1.

Is there any other internal SOLR thread pool configuration which might be 
limiting the query response time?

I wanted to check with the community if what I am seeing is abnormal behavior, 
what could be the issue?  Is there any configuration that I can tweak to get 
better query response times for more load?

Regards
Suresh