Re: Some performance questions....

2018-03-25 Thread Walter Underwood
> On Mar 24, 2018, at 5:21 PM, Deepak Goel  wrote:
> 
> My first test was to test with static queries. Does Solr scale-up as we
> increase the load of same query?
> 
> The second test would be to check with 'Different Queries'.
> 
> And then finally check with 80% similar queries and 20% different queries.

You insulted me when I gave a clear explanation about how to run a meaningful 
benchmark.

Now you give results from a totally invalid benchmark. If you are getting slow 
responses from a one-query “benchmark”, you have serious system or 
configuration mistakes. That should be returning in a few milliseconds.

Also, 80/20 isn’t even close to a realistic query load.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Re: Some performance questions....

2018-03-25 Thread Deepak Goel
Some observations:

*#* The CPU Load on idxa1 never crosses above 91% mark mostly even if you
increase the load (by increasing the number of threads). This is similar to
my environment (I can never cross 90% on Linux even if I increase the load.
For Windows I can never cross 65% for some reason)

*#* Similarly the CPU Load on idxa2 never crosses 50% (I guess this follows
from the above point)

*#* Your system saturates at 10 threads (The qps hits the highest mark at
this load). Increasing the load further (number of threads - 20, 100, 200)
only worsens the response time, while the qps remains the same

*#* The Query-Time is anywhere between 25-100ms. For 200 threads, the
Query-Time is between 500-1400ms. This is for a load of 'Static-Query'.

A 'Dynamic-Query' load would only worsen the Query-Time (It will also
probably bring down the qps and max-cpu-utilisation)

*#* The author has a similar hardware configuration as yours (idxa1). The
author has not specified the OS though.

If it is Windows, then I would believe it might be a good idea to have 2
VM's on his box

If it is Linux, it might be a good idea to decide once someone does the
test with Dynamic-Query Load. If the author has a load of Static-Query,
then having one VM on his box should be fine as 90% of CPU resources can be
consumed (However he would loose on Reliability, Availability as compared
to 2 VM's)

Some other points:

*@* I would have liked to have the vmstat information for 10,5,7,8 threads

*@* Also if you could run the test for 7 and 8 threads (Because at 10
threads system saturates and at 5 threads the load is less)

*@* Can you please also do a Load-Test for Dynamic-Queries with 5-10
threads (I am sorry for asking too much. You can please ignore these
demands if it is too much). I will do the same on my environment



Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Sun, Mar 25, 2018 at 9:45 PM, Shawn Heisey  wrote:

> On 3/25/2018 7:15 AM, Deepak Goel wrote:
>
>> $ Why is the 'qps' not increasing with increase in threads? (If I
>> understand the qps parameter right?)
>>
>
> Likely because I sent all these queries to a single copy of the index.  We
> only have two copies of the index in production, plus a third copy on a dev
> server running a newer version of Solr. I sent the queries from the test
> program to the production server pair that's designated "standby" -- not
> receiving queries unless the other pair is down.
>
> Our Solr servers do not handle a high query load.  It's usually less than
> two queries per second.
>
> Handling a very high query load requires load balancing to multiple copies
> of the index (replicas in SolrCloud terminology). We don't need that, so we
> don't have a bunch of copies.  The only reason we have two copies is so we
> can handle hardware failure gracefully.  I bypassed the load balancer for
> these tests.
>
> $ Is it possible to run with 10 & 5 & 2 threads?
>>
>
> Sure.
>
> I have updated the gist with those results.
>
> https://gist.github.com/elyograg/abedf4ae28467059e46781f7d474f379
>
> $ What were the server utilisation (CPU, Memory) when you ran the test?
>>
>
> I actually never looked when I was running the tests before.  I ran
> additional tests so I could gather that data.  The updated gist has vmstat
> information (while running a 20 thread test, and while running a 200 thread
> test) for the server side. The server named idxa1 has a higher CPU load
> because it is aggregating the shard data and replying to the query, in
> addition to serving three out of the seven shards.  The server named idxa2
> has four shards.  The extra shard on idxa2 is very small - a little over
> 321000 docs, a little over 500MB disk used.  This is where new docs are
> written.
>
> The CPU load on idxa2 is similar for both thread levels.  I this is
> because all queries are served from cache.  But idxa1 shows a higher load,
> because even when the cache is used, that server must still aggregate the
> shard data (which was pulled from cache) and create responses.  The
> aggregation is not cached, because Solr has no way to know that what it is
> receiving from the shards is cached data.
>
> Here's the benchmark output from the 200 thread test when I was getting
> the CPU information:
>
> query count: 20
> elapsed count: 20
> query median: 488.0
> elapsed median: 500.0
> query 75th: 674.0
> elapsed 75th: 686.0
> query 95th: 1006.0
> elapsed 95th: 1018.0
> query 99th: 1283.01
> elapsed 99th: 1299.0
> total time in seconds: 542
> numThreads: 200
> queries per thread: 1000
> qps: 369
>
> $ The 'query median' increases from 35 to 470 as you increase threads from
>> 20 to 200 (You had mentioned earlier that QTime for Banjo query was 11
>> when
>> you had hit it the second time around)
>>
>
> When I got 11 ms, that was doing *one* 

Re: Some performance questions....

2018-03-25 Thread Shawn Heisey

On 3/25/2018 7:15 AM, Deepak Goel wrote:

$ Why is the 'qps' not increasing with increase in threads? (If I
understand the qps parameter right?)


Likely because I sent all these queries to a single copy of the index.  
We only have two copies of the index in production, plus a third copy on 
a dev server running a newer version of Solr. I sent the queries from 
the test program to the production server pair that's designated 
"standby" -- not receiving queries unless the other pair is down.


Our Solr servers do not handle a high query load.  It's usually less 
than two queries per second.


Handling a very high query load requires load balancing to multiple 
copies of the index (replicas in SolrCloud terminology). We don't need 
that, so we don't have a bunch of copies.  The only reason we have two 
copies is so we can handle hardware failure gracefully.  I bypassed the 
load balancer for these tests.



$ Is it possible to run with 10 & 5 & 2 threads?


Sure.

I have updated the gist with those results.

https://gist.github.com/elyograg/abedf4ae28467059e46781f7d474f379


$ What were the server utilisation (CPU, Memory) when you ran the test?


I actually never looked when I was running the tests before.  I ran 
additional tests so I could gather that data.  The updated gist has 
vmstat information (while running a 20 thread test, and while running a 
200 thread test) for the server side. The server named idxa1 has a 
higher CPU load because it is aggregating the shard data and replying to 
the query, in addition to serving three out of the seven shards.  The 
server named idxa2 has four shards.  The extra shard on idxa2 is very 
small - a little over 321000 docs, a little over 500MB disk used.  This 
is where new docs are written.


The CPU load on idxa2 is similar for both thread levels.  I this is 
because all queries are served from cache.  But idxa1 shows a higher 
load, because even when the cache is used, that server must still 
aggregate the shard data (which was pulled from cache) and create 
responses.  The aggregation is not cached, because Solr has no way to 
know that what it is receiving from the shards is cached data.


Here's the benchmark output from the 200 thread test when I was getting 
the CPU information:


query count: 20
elapsed count: 20
query median: 488.0
elapsed median: 500.0
query 75th: 674.0
elapsed 75th: 686.0
query 95th: 1006.0
elapsed 95th: 1018.0
query 99th: 1283.01
elapsed 99th: 1299.0
total time in seconds: 542
numThreads: 200
queries per thread: 1000
qps: 369


$ The 'query median' increases from 35 to 470 as you increase threads from
20 to 200 (You had mentioned earlier that QTime for Banjo query was 11 when
you had hit it the second time around)


When I got 11 ms, that was doing *one* query.  This program does a lot 
of them, so I'm not surprised by the increase.  I did the one-off 
queries on the dev server, not the standby production servers that 
received the load test.  The hardware specs are similar, except that in 
dev, the entire index is on one server running Solr 6.6.2.  That server 
also contains other indexes not being handled by the production pair I 
used for the load test.



$ Can you please give Linux server configuration if possible?


What *exactly* are you looking for here?  I've got some information 
below, but I do not know if it's what you are after.


High level, first server (idxa1):
Dell PowerEdge 2950 III
Two 4-core CPUs.
model name  : Intel(R) Xeon(R) CPU   E5440  @ 2.83GHz
64GB memory
Solr is version 4.7.2, with an 8GB heap
About 140GB of index data
CentOS 6, kernel 2.6.32-431.11.2.el6.centos.plus.x86_64
Oracla java:
java version "1.7.0_72"
Java(TM) SE Runtime Environment (build 1.7.0_72-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode)

Differences on the second server (idxa2):
model name  : Intel(R) Xeon(R) CPU   E5420  @ 2.50GHz
Slightly more (about 500MB) index data.
2.6.32-504.12.2.el6.centos.plus.x86_64.

The whole production index is in the ballpark of 280GB, and contains 
over 187 million docs.  The dev server has more than 188 million docs.  
I think the reason that the counts are different is because we very 
recently deleted a bunch of data from the database, but skipped the 
update of the Solr index for the deletion.  The production indexes have 
been rebuilt since the delete, but the dev index hasn't.


The network between the client running the test and the Solr servers 
includes a layer 3 switch, some layer 2 switches, and a firewall.  All 
network hardware is made by Cisco.  The entire path (including the 
firewall) is gigabit.


Thanks,
Shawn



Re: Some performance questions....

2018-03-25 Thread Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Sun, Mar 25, 2018 at 2:24 PM, Shawn Heisey  wrote:

> On 3/25/2018 1:45 AM, Shawn Heisey wrote:
>
>> I have written a little test program that can pound the system harder,
>> need a little more time to gather what I learned with it.
>>
>
> Here's the code and three results with different threadcounts:
>
> https://gist.github.com/elyograg/abedf4ae28467059e46781f7d474f379
>
> I ran the program several times while writing it.  Once I had it finished,
> I did the 20 thread run first, then the 100 thread run, and then the 200
> thread run.  Gist re-ordered my files, wasn't expecting that.
>
>
$ Why is the 'qps' not increasing with increase in threads? (If I
understand the qps parameter right?)

$ Is it possible to run with 10 & 5 & 2 threads?

$ What were the server utilisation (CPU, Memory) when you ran the test?

$ The 'query median' increases from 35 to 470 as you increase threads from
20 to 200 (You had mentioned earlier that QTime for Banjo query was 11 when
you had hit it the second time around)

$ Can you please give Linux server configuration if possible?


> It was executed inside eclipse on a Windows 7 system.  The Solr servers
> are running Linux.  This is a distributed index with 7 total shards running
> on two servers.  The "shards" parameter is defined on the server side in
> the 'ncmain' core, which has an empty index.  The servers are NOT running
> in SolrCloud mode.
>
> As you can see in the code, I was using exactly the same query every time
> -- that "banjo" query that I tried earlier.
>
> I have to try and remember how to build a simple program like this on the
> commandline before I can try it in Linux.  I don't know if it would see a
> performance improvement running on Linux.
>
> Thanks,
> Shawn
>
>


Re: Some performance questions....

2018-03-25 Thread Shawn Heisey

On 3/25/2018 1:45 AM, Shawn Heisey wrote:
I have written a little test program that can pound the system harder, 
need a little more time to gather what I learned with it. 


Here's the code and three results with different threadcounts:

https://gist.github.com/elyograg/abedf4ae28467059e46781f7d474f379

I ran the program several times while writing it.  Once I had it 
finished, I did the 20 thread run first, then the 100 thread run, and 
then the 200 thread run.  Gist re-ordered my files, wasn't expecting that.


It was executed inside eclipse on a Windows 7 system.  The Solr servers 
are running Linux.  This is a distributed index with 7 total shards 
running on two servers.  The "shards" parameter is defined on the server 
side in the 'ncmain' core, which has an empty index.  The servers are 
NOT running in SolrCloud mode.


As you can see in the code, I was using exactly the same query every 
time -- that "banjo" query that I tried earlier.


I have to try and remember how to build a simple program like this on 
the commandline before I can try it in Linux.  I don't know if it would 
see a performance improvement running on Linux.


Thanks,
Shawn



Re: Some performance questions....

2018-03-25 Thread Shawn Heisey

On 3/24/2018 10:42 PM, Deepak Goel wrote:

I believe you ran this query with a 1 user load. Or was it a multi-user
load test? If it was multi-user load test, how many users did you test for?
And what were the utilisations and tps?


It was late Saturday night when I did that.  There's almost no load on 
the system.


I literally did just the four queries I mentioned, using the admin UI.

I have written a little test program that can pound the system harder, 
need a little more time to gather what I learned with it.


Thanks,
Shawn



Re: Some performance questions....

2018-03-24 Thread Deepak Goel
On 25 Mar 2018 6:49 am, "Shawn Heisey"  wrote:

On 3/24/2018 6:21 PM, Deepak Goel wrote:

> Do you have any documented proof of the same (1 to 5ms)? Or is it an
> educated guess
>

Just now, I did a test.  I did a "*:*" query (all docs), the QTime was 194
milliseconds, numFound was 188635489.  Then I did the exact same query
again.  QTime dropped to 39 milliseconds.

Next, I did a query for "banjo" ... something I don't think a lot of people
are searching for.  The QTime on this was 2395 milliseconds, numFound was
737280.  Running the same query again, QTime dropped to 11 milliseconds.


I believe you ran this query with a 1 user load. Or was it a multi-user
load test? If it was multi-user load test, how many users did you test for?
And what were the utilisations and tps?


My index is big and distributed.  Your index is very small, and likely
contained in one core, so it should have far better performance than my
index.


I dont think Tomcat and Jmeter are a bottleneck. But I will bump up the
> heap size of them too
>

I was actually thinking that if these are run *without* a max heap setting,
that you might want to explicitly set the heap size so that it's not too
big.  Those programs probably don't need a very big heap at all.  If Java
were to choose a big default heap size, the server might start swapping,
and that would REALLY make performance bad, especially on Windows.


The problem I am facing: On Windows, the tps is 28 while on Linix, the tps
> is 564 (All the configuration and hardware is same). The other problem is,
> Even if there is plenty of hardware available, the Windows environment does
> not scale. And I wonder why is this so?
>

My first guess would be the 512MB heap, possibly causing even more problems
on Windows.

And then there's my general bias against Microsoft.  I have witnessed
deficiencies in their memory management, their filesystem performance, and
other things.  Linux just does a better job in almost every category that I
care about for a server.

Which version of Windows are you running it on?  You would only want to do
a test like this on a Server OS, and I'd hope that it's at least Server
2008.  The client operating systems do not handle server programs very
well.  And it should be a 64-bit OS, with 64-bit Java.

Thanks,
Shawn


Re: Some performance questions....

2018-03-24 Thread Shawn Heisey

On 3/24/2018 6:21 PM, Deepak Goel wrote:

Do you have any documented proof of the same (1 to 5ms)? Or is it an
educated guess


Just now, I did a test.  I did a "*:*" query (all docs), the QTime was 
194 milliseconds, numFound was 188635489.  Then I did the exact same 
query again.  QTime dropped to 39 milliseconds.


Next, I did a query for "banjo" ... something I don't think a lot of 
people are searching for.  The QTime on this was 2395 milliseconds, 
numFound was 737280.  Running the same query again, QTime dropped to 11 
milliseconds.


My index is big and distributed.  Your index is very small, and likely 
contained in one core, so it should have far better performance than my 
index.



I dont think Tomcat and Jmeter are a bottleneck. But I will bump up the
heap size of them too


I was actually thinking that if these are run *without* a max heap 
setting, that you might want to explicitly set the heap size so that 
it's not too big.  Those programs probably don't need a very big heap at 
all.  If Java were to choose a big default heap size, the server might 
start swapping, and that would REALLY make performance bad, especially 
on Windows.



The problem I am facing: On Windows, the tps is 28 while on Linix, the tps
is 564 (All the configuration and hardware is same). The other problem is,
Even if there is plenty of hardware available, the Windows environment does
not scale. And I wonder why is this so?


My first guess would be the 512MB heap, possibly causing even more 
problems on Windows.


And then there's my general bias against Microsoft.  I have witnessed 
deficiencies in their memory management, their filesystem performance, 
and other things.  Linux just does a better job in almost every category 
that I care about for a server.


Which version of Windows are you running it on?  You would only want to 
do a test like this on a Server OS, and I'd hope that it's at least 
Server 2008.  The client operating systems do not handle server programs 
very well.  And it should be a 64-bit OS, with 64-bit Java.


Thanks,
Shawn



Re: Some performance questions....

2018-03-24 Thread Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Sun, Mar 25, 2018 at 4:00 AM, Shawn Heisey  wrote:

> On 3/24/2018 1:25 PM, Deepak Goel wrote:
>
>> Please check the section *Questions from ‘Around the World’* in the
>> following doc for answers to your questions:
>>
>> *https://docs.google.com/document/d/1ZwyveG-Zjy7tbsvh9xjMug4
>> bnIMRqKnNax3jh4GJlzM/edit?usp=sharing
>>
>
> The document says that 80 percent of the time it's the same query and 20
> percent it's a different one.  But the code does not have any facility for
> changing the query, as far as I can see.  It appears to be always the same.
>
>
My first test was to test with static queries. Does Solr scale-up as we
increase the load of same query?

The second test would be to check with 'Different Queries'.

And then finally check with 80% similar queries and 20% different queries.


> If the query is always the same, or if it's the same 80 pecent of the
> time, I would expect response time on the vast majority of the queries to
> be about one to five milliseconds


Do you have any documented proof of the same (1 to 5ms)? Or is it an
educated guess


> , no matter how big the index is, but your document says it's 280 on
> Linux, and 1426 on Windows.
>
>
At peak loads on Linux, the response-time is 172ms. If I decrease the load
by half, the response time is around 50ms


> If all settings such as heap are at their defaults, then I suspect you may
> be running Solr with a heap size that's FAR too small.  If this is what's
> happening, then the JVM is going to be spending a very large amount of time
> performing garbage collection, instead of running the application.
>
>
I don't think the Jvm heap is a problem. But I will bump it up and test
again


> The default heap size when starting Solr using the included scripts is 512
> megabytes.  This is VERY small, to ensure that Solr will successfully start
> on any system.  Nearly all users must increase the heap size before they go
> to production.  I would set it to 2GB for your index.  If starting Solr
> with the bin\solr or bin/solr command, add a "-m 2g" parameter to the start
> command. 2GB should be a lot more than Solr needs to handle that index, but
> it isn't a HUGE amount.  Be aware that you may need to adjust the heap size
> for your Tomcat installation, and possibly JMeter as well, to be sure that
> those processes are allocating reasonable amounts of memory.


I dont think Tomcat and Jmeter are a bottleneck. But I will bump up the
heap size of them too


> I do not know what the recommended sizes for those programs will be, you
> would need to ask those communities.
>
>
The problem I am facing: On Windows, the tps is 28 while on Linix, the tps
is 564 (All the configuration and hardware is same). The other problem is,
Even if there is plenty of hardware available, the Windows environment does
not scale. And I wonder why is this so?


> Thanks,
> Shawn
>
>


Re: Some performance questions....

2018-03-24 Thread Shawn Heisey

On 3/24/2018 1:25 PM, Deepak Goel wrote:

Please check the section *Questions from ‘Around the World’* in the
following doc for answers to your questions:

*https://docs.google.com/document/d/1ZwyveG-Zjy7tbsvh9xjMug4bnIMRqKnNax3jh4GJlzM/edit?usp=sharing


The document says that 80 percent of the time it's the same query and 20 
percent it's a different one.  But the code does not have any facility 
for changing the query, as far as I can see.  It appears to be always 
the same.


If the query is always the same, or if it's the same 80 pecent of the 
time, I would expect response time on the vast majority of the queries 
to be about one to five milliseconds, no matter how big the index is, 
but your document says it's 280 on Linux, and 1426 on Windows.


If all settings such as heap are at their defaults, then I suspect you 
may be running Solr with a heap size that's FAR too small.  If this is 
what's happening, then the JVM is going to be spending a very large 
amount of time performing garbage collection, instead of running the 
application.


The default heap size when starting Solr using the included scripts is 
512 megabytes.  This is VERY small, to ensure that Solr will 
successfully start on any system.  Nearly all users must increase the 
heap size before they go to production.  I would set it to 2GB for your 
index.  If starting Solr with the bin\solr or bin/solr command, add a 
"-m 2g" parameter to the start command. 2GB should be a lot more than 
Solr needs to handle that index, but it isn't a HUGE amount.  Be aware 
that you may need to adjust the heap size for your Tomcat installation, 
and possibly JMeter as well, to be sure that those processes are 
allocating reasonable amounts of memory.  I do not know what the 
recommended sizes for those programs will be, you would need to ask 
those communities.


Thanks,
Shawn



Re: Some performance questions....

2018-03-24 Thread Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Sat, Mar 24, 2018 at 6:03 AM, Rick Leir  wrote:

>
>
> Deep,
> What is the test so I can try it.
>
>
*The test goal now according to me is to check:*

'How does Solr scales up on a single server (with varying OS if possible -
Linux, Windows) at 25%, 50%, 75%, 100% utilisation?'

*The original question from the Author was:*

Lets says I have a dual CPU with a total of 8 cores and 24 GB RAM for my
Solr and some other stuff.

Would it be more beneficial to only run 1 instance of Solr with the
collection stored on 4 HD's in RAID 0?? Or Have several Virtual
Machines each running of its own HD, ie: Have 4 VM's running Solr?


> 75 or 90 ms .. is that the JVM startup time?
>

This time is the time taken by my code to create a 'Client Object' in Solr
on Windows environment


> Cheers -- Rick
> >>
> >>
> >I have stated the numbers which I found during my test. The best way to
> >verify them is for someone else to run the same test. Otherwise I don't
> >see
> >how we can verify the results
>
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>


Re: Some performance questions....

2018-03-24 Thread Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Sat, Mar 24, 2018 at 5:16 AM, Shawn Heisey  wrote:

> On 3/23/2018 11:31 AM, Deepak Goel wrote:
> > Do you have any specific questions about the benchmark setup?
>
> How many docs are in the Solr index?  How much disk space does it
> consume?  How much total memory is in the machine?  How much memory is
> allocated to Java heaps?  Is there any other software running besides
> the Solr server and the benchmark program?  If it's a virtual machine,
> do you know anything about how many virtual machines are on the physical
> hardware, and whether resources are oversubscribed on the physical
> hardware?
>
> > I have stated the numbers which I found during my test. The best way to
> > verify them is for someone else to run the same test. Otherwise I don't
> see
> > how we can verify the results
>
> You have provided a code fragment, not complete code that can be used to
> compile exactly what you're running.  There is no information about
> exactly what you're doing with JMeter.  There are no version numbers for
> any of the software that you're using.  When I look at what's available,
> I don't have enough information to replicate your test.
>
> Your code fragment has a hard-coded query in it.  Running the same query
> over and over won't provide meaningful results, and definitely shouldn't
> show an average query time of nearly 1.5 seconds.
>
>
Please check the section *Questions from ‘Around the World’* in the
following doc for answers to your questions:

*https://docs.google.com/document/d/1ZwyveG-Zjy7tbsvh9xjMug4bnIMRqKnNax3jh4GJlzM/edit?usp=sharing
*


Thanks,
> Shawn
>
>


Re: Some performance questions....

2018-03-24 Thread Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Sat, Mar 24, 2018 at 6:18 AM, Shawn Heisey  wrote:

> On 3/23/2018 1:13 PM, Deepak Goel wrote:
> > Yes I am now creating a client object only once. On Linux it has superb
> > results (performance improves by around two times). However on Windows it
> > has no improvement
> >
> > *SoftwareThroughput (/sec)Response Time (msec)Utilization (%CPU)UnTuned
> > (Windows)27.8142665UnTuned (Linux)34528091Partially Tuned with Shawn's
> > suggestions (Linux)56417290Partially Tuned with Shawn's suggestions
> > (Windows)28.11.10560*
>
> This information is unreadable.  All the whitespace between the columns
> is missing.
>
> Please check this document
https://docs.google.com/document/d/1ZwyveG-Zjy7tbsvh9xjMug4bnIMRqKnNax3jh4GJlzM/edit?usp=sharing


> Thanks,
> Shawn
>
>


Re: Some performance questions....

2018-03-23 Thread Shawn Heisey
On 3/23/2018 1:13 PM, Deepak Goel wrote:
> Yes I am now creating a client object only once. On Linux it has superb
> results (performance improves by around two times). However on Windows it
> has no improvement
>
> *SoftwareThroughput (/sec)Response Time (msec)Utilization (%CPU)UnTuned
> (Windows)27.8142665UnTuned (Linux)34528091Partially Tuned with Shawn's
> suggestions (Linux)56417290Partially Tuned with Shawn's suggestions
> (Windows)28.11.10560*

This information is unreadable.  All the whitespace between the columns
is missing.

Thanks,
Shawn



Re: Some performance questions....

2018-03-23 Thread Rick Leir


Deep,
What is the test so I can try it. 

75 or 90 ms .. is that the JVM startup time?
Cheers -- Rick
>>
>>
>I have stated the numbers which I found during my test. The best way to
>verify them is for someone else to run the same test. Otherwise I don't
>see
>how we can verify the results


-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com


Re: Some performance questions....

2018-03-23 Thread Shawn Heisey
On 3/23/2018 11:31 AM, Deepak Goel wrote:
> Do you have any specific questions about the benchmark setup?

How many docs are in the Solr index?  How much disk space does it
consume?  How much total memory is in the machine?  How much memory is
allocated to Java heaps?  Is there any other software running besides
the Solr server and the benchmark program?  If it's a virtual machine,
do you know anything about how many virtual machines are on the physical
hardware, and whether resources are oversubscribed on the physical hardware?

> I have stated the numbers which I found during my test. The best way to
> verify them is for someone else to run the same test. Otherwise I don't see
> how we can verify the results

You have provided a code fragment, not complete code that can be used to
compile exactly what you're running.  There is no information about
exactly what you're doing with JMeter.  There are no version numbers for
any of the software that you're using.  When I look at what's available,
I don't have enough information to replicate your test.

Your code fragment has a hard-coded query in it.  Running the same query
over and over won't provide meaningful results, and definitely shouldn't
show an average query time of nearly 1.5 seconds.

Thanks,
Shawn



Re: Some performance questions....

2018-03-23 Thread Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Fri, Mar 23, 2018 at 11:38 PM, Shawn Heisey  wrote:

> On 3/23/2018 11:21 AM, Deepak Goel wrote:
> >> I tried the above suggestion. The throughput and utilisation remain the
> >> same (they dont increase even if I increase the load). The response time
> >> comes down.
> >>
>
> Are you still creating a new client object for every query?  Changing
> how the client object is created won't improve anything if you're still
> making a new one every time.
>
> You're going to need to move the client creation somewhere else in your
> code that only gets run once at startup, and then use the already-built
> client object in the code that does the query.  The different way of
> creating the client object that I gave you will ensure that it is
> actually capable of running concurrently with many threads. (With some
> older versions, this is not guaranteed)
>
>
Yes I am now creating a client object only once. On Linux it has superb
results (performance improves by around two times). However on Windows it
has no improvement


*SoftwareThroughput (/sec)Response Time (msec)Utilization (%CPU)UnTuned
(Windows)27.8142665UnTuned (Linux)34528091Partially Tuned with Shawn's
suggestions (Linux)56417290Partially Tuned with Shawn's suggestions
(Windows)28.11.10560*





Thanks,
> Shawn
>
>
>
>


Re: Some performance questions....

2018-03-23 Thread Shawn Heisey
On 3/23/2018 11:21 AM, Deepak Goel wrote:
>> I tried the above suggestion. The throughput and utilisation remain the
>> same (they dont increase even if I increase the load). The response time
>> comes down.
>>

Are you still creating a new client object for every query?  Changing
how the client object is created won't improve anything if you're still
making a new one every time.

You're going to need to move the client creation somewhere else in your
code that only gets run once at startup, and then use the already-built
client object in the code that does the query.  The different way of
creating the client object that I gave you will ensure that it is
actually capable of running concurrently with many threads. (With some
older versions, this is not guaranteed)

Thanks,
Shawn





Re: Some performance questions....

2018-03-23 Thread Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Tue, Mar 20, 2018 at 3:32 AM, Shawn Heisey  wrote:

> On 3/16/2018 4:24 PM, Deepak Goel wrote:
> > It is taking less than 100ms to create a HttpSolrClient Object
>
> "Less than 100ms" is vague.  Let's say by that you mean it takes at
> least 50 milliseconds.  This is a lot slower than I expected it to be,
> but if you've measured it, I'll accept that.
>
>
The results were a bit volatile from test to test. It used to take
sometimes 75ms and sometimes around 95ms. So I have stated the upper-bound
on the results (100ms)

(Sorry for being rude) However you don't need to accept my results. May I
suggest you to measure it yourself (or anyone else can also do it)


> If every single thread you're running has to spend 50 milliseconds or
> more creating a client before it can actually send a request, then the
> application is going to be spending a lot of time NOT sending requests,
> but creating and destroying clients.  (You didn't indicate how long the
> close() takes)
>

I did implement your solution (On windows it does not make a difference, on
Linux it does by at-least a margin of twice)


>
> Your numbers indicated a response time of 1426 milliseconds for Solr.
> If this is an average or a median, then that is not a fast query.  These
> numbers make me question the entire benchmark setup.


Do you have any specific questions about the benchmark setup?


> Based on the code
> provided, I don't see how the numbers can be that bad, even if we assume
> that up to 100 milliseconds is spent creating every client.
>
>
I have stated the numbers which I found during my test. The best way to
verify them is for someone else to run the same test. Otherwise I don't see
how we can verify the results


> Because the ES numbers are so much worse than the Solr numbers, I'm
> betting that creating an ES client is even less efficient than creating
> a Solr client.  If that's the case, I do not know why ... maybe that
> client runs through more startup checks than a Solr client does.
> Creation time for the client shouldn't matter, since it should only be
> done once for every benchmark run, and the time spent creating the
> client shouldn't be counted in the benchmark numbers.
>
>
I can check up & optimise the ES code. However it will take me a couple of
weeks on that


> Thanks,
> Shawn
>
>


Re: Some performance questions....

2018-03-23 Thread Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Thu, Mar 22, 2018 at 1:25 AM, Deepak Goel  wrote:

>
>
>
>
> Deepak
> "Please stop cruelty to Animals, help by becoming a Vegan"
> +91 73500 12833
> deic...@gmail.com
>
> Facebook: https://www.facebook.com/deicool
> LinkedIn: www.linkedin.com/in/deicool
>
> "Plant a Tree, Go Green"
>
> On Sat, Mar 17, 2018 at 2:56 AM, Shawn Heisey  wrote:
>
>> On 3/16/2018 2:21 PM, Deepak Goel wrote:
>> > I wanted to test how many max connections can Solr handle concurrently.
>> > Also I would have to implement an 'connection pooling' of the
>> client-object
>> > connections rather than a single connection thread
>> >
>> > However a single client object with thousands of queries coming in would
>> > surely become a bottleneck. I can test this scenario too.
>>
>> Handling thousands of simultaneous queries is NOT something you can
>> expect a single Solr server to do.  It's not going to happen.  It
>> wouldn't happen with ES, either.  Handling that much load requires load
>> balancing to a LOT of servers.  The server would much more of a
>> bottleneck than the client.
>>
>> > The problem is the max throughput which I can get on the machine is
>> around
>> > 28 tps, even though I increase the load further & only 65% CPU is
>> utilised
>> > (there is still 35% which is not being used). This clearly indicates the
>> > software is a problem as there is enough hardware resources.
>>
>> If your code is creating a client object before every single query, that
>> could be part of the issue.  The benchmark code should be using the same
>> client for all requests.  I really don't know how long it takes to
>> create HttpSolrClient objects, but I don't imagine that it's super-fast.
>>
>> What version of SolrJ were you using?
>>
>> Depending on the SolrJ version you may need to create the client with a
>> custom HttpClient object in order to allow it to handle plenty of
>> threads.  This is how I create client objects in my SolrJ code:
>>
>>   RequestConfig rc = RequestConfig.custom().setConnectTimeout(2000)
>> .setSocketTimeout(6).build();
>>   CloseableHttpClient httpClient =
>> HttpClients.custom().setDefaultRequestConfig(rc).setMaxConnPerRoute(1024)
>> .setMaxConnTotal(4096).disableAutomaticRetries().build();
>>
>>   SolrClient sc = new HttpSolrClient.Builder().withBaseSolrUrl(solrUrl)
>> .withHttpClient(httpClient).build();
>>
>> I tried the above suggestion. The throughput and utilisation remain the
> same (they dont increase even if I increase the load). The response time
> comes down.
>
>
>
>
>
>
>
> *SoftwareThroughput (/sec)Response Time (msec)Utilization (%CPU)UnTuned
> (Windows)27.8142665UnTuned (Linux)Partially Tuned (Linux)Partially Tuned
> (Windows)28.11.10560 *I am going to give your suggestion a spin on Linux
> next (This might take a day or two)
>
>
>

This is how the Linux results look like


*SoftwareThroughput (/sec)Response Time (msec)Utilization (%CPU)UnTuned
(Windows)27.8142665UnTuned (Linux)34528091Partially Tuned
(Linux)56417290Partially Tuned (Windows)28.11.10560*



> Thanks,
>> Shawn
>>
>>
>
>
> 
>  Virus-free.
> www.avg.com
> 
> <#m_7316059216213330048_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>


Re: Some performance questions....

2018-03-21 Thread Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Mon, Mar 19, 2018 at 2:40 AM, Walter Underwood 
wrote:

> > On Mar 17, 2018, at 3:23 AM, Deepak Goel  wrote:
> >
> > Sorry for being rude. But the ' results ' please, not the ' road to the
> > results '
>
> We have 15 different search collections, all different sizes and all with
> different kinds of queries. Here are the two major ones.
>
> 22 million docs
> 32 server Solr Cloud cluster, EC2 c4.8xlarge instances (36 CPU, 59 GB RAM)
> Solr 6.6.2
> 4 shards
> 24,000 requests/minute
> 95th percentile query response time 5 to 7 seconds
>
> 250,000 docs
> 4 server Solr master/slave cluster, EC2 c4.4xlarge (16 CPU, 30 GB RAM)
> Solr 4.10.4
> 60,000 requests/minute
> 95th percentile 100 ms
>
> This does not help at all. If you look at the author's question, i think
it is about a single server. You will have to post your results (25%CPU,
50%CPU, 75%CPU, 100%CPU) for a single server (how does the server scale
with increase in load)


> That should make everything crystal clear.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>


Virus-free.
www.avg.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: Some performance questions....

2018-03-21 Thread Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Sat, Mar 17, 2018 at 2:56 AM, Shawn Heisey  wrote:

> On 3/16/2018 2:21 PM, Deepak Goel wrote:
> > I wanted to test how many max connections can Solr handle concurrently.
> > Also I would have to implement an 'connection pooling' of the
> client-object
> > connections rather than a single connection thread
> >
> > However a single client object with thousands of queries coming in would
> > surely become a bottleneck. I can test this scenario too.
>
> Handling thousands of simultaneous queries is NOT something you can
> expect a single Solr server to do.  It's not going to happen.  It
> wouldn't happen with ES, either.  Handling that much load requires load
> balancing to a LOT of servers.  The server would much more of a
> bottleneck than the client.
>
> > The problem is the max throughput which I can get on the machine is
> around
> > 28 tps, even though I increase the load further & only 65% CPU is
> utilised
> > (there is still 35% which is not being used). This clearly indicates the
> > software is a problem as there is enough hardware resources.
>
> If your code is creating a client object before every single query, that
> could be part of the issue.  The benchmark code should be using the same
> client for all requests.  I really don't know how long it takes to
> create HttpSolrClient objects, but I don't imagine that it's super-fast.
>
> What version of SolrJ were you using?
>
> Depending on the SolrJ version you may need to create the client with a
> custom HttpClient object in order to allow it to handle plenty of
> threads.  This is how I create client objects in my SolrJ code:
>
>   RequestConfig rc = RequestConfig.custom().setConnectTimeout(2000)
> .setSocketTimeout(6).build();
>   CloseableHttpClient httpClient =
> HttpClients.custom().setDefaultRequestConfig(rc).setMaxConnPerRoute(1024)
> .setMaxConnTotal(4096).disableAutomaticRetries().build();
>
>   SolrClient sc = new HttpSolrClient.Builder().withBaseSolrUrl(solrUrl)
> .withHttpClient(httpClient).build();
>
> I tried the above suggestion. The throughput and utilisation remain the
same (they dont increase even if I increase the load). The response time
comes down.







*SoftwareThroughput (/sec)Response Time (msec)Utilization (%CPU)UnTuned
(Windows)27.8142665UnTuned (Linux)Partially Tuned (Linux)Partially Tuned
(Windows)28.11.10560 *I am going to give your suggestion a spin on Linux
next (This might take a day or two)



> Thanks,
> Shawn
>
>


Virus-free.
www.avg.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: Some performance questions....

2018-03-19 Thread Shawn Heisey
On 3/16/2018 4:24 PM, Deepak Goel wrote:
> It is taking less than 100ms to create a HttpSolrClient Object

"Less than 100ms" is vague.  Let's say by that you mean it takes at
least 50 milliseconds.  This is a lot slower than I expected it to be,
but if you've measured it, I'll accept that.

If every single thread you're running has to spend 50 milliseconds or
more creating a client before it can actually send a request, then the
application is going to be spending a lot of time NOT sending requests,
but creating and destroying clients.  (You didn't indicate how long the
close() takes)

Your numbers indicated a response time of 1426 milliseconds for Solr. 
If this is an average or a median, then that is not a fast query.  These
numbers make me question the entire benchmark setup.  Based on the code
provided, I don't see how the numbers can be that bad, even if we assume
that up to 100 milliseconds is spent creating every client.

Because the ES numbers are so much worse than the Solr numbers, I'm
betting that creating an ES client is even less efficient than creating
a Solr client.  If that's the case, I do not know why ... maybe that
client runs through more startup checks than a Solr client does. 
Creation time for the client shouldn't matter, since it should only be
done once for every benchmark run, and the time spent creating the
client shouldn't be counted in the benchmark numbers.

Thanks,
Shawn



Re: Some performance questions....

2018-03-18 Thread Walter Underwood
> On Mar 17, 2018, at 3:23 AM, Deepak Goel  wrote:
> 
> Sorry for being rude. But the ' results ' please, not the ' road to the
> results '

We have 15 different search collections, all different sizes and all with 
different kinds of queries. Here are the two major ones.

22 million docs
32 server Solr Cloud cluster, EC2 c4.8xlarge instances (36 CPU, 59 GB RAM)
Solr 6.6.2
4 shards
24,000 requests/minute
95th percentile query response time 5 to 7 seconds

250,000 docs
4 server Solr master/slave cluster, EC2 c4.4xlarge (16 CPU, 30 GB RAM)
Solr 4.10.4
60,000 requests/minute
95th percentile 100 ms

That should make everything crystal clear.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Re: Some performance questions....

2018-03-17 Thread BlackIce
Looks like I've opened up a very interesting can of worms

Thank you to all that are posting to this thread, I'm learning a lot...
 The way I see it now... a Single Solr instance on this machine, seems like
the most intelligent choice.
And then as upgrade path, adding in-expensive machines. This adds me
storage space, cpu power and starts to build up on the parallezation
cluster and load balancing.

Greetz

On Sat, Mar 17, 2018 at 11:23 AM, Deepak Goel  wrote:

> On 17 Mar 2018 05:19, "Walter Underwood"  wrote:
>
> > On Mar 16, 2018, at 3:26 PM, Deepak Goel  wrote:
> >
> > Can you please post results of your test?
> >
> > Please tell us the tps at 25%, 50%, 75%, 100% of your CPU resource
>
>
> I could, but it probably would not be useful for your documents or your
> queries.
>
> We have 22 million homework problems. Our queries are often hundreds of
> words long,
> because students copy and paste entire problems. After pre-processing, the
> average query
> is still 25 words.
>
> For load benchmarking, I use access logs from production. I typically
> gather over a half-million
> lines of log. Using production logs means that queries have the same
> statistical distribution
> as prod, so the cache hit rates are reasonable.
>
> Before each benchmark, I restart all the Solr instances to clear the
> caches. Then the first part
> of the query log is used to warm the caches, typically about 4000 queries.
>
> After that, the measured benchmark run starts. This uses JMeter with
> 100-500 threads. Each
> thread is configured with a constant throughput timer so a constant load is
> offered. Test run
> one or two hours. Recently, I ran a test with a rate of 1000
> requests/minute for one hour.
>
> During the benchmark, I monitor the CPU usage. Our systems are configured
> with enough RAM
> so that disk is not accessed for search indexes. If the CPU goes over
> 75-80%, there is congestion
> and queries will slow down. Also, if the run queue (load average) increases
> over the number of
> CPUs, there will be congestion.
>
> After the benchmark run, the JMeter log is analyzed to report response time
> percentiles for
> each Solr request handler.
>
>
> Sorry for being rude. But the ' results ' please, not the ' road to the
> results '
>
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>


Re: Some performance questions....

2018-03-17 Thread Deepak Goel
On 17 Mar 2018 05:19, "Walter Underwood"  wrote:

> On Mar 16, 2018, at 3:26 PM, Deepak Goel  wrote:
>
> Can you please post results of your test?
>
> Please tell us the tps at 25%, 50%, 75%, 100% of your CPU resource


I could, but it probably would not be useful for your documents or your
queries.

We have 22 million homework problems. Our queries are often hundreds of
words long,
because students copy and paste entire problems. After pre-processing, the
average query
is still 25 words.

For load benchmarking, I use access logs from production. I typically
gather over a half-million
lines of log. Using production logs means that queries have the same
statistical distribution
as prod, so the cache hit rates are reasonable.

Before each benchmark, I restart all the Solr instances to clear the
caches. Then the first part
of the query log is used to warm the caches, typically about 4000 queries.

After that, the measured benchmark run starts. This uses JMeter with
100-500 threads. Each
thread is configured with a constant throughput timer so a constant load is
offered. Test run
one or two hours. Recently, I ran a test with a rate of 1000
requests/minute for one hour.

During the benchmark, I monitor the CPU usage. Our systems are configured
with enough RAM
so that disk is not accessed for search indexes. If the CPU goes over
75-80%, there is congestion
and queries will slow down. Also, if the run queue (load average) increases
over the number of
CPUs, there will be congestion.

After the benchmark run, the JMeter log is analyzed to report response time
percentiles for
each Solr request handler.


Sorry for being rude. But the ' results ' please, not the ' road to the
results '


wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


Re: Some performance questions....

2018-03-16 Thread Walter Underwood
> On Mar 16, 2018, at 3:26 PM, Deepak Goel  wrote:
> 
> Can you please post results of your test?
> 
> Please tell us the tps at 25%, 50%, 75%, 100% of your CPU resource


I could, but it probably would not be useful for your documents or your queries.

We have 22 million homework problems. Our queries are often hundreds of words 
long,
because students copy and paste entire problems. After pre-processing, the 
average query
is still 25 words.

For load benchmarking, I use access logs from production. I typically gather 
over a half-million
lines of log. Using production logs means that queries have the same 
statistical distribution
as prod, so the cache hit rates are reasonable.

Before each benchmark, I restart all the Solr instances to clear the caches. 
Then the first part
of the query log is used to warm the caches, typically about 4000 queries.

After that, the measured benchmark run starts. This uses JMeter with 100-500 
threads. Each
thread is configured with a constant throughput timer so a constant load is 
offered. Test run
one or two hours. Recently, I ran a test with a rate of 1000 requests/minute 
for one hour.

During the benchmark, I monitor the CPU usage. Our systems are configured with 
enough RAM
so that disk is not accessed for search indexes. If the CPU goes over 75-80%, 
there is congestion
and queries will slow down. Also, if the run queue (load average) increases 
over the number of
CPUs, there will be congestion.

After the benchmark run, the JMeter log is analyzed to report response time 
percentiles for
each Solr request handler.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Re: Some performance questions....

2018-03-16 Thread Deepak Goel
On Sat, Mar 17, 2018 at 3:11 AM, Walter Underwood 
wrote:

> > On Mar 16, 2018, at 1:21 PM, Deepak Goel  wrote:
> >
> > However a single client object with thousands of queries coming in would
> > surely become a bottleneck. I can test this scenario too.
>
> No it isn’t. The single client object is thread-safe and manages a pool of
> connections.
>
> Your benchmark is probably the bottleneck. I have no problem driving 36
> CPUs to beyond
> 65% utilization with a benchmark.
>
>
Can you please post results of your test?

Please tell us the tps at 25%, 50%, 75%, 100% of your CPU resource


> Using one client object is not a scenario. It is how SolrJ was designed to
> be used.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>


Re: Some performance questions....

2018-03-16 Thread Deepak Goel
On Sat, Mar 17, 2018 at 2:56 AM, Shawn Heisey  wrote:

> On 3/16/2018 2:21 PM, Deepak Goel wrote:
> > I wanted to test how many max connections can Solr handle concurrently.
> > Also I would have to implement an 'connection pooling' of the
> client-object
> > connections rather than a single connection thread
> >
> > However a single client object with thousands of queries coming in would
> > surely become a bottleneck. I can test this scenario too.
>
> Handling thousands of simultaneous queries is NOT something you can
> expect a single Solr server to do.  It's not going to happen.  It
> wouldn't happen with ES, either.  Handling that much load requires load
> balancing to a LOT of servers.  The server would much more of a
> bottleneck than the client.
>

The problem is not server in my case. The server has hardware resources.
It's the software which is a problem.


>
> > The problem is the max throughput which I can get on the machine is
> around
> > 28 tps, even though I increase the load further & only 65% CPU is
> utilised
> > (there is still 35% which is not being used). This clearly indicates the
> > software is a problem as there is enough hardware resources.
>
> If your code is creating a client object before every single query, that
> could be part of the issue.  The benchmark code should be using the same
> client for all requests.  I really don't know how long it takes to
> create HttpSolrClient objects, but I don't imagine that it's super-fast.
>
>
It is taking less than 100ms to create a HttpSolrClient Object


> What version of SolrJ were you using?
>

Solr 7.2.0


> Depending on the SolrJ version you may need to create the client with a
> custom HttpClient object in order to allow it to handle plenty of
> threads.  This is how I create client objects in my SolrJ code:
>
>   RequestConfig rc = RequestConfig.custom().setConnectTimeout(2000)
> .setSocketTimeout(6).build();
>   CloseableHttpClient httpClient =
> HttpClients.custom().setDefaultRequestConfig(rc).setMaxConnPerRoute(1024)
> .setMaxConnTotal(4096).disableAutomaticRetries().build();
>
>   SolrClient sc = new HttpSolrClient.Builder().withBaseSolrUrl(solrUrl)
> .withHttpClient(httpClient).build();
>
>
I can give the above configuration a spin and test if the results improve


> Thanks,
> Shawn
>
>


Re: Some performance questions....

2018-03-16 Thread Walter Underwood
> On Mar 16, 2018, at 1:21 PM, Deepak Goel  wrote:
> 
> However a single client object with thousands of queries coming in would
> surely become a bottleneck. I can test this scenario too.

No it isn’t. The single client object is thread-safe and manages a pool of 
connections.

Your benchmark is probably the bottleneck. I have no problem driving 36 CPUs to 
beyond
65% utilization with a benchmark.

Using one client object is not a scenario. It is how SolrJ was designed to be 
used.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Re: Some performance questions....

2018-03-16 Thread Shawn Heisey
On 3/16/2018 2:21 PM, Deepak Goel wrote:
> I wanted to test how many max connections can Solr handle concurrently.
> Also I would have to implement an 'connection pooling' of the client-object
> connections rather than a single connection thread
>
> However a single client object with thousands of queries coming in would
> surely become a bottleneck. I can test this scenario too.

Handling thousands of simultaneous queries is NOT something you can
expect a single Solr server to do.  It's not going to happen.  It
wouldn't happen with ES, either.  Handling that much load requires load
balancing to a LOT of servers.  The server would much more of a
bottleneck than the client.

> The problem is the max throughput which I can get on the machine is around
> 28 tps, even though I increase the load further & only 65% CPU is utilised
> (there is still 35% which is not being used). This clearly indicates the
> software is a problem as there is enough hardware resources.

If your code is creating a client object before every single query, that
could be part of the issue.  The benchmark code should be using the same
client for all requests.  I really don't know how long it takes to
create HttpSolrClient objects, but I don't imagine that it's super-fast.

What version of SolrJ were you using?

Depending on the SolrJ version you may need to create the client with a
custom HttpClient object in order to allow it to handle plenty of
threads.  This is how I create client objects in my SolrJ code:

  RequestConfig rc = RequestConfig.custom().setConnectTimeout(2000)
    .setSocketTimeout(6).build();
  CloseableHttpClient httpClient =
HttpClients.custom().setDefaultRequestConfig(rc).setMaxConnPerRoute(1024)
    .setMaxConnTotal(4096).disableAutomaticRetries().build();

  SolrClient sc = new HttpSolrClient.Builder().withBaseSolrUrl(solrUrl)
    .withHttpClient(httpClient).build();

Thanks,
Shawn



RE: Some performance questions....

2018-03-16 Thread Davis, Daniel (NIH/NLM) [C]
Deepak,

A better test of multi-user support might be to vary the queries and try to 
simulate a realistic 'working set' of search data.

I've made this same performance analysis mistake with the search index of 
www.indexengines.com, which I developed (in part).   Somewhat different from 
Lucene, inside, although.

What we cared a lot about were these things:

- if a query was done warm, e.g. with results cached in memory, response time 
should be very fast.
- If a query was done cold, e.g. with results from disk, response time should 
still be acceptable.
- If a lot of different queries were done, that we think simulate the real 
behavior of N users, that the memory usage of cache should be acceptable, e.g. 
the cache should get warm and there should be few cache misses.

This last test was key - if we have designed our caching properly, then the 
queries of X users will fit in Y memory, and we will be able to develop a 
simple understanding of that, with our target users.

Generating that realistic amount of query behavior for X users is hard.   Using 
real search logs from your previous search product is a good idea.   For 
instance, if you look at the top 1000 queries performed by your users over a 
particular period of time, you can then say that some percentage of user 
queries were covered by the top 1000 queries, e.g. 90%.   Then, maybe you 
measure of that same period your queries per second (QPS).

Now, you can say that if you randomly sample those top 1000 queries while 
generating the same QPS with an exponential distribution generator, that you 
have covered 90% of your real traffic.   Your queries are much more randomly 
distributed, but that's OK, because what you want to know is whether it all 
fits in cache memory, the effect of # of CPUs, amount of Memory, number of 
cluster nodes, sharding, and replication on the response time and such.

Depending on your user community, top 1000 queries may not be enough to hit 
90%, it may only hit 70%.   Maybe you also need to look at the rate of 
"advanced search" and "search", or account for queries that drive business 
intelligence reports.   It really depends on your use case.   I wish I'd had 
the cloud available to test performance with - we were really naïve and did all 
this testing with our metal because, well, we thought our stuff relied on that.

I recommend you read the first couple chapters of Ran Jain's Art of Computer 
Systems Performance Analysis.   It’s a great book even if you totally skip the 
later chapters on Queuing System analysis, and just think about what and how to 
test.

Hope this helps,

-Dan 





-Original Message-
From: Deepak Goel [mailto:deic...@gmail.com] 
Sent: Friday, March 16, 2018 4:22 PM
To: solr-user@lucene.apache.org
Subject: Re: Some performance questions

On Sat, Mar 17, 2018 at 1:06 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 3/16/2018 7:38 AM, Deepak Goel wrote:
> > I did a performance study of Solr a while back. And I found that it 
> > does not scale beyond a particular point on a single machine (could 
> > be due to the way its coded). Hence multiple instances might make sense.
> >
> > https://docs.google.com/document/d/1kUqEcZl3NhOo6SLklo5Icg3fMnn9O
> tLY_lwnc6wbXus/edit?usp=sharing
>
> How did you *use* that code that you've shown?  That is not apparent 
> (at least to me) from the document.
>
> If every usage of the SolrJ code went through ALL of the code you've 
> shown, then it's not done well.  It appears that you're creating and 
> closing a client object with every query.  This will be VERY inefficient.
>
> The client object should be created during an initialization step, and 
> then passed to the benchmark step to be used there.  One client object 
> can be used by many threads.


I wanted to test how many max connections can Solr handle concurrently.
Also I would have to implement an 'connection pooling' of the client-object 
connections rather than a single connection thread

However a single client object with thousands of queries coming in would surely 
become a bottleneck. I can test this scenario too.

Very likely the ES client works the same,
> but you'd need to ask them to be sure.
>
> That code seems to be doing an identical query on every run.  If 
> that's what's happening, it's not a good indicator of performance.  
> Running the same query over and over will show better performance than 
> you can expect from a real-world query load

What evidence do you see that Solr isn't scaling like you expect?
>
> The problem is the max throughput which I can get on the machine is 
> around
28 tps, even though I increase the load further & only 65% CPU is utilised 
(there is still 35% which is not being used). This clearly indicates the 
software is a problem as there is enough hardware resources.

Also very soon I would have a Linux en

Re: Some performance questions....

2018-03-16 Thread Deepak Goel
On Sat, Mar 17, 2018 at 1:06 AM, Shawn Heisey  wrote:

> On 3/16/2018 7:38 AM, Deepak Goel wrote:
> > I did a performance study of Solr a while back. And I found that it does
> > not scale beyond a particular point on a single machine (could be due to
> > the way its coded). Hence multiple instances might make sense.
> >
> > https://docs.google.com/document/d/1kUqEcZl3NhOo6SLklo5Icg3fMnn9O
> tLY_lwnc6wbXus/edit?usp=sharing
>
> How did you *use* that code that you've shown?  That is not apparent (at
> least to me) from the document.
>
> If every usage of the SolrJ code went through ALL of the code you've
> shown, then it's not done well.  It appears that you're creating and
> closing a client object with every query.  This will be VERY inefficient.
>
> The client object should be created during an initialization step, and
> then passed to the benchmark step to be used there.  One client object
> can be used by many threads.


I wanted to test how many max connections can Solr handle concurrently.
Also I would have to implement an 'connection pooling' of the client-object
connections rather than a single connection thread

However a single client object with thousands of queries coming in would
surely become a bottleneck. I can test this scenario too.

Very likely the ES client works the same,
> but you'd need to ask them to be sure.
>
> That code seems to be doing an identical query on every run.  If that's
> what's happening, it's not a good indicator of performance.  Running the
> same query over and over will show better performance than you can
> expect from a real-world query load

What evidence do you see that Solr isn't scaling like you expect?
>
> The problem is the max throughput which I can get on the machine is around
28 tps, even though I increase the load further & only 65% CPU is utilised
(there is still 35% which is not being used). This clearly indicates the
software is a problem as there is enough hardware resources.

Also very soon I would have a Linux environment with me, so I can conduct
the test in the document on Linux too (for the users interested in Linux
and not Windows)


> Thanks,
> Shawn
>
>


Re: Some performance questions....

2018-03-16 Thread Shawn Heisey
On 3/16/2018 7:38 AM, Deepak Goel wrote:
> I did a performance study of Solr a while back. And I found that it does
> not scale beyond a particular point on a single machine (could be due to
> the way its coded). Hence multiple instances might make sense.
>
> https://docs.google.com/document/d/1kUqEcZl3NhOo6SLklo5Icg3fMnn9OtLY_lwnc6wbXus/edit?usp=sharing

How did you *use* that code that you've shown?  That is not apparent (at
least to me) from the document.

If every usage of the SolrJ code went through ALL of the code you've
shown, then it's not done well.  It appears that you're creating and
closing a client object with every query.  This will be VERY inefficient.

The client object should be created during an initialization step, and
then passed to the benchmark step to be used there.  One client object
can be used by many threads.  Very likely the ES client works the same,
but you'd need to ask them to be sure.

That code seems to be doing an identical query on every run.  If that's
what's happening, it's not a good indicator of performance.  Running the
same query over and over will show better performance than you can
expect from a real-world query load.

What evidence do you see that Solr isn't scaling like you expect?

Thanks,
Shawn



Re: Some performance questions....

2018-03-16 Thread Deepak Goel
> That benchmark is on Windows, so not interesting for most of us.

I guess I must have missed this in the author's question. Did he describe
his OS?

Also other applications scale well on Windows. Why would Solr be different?
The Solr page does not say about any performance limits on windows
(shouldn't they say that upfront in that case!)

https://lucene.apache.org/solr/guide/6_6/installing-solr.html#got-java
(You can install Solr in any system where a suitable Java Runtime
Environment (JRE) is available, as detailed below. Currently this includes
Linux, OS X, and Microsoft Windows.)

> Windows has very different handling for threads, memory, and files
compared to Unix. I had to do a lot of Windows-specific tuning for > >
Ultraseek Server to get decent performance. For example, merge speed was
terrible unless I opened files with a Windows-specific > > > >caching hint.





Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"
On Fri, Mar 16, 2018 at 9:43 PM, Walter Underwood 
wrote:

> On Mar 16, 2018, at 6:38 AM, Deepak Goel  wrote:
> >
> > I did a performance study of Solr a while back. And I found that it does
> > not scale beyond a particular point on a single machine (could be due to
> > the way its coded). Hence multiple instances might make sense.
> >
> > https://docs.google.com/document/d/1kUqEcZl3NhOo6SLklo5Icg3fMnn9O
> tLY_lwnc6wbXus/edit?usp=sharing  1kUqEcZl3NhOo6SLklo5Icg3fMnn9OtLY_lwnc6wbXus/edit?usp=sharing>
> >
> > ***Deepak***
>
> That benchmark is on Windows, so not interesting for most of us.
>
> Windows has very different handling for threads, memory, and files
> compared to Unix. I had to do a lot of Windows-specific tuning for
> Ultraseek Server to get decent performance. For example, merge speed was
> terrible unless I opened files with a Windows-specific caching hint.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>


Re: Some performance questions....

2018-03-16 Thread Deepak Goel
> On Mar 16, 2018, at 6:26 AM, Deepak Goel  wrote:
>
> I would try multiple Solr instances rather a single Solr instance (it
> definitely will give a performance boost)
> I would avoid multiple Solr instances on single machine. I can use all 36
cores on our servers with one Solr process.

Is your load scaling linearly? Can you please post the results?




Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Fri, Mar 16, 2018 at 9:39 PM, Walter Underwood 
wrote:

> > On Mar 16, 2018, at 6:26 AM, Deepak Goel  wrote:
> >
> > I would try multiple Solr instances rather a single Solr instance (it
> > definitely will give a performance boost)
>
>
> I would avoid multiple Solr instances on single machine. I can use all 36
> cores on our servers with one Solr process.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>


Re: Some performance questions....

2018-03-16 Thread Walter Underwood
On Mar 16, 2018, at 6:38 AM, Deepak Goel  wrote:
> 
> I did a performance study of Solr a while back. And I found that it does
> not scale beyond a particular point on a single machine (could be due to
> the way its coded). Hence multiple instances might make sense.
> 
> https://docs.google.com/document/d/1kUqEcZl3NhOo6SLklo5Icg3fMnn9OtLY_lwnc6wbXus/edit?usp=sharing
>  
> 
> 
> ***Deepak***

That benchmark is on Windows, so not interesting for most of us.

Windows has very different handling for threads, memory, and files compared to 
Unix. I had to do a lot of Windows-specific tuning for Ultraseek Server to get 
decent performance. For example, merge speed was terrible unless I opened files 
with a Windows-specific caching hint.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Re: Some performance questions....

2018-03-16 Thread Walter Underwood
> On Mar 16, 2018, at 6:26 AM, Deepak Goel  wrote:
> 
> I would try multiple Solr instances rather a single Solr instance (it
> definitely will give a performance boost)


I would avoid multiple Solr instances on single machine. I can use all 36 cores 
on our servers with one Solr process.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Re: Some performance questions....

2018-03-16 Thread Deepak Goel
On Fri, Mar 16, 2018 at 6:03 PM, Shawn Heisey  wrote:

> On 3/15/2018 6:34 AM, BlackIce wrote:
>
>> However the main app that will be
>> running is more or less a single threated app which takes advantage when
>> run under several instances, ie: parallelism, so I thought, since I'm at
>> it
>> I may give solr a few instances as well
>>
>
> ***Deepak***

I did a performance study of Solr a while back. And I found that it does
not scale beyond a particular point on a single machine (could be due to
the way its coded). Hence multiple instances might make sense.

https://docs.google.com/document/d/1kUqEcZl3NhOo6SLklo5Icg3fMnn9OtLY_lwnc6wbXus/edit?usp=sharing

***Deepak***



> Solr is a fully threaded app, capable of doing LOTS of things at the same
> time, without multiple instances.
>
> Thnx for the Heap pointer.. I've read, from some Professor.. that Solr
>> actually is more efficient with a very small Heap and to have everything
>> mapped to virtual memory... Which brings me to the next question.. is the
>> Virtual memory mapping done by the OS or Solar? Does the Virtual memory
>> reside on the OS HDD? Or on the Solr HDD?.. and if the Virtual memory
>> mapping is done on the OS HDD, wouldn't it be beneficial to run the OS off
>> a SSD?
>>
>
> ***Deepak***
If you have a small RAM (I am assuming that is what you mean by a small
heap), then OS will do swapping or demand paging to manage your memory
requirements. SSD will help. However it might be better to have a larger
RAM than rely on SSD.
***Deepak***

> There appears to be some confusion here.
>
> The virtual memory doesn't reside on ANY hard drive, unless you've REALLY
> configured the system badly and the system starts using swap space.  If the
> system starts using swap, performance is going to be terrible, no matter
> how fast the disk where swap resides is.
>
> The "mapping to virtual memory" feature is something the operating system
> does.  Lucene/Solr utilizes MMAP code in Java, which then turns around and
> uses MMAP functionality provided by the OS.
>
> At that point, that file can be accessed by the application as if it were
> a very large block of memory.  Mapping the file doesn't immediately use any
> memory at all.  The OS manages the access to the file.  If the part of the
> file that is being accessed has not been accessed before, then the OS will
> read the data off the disk, place it into the OS disk cache, and provide it
> to whatever requested it.  If it has been accessed before and is still in
> the disk cache, then it won't read the disk, it will just provide the data
> from the cache.  Getting most data from cache is *required* for good Solr
> performance.
>
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Running with your indexes on SSD might indeed help performance, and
> regardless of anything that's going on, WILL help performance in the short
> term, when you first turn the machine on.  But if it also helps with
> long-term query performance, then chances are that the machine doesn't have
> enough memory.When Solr servers are sized correctly, running on SSD is
> typically not going to make a big difference, unless the machine does a lot
> more indexing than querying.
>
> For now.. my FEELING is to run one Solr instance on this particular
>> machine.. by the time the RAM is outgrown add another machine and so
>> forth...
>>
>
> Any plans you have for a growth strategy with multiple Solr instances are
> extremely likely to still be possible with only one instance, with very
> little change.
>
> Thanks,
> Shawn







Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Fri, Mar 16, 2018 at 6:03 PM, Shawn Heisey  wrote:

> On 3/15/2018 6:34 AM, BlackIce wrote:
>
>> However the main app that will be
>> running is more or less a single threated app which takes advantage when
>> run under several instances, ie: parallelism, so I thought, since I'm at
>> it
>> I may give solr a few instances as well
>>
>
> Solr is a fully threaded app, capable of doing LOTS of things at the same
> time, without multiple instances.
>
> Thnx for the Heap pointer.. I've read, from some Professor.. that Solr
>> actually is more efficient with a very small Heap and to have everything
>> mapped to virtual memory... Which brings me to the next question.. is the
>> Virtual memory mapping done by the OS or Solar? Does the Virtual memory
>> reside on the OS HDD? Or on the Solr HDD?.. and if the Virtual memory
>> mapping is done on the OS HDD, wouldn't it be beneficial to run the OS off
>> a SSD?
>>
>
> There appears to be some confusion here.
>
> The virtual memory doesn't reside on ANY hard drive, unless you've REALLY
> configured the system badly and the system starts using swap space.  If the
> system starts using swap, 

Re: Some performance questions....

2018-03-16 Thread Deepak Goel
>I think there is no benefit in having multiple Solr instances on a single
>server, unless the heap memory required by the JVM is too big.
Deepak***
I would try multiple Solr instances rather a single Solr instance (it
definitely will give a performance boost)
Deepak***
>And remember that this has relatively to do with the index size ( inverted
>index is memory mapped OFF heap and docValues as well).
>On the other hand of course Apache Solr uses plenty of JVM heap memory as
>well ( caches, temporary data structures during indexing, ect ect)

> Deepak:
>
> Well its kinda a given that when running ANYTHING under a VM you have an
> overhead..

>***Deepak***
>You mean you are assuming without any facts (performance benchmark with n
>without VM)
 >***Deepak***
>I think Shawn detailed this quite extensively, I am no sys admin or OS
>expert, but there is no need of benchmarks and I don't even understand your
>doubts.
>In Information technology anytime you add additional layers of software you
>need adapters which means additional instructions executed.
>It is obvious  that having :
>metal -> OS -> APP is cheaper instruction wise then
>metal -> OS -> VM -> APP
>The APP will execute instruction in the VM that will be responsible to
>translate those instructions for the underlining OS.
Deepak***
I had past experience with VM's. They absolutely do not take any overheads.
Since we have conflicting opinions, it is best to benchmark it yourself
Deepak***
>Going direct you skip one passage.
>you can think about this when you emulate different OS, is it cheaper to
run
>windows on a machine directly to execute windows applications or run a
>Windows VM on top of another OS to execute windows applications ?









Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Thu, Mar 15, 2018 at 9:43 PM, Alessandro Benedetti 
wrote:

> *Single Solr Instance VS Multiple Solr instances on Single Server
> *
>
> I think there is no benefit in having multiple Solr instances on a single
> server, unless the heap memory required by the JVM is too big.
> And remember that this has relatively to do with the index size ( inverted
> index is memory mapped OFF heap and docValues as well).
> On the other hand of course Apache Solr uses plenty of JVM heap memory as
> well ( caches, temporary data structures during indexing, ect ect)
>
> > Deepak:
> >
> > Well its kinda a given that when running ANYTHING under a VM you have an
> > overhead..
>
> ***Deepak***
> You mean you are assuming without any facts (performance benchmark with n
> without VM)
>  ***Deepak***
> I think Shawn detailed this quite extensively, I am no sys admin or OS
> expert, but there is no need of benchmarks and I don't even understand your
> doubts.
> In Information technology anytime you add additional layers of software you
> need adapters which means additional instructions executed.
> It is obvious  that having :
> metal -> OS -> APP is cheaper instruction wise then
> metal -> OS -> VM -> APP
> The APP will execute instruction in the VM that will be responsible to
> translate those instructions for the underlining OS.
> Going direct you skip one passage.
> you can think about this when you emulate different OS, is it cheaper to
> run
> windows on a machine directly to execute windows applications or run a
> Windows VM on top of another OS to execute windows applications ?
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Some performance questions....

2018-03-16 Thread Shawn Heisey

On 3/15/2018 6:34 AM, BlackIce wrote:

However the main app that will be
running is more or less a single threated app which takes advantage when
run under several instances, ie: parallelism, so I thought, since I'm at it
I may give solr a few instances as well


Solr is a fully threaded app, capable of doing LOTS of things at the 
same time, without multiple instances.



Thnx for the Heap pointer.. I've read, from some Professor.. that Solr
actually is more efficient with a very small Heap and to have everything
mapped to virtual memory... Which brings me to the next question.. is the
Virtual memory mapping done by the OS or Solar? Does the Virtual memory
reside on the OS HDD? Or on the Solr HDD?.. and if the Virtual memory
mapping is done on the OS HDD, wouldn't it be beneficial to run the OS off
a SSD?


There appears to be some confusion here.

The virtual memory doesn't reside on ANY hard drive, unless you've 
REALLY configured the system badly and the system starts using swap 
space.  If the system starts using swap, performance is going to be 
terrible, no matter how fast the disk where swap resides is.


The "mapping to virtual memory" feature is something the operating 
system does.  Lucene/Solr utilizes MMAP code in Java, which then turns 
around and uses MMAP functionality provided by the OS.


At that point, that file can be accessed by the application as if it 
were a very large block of memory.  Mapping the file doesn't immediately 
use any memory at all.  The OS manages the access to the file.  If the 
part of the file that is being accessed has not been accessed before, 
then the OS will read the data off the disk, place it into the OS disk 
cache, and provide it to whatever requested it.  If it has been accessed 
before and is still in the disk cache, then it won't read the disk, it 
will just provide the data from the cache.  Getting most data from cache 
is *required* for good Solr performance.


http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Running with your indexes on SSD might indeed help performance, and 
regardless of anything that's going on, WILL help performance in the 
short term, when you first turn the machine on.  But if it also helps 
with long-term query performance, then chances are that the machine 
doesn't have enough memory.When Solr servers are sized correctly, 
running on SSD is typically not going to make a big difference, unless 
the machine does a lot more indexing than querying.



For now.. my FEELING is to run one Solr instance on this particular
machine.. by the time the RAM is outgrown add another machine and so
forth...


Any plans you have for a growth strategy with multiple Solr instances 
are extremely likely to still be possible with only one instance, with 
very little change.


Thanks,
Shawn



Re: Some performance questions....

2018-03-15 Thread Alessandro Benedetti
*Single Solr Instance VS Multiple Solr instances on Single Server
*

I think there is no benefit in having multiple Solr instances on a single
server, unless the heap memory required by the JVM is too big.
And remember that this has relatively to do with the index size ( inverted
index is memory mapped OFF heap and docValues as well).
On the other hand of course Apache Solr uses plenty of JVM heap memory as
well ( caches, temporary data structures during indexing, ect ect)

> Deepak: 
> 
> Well its kinda a given that when running ANYTHING under a VM you have an 
> overhead..

***Deepak*** 
You mean you are assuming without any facts (performance benchmark with n 
without VM) 
 ***Deepak*** 
I think Shawn detailed this quite extensively, I am no sys admin or OS
expert, but there is no need of benchmarks and I don't even understand your
doubts.
In Information technology anytime you add additional layers of software you
need adapters which means additional instructions executed.
It is obvious  that having :
metal -> OS -> APP is cheaper instruction wise then 
metal -> OS -> VM -> APP
The APP will execute instruction in the VM that will be responsible to
translate those instructions for the underlining OS.
Going direct you skip one passage.
you can think about this when you emulate different OS, is it cheaper to run
windows on a machine directly to execute windows applications or run a
Windows VM on top of another OS to execute windows applications ?



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Some performance questions....

2018-03-15 Thread Deepak Goel
Please see inline...



Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Thu, Mar 15, 2018 at 6:04 PM, BlackIce  wrote:

> Shawn:
> well the idea was to utilize system resources more efficiently.. this is
> not due so much to Solr, as I sayd I don't know that much about Solr,
> except Shema.xml and Solarconfig.xml - However the main app that will be
> running is more or less a single threated app which takes advantage when
> run under several instances, ie: parallelism, so I thought, since I'm at it
> I may give solr a few instances as well... but the more I read, the more
> confused I get.. I've read about some guy running 8 Solr instances on his
> dual Xeon 26xx series, each VM with 12 GB ram..
>
> Deepak:
>
> Well its kinda a given that when running ANYTHING under a VM you have an
> overhead..

***Deepak***
You mean you are assuming without any facts (performance benchmark with n
without VM)
 ***Deepak***

> so since I control the hardware, ie: not sharing space on some
> hosted VM by some ISP... why not skip the whole VM thing entirely?
>
> Thnx for the Heap pointer.. I've read, from some Professor.. that Solr
> actually is more efficient with a very small Heap and to have everything
> mapped to virtual memory... Which brings me to the next question.. is the
> Virtual memory mapping done by the OS or Solar? Does the Virtual memory
> reside on the OS HDD? Or on the Solr HDD?.. and if the Virtual memory
> mapping is done on the OS HDD, wouldn't it be beneficial to run the OS off
> a SSD?
>
> ***Deepak***
The OS does mapping itself to virtual memory (Atleast Unix does). However
am not sure of the internal mechanism of Solr
***Deepak***


> For now.. my FEELING is to run one Solr instance on this particular
> machine.. by the time the RAM is outgrown add another machine and so
> forth...

***Deepak***
I wonder if there are any performance benchmarks showing how Solr scales at
higher loads on a single machine (is it linear or non linear). Most
software don't scale linearly at higher loads
 ***Deepak***

> I've had a small set-back: due to the chasis configuration I could
> only fit in Half of the HDD's I intented.. the rest collide with the CPU
> heatsinks (Don't ask)
>  so my entire initial set-up has changed and with it my initial "growth
> strategy"
>
> On Wed, Mar 14, 2018 at 4:15 PM, Shawn Heisey  wrote:
>
> > On 3/14/2018 5:49 AM, BlackIce wrote:
> >
> >> I was just thinking Do I really need separate VM's in order to run
> >> multiple Solr instances? Doesn't it suffice to have each instance in its
> >> own user account?
> >>
> >
> > You can run multiple instances all under the same account on one machine.
> > But for a single machine, why do you need multiple Solr instances at all?
> > One instance can handle many indexes, and will probably do it more
> > efficiently than multiple instances.
> >
> > The only time I would *ever* recommend multiple Solr instances is when a
> > single instance would need an ENORMOUS Java heap -- something much larger
> > than 32GB.  If something like that can be split into multiple instances
> > where each one has a heap that's 31GB heap or less, then memory usage
> will
> > be more efficient and Java's garbage collection will work better.
> >
> > FYI -- Running Java with a 32GB heap actually has LESS memory available
> > than running it with a 31GB heap.  This is because when the heap reaches
> > 32GB, Java must switch to 64-bit pointers, so every little allocation
> > requires a little bit more memory.
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: Some performance questions....

2018-03-15 Thread BlackIce
Shawn:
well the idea was to utilize system resources more efficiently.. this is
not due so much to Solr, as I sayd I don't know that much about Solr,
except Shema.xml and Solarconfig.xml - However the main app that will be
running is more or less a single threated app which takes advantage when
run under several instances, ie: parallelism, so I thought, since I'm at it
I may give solr a few instances as well... but the more I read, the more
confused I get.. I've read about some guy running 8 Solr instances on his
dual Xeon 26xx series, each VM with 12 GB ram..

Deepak:

Well its kinda a given that when running ANYTHING under a VM you have an
overhead.. so since I control the hardware, ie: not sharing space on some
hosted VM by some ISP... why not skip the whole VM thing entirely?

Thnx for the Heap pointer.. I've read, from some Professor.. that Solr
actually is more efficient with a very small Heap and to have everything
mapped to virtual memory... Which brings me to the next question.. is the
Virtual memory mapping done by the OS or Solar? Does the Virtual memory
reside on the OS HDD? Or on the Solr HDD?.. and if the Virtual memory
mapping is done on the OS HDD, wouldn't it be beneficial to run the OS off
a SSD?

For now.. my FEELING is to run one Solr instance on this particular
machine.. by the time the RAM is outgrown add another machine and so
forth... I've had a small set-back: due to the chasis configuration I could
only fit in Half of the HDD's I intented.. the rest collide with the CPU
heatsinks (Don't ask)
 so my entire initial set-up has changed and with it my initial "growth
strategy"

On Wed, Mar 14, 2018 at 4:15 PM, Shawn Heisey  wrote:

> On 3/14/2018 5:49 AM, BlackIce wrote:
>
>> I was just thinking Do I really need separate VM's in order to run
>> multiple Solr instances? Doesn't it suffice to have each instance in its
>> own user account?
>>
>
> You can run multiple instances all under the same account on one machine.
> But for a single machine, why do you need multiple Solr instances at all?
> One instance can handle many indexes, and will probably do it more
> efficiently than multiple instances.
>
> The only time I would *ever* recommend multiple Solr instances is when a
> single instance would need an ENORMOUS Java heap -- something much larger
> than 32GB.  If something like that can be split into multiple instances
> where each one has a heap that's 31GB heap or less, then memory usage will
> be more efficient and Java's garbage collection will work better.
>
> FYI -- Running Java with a 32GB heap actually has LESS memory available
> than running it with a 31GB heap.  This is because when the heap reaches
> 32GB, Java must switch to 64-bit pointers, so every little allocation
> requires a little bit more memory.
>
> Thanks,
> Shawn
>
>


Re: Some performance questions....

2018-03-14 Thread Shawn Heisey

On 3/14/2018 5:49 AM, BlackIce wrote:

I was just thinking Do I really need separate VM's in order to run
multiple Solr instances? Doesn't it suffice to have each instance in its
own user account?


You can run multiple instances all under the same account on one 
machine.  But for a single machine, why do you need multiple Solr 
instances at all?  One instance can handle many indexes, and will 
probably do it more efficiently than multiple instances.


The only time I would *ever* recommend multiple Solr instances is when a 
single instance would need an ENORMOUS Java heap -- something much 
larger than 32GB.  If something like that can be split into multiple 
instances where each one has a heap that's 31GB heap or less, then 
memory usage will be more efficient and Java's garbage collection will 
work better.


FYI -- Running Java with a 32GB heap actually has LESS memory available 
than running it with a 31GB heap.  This is because when the heap reaches 
32GB, Java must switch to 64-bit pointers, so every little allocation 
requires a little bit more memory.


Thanks,
Shawn



Re: Some performance questions....

2018-03-14 Thread Deepak Goel
Have you measured the overhead of VM anytime? Or have you read it somewhere?

On 14 Mar 2018 18:10, "BlackIce"  wrote:

> but it should be possible, without the overhead of VM's
>
> On Wed, Mar 14, 2018 at 1:30 PM, Deepak Goel  wrote:
>
> > The OS resources would be shared in that case
> >
> > On 14 Mar 2018 17:19, "BlackIce"  wrote:
> >
> > > I was just thinking Do I really need separate VM's in order to run
> > > multiple Solr instances? Doesn't it suffice to have each instance in
> its
> > > own user account?
> > >
> > > Greetz
> > >
> > > On Mon, Mar 12, 2018 at 7:41 PM, BlackIce 
> wrote:
> > >
> > > > I don't have any production logs and this all sounds to
> > complicated.
> > > >
> > > > So, I'll just trow the system together in a way it makes the most
> sense
> > > > for now.. collect some logs and then do some testing further down the
> > > road.
> > > > For now just get the sucker up and running.
> > > >
> > > > Thanks all
> > > >
> > > > On Mon, Mar 12, 2018 at 7:23 PM, Deepak Goel 
> > wrote:
> > > >
> > > >> I am not sure if I understand your question
> > > >>
> > > >> *"How do I test this?"*
> > > >> You have to run test (benchmark test) of transactions (queries)
> which
> > > are
> > > >> most representative of your system (requirement).
> > > >>
> > > >> You can use a performance testing tool like JMeter (along with
> PerfMon
> > > >> configured for utilisation metrics)
> > > >>
> > > >>
> > > >>
> > > >> Deepak
> > > >> "Please stop cruelty to Animals, help by becoming a Vegan"
> > > >> +91 73500 12833
> > > >> deic...@gmail.com
> > > >>
> > > >> Facebook: https://www.facebook.com/deicool
> > > >> LinkedIn: www.linkedin.com/in/deicool
> > > >>
> > > >> "Plant a Tree, Go Green"
> > > >>
> > > >> On Mon, Mar 12, 2018 at 10:57 PM, BlackIce 
> > > wrote:
> > > >>
> > > >> > So Im thinking following scenarios :
> > > >> > Single instance with drives in raid 0, raid 10 and raid 5.
> > > >> >
> > > >> > And then having 3 Vms and 4 Solr instances each with its own HD.
> > > >> >
> > > >> > How do I test this?
> > > >> >
> > > >> >
> > > >> > Greetz
> > > >> >
> > > >> > On Mar 12, 2018 1:16 PM, "BlackIce" 
> wrote:
> > > >> >
> > > >> > > OK, so we're gone nowhere,  since I've already lost lots of
> > > time...  A
> > > >> > few
> > > >> > > days more or less won't make a difference  I'd be willing to
> > > >> > benchmark
> > > >> > > if some tells me how to.
> > > >> > >
> > > >> > >
> > > >> > > Greetz
> > > >> > >
> > > >> > > On Mar 12, 2018 7:17 AM, "Deepak Goel" 
> wrote:
> > > >> > >
> > > >> > >> Now you are mixing your original question about performance
> with
> > > >> > >> reliability
> > > >> > >>
> > > >> > >> On 12 Mar 2018 02:29, "BlackIce" 
> wrote:
> > > >> > >>
> > > >> > >> > Second to this wouldn't 4 Solr instances each with its own HD
> > be
> > > >> fault
> > > >> > >> > tolerant? vs. one solr instance with 4 HD's in RAID 0? Plus
> to
> > > his
> > > >> > comes
> > > >> > >> > the storage capacity, I need the capacity of those 4
> drives...
> > > the
> > > >> > more
> > > >> > >> I
> > > >> > >> > read.. the more questions
> > > >> > >> >
> > > >> > >> > On Sun, Mar 11, 2018 at 9:43 PM, BlackIce <
> > blackice...@gmail.com
> > > >
> > > >> > >> wrote:
> > > >> > >> >
> > > >> > >> > > Thnx for the pointers.
> > > >> > >> > >
> > > >> > >> > > I haven't given much thought to Solr, asides shemal.xml and
> > > >> > >> > solrconfig.xml
> > > >> > >> > > and I'm just diving into a bit more deeper stuff!
> > > >> > >> > >
> > > >> > >> > > Greetz
> > > >> > >> > >
> > > >> > >> > > RRK
> > > >> > >> > >
> > > >> > >> > > On Sun, Mar 11, 2018 at 8:58 PM, Deepak Goel <
> > > deic...@gmail.com>
> > > >> > >> wrote:
> > > >> > >> > >
> > > >> > >> > >> To rephrase your Question
> > > >> > >> > >>
> > > >> > >> > >> "Does Solr do well with Scale-up or Scale-out?"
> > > >> > >> > >>
> > > >> > >> > >> Are there any Performance Benchmarks for the same out
> there
> > > >> > >> supporting
> > > >> > >> > the
> > > >> > >> > >> claim?
> > > >> > >> > >>
> > > >> > >> > >> On 11 Mar 2018 23:05, "BlackIce" 
> > > wrote:
> > > >> > >> > >>
> > > >> > >> > >> > Hi,
> > > >> > >> > >> >
> > > >> > >> > >> > I have some questions regarding performance.
> > > >> > >> > >> >
> > > >> > >> > >> > Lets says I have a dual CPU with a total of 8 cores and
> 24
> > > GB
> > > >> RAM
> > > >> > >> for
> > > >> > >> > my
> > > >> > >> > >> > Solr and some other stuff.
> > > >> > >> > >> >
> > > >> > >> > >> > Would it be more beneficial to only run 1 instance of
> Solr
> > > >> with
> > > >> > the
> > > >> > >> > >> > collection stored on 4 HD's in RAID 0?? Or Have
> > several
> > > >> > Virtual
> > > >> > >> > >> > Machines each running of its own HD, ie: Have 4 VM's
> > 

Re: Some performance questions....

2018-03-14 Thread BlackIce
but it should be possible, without the overhead of VM's

On Wed, Mar 14, 2018 at 1:30 PM, Deepak Goel  wrote:

> The OS resources would be shared in that case
>
> On 14 Mar 2018 17:19, "BlackIce"  wrote:
>
> > I was just thinking Do I really need separate VM's in order to run
> > multiple Solr instances? Doesn't it suffice to have each instance in its
> > own user account?
> >
> > Greetz
> >
> > On Mon, Mar 12, 2018 at 7:41 PM, BlackIce  wrote:
> >
> > > I don't have any production logs and this all sounds to
> complicated.
> > >
> > > So, I'll just trow the system together in a way it makes the most sense
> > > for now.. collect some logs and then do some testing further down the
> > road.
> > > For now just get the sucker up and running.
> > >
> > > Thanks all
> > >
> > > On Mon, Mar 12, 2018 at 7:23 PM, Deepak Goel 
> wrote:
> > >
> > >> I am not sure if I understand your question
> > >>
> > >> *"How do I test this?"*
> > >> You have to run test (benchmark test) of transactions (queries) which
> > are
> > >> most representative of your system (requirement).
> > >>
> > >> You can use a performance testing tool like JMeter (along with PerfMon
> > >> configured for utilisation metrics)
> > >>
> > >>
> > >>
> > >> Deepak
> > >> "Please stop cruelty to Animals, help by becoming a Vegan"
> > >> +91 73500 12833
> > >> deic...@gmail.com
> > >>
> > >> Facebook: https://www.facebook.com/deicool
> > >> LinkedIn: www.linkedin.com/in/deicool
> > >>
> > >> "Plant a Tree, Go Green"
> > >>
> > >> On Mon, Mar 12, 2018 at 10:57 PM, BlackIce 
> > wrote:
> > >>
> > >> > So Im thinking following scenarios :
> > >> > Single instance with drives in raid 0, raid 10 and raid 5.
> > >> >
> > >> > And then having 3 Vms and 4 Solr instances each with its own HD.
> > >> >
> > >> > How do I test this?
> > >> >
> > >> >
> > >> > Greetz
> > >> >
> > >> > On Mar 12, 2018 1:16 PM, "BlackIce"  wrote:
> > >> >
> > >> > > OK, so we're gone nowhere,  since I've already lost lots of
> > time...  A
> > >> > few
> > >> > > days more or less won't make a difference  I'd be willing to
> > >> > benchmark
> > >> > > if some tells me how to.
> > >> > >
> > >> > >
> > >> > > Greetz
> > >> > >
> > >> > > On Mar 12, 2018 7:17 AM, "Deepak Goel"  wrote:
> > >> > >
> > >> > >> Now you are mixing your original question about performance with
> > >> > >> reliability
> > >> > >>
> > >> > >> On 12 Mar 2018 02:29, "BlackIce"  wrote:
> > >> > >>
> > >> > >> > Second to this wouldn't 4 Solr instances each with its own HD
> be
> > >> fault
> > >> > >> > tolerant? vs. one solr instance with 4 HD's in RAID 0? Plus to
> > his
> > >> > comes
> > >> > >> > the storage capacity, I need the capacity of those 4 drives...
> > the
> > >> > more
> > >> > >> I
> > >> > >> > read.. the more questions
> > >> > >> >
> > >> > >> > On Sun, Mar 11, 2018 at 9:43 PM, BlackIce <
> blackice...@gmail.com
> > >
> > >> > >> wrote:
> > >> > >> >
> > >> > >> > > Thnx for the pointers.
> > >> > >> > >
> > >> > >> > > I haven't given much thought to Solr, asides shemal.xml and
> > >> > >> > solrconfig.xml
> > >> > >> > > and I'm just diving into a bit more deeper stuff!
> > >> > >> > >
> > >> > >> > > Greetz
> > >> > >> > >
> > >> > >> > > RRK
> > >> > >> > >
> > >> > >> > > On Sun, Mar 11, 2018 at 8:58 PM, Deepak Goel <
> > deic...@gmail.com>
> > >> > >> wrote:
> > >> > >> > >
> > >> > >> > >> To rephrase your Question
> > >> > >> > >>
> > >> > >> > >> "Does Solr do well with Scale-up or Scale-out?"
> > >> > >> > >>
> > >> > >> > >> Are there any Performance Benchmarks for the same out there
> > >> > >> supporting
> > >> > >> > the
> > >> > >> > >> claim?
> > >> > >> > >>
> > >> > >> > >> On 11 Mar 2018 23:05, "BlackIce" 
> > wrote:
> > >> > >> > >>
> > >> > >> > >> > Hi,
> > >> > >> > >> >
> > >> > >> > >> > I have some questions regarding performance.
> > >> > >> > >> >
> > >> > >> > >> > Lets says I have a dual CPU with a total of 8 cores and 24
> > GB
> > >> RAM
> > >> > >> for
> > >> > >> > my
> > >> > >> > >> > Solr and some other stuff.
> > >> > >> > >> >
> > >> > >> > >> > Would it be more beneficial to only run 1 instance of Solr
> > >> with
> > >> > the
> > >> > >> > >> > collection stored on 4 HD's in RAID 0?? Or Have
> several
> > >> > Virtual
> > >> > >> > >> > Machines each running of its own HD, ie: Have 4 VM's
> running
> > >> > Solr?
> > >> > >> > >> >
> > >> > >> > >> > Any Thoughts?
> > >> > >> > >> >
> > >> > >> > >> > Thank you!
> > >> > >> > >> >
> > >> > >> > >> > RRK
> > >> > >> > >> >
> > >> > >> > >>
> > >> > >> > >
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>


Re: Some performance questions....

2018-03-14 Thread Deepak Goel
The OS resources would be shared in that case

On 14 Mar 2018 17:19, "BlackIce"  wrote:

> I was just thinking Do I really need separate VM's in order to run
> multiple Solr instances? Doesn't it suffice to have each instance in its
> own user account?
>
> Greetz
>
> On Mon, Mar 12, 2018 at 7:41 PM, BlackIce  wrote:
>
> > I don't have any production logs and this all sounds to complicated.
> >
> > So, I'll just trow the system together in a way it makes the most sense
> > for now.. collect some logs and then do some testing further down the
> road.
> > For now just get the sucker up and running.
> >
> > Thanks all
> >
> > On Mon, Mar 12, 2018 at 7:23 PM, Deepak Goel  wrote:
> >
> >> I am not sure if I understand your question
> >>
> >> *"How do I test this?"*
> >> You have to run test (benchmark test) of transactions (queries) which
> are
> >> most representative of your system (requirement).
> >>
> >> You can use a performance testing tool like JMeter (along with PerfMon
> >> configured for utilisation metrics)
> >>
> >>
> >>
> >> Deepak
> >> "Please stop cruelty to Animals, help by becoming a Vegan"
> >> +91 73500 12833
> >> deic...@gmail.com
> >>
> >> Facebook: https://www.facebook.com/deicool
> >> LinkedIn: www.linkedin.com/in/deicool
> >>
> >> "Plant a Tree, Go Green"
> >>
> >> On Mon, Mar 12, 2018 at 10:57 PM, BlackIce 
> wrote:
> >>
> >> > So Im thinking following scenarios :
> >> > Single instance with drives in raid 0, raid 10 and raid 5.
> >> >
> >> > And then having 3 Vms and 4 Solr instances each with its own HD.
> >> >
> >> > How do I test this?
> >> >
> >> >
> >> > Greetz
> >> >
> >> > On Mar 12, 2018 1:16 PM, "BlackIce"  wrote:
> >> >
> >> > > OK, so we're gone nowhere,  since I've already lost lots of
> time...  A
> >> > few
> >> > > days more or less won't make a difference  I'd be willing to
> >> > benchmark
> >> > > if some tells me how to.
> >> > >
> >> > >
> >> > > Greetz
> >> > >
> >> > > On Mar 12, 2018 7:17 AM, "Deepak Goel"  wrote:
> >> > >
> >> > >> Now you are mixing your original question about performance with
> >> > >> reliability
> >> > >>
> >> > >> On 12 Mar 2018 02:29, "BlackIce"  wrote:
> >> > >>
> >> > >> > Second to this wouldn't 4 Solr instances each with its own HD be
> >> fault
> >> > >> > tolerant? vs. one solr instance with 4 HD's in RAID 0? Plus to
> his
> >> > comes
> >> > >> > the storage capacity, I need the capacity of those 4 drives...
> the
> >> > more
> >> > >> I
> >> > >> > read.. the more questions
> >> > >> >
> >> > >> > On Sun, Mar 11, 2018 at 9:43 PM, BlackIce  >
> >> > >> wrote:
> >> > >> >
> >> > >> > > Thnx for the pointers.
> >> > >> > >
> >> > >> > > I haven't given much thought to Solr, asides shemal.xml and
> >> > >> > solrconfig.xml
> >> > >> > > and I'm just diving into a bit more deeper stuff!
> >> > >> > >
> >> > >> > > Greetz
> >> > >> > >
> >> > >> > > RRK
> >> > >> > >
> >> > >> > > On Sun, Mar 11, 2018 at 8:58 PM, Deepak Goel <
> deic...@gmail.com>
> >> > >> wrote:
> >> > >> > >
> >> > >> > >> To rephrase your Question
> >> > >> > >>
> >> > >> > >> "Does Solr do well with Scale-up or Scale-out?"
> >> > >> > >>
> >> > >> > >> Are there any Performance Benchmarks for the same out there
> >> > >> supporting
> >> > >> > the
> >> > >> > >> claim?
> >> > >> > >>
> >> > >> > >> On 11 Mar 2018 23:05, "BlackIce" 
> wrote:
> >> > >> > >>
> >> > >> > >> > Hi,
> >> > >> > >> >
> >> > >> > >> > I have some questions regarding performance.
> >> > >> > >> >
> >> > >> > >> > Lets says I have a dual CPU with a total of 8 cores and 24
> GB
> >> RAM
> >> > >> for
> >> > >> > my
> >> > >> > >> > Solr and some other stuff.
> >> > >> > >> >
> >> > >> > >> > Would it be more beneficial to only run 1 instance of Solr
> >> with
> >> > the
> >> > >> > >> > collection stored on 4 HD's in RAID 0?? Or Have several
> >> > Virtual
> >> > >> > >> > Machines each running of its own HD, ie: Have 4 VM's running
> >> > Solr?
> >> > >> > >> >
> >> > >> > >> > Any Thoughts?
> >> > >> > >> >
> >> > >> > >> > Thank you!
> >> > >> > >> >
> >> > >> > >> > RRK
> >> > >> > >> >
> >> > >> > >>
> >> > >> > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >
> >> >
> >>
> >
> >
>


Re: Some performance questions....

2018-03-14 Thread BlackIce
I was just thinking Do I really need separate VM's in order to run
multiple Solr instances? Doesn't it suffice to have each instance in its
own user account?

Greetz

On Mon, Mar 12, 2018 at 7:41 PM, BlackIce  wrote:

> I don't have any production logs and this all sounds to complicated.
>
> So, I'll just trow the system together in a way it makes the most sense
> for now.. collect some logs and then do some testing further down the road.
> For now just get the sucker up and running.
>
> Thanks all
>
> On Mon, Mar 12, 2018 at 7:23 PM, Deepak Goel  wrote:
>
>> I am not sure if I understand your question
>>
>> *"How do I test this?"*
>> You have to run test (benchmark test) of transactions (queries) which are
>> most representative of your system (requirement).
>>
>> You can use a performance testing tool like JMeter (along with PerfMon
>> configured for utilisation metrics)
>>
>>
>>
>> Deepak
>> "Please stop cruelty to Animals, help by becoming a Vegan"
>> +91 73500 12833
>> deic...@gmail.com
>>
>> Facebook: https://www.facebook.com/deicool
>> LinkedIn: www.linkedin.com/in/deicool
>>
>> "Plant a Tree, Go Green"
>>
>> On Mon, Mar 12, 2018 at 10:57 PM, BlackIce  wrote:
>>
>> > So Im thinking following scenarios :
>> > Single instance with drives in raid 0, raid 10 and raid 5.
>> >
>> > And then having 3 Vms and 4 Solr instances each with its own HD.
>> >
>> > How do I test this?
>> >
>> >
>> > Greetz
>> >
>> > On Mar 12, 2018 1:16 PM, "BlackIce"  wrote:
>> >
>> > > OK, so we're gone nowhere,  since I've already lost lots of time...  A
>> > few
>> > > days more or less won't make a difference  I'd be willing to
>> > benchmark
>> > > if some tells me how to.
>> > >
>> > >
>> > > Greetz
>> > >
>> > > On Mar 12, 2018 7:17 AM, "Deepak Goel"  wrote:
>> > >
>> > >> Now you are mixing your original question about performance with
>> > >> reliability
>> > >>
>> > >> On 12 Mar 2018 02:29, "BlackIce"  wrote:
>> > >>
>> > >> > Second to this wouldn't 4 Solr instances each with its own HD be
>> fault
>> > >> > tolerant? vs. one solr instance with 4 HD's in RAID 0? Plus to his
>> > comes
>> > >> > the storage capacity, I need the capacity of those 4 drives... the
>> > more
>> > >> I
>> > >> > read.. the more questions
>> > >> >
>> > >> > On Sun, Mar 11, 2018 at 9:43 PM, BlackIce 
>> > >> wrote:
>> > >> >
>> > >> > > Thnx for the pointers.
>> > >> > >
>> > >> > > I haven't given much thought to Solr, asides shemal.xml and
>> > >> > solrconfig.xml
>> > >> > > and I'm just diving into a bit more deeper stuff!
>> > >> > >
>> > >> > > Greetz
>> > >> > >
>> > >> > > RRK
>> > >> > >
>> > >> > > On Sun, Mar 11, 2018 at 8:58 PM, Deepak Goel 
>> > >> wrote:
>> > >> > >
>> > >> > >> To rephrase your Question
>> > >> > >>
>> > >> > >> "Does Solr do well with Scale-up or Scale-out?"
>> > >> > >>
>> > >> > >> Are there any Performance Benchmarks for the same out there
>> > >> supporting
>> > >> > the
>> > >> > >> claim?
>> > >> > >>
>> > >> > >> On 11 Mar 2018 23:05, "BlackIce"  wrote:
>> > >> > >>
>> > >> > >> > Hi,
>> > >> > >> >
>> > >> > >> > I have some questions regarding performance.
>> > >> > >> >
>> > >> > >> > Lets says I have a dual CPU with a total of 8 cores and 24 GB
>> RAM
>> > >> for
>> > >> > my
>> > >> > >> > Solr and some other stuff.
>> > >> > >> >
>> > >> > >> > Would it be more beneficial to only run 1 instance of Solr
>> with
>> > the
>> > >> > >> > collection stored on 4 HD's in RAID 0?? Or Have several
>> > Virtual
>> > >> > >> > Machines each running of its own HD, ie: Have 4 VM's running
>> > Solr?
>> > >> > >> >
>> > >> > >> > Any Thoughts?
>> > >> > >> >
>> > >> > >> > Thank you!
>> > >> > >> >
>> > >> > >> > RRK
>> > >> > >> >
>> > >> > >>
>> > >> > >
>> > >> > >
>> > >> >
>> > >>
>> > >
>> >
>>
>
>


Re: Some performance questions....

2018-03-12 Thread BlackIce
I don't have any production logs and this all sounds to complicated.

So, I'll just trow the system together in a way it makes the most sense for
now.. collect some logs and then do some testing further down the road. For
now just get the sucker up and running.

Thanks all

On Mon, Mar 12, 2018 at 7:23 PM, Deepak Goel  wrote:

> I am not sure if I understand your question
>
> *"How do I test this?"*
> You have to run test (benchmark test) of transactions (queries) which are
> most representative of your system (requirement).
>
> You can use a performance testing tool like JMeter (along with PerfMon
> configured for utilisation metrics)
>
>
>
> Deepak
> "Please stop cruelty to Animals, help by becoming a Vegan"
> +91 73500 12833
> deic...@gmail.com
>
> Facebook: https://www.facebook.com/deicool
> LinkedIn: www.linkedin.com/in/deicool
>
> "Plant a Tree, Go Green"
>
> On Mon, Mar 12, 2018 at 10:57 PM, BlackIce  wrote:
>
> > So Im thinking following scenarios :
> > Single instance with drives in raid 0, raid 10 and raid 5.
> >
> > And then having 3 Vms and 4 Solr instances each with its own HD.
> >
> > How do I test this?
> >
> >
> > Greetz
> >
> > On Mar 12, 2018 1:16 PM, "BlackIce"  wrote:
> >
> > > OK, so we're gone nowhere,  since I've already lost lots of time...  A
> > few
> > > days more or less won't make a difference  I'd be willing to
> > benchmark
> > > if some tells me how to.
> > >
> > >
> > > Greetz
> > >
> > > On Mar 12, 2018 7:17 AM, "Deepak Goel"  wrote:
> > >
> > >> Now you are mixing your original question about performance with
> > >> reliability
> > >>
> > >> On 12 Mar 2018 02:29, "BlackIce"  wrote:
> > >>
> > >> > Second to this wouldn't 4 Solr instances each with its own HD be
> fault
> > >> > tolerant? vs. one solr instance with 4 HD's in RAID 0? Plus to his
> > comes
> > >> > the storage capacity, I need the capacity of those 4 drives... the
> > more
> > >> I
> > >> > read.. the more questions
> > >> >
> > >> > On Sun, Mar 11, 2018 at 9:43 PM, BlackIce 
> > >> wrote:
> > >> >
> > >> > > Thnx for the pointers.
> > >> > >
> > >> > > I haven't given much thought to Solr, asides shemal.xml and
> > >> > solrconfig.xml
> > >> > > and I'm just diving into a bit more deeper stuff!
> > >> > >
> > >> > > Greetz
> > >> > >
> > >> > > RRK
> > >> > >
> > >> > > On Sun, Mar 11, 2018 at 8:58 PM, Deepak Goel 
> > >> wrote:
> > >> > >
> > >> > >> To rephrase your Question
> > >> > >>
> > >> > >> "Does Solr do well with Scale-up or Scale-out?"
> > >> > >>
> > >> > >> Are there any Performance Benchmarks for the same out there
> > >> supporting
> > >> > the
> > >> > >> claim?
> > >> > >>
> > >> > >> On 11 Mar 2018 23:05, "BlackIce"  wrote:
> > >> > >>
> > >> > >> > Hi,
> > >> > >> >
> > >> > >> > I have some questions regarding performance.
> > >> > >> >
> > >> > >> > Lets says I have a dual CPU with a total of 8 cores and 24 GB
> RAM
> > >> for
> > >> > my
> > >> > >> > Solr and some other stuff.
> > >> > >> >
> > >> > >> > Would it be more beneficial to only run 1 instance of Solr with
> > the
> > >> > >> > collection stored on 4 HD's in RAID 0?? Or Have several
> > Virtual
> > >> > >> > Machines each running of its own HD, ie: Have 4 VM's running
> > Solr?
> > >> > >> >
> > >> > >> > Any Thoughts?
> > >> > >> >
> > >> > >> > Thank you!
> > >> > >> >
> > >> > >> > RRK
> > >> > >> >
> > >> > >>
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> >
>


Re: Some performance questions....

2018-03-12 Thread Deepak Goel
I am not sure if I understand your question

*"How do I test this?"*
You have to run test (benchmark test) of transactions (queries) which are
most representative of your system (requirement).

You can use a performance testing tool like JMeter (along with PerfMon
configured for utilisation metrics)



Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Mon, Mar 12, 2018 at 10:57 PM, BlackIce  wrote:

> So Im thinking following scenarios :
> Single instance with drives in raid 0, raid 10 and raid 5.
>
> And then having 3 Vms and 4 Solr instances each with its own HD.
>
> How do I test this?
>
>
> Greetz
>
> On Mar 12, 2018 1:16 PM, "BlackIce"  wrote:
>
> > OK, so we're gone nowhere,  since I've already lost lots of time...  A
> few
> > days more or less won't make a difference  I'd be willing to
> benchmark
> > if some tells me how to.
> >
> >
> > Greetz
> >
> > On Mar 12, 2018 7:17 AM, "Deepak Goel"  wrote:
> >
> >> Now you are mixing your original question about performance with
> >> reliability
> >>
> >> On 12 Mar 2018 02:29, "BlackIce"  wrote:
> >>
> >> > Second to this wouldn't 4 Solr instances each with its own HD be fault
> >> > tolerant? vs. one solr instance with 4 HD's in RAID 0? Plus to his
> comes
> >> > the storage capacity, I need the capacity of those 4 drives... the
> more
> >> I
> >> > read.. the more questions
> >> >
> >> > On Sun, Mar 11, 2018 at 9:43 PM, BlackIce 
> >> wrote:
> >> >
> >> > > Thnx for the pointers.
> >> > >
> >> > > I haven't given much thought to Solr, asides shemal.xml and
> >> > solrconfig.xml
> >> > > and I'm just diving into a bit more deeper stuff!
> >> > >
> >> > > Greetz
> >> > >
> >> > > RRK
> >> > >
> >> > > On Sun, Mar 11, 2018 at 8:58 PM, Deepak Goel 
> >> wrote:
> >> > >
> >> > >> To rephrase your Question
> >> > >>
> >> > >> "Does Solr do well with Scale-up or Scale-out?"
> >> > >>
> >> > >> Are there any Performance Benchmarks for the same out there
> >> supporting
> >> > the
> >> > >> claim?
> >> > >>
> >> > >> On 11 Mar 2018 23:05, "BlackIce"  wrote:
> >> > >>
> >> > >> > Hi,
> >> > >> >
> >> > >> > I have some questions regarding performance.
> >> > >> >
> >> > >> > Lets says I have a dual CPU with a total of 8 cores and 24 GB RAM
> >> for
> >> > my
> >> > >> > Solr and some other stuff.
> >> > >> >
> >> > >> > Would it be more beneficial to only run 1 instance of Solr with
> the
> >> > >> > collection stored on 4 HD's in RAID 0?? Or Have several
> Virtual
> >> > >> > Machines each running of its own HD, ie: Have 4 VM's running
> Solr?
> >> > >> >
> >> > >> > Any Thoughts?
> >> > >> >
> >> > >> > Thank you!
> >> > >> >
> >> > >> > RRK
> >> > >> >
> >> > >>
> >> > >
> >> > >
> >> >
> >>
> >
>


Re: Some performance questions....

2018-03-12 Thread BlackIce
So Im thinking following scenarios :
Single instance with drives in raid 0, raid 10 and raid 5.

And then having 3 Vms and 4 Solr instances each with its own HD.

How do I test this?


Greetz

On Mar 12, 2018 1:16 PM, "BlackIce"  wrote:

> OK, so we're gone nowhere,  since I've already lost lots of time...  A few
> days more or less won't make a difference  I'd be willing to benchmark
> if some tells me how to.
>
>
> Greetz
>
> On Mar 12, 2018 7:17 AM, "Deepak Goel"  wrote:
>
>> Now you are mixing your original question about performance with
>> reliability
>>
>> On 12 Mar 2018 02:29, "BlackIce"  wrote:
>>
>> > Second to this wouldn't 4 Solr instances each with its own HD be fault
>> > tolerant? vs. one solr instance with 4 HD's in RAID 0? Plus to his comes
>> > the storage capacity, I need the capacity of those 4 drives... the more
>> I
>> > read.. the more questions
>> >
>> > On Sun, Mar 11, 2018 at 9:43 PM, BlackIce 
>> wrote:
>> >
>> > > Thnx for the pointers.
>> > >
>> > > I haven't given much thought to Solr, asides shemal.xml and
>> > solrconfig.xml
>> > > and I'm just diving into a bit more deeper stuff!
>> > >
>> > > Greetz
>> > >
>> > > RRK
>> > >
>> > > On Sun, Mar 11, 2018 at 8:58 PM, Deepak Goel 
>> wrote:
>> > >
>> > >> To rephrase your Question
>> > >>
>> > >> "Does Solr do well with Scale-up or Scale-out?"
>> > >>
>> > >> Are there any Performance Benchmarks for the same out there
>> supporting
>> > the
>> > >> claim?
>> > >>
>> > >> On 11 Mar 2018 23:05, "BlackIce"  wrote:
>> > >>
>> > >> > Hi,
>> > >> >
>> > >> > I have some questions regarding performance.
>> > >> >
>> > >> > Lets says I have a dual CPU with a total of 8 cores and 24 GB RAM
>> for
>> > my
>> > >> > Solr and some other stuff.
>> > >> >
>> > >> > Would it be more beneficial to only run 1 instance of Solr with the
>> > >> > collection stored on 4 HD's in RAID 0?? Or Have several Virtual
>> > >> > Machines each running of its own HD, ie: Have 4 VM's running Solr?
>> > >> >
>> > >> > Any Thoughts?
>> > >> >
>> > >> > Thank you!
>> > >> >
>> > >> > RRK
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>


Re: Some performance questions....

2018-03-12 Thread Walter Underwood
Benchmark with production logs. Replay them at a constant request rate. Measure 
the response time and look at the median and 90th or 95th percentile. Do not 
use the average response time, because that will be thrown off by outliers.

It is best to run a few thousand warming queries before starting the measured 
benchmark run. That will load some results into the caches and will also load 
the index files into OS file buffers.

Restart the processes between benchmark runs to clear the caches.

My benchmark runs are about an hour long, not counting warming and analysis.

I always configure Solr with enough RAM that all the indexes fit into OS file 
buffers. With that, disk speed only matters for startup and indexing.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 12, 2018, at 10:16 AM, BlackIce  wrote:
> 
> OK, so we're gone nowhere,  since I've already lost lots of time...  A few
> days more or less won't make a difference  I'd be willing to benchmark
> if some tells me how to.
> 
> 
> Greetz
> 
> On Mar 12, 2018 7:17 AM, "Deepak Goel"  wrote:
> 
>> Now you are mixing your original question about performance with
>> reliability
>> 
>> On 12 Mar 2018 02:29, "BlackIce"  wrote:
>> 
>>> Second to this wouldn't 4 Solr instances each with its own HD be fault
>>> tolerant? vs. one solr instance with 4 HD's in RAID 0? Plus to his comes
>>> the storage capacity, I need the capacity of those 4 drives... the more I
>>> read.. the more questions
>>> 
>>> On Sun, Mar 11, 2018 at 9:43 PM, BlackIce  wrote:
>>> 
 Thnx for the pointers.
 
 I haven't given much thought to Solr, asides shemal.xml and
>>> solrconfig.xml
 and I'm just diving into a bit more deeper stuff!
 
 Greetz
 
 RRK
 
 On Sun, Mar 11, 2018 at 8:58 PM, Deepak Goel 
>> wrote:
 
> To rephrase your Question
> 
> "Does Solr do well with Scale-up or Scale-out?"
> 
> Are there any Performance Benchmarks for the same out there supporting
>>> the
> claim?
> 
> On 11 Mar 2018 23:05, "BlackIce"  wrote:
> 
>> Hi,
>> 
>> I have some questions regarding performance.
>> 
>> Lets says I have a dual CPU with a total of 8 cores and 24 GB RAM
>> for
>>> my
>> Solr and some other stuff.
>> 
>> Would it be more beneficial to only run 1 instance of Solr with the
>> collection stored on 4 HD's in RAID 0?? Or Have several Virtual
>> Machines each running of its own HD, ie: Have 4 VM's running Solr?
>> 
>> Any Thoughts?
>> 
>> Thank you!
>> 
>> RRK
>> 
> 
 
 
>>> 
>> 



Re: Some performance questions....

2018-03-12 Thread BlackIce
OK, so we're gone nowhere,  since I've already lost lots of time...  A few
days more or less won't make a difference  I'd be willing to benchmark
if some tells me how to.


Greetz

On Mar 12, 2018 7:17 AM, "Deepak Goel"  wrote:

> Now you are mixing your original question about performance with
> reliability
>
> On 12 Mar 2018 02:29, "BlackIce"  wrote:
>
> > Second to this wouldn't 4 Solr instances each with its own HD be fault
> > tolerant? vs. one solr instance with 4 HD's in RAID 0? Plus to his comes
> > the storage capacity, I need the capacity of those 4 drives... the more I
> > read.. the more questions
> >
> > On Sun, Mar 11, 2018 at 9:43 PM, BlackIce  wrote:
> >
> > > Thnx for the pointers.
> > >
> > > I haven't given much thought to Solr, asides shemal.xml and
> > solrconfig.xml
> > > and I'm just diving into a bit more deeper stuff!
> > >
> > > Greetz
> > >
> > > RRK
> > >
> > > On Sun, Mar 11, 2018 at 8:58 PM, Deepak Goel 
> wrote:
> > >
> > >> To rephrase your Question
> > >>
> > >> "Does Solr do well with Scale-up or Scale-out?"
> > >>
> > >> Are there any Performance Benchmarks for the same out there supporting
> > the
> > >> claim?
> > >>
> > >> On 11 Mar 2018 23:05, "BlackIce"  wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > I have some questions regarding performance.
> > >> >
> > >> > Lets says I have a dual CPU with a total of 8 cores and 24 GB RAM
> for
> > my
> > >> > Solr and some other stuff.
> > >> >
> > >> > Would it be more beneficial to only run 1 instance of Solr with the
> > >> > collection stored on 4 HD's in RAID 0?? Or Have several Virtual
> > >> > Machines each running of its own HD, ie: Have 4 VM's running Solr?
> > >> >
> > >> > Any Thoughts?
> > >> >
> > >> > Thank you!
> > >> >
> > >> > RRK
> > >> >
> > >>
> > >
> > >
> >
>


Re: Some performance questions....

2018-03-12 Thread Shawn Heisey

On 3/12/2018 3:22 AM, Deepak Goel wrote:

A single OS and JVM does not scale linearly for higher loads. If you have
seperate OS and Java, the load is distributed across multiple instances
(with each instance only requiered to support a smaller load and hence
would scale nicely)

I had found this for running multiple apache servers on multiple VMs as
compared to a single instance (not Solr). But i am pretty sure it would be
same for Solr too


I think this is the last thing I'm going to say on the subject.  You 
disagree with a fundamental hardware concept that I've learned through 
experience, so I might never convince you of anything.  If that's the 
case, I'm done trying after this, and I wish you the best of luck with 
your efforts.


If the physical hosts you put the VMs on are far more powerful than you 
would ever use for bare metal, and/or you split virtual machines between 
different physical hosts, then that installation might scale better than 
a single bare metal host. The decision makers in most companies are 
usually a lot more willing to buy really expensive hardware if you tell 
them it's for virtualization than they are for a single-purpose machine.


But if the bare metal environment has the same number of physical 
servers with the same specifications, then a well-tuned bare metal setup 
is going to perform better than virtual machines.  There's nothing wrong 
with VMs.  They can perform very well if everything's sized appropriately.


Anytime a virtualized environment performs better than bare metal, it's 
usually going to be that way because the virtualized environment has 
different hardware than the bare metal environment.  That hardware will 
probably be much more expensive, and/or newer hardware that just works 
better.  It might also happen because the software installation was not 
set up correctly to fully utilize all the hardware.


Solr works best with a lot more memory installed than people usually 
install, *especially* with virtual machines, where RAM may be an even 
more precious commodity than it is in bare metal servers.


Thanks,
Shawn



Re: Some performance questions....

2018-03-12 Thread Deepak Goel
Now you are mixing your original question about performance with reliability

On 12 Mar 2018 02:29, "BlackIce"  wrote:

> Second to this wouldn't 4 Solr instances each with its own HD be fault
> tolerant? vs. one solr instance with 4 HD's in RAID 0? Plus to his comes
> the storage capacity, I need the capacity of those 4 drives... the more I
> read.. the more questions
>
> On Sun, Mar 11, 2018 at 9:43 PM, BlackIce  wrote:
>
> > Thnx for the pointers.
> >
> > I haven't given much thought to Solr, asides shemal.xml and
> solrconfig.xml
> > and I'm just diving into a bit more deeper stuff!
> >
> > Greetz
> >
> > RRK
> >
> > On Sun, Mar 11, 2018 at 8:58 PM, Deepak Goel  wrote:
> >
> >> To rephrase your Question
> >>
> >> "Does Solr do well with Scale-up or Scale-out?"
> >>
> >> Are there any Performance Benchmarks for the same out there supporting
> the
> >> claim?
> >>
> >> On 11 Mar 2018 23:05, "BlackIce"  wrote:
> >>
> >> > Hi,
> >> >
> >> > I have some questions regarding performance.
> >> >
> >> > Lets says I have a dual CPU with a total of 8 cores and 24 GB RAM for
> my
> >> > Solr and some other stuff.
> >> >
> >> > Would it be more beneficial to only run 1 instance of Solr with the
> >> > collection stored on 4 HD's in RAID 0?? Or Have several Virtual
> >> > Machines each running of its own HD, ie: Have 4 VM's running Solr?
> >> >
> >> > Any Thoughts?
> >> >
> >> > Thank you!
> >> >
> >> > RRK
> >> >
> >>
> >
> >
>


Re: Some performance questions....

2018-03-12 Thread Deepak Goel
We need benchmarks or data to support the claim.

A single OS and JVM does not scale linearly for higher loads. If you have
seperate OS and Java, the load is distributed across multiple instances
(with each instance only requiered to support a smaller load and hence
would scale nicely)

I had found this for running multiple apache servers on multiple VMs as
compared to a single instance (not Solr). But i am pretty sure it would be
same for Solr too

On 12 Mar 2018 12:42, "Shawn Heisey"  wrote:

> On 3/11/2018 7:39 PM, Deepak Goel wrote:
>
>> I doubt this. It would be great if someone can subtantiate this with hard
>> facts
>>
>
> This seems to be in response to my claim that virtualization always has
> overhead.  I don't see how this statement can be at all controversial.
>
> Virtualization isn't free, even if the hardware and software in use are
> extremely efficient at it.  Translating what a virtual machine does into a
> corresponding action on the real hardware is going to take time and
> resources beyond whatever the action itself is.
>
> Plus there's the application-level overhead.  You have the overhead of
> multiple operating systems, multiple copies of Java running, multiple
> servlet containers (probably Jetty), and multiple copies of Solr.  And each
> of them is running inside a limited subset of the hardware installed in the
> physical server.
>
> Let's say you start with VMs on a server, and benchmark Solr's
> performance.  Then you completely erase the server, install one operating
> system, install Solr onto the OS, and then install all of the indexes that
> were running on the VMs into that one Solr instance.  Assuming that things
> are set up correctly and that you give that Solr instance the correct
> amount of heap memory, it's almost guaranteed to be faster than the VMs.  I
> can't tell you whether the improvement will be half a percent or 50
> percent, only that it will be faster.
>
> Thanks,
> Shawn
>
>


Re: Some performance questions....

2018-03-12 Thread Shawn Heisey

On 3/11/2018 7:39 PM, Deepak Goel wrote:

I doubt this. It would be great if someone can subtantiate this with hard
facts 


This seems to be in response to my claim that virtualization always has 
overhead.  I don't see how this statement can be at all controversial.


Virtualization isn't free, even if the hardware and software in use are 
extremely efficient at it.  Translating what a virtual machine does into 
a corresponding action on the real hardware is going to take time and 
resources beyond whatever the action itself is.


Plus there's the application-level overhead.  You have the overhead of 
multiple operating systems, multiple copies of Java running, multiple 
servlet containers (probably Jetty), and multiple copies of Solr.  And 
each of them is running inside a limited subset of the hardware 
installed in the physical server.


Let's say you start with VMs on a server, and benchmark Solr's 
performance.  Then you completely erase the server, install one 
operating system, install Solr onto the OS, and then install all of the 
indexes that were running on the VMs into that one Solr instance.  
Assuming that things are set up correctly and that you give that Solr 
instance the correct amount of heap memory, it's almost guaranteed to be 
faster than the VMs.  I can't tell you whether the improvement will be 
half a percent or 50 percent, only that it will be faster.


Thanks,
Shawn



Re: Some performance questions....

2018-03-11 Thread Deepak Goel
On 12 Mar 2018 05:51, "Shawn Heisey"  wrote:

On 3/11/2018 11:35 AM, BlackIce wrote:

> I have some questions regarding performance.
>
> Lets says I have a dual CPU with a total of 8 cores and 24 GB RAM for my
> Solr and some other stuff.
>
> Would it be more beneficial to only run 1 instance of Solr with the
> collection stored on 4 HD's in RAID 0?? Or Have several Virtual
> Machines each running of its own HD, ie: Have 4 VM's running Solr?
>

Performance is always going to be better on bare metal than on virtual
machines.  Virtualization in modern times is really good, so the difference
*might* be minimal, but there is ALWAYS overhead.

*Deepak*

I doubt this. It would be great if someone can subtantiate this with hard
facts
*Deepak*


I used to create virtual machines in my hardware for Solr. Initially with
vmware esxi, then later natively in Linux with KVM.  At that time, I was
running one index core per VM.  Just for some testing, I took a similar
machine and set up one Solr instance handling all the same cores on bare
metal.  I do not remember HOW much faster it was, but it was definitely
faster. One big thing I like about bare metal is that there's only one
"machine", IP address, and Solr instance to administer.

Unless you're willing to completely rebuild the whole thing in the event of
drive failure, don't use RAID0.  If one drive dies (and every hard drive IS
eventually going to die if it's used long enough), then *all* of the data
on the whole RAID volume is gone.

You could do RAID5, which has decent redundancy and good space efficiency,
but if you're not familiar with the RAID5 write penalty, do some research
on it, and you'll probably come out of it not wanting to EVER use it.  If
you like, I can explain exactly why you should avoid any RAID level that
incorporates 5 or 6.

Overall, the best level is RAID10 ... but it has a glaring disadvantage
from a cost perspective -- you lose half of your raw capacity.  Since
drives are relatively cheap, I always build my servers with RAID10, using a
1MB stripe size and a battery-backed caching controller.  For the typical
hardware I'm using, that means that I'm going to end up with 6 to 12TB of
usable space instead of 10 to 20TB (raid5), but the volume is FAST.

Thanks,
Shawn


Re: Some performance questions....

2018-03-11 Thread Shawn Heisey

On 3/11/2018 11:35 AM, BlackIce wrote:

I have some questions regarding performance.

Lets says I have a dual CPU with a total of 8 cores and 24 GB RAM for my
Solr and some other stuff.

Would it be more beneficial to only run 1 instance of Solr with the
collection stored on 4 HD's in RAID 0?? Or Have several Virtual
Machines each running of its own HD, ie: Have 4 VM's running Solr?


Performance is always going to be better on bare metal than on virtual 
machines.  Virtualization in modern times is really good, so the 
difference *might* be minimal, but there is ALWAYS overhead.


I used to create virtual machines in my hardware for Solr. Initially 
with vmware esxi, then later natively in Linux with KVM.  At that time, 
I was running one index core per VM.  Just for some testing, I took a 
similar machine and set up one Solr instance handling all the same cores 
on bare metal.  I do not remember HOW much faster it was, but it was 
definitely faster. One big thing I like about bare metal is that there's 
only one "machine", IP address, and Solr instance to administer.


Unless you're willing to completely rebuild the whole thing in the event 
of drive failure, don't use RAID0.  If one drive dies (and every hard 
drive IS eventually going to die if it's used long enough), then *all* 
of the data on the whole RAID volume is gone.


You could do RAID5, which has decent redundancy and good space 
efficiency, but if you're not familiar with the RAID5 write penalty, do 
some research on it, and you'll probably come out of it not wanting to 
EVER use it.  If you like, I can explain exactly why you should avoid 
any RAID level that incorporates 5 or 6.


Overall, the best level is RAID10 ... but it has a glaring disadvantage 
from a cost perspective -- you lose half of your raw capacity.  Since 
drives are relatively cheap, I always build my servers with RAID10, 
using a 1MB stripe size and a battery-backed caching controller.  For 
the typical hardware I'm using, that means that I'm going to end up with 
6 to 12TB of usable space instead of 10 to 20TB (raid5), but the volume 
is FAST.


Thanks,
Shawn



Re: Some performance questions....

2018-03-11 Thread BlackIce
Second to this wouldn't 4 Solr instances each with its own HD be fault
tolerant? vs. one solr instance with 4 HD's in RAID 0? Plus to his comes
the storage capacity, I need the capacity of those 4 drives... the more I
read.. the more questions

On Sun, Mar 11, 2018 at 9:43 PM, BlackIce  wrote:

> Thnx for the pointers.
>
> I haven't given much thought to Solr, asides shemal.xml and solrconfig.xml
> and I'm just diving into a bit more deeper stuff!
>
> Greetz
>
> RRK
>
> On Sun, Mar 11, 2018 at 8:58 PM, Deepak Goel  wrote:
>
>> To rephrase your Question
>>
>> "Does Solr do well with Scale-up or Scale-out?"
>>
>> Are there any Performance Benchmarks for the same out there supporting the
>> claim?
>>
>> On 11 Mar 2018 23:05, "BlackIce"  wrote:
>>
>> > Hi,
>> >
>> > I have some questions regarding performance.
>> >
>> > Lets says I have a dual CPU with a total of 8 cores and 24 GB RAM for my
>> > Solr and some other stuff.
>> >
>> > Would it be more beneficial to only run 1 instance of Solr with the
>> > collection stored on 4 HD's in RAID 0?? Or Have several Virtual
>> > Machines each running of its own HD, ie: Have 4 VM's running Solr?
>> >
>> > Any Thoughts?
>> >
>> > Thank you!
>> >
>> > RRK
>> >
>>
>
>


Re: Some performance questions....

2018-03-11 Thread BlackIce
Thnx for the pointers.

I haven't given much thought to Solr, asides shemal.xml and solrconfig.xml
and I'm just diving into a bit more deeper stuff!

Greetz

RRK

On Sun, Mar 11, 2018 at 8:58 PM, Deepak Goel  wrote:

> To rephrase your Question
>
> "Does Solr do well with Scale-up or Scale-out?"
>
> Are there any Performance Benchmarks for the same out there supporting the
> claim?
>
> On 11 Mar 2018 23:05, "BlackIce"  wrote:
>
> > Hi,
> >
> > I have some questions regarding performance.
> >
> > Lets says I have a dual CPU with a total of 8 cores and 24 GB RAM for my
> > Solr and some other stuff.
> >
> > Would it be more beneficial to only run 1 instance of Solr with the
> > collection stored on 4 HD's in RAID 0?? Or Have several Virtual
> > Machines each running of its own HD, ie: Have 4 VM's running Solr?
> >
> > Any Thoughts?
> >
> > Thank you!
> >
> > RRK
> >
>


Re: Some performance questions....

2018-03-11 Thread Deepak Goel
To rephrase your Question

"Does Solr do well with Scale-up or Scale-out?"

Are there any Performance Benchmarks for the same out there supporting the
claim?

On 11 Mar 2018 23:05, "BlackIce"  wrote:

> Hi,
>
> I have some questions regarding performance.
>
> Lets says I have a dual CPU with a total of 8 cores and 24 GB RAM for my
> Solr and some other stuff.
>
> Would it be more beneficial to only run 1 instance of Solr with the
> collection stored on 4 HD's in RAID 0?? Or Have several Virtual
> Machines each running of its own HD, ie: Have 4 VM's running Solr?
>
> Any Thoughts?
>
> Thank you!
>
> RRK
>