Re: Performance if there is a large number of field

2018-05-10 Thread Deepak Goel
Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please stop cruelty to Animals, become a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home

On Thu, May 10, 2018 at 10:50 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 5/10/2018 10:58 AM, Deepak Goel wrote:
>
>> I wonder what does Solr stores in the document for fields which are not
>> being used. And if the queries have a performance difference
>> https://lucene.apache.org/solr/guide/6_6/defining-fields.html
>> (A default value that will be added automatically to any document that
>> does
>> not have a value in this field when it is indexed. If this property is not
>> specified, there is no default)
>>
>
> If a field is missing from a document, the Lucene index doesn't contain
> anything for that field.  That is why there is no storage disadvantage to
> having fields that are not being used.
>
> Lucene does not have the concept of a schema.  That is part of Solr.  Solr
> uses the information in the schema to control its interaction with Lucene.
> When there is a default value specified in the schema, the field is never
> missing from the document.
>
> Sorry but I am unclear about - "What if there is no default value and the
field does not contain anything"? What does Solr pass on to Lucene? Or is
the field itself omitted from the document?

What if I want to query for documents where the field is not used? Is that
possible?

Thanks,
> Shawn
>
>


Re: Performance if there is a large number of field

2018-05-10 Thread Deepak Goel
I wonder what does Solr stores in the document for fields which are not
being used. And if the queries have a performance difference
https://lucene.apache.org/solr/guide/6_6/defining-fields.html
(A default value that will be added automatically to any document that does
not have a value in this field when it is indexed. If this property is not
specified, there is no default)





Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please stop cruelty to Animals, become a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home

On Thu, May 10, 2018 at 9:10 PM, Shawn Heisey  wrote:

> On 5/10/2018 7:51 AM, Issei Nishigata wrote:
>
>> I am designing a schema.
>>
>> I calculated the number of the necessary field as trial, and found that I
>> need at least more than 35000.
>> I do not use all these fields in 1 document.
>> I use 300 field each document at maximum, and do not use remaining 34700
>> fields.
>>
>> Does this way of using it affect performance such as retrieving and
>> sorting?
>> If it is affected, what kind of alternative idea do we have?
>>
>
> There are no storage efficiency degradations from having fields defined
> that aren't used in particular documents.
>
> It is likely that having so many fields is going to result in extremely
> large and complex queries.  That is the potential performance problem.
>
> The efficiency of each clause of the query will not be affected by having
> several thousand fields unused in each document, but if your queries
> include clauses for searching thousands of fields, then the query will run
> slowly.  If you are constructing relatively simple queries that only touch
> a small number of fields, then that won't be a worry.
>
> Thanks,
> Shawn
>
>


Re: Performance if there is a large number of field

2018-05-11 Thread Deepak Goel
Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please stop cruelty to Animals, become a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home

On Fri, May 11, 2018 at 8:15 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 5/10/2018 2:22 PM, Deepak Goel wrote:
>
>> Are there any benchmarks for this approach? If not, I can give it a spin.
>> Also wondering if there are any alternative approach (i guess lucene
>> stores
>> data in a inverted field format)
>>
>
> Here is the only other query I know of that can find documents missing a
> field:
>
> q=*:* -field:*
>
> The potential problem with this query is that it uses a wildcard.  On
> non-point fields with very low cardinality, the performance might be
> similar.  But if the field is a Point type, or has a large number of unique
> values, then performance would be a lot worse than the range query I
> mentioned before.  The range query is the best general purpose option.
>
>
I wonder if giving a default value would help. Since Lucene stores all the
document id's which contain the default value (not changed by user) in a
single block (inverted index format), this could be retrieved much faster


> The *:* query, despite appearances, does not use wildcards.  It is special
> query syntax.
>
> Thanks,
> Shawn
>
>


Re: Performance if there is a large number of field

2018-05-10 Thread Deepak Goel
On Fri, 11 May 2018, 01:15 Shawn Heisey, <apa...@elyograg.org> wrote:

> On 5/10/2018 11:49 AM, Deepak Goel wrote:
> > Sorry but I am unclear about - "What if there is no default value and the
> > field does not contain anything"? What does Solr pass on to Lucene? Or is
> > the field itself omitted from the document?
>
> If there is no default value and the field doesn't exist in what's
> indexed, then nothing is sent to Lucene for that field. The Lucene index
> will have nothing in it for that field.  Pro tip: The empty string is
> not the same thing as no value.
>
> > What if I want to query for documents where the field is not used? Is
> that
> > possible?
>
> This is the best performing approach for finding documents where a field
> doesn't exist:
>
> q=*:* -field:[* TO *]
>

Are there any benchmarks for this approach? If not, I can give it a spin.
Also wondering if there are any alternative approach (i guess lucene stores
data in a inverted field format)

>
> Summary: all documents, minus those where the field value is in an
> all-inclusive range.
>
> Thanks,
> Shawn
>
>


Re: Re[2]: Solr CPU usage

2018-05-16 Thread Deepak Goel
1. Are you using two VM's on the same machine?

2. Why are the CPU usage graphs different (during the same time interval
15.40-16.00)? The master and slave are on the same computer, right?

3. The cpu utilised in the method graph is very less as compared to the
time interval shown in the cpu usage graph. Are there any other processes
running on the computer?


I have not used visualvm so its a bit confusing for me (I will have to
download and try it out)



Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please stop cruelty to Animals, become a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home

On Wed, May 16, 2018 at 8:28 PM, Александр Шестак <
shestakalexa...@mail.ru.invalid> wrote:

>
> Master/slave are working on single computer with Intel Core i5 3.2GHz (it
> is 4 cores).
>
> It is standard visualVm ui with CPU usage. I think that it show overall
> CPU usage (for all cores).
> >Среда, 16 мая 2018, 17:42 +03:00 от Deepak Goel <deic...@gmail.com>:
> >
> >How many CPU's do you have in master/slave?
> >
> >The graphs shown by you are for 'a single CPU' or 'All the CPU'?
> >
> >
> >
> >Deepak
> >" The greatness of a nation can be judged by the way its animals are
> treated.  Please stop cruelty to Animals, become a Vegan"
> >
> >+91 73500 12833
> >deic...@gmail.com
> >
> >Facebook:  https://www.facebook.com/deicool
> >LinkedIn:  www.linkedin.com/in/deicool
> >
> >"Plant a Tree, Go Green"
> >
> >Make In India :  http://www.makeinindia.com/home
> >
> >On Wed, May 16, 2018 at 6:41 PM, Александр Шестак  <
> shestakalexa...@mail.ru.invalid > wrote:
> >>Hi, I have a question about unpredictable CPU usage by solr.
> >>We have recently migrated our application from Solr 4.6.1 to Solr 7.1.0.
> We use master/slave approach. And now we have noticed that CPU usage of
> master/slave in passive state (no request to Solr are performed) is none
> zero. When we use Solr 4.6.1 and use tomcat for Solr deploying then CPU
> usage is almost 0. With Solr 7.1.0 cpu usage is varying from 0% to 40% (it
> jumps every time).
> >>
> >>Is it normal behavior for Solr and jetty?
> >>I have tried to analyze this situation by Java VisualVM.
> >>Solr Master CPU usage looks in next way
> >>
> >>Master spend most of time in some jetty method
> >>
> >>Solr Slave CPU usage looks in next way
> >>
> >>
> >>
> >>
> >>
> >>All this screenshots were made when there is no any activity with solr
> performed (solr just was started and no requests to it were performed)
> >
>
>
>
>


Re: Solr CPU usage

2018-05-16 Thread Deepak Goel
How many CPU's do you have in master/slave?

The graphs shown by you are for 'a single CPU' or 'All the CPU'?



Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please stop cruelty to Animals, become a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home

On Wed, May 16, 2018 at 6:41 PM, Александр Шестак <
shestakalexa...@mail.ru.invalid> wrote:

> Hi, I have a question about unpredictable CPU usage by solr.
>
> We have recently migrated our application from Solr 4.6.1 to Solr 7.1.0.
> We use master/slave approach. And now we have noticed that CPU usage of
> master/slave in passive state (no request to Solr are performed) is none
> zero. When we use Solr 4.6.1 and use tomcat for Solr deploying then CPU
> usage is almost 0. With Solr 7.1.0 cpu usage is varying from 0% to 40% (it
> jumps every time).
>
>
> Is it normal behavior for Solr and jetty?
>
> I have tried to analyze this situation by Java VisualVM.
>
> Solr Master CPU usage looks in next way
>
>
> Master spend most of time in some jetty method
>
>
> Solr Slave CPU usage looks in next way
>
>
>
>
>
>
> All this screenshots were made when there is no any activity with solr
> performed (solr just was started and no requests to it were performed)
>
>


Re: Thoughts on scaling strategy for Solr deployed on AWS EC2 instances - Scale up / out and which instance type?

2018-05-21 Thread Deepak Goel
On Mon, May 21, 2018 at 7:55 PM, Kelly, Frank  wrote:

> Using Solr 5.3.1 - index
>
> We have an indexing heavy workload (we do more indexing than searching)
> and for those searches we do perform we have very few cache hits (25% of
> our index is in memory and the hit rate is < 0.1%)
>
> We are currently using r3.xlarge (memory optimized instances as we
> originally thought we’d have a higher cache hit rate) with EBS optimization
> to IOPs configurable EBS drives.
> Our EBS traffic bandwidth seems to work great so searches on disk are
> pretty fast.
> Now though we seem CPU bound and if/ when Solr CPU gets pegged for too
> long replication falls behind and then starts to recover which causes more
> usage and then eventually shards go “Down”.
>
> Cpu Bound - What does your hardware configuration look like?

"Down" - What does exactly happen? Can you please give a bit more about
this?


> Our key question: Scale up (fewer instances to manage) or Scale out (more
> instances to manage) and
> do we switch to compute optimized instances (the answer given our usage I
> assume is probably)
>
>
Is the load scaling linearly (25%,50%,75,100% CPU) on your current machine?
If it is, then scale-up would be a good choice. However, if it is not, I
would go for scale-out


> Appreciate any thoughts folks have on this?
>
> Thanks!
>
> -Frank
>



Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please stop cruelty to Animals, become a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home


Re: Navigating through Solr Source Code

2018-05-21 Thread Deepak Goel
If you can find out how Solr evolved over the years, you can perhaps follow
that same path

On Mon, 21 May 2018, 18:35 Erick Erickson,  wrote:

> Another useful trick is the class hierarchy displays most modern IDE's
> have available to get a sense of what class is where. And I second
> Emir's comment about picking some feature. _Nobody_ knows all the Solr
> code, and that's not even including Lucene. It's big, very big. So
> pick a feature you want to understand and/or improve and stick to that
> or you'll go nuts.
>
> And a great way to get a sense of how a feature works is to find the
> unit test that exercises it and just step through it in the debugger.
> And if there's no unit test, another great way to do things would be
> to _create_ a unit test. Or fix some of the BadApple tests, but those
> will be pretty hairy
>
> Best,
> Erick
>
> On Mon, May 21, 2018 at 7:18 AM, Emir Arnautović
>  wrote:
> > Hi,
> > I would start from the feature/concept that I find documentation to be
> vague. If you think that everything is like that, I would not start with
> code just yet and would focus on understanding high level concepts first.
> Also, you need to figure out if some feature is Solr or Lucene and if it is
> Solr if cloud mode is involved or not. I would suggest that you start
> simple tog get familiar with Solr concepts. Set up local dev env, put some
> break point and start following it.
> >
> > Good luck,
> > Emir
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection
> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> >
> >> On 21 May 2018, at 12:35, Greenhorn Techie 
> wrote:
> >>
> >> Hi,
> >>
> >> As the documentation around Solr is limited, I am thinking to go through
> >> the source code and understand the various bits and pieces. However, I
> am a
> >> bit confused on where to start as I my developing skills are a bit
> limited.
> >>
> >> Any thoughts on how best to start / where to start looking into Solr
> source
> >> code?
> >>
> >> Thanks
> >
>


Re: Getting more documents from resultsSet

2018-05-18 Thread Deepak Goel
I wonder if in-memory-filesystem would help...

On Sat, 19 May 2018, 01:03 Erick Erickson,  wrote:

> If you only return fields that are docValue=true that'll largely
> eliminate the disk seeks. 30 seconds does seem kind of excessive even
> with disk seeks though.
>
> Here'r a reference:
> https://lucene.apache.org/solr/guide/6_6/docvalues.html
>
> Whenever I see anything like "...our business requirement is...", I
> cringe. _Why_ is that a requirement? What is being done _for the user_
> that requires 2000 documents? There may be legitimate reasons, but
> there also may be better ways to get what you need. This may very well
> be an XY problem.
>
> For instance, if you want to take the top 2,000 docs from query X and
> score just those, see:
> https://lucene.apache.org/solr/guide/6_6/query-re-ranking.html,
> specifically: ReRankQParserPlugin.
>
> Best,
> Erick
>
> On Fri, May 18, 2018 at 11:09 AM, root23  wrote:
> > Hi all,
> > I am working on Solr 6. Our business requirement is that we need to
> return
> > 2000 docs for every query we execute.
> > Now normally if i execute the same set to query with start=0 to rows=10.
> It
> > returns very fast(event for our most complex queries in like less then 3
> > seconds).
> > however the moment i add start=0 to rows =2000, the response time is
> like 30
> > seconds or so.
> >
> > I understand that solr has to do probably disk seek to get the documents
> > which might be the bottle neck in this case.
> >
> > Is there a way i can optimize around this knowingly that i might have to
> get
> > 2000 results in one go and then might have to paginate also further and
> > showing 2000 results on each page. We could go to as much as 50 page.
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Hardware-Aware Solr Coud Sharding?

2018-06-12 Thread Deepak Goel
What does your base hardware configuration look like?

You could have several VM's on machines with higher configuration.



Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please consider stopping the cruelty by becoming a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home

On Tue, Jun 12, 2018 at 8:42 PM, Michael Braun  wrote:

> We have a case of a Solr Cloud cluster with different kinds of nodes - some
> may have significant differences in hardware specs (50-100% more
> HD/RAM/CPU, etc). Ideally nodes with increased resources could take on more
> shard replicas.
>
> It looks like the Collections API (
> https://lucene.apache.org/solr/guide/6_6/collections-api.html) supports
> only even splitting of shards when using compositeId routing.
>
> The way to handle this right now looks to be running additional Solr
> instances on nodes with increased resources to balance the load (so if the
> machines are 1x, 1.5x, and 2x, run 2 instances, 3 instances, and 4
> instances, respectively). Has anyone looked into other ways of handling
> this that don't require the additional Solr instance deployments?
>
> -Michael
>


Re: 7.3.1 creates thousands of threads after start up

2018-06-08 Thread Deepak Goel
Do these machines have a firewall in-between?

On Fri, 8 Jun 2018, 20:29 Markus Jelsma,  wrote:

> Hello Shawn,
>
> The logs appear useless, they are littered with these:
>
> 2018-06-08 14:02:47.382 ERROR (qtp1458849419-1263) [   ]
> o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: Error
> trying to proxy request for url: http://idx2:8983/solr/
> search/admin/ping 
> at
> org.apache.solr.servlet.HttpSolrCall.remoteQuery(HttpSolrCall.java:647)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:501)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)
> ..
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
> at
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:382)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:708)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.eclipse.jetty.io.EofException
> at
> org.eclipse.jetty.server.HttpConnection$SendCallback.reset(HttpConnection.java:704)
> ..
> at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:509)
>
> Regarding the versions, it is a bit hard to recall but i do no thing i
> have seen this on 7.2, most certainly not on 7.1.
>
> We operate three distinct type of Solr collections, they only share the
> same Zookeeper quorum. The other two collections do not seem to have this
> problem, but i don't restart those as often as i restart this collection,
> as i am STILL trying to REPRODUCE the dreaded memory leak i reported having
> on 7.3 about two weeks ago. Sorry, but i drives me nuts!
>
> Thanks,
> Markus
>
> -Original message-
> > From:Shawn Heisey 
> > Sent: Friday 8th June 2018 16:47
> > To: solr-user@lucene.apache.org
> > Subject: Re: 7.3.1 creates thousands of threads after start up
> >
> > On 6/8/2018 8:17 AM, Markus Jelsma wrote:
> > > Our local test environment mini cluster goes nuts right after start
> up. It is a two node/shard/replica collection starts up normally if only
> one node start up.  But as soon as the second node attempts to join the
> cluster, both nodes go crazy, creating thousands of threads with identical
> stack traces.
> > >
> > > "qtp1458849419-4738" - Thread t@4738
> > >java.lang.Thread.State: TIMED_WAITING
> > > at sun.misc.Unsafe.park(Native Method)
> > > - parking to wait for <6ee32168> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > > at
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> > > at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> > > at
> org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:392)
> > > at
> org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:600)
> > > at
> org.eclipse.jetty.util.thread.QueuedThreadPool.access$800(QueuedThreadPool.java:49)
> > > at
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:663)
> > > at java.lang.Thread.run(Thread.java:748)
> > >
> > >Locked ownable synchronizers:
> > > - None
> > >
> > > If does not happen always, but most of the time i am unable to boot
> the cluster normally. Sometimes, apparently right now for the first time,
> the GUI is still accessible.
> > >
> > > Is this a known issue?
> >
> > It's not a problem that I've heard of.  There are no Solr classes in the
> > stacktrace, only Jetty and Java classes.  I won't try to tell you that a
> > bug in Solr can't be the root cause, because it definitely can.  The
> > threads appear to be created by Jetty, but the supplied info doesn't
> > indicate WHY it's happening.
> >
> > Presumably there's a previous version you've used where this problem did
> > NOT happen.  What version would that be?
> >
> > Can you share the solr.log file from both nodes when this happens?
> > There might be a clue there.
> >
> > It sounds like you probably have a small number of collections in the
> > dev cluster.  Can you confirm that?
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: Slower queries with 7.3.1?

2018-05-26 Thread Deepak Goel
Is it possible to profile the code to find the exact points which are
taking more time comparatively?

On Sun, 27 May 2018, 06:02 Will Currie,  wrote:

> I raised https://issues.apache.org/jira/browse/SOLR-12407. In case anybody
> else sees a similar slowdown with boosts.
>
> On Sat, May 26, 2018 at 4:10 PM, Will Currie  wrote:
>
> > I did some more (micro)benchmarking with a single query. Setting the
> query
> > cache size to zero I see 400ms response time on 7.2 and 600ms on 7.3.
> > Running curl in a loop on my laptop. ~4M docs. ~3G index. 1M total hits
> > for the query.. Yup. I'm reluctant to post the query. It has multiple
> 300+
> > character streams of if,product,map calls in multiple boost parameters.
> >
> > I realise my query is likely ridiculous (inefficient, better done another
> > way, etc) but LUCENE-8099 mentions:
> > "Re performance: there shouldn't be any reason for things to be slower
> ...
> > It might be useful to add some examples of these queries to the benchmark
> > tests though."
> >
> > Maybe I have such a benchmark.. Grasping at straws guess, I noticed 7.2
> > sticks with floats. 7.3 does a few frames of math with doubles before
> > returning to floats.
> >
> > jstack from 7.2:
> >
> > "qtp2136344592-24" #24 prio=5 os_prio=31 tid=0x7f80630e5000
> nid=0x7103
> > runnable [0x749bb000]
> >java.lang.Thread.State: RUNNABLE
> > at org.apache.lucene.queries.function.valuesource.
> > ProductFloatFunction.func(ProductFloatFunction.java:41)
> > at org.apache.lucene.queries.function.valuesource.
> > MultiFloatFunction$1.floatVal(MultiFloatFunction.java:82)
> > at org.apache.lucene.queries.function.valuesource.IfFunction$1.floatVal(
> > IfFunction.java:64)
> > at org.apache.lucene.queries.function.valuesource.
> > ProductFloatFunction.func(ProductFloatFunction.java:41)
> > at org.apache.lucene.queries.function.valuesource.
> > MultiFloatFunction$1.floatVal(MultiFloatFunction.java:82)
> > at org.apache.lucene.queries.function.valuesource.IfFunction$1.floatVal(
> > IfFunction.java:64)
> > at org.apache.lucene.queries.function.valuesource.IfFunction$1.floatVal(
> > IfFunction.java:64)
> > at org.apache.lucene.queries.function.valuesource.
> > ProductFloatFunction.func(ProductFloatFunction.java:41)
> > at org.apache.lucene.queries.function.valuesource.
> > MultiFloatFunction$1.floatVal(MultiFloatFunction.java:82)
> > * at
> >
> org.apache.lucene.queries.function.BoostedQuery$CustomScorer.score(BoostedQuery.java:124)*
> > at org.apache.lucene.search.TopScoreDocCollector$
> > SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:64)
> > at org.apache.lucene.search.Weight$DefaultBulkScorer.
> > scoreAll(Weight.java:233)
> > at org.apache.lucene.search.Weight$DefaultBulkScorer.
> > score(Weight.java:184)
> > at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
> > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:660)
> > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:462)
> > at org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(
> > SolrIndexSearcher.java:215)
> >
> > jstack from 7.3.1:
> >
> > "qtp559670971-25" #25 prio=5 os_prio=31 tid=0x7fe23fa0c000 nid=0x7303
> > runnable [0x7b024000]
> >java.lang.Thread.State: RUNNABLE
> > at org.apache.lucene.queries.function.valuesource.IfFunction$1.floatVal(
> > IfFunction.java:64)
> > at org.apache.lucene.queries.function.valuesource.
> > ProductFloatFunction.func(ProductFloatFunction.java:41)
> > at org.apache.lucene.queries.function.valuesource.
> > MultiFloatFunction$1.floatVal(MultiFloatFunction.java:82)
> > at org.apache.lucene.queries.function.valuesource.IfFunction$1.floatVal(
> > IfFunction.java:64)
> > at org.apache.lucene.queries.function.valuesource.
> > ProductFloatFunction.func(ProductFloatFunction.java:41)
> > at org.apache.lucene.queries.function.valuesource.
> > MultiFloatFunction$1.floatVal(MultiFloatFunction.java:82)
> > at org.apache.lucene.queries.function.valuesource.IfFunction$1.floatVal(
> > IfFunction.java:64)
> > at org.apache.lucene.queries.function.valuesource.IfFunction$1.floatVal(
> > IfFunction.java:64)
> > at org.apache.lucene.queries.function.valuesource.
> > ProductFloatFunction.func(ProductFloatFunction.java:41)
> > at org.apache.lucene.queries.function.valuesource.
> > MultiFloatFunction$1.floatVal(MultiFloatFunction.java:82)
> > * at
> >
> org.apache.lucene.queries.function.docvalues.FloatDocValues.doubleVal(FloatDocValues.java:67)*
> > * at
> >
> org.apache.lucene.queries.function.ValueSource$WrappedDoubleValuesSource$1.doubleValue(ValueSource.java:217)*
> > * at
> >
> org.apache.lucene.search.DoubleValues$1.doubleValue(DoubleValues.java:48)*
> > * at
> >
> org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.doubleValue(FunctionScoreQuery.java:199)*
> > * at
> >
> 

Re: Windows monitoring software for Solr recommendation

2018-06-06 Thread Deepak Goel
It would be a bit extensive, but would be interesting to know if a similar
error/situation occurs in Linux too (there are kernel level debugging tools
available for this open source OS)

On Wed, 6 Jun 2018, 10:59 Shawn Heisey,  wrote:

> On 6/5/2018 10:26 PM, TK Solr wrote:
> > I visualized the GC log with GCMV (GCVM?) and the graph shows Solr was
> > using less than half of the heap space at the peak.
> > This Solr doesn't get much query traffic and no indexing was running.
> > It's really a sudden death of JVM with no trace.
> >
>
> If you aren't concerned about what you see in a GC analysis, then the
> heap may not be an issue.  FYI, this is where I would have sent the log
> once I got it:
>
> http://gceasy.io/
>
> This website does a VERY good job of detecting possible problems with
> the heap and GC.
>
> > The only concern I have is that the Solr config files are that of Solr
> > 5.x and they just upgraded to Solr 6.6. But I understand Solr 6
> > supports Solr 5 compatible mode. Has there been any issue in the
> > compatibility mode?
>
> If the config was actually *designed* for 5.x, then it should have
> little problem working in 6.x.  If it was designed for an earlier
> version and just happened to work in 5.x, then I would be less
> optimistic about it working in 6.x.  That said ... it is very unlikely
> that anything in the index config files would cause crashes, even if
> there is a compatibility problem.
>
> The simple truth is that most Java software, including Solr, just
> doesn't ever crash unless there's something VERY wrong.
>
> Actual crashes do happen in the wild, they're just very rare.  Extremely
> severe memory starvation at the OS level can cause problems where
> processes die without any logging, or the OS kills them explicitly.  If
> the java heap is properly sized for the system, that shouldn't be
> possible.  Since you're running Solr 6, you're running Java 8 minimum.
> PermGen is gone in Java 8.  Similar issues to what used to happen with
> PermGen can still happen with the new piece called Metaspace, but if the
> overall system config is good, that shouldn't be a problem.
>
> Thanks,
> Shawn
>
>


Re: SolrCloud Heterogenous Hardware setup

2018-05-01 Thread Deepak Goel
I had a similar problem some time back. Although it might not be the best
way, but I used cron to move data from a high-end-spec to a lower-end-spec.
It worked beautifully



Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please stop cruelty to Animals, become a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home

On Tue, May 1, 2018 at 10:02 PM, Greenhorn Techie  wrote:

> Thanks Erick. This information is very helpful. Will explore further on the
> node placement rules within Collections API.
>
> Many Thanks
>
>
> On 1 May 2018 at 16:26:34, Erick Erickson (erickerick...@gmail.com) wrote:
>
> "Is it possible to configure a collection such that the collection
> data is only stored on few nodes in the SolrCloud setup?"
>
> Yes. There are "node placement rules", but also you can create a
> collection with a createNodeSet that specifies the nodes that the
> replicas are placed on.
>
> " If this is possible, at the end of each month, what is the approach
> to be taken to “move” the latest collection from higher-spec hardware
> machines to the lower-spec ones?"
>
> There are a bunch of ways, in order of how long they've been around
> (check your version). All of these are COLLECTIONS API calls.
> - ADDREPLICA/DELETEREPLCIA
> - MOVEREPLICA
> - REPLACENODE
>
> The other thing you may wan to look at is that David Smiley has been
> working on timeseries support in Solr, but that's quite recent so may
> not be available in whatever version you're using. Nor do I know
> enough details a about it to know how (or if) it it supported the
> heterogeneous setup you're talking about. Check CHANGES.txt.
>
> Best,
> Erick
>
> On Tue, May 1, 2018 at 7:59 AM, Greenhorn Techie
>  wrote:
> > Hi,
> >
> > We are building a SolrCloud setup, which will index time-series data.
> Being
> > time-series data with write-once semantics, we are planning to have
> > multiple collections i.e. one collection per month. As per our use case,
> > end users should be able to query across last 12 months worth of data,
> > which means 12 collections (with one collection per month). To achieve
> > this, we are planning to leverage Solr collection aliasing such that the
> > search_alias collection will point to the 12 collections and indexing
> will
> > always happen to the latest collection.
> >
> > As its write-once kind of data, the question I have is whether it is
> > possible to have two different hardware profiles within the SolrCloud
> > cluster such that all the older collections (being read-only) will be
> > stored on the lower hardware spec, while the latest collection (being
> write
> > heavy) will be stored only on the higher hardware profile machines.
> >
> > - Is it possible to configure a collection such that the collection data
> > is only stored on few nodes in the SolrCloud setup?
> > - If this is possible, at the end of each month, what is the approach to
> > be taken to “move” the latest collection from higher-spec hardware
> machines
> > to the lower-spec ones?
> >
> > TIA.
>


Re: Shard size variation

2018-04-30 Thread Deepak Goel
Could you please also give the machine details of the two clouds you are
running?



Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please stop cruelty to Animals, become a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home

On Mon, Apr 30, 2018 at 9:51 PM, Antony A  wrote:

> Hi Shawn,
>
> The cloud is running version 6.2.1. with ClassicIndexSchemaFactory
>
> The sum of size from admin UI on all the shards is around 265 G vs 224 G
> between the two clouds.
>
> I created the collection using "numShards" so compositeId router.
>
> If you need more information, please let me know.
>
> Thanks
> AA
>
> On Mon, Apr 30, 2018 at 10:04 AM, Shawn Heisey 
> wrote:
>
> > On 4/30/2018 9:51 AM, Antony A wrote:
> >
> >> I am running two separate solr clouds. I have 8 shards in each with a
> >> total
> >> of 300 million documents. Both the clouds are indexing the document from
> >> the same source/configuration.
> >>
> >> I am noticing there is a difference in the size of the collection
> between
> >> them. I am planning to add more shards to see if that helps solve the
> >> issue. Has anyone come across similar issue?
> >>
> >
> > There's no information here about exactly what you are seeing, what you
> > are expecting to see, and why you believe that what you are seeing is
> wrong.
> >
> > You did say that there is "a difference in size".  That is a very vague
> > problem description.
> >
> > FYI, unless a SolrCloud collection is using the implicit router, you
> > cannot add shards.  And if it *IS* using the implicit router, then you
> are
> > 100% in control of document routing -- Solr cannot influence that at all.
> >
> > Thanks,
> > Shawn
> >
> >
>


Newbie Question

2018-01-08 Thread Deepak Goel
Hello

*I am trying to search for documents in my collection (Shakespeare). The
code is as follows:*

SolrClient client = new HttpSolrClient.Builder("
http://localhost:8983/solr/shakespeare;).build();

SolrDocument doc = client.getById("2");
*However this does not return any document. What mistake am I making?*

Thank You
Deepak

Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"


Virus-free.
www.avg.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: Newbie Question

2018-01-08 Thread Deepak Goel
*Is this right?*

SolrClient client = new HttpSolrClient.Builder("
http://localhost:8983/solr/shakespeare/select;).build();

SolrQuery query = new SolrQuery();
query.setQuery("henry");
query.setFields("text_entry");
query.setStart(0);

queryResponse = client.query(query);

*This is still returning NULL*


<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail>
Virus-free.
www.avg.com
<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>



Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Mon, Jan 8, 2018 at 10:55 PM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> I think you are missing /query handler endpoint in the URL. Plus actual
> search parameters.
>
> You may try using the admin UI to build your queries first.
>
> Regards,
> Alex
>
> On Jan 8, 2018 12:23 PM, "Deepak Goel" <deic...@gmail.com> wrote:
>
> > Hello
> >
> > *I am trying to search for documents in my collection (Shakespeare). The
> > code is as follows:*
> >
> > SolrClient client = new HttpSolrClient.Builder("
> > http://localhost:8983/solr/shakespeare;).build();
> >
> > SolrDocument doc = client.getById("2");
> > *However this does not return any document. What mistake am I making?*
> >
> > Thank You
> > Deepak
> >
> > Deepak
> > "Please stop cruelty to Animals, help by becoming a Vegan"
> > +91 73500 12833
> > deic...@gmail.com
> >
> > Facebook: https://www.facebook.com/deicool
> > LinkedIn: www.linkedin.com/in/deicool
> >
> > "Plant a Tree, Go Green"
> >
> > <http://www.avg.com/email-signature?utm_medium=email;
> > utm_source=link_campaign=sig-email_content=webmail>
> > Virus-free.
> > www.avg.com
> > <http://www.avg.com/email-signature?utm_medium=email;
> > utm_source=link_campaign=sig-email_content=webmail>
> > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> >
>


Re: Newbie Question

2018-01-08 Thread Deepak Goel
Got it . Thank You for your help



Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Mon, Jan 8, 2018 at 11:48 PM, Deepak Goel <deic...@gmail.com> wrote:

> *Is this right?*
>
> SolrClient client = new HttpSolrClient.Builder("http:/
> /localhost:8983/solr/shakespeare/select").build();
>
> SolrQuery query = new SolrQuery();
> query.setQuery("henry");
> query.setFields("text_entry");
> query.setStart(0);
>
> queryResponse = client.query(query);
>
> *This is still returning NULL*
>
>
>
> <http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail>
>  Virus-free.
> www.avg.com
> <http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail>
> <#m_-1646772333528808550_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
>
>
> Deepak
> "Please stop cruelty to Animals, help by becoming a Vegan"
> +91 73500 12833
> deic...@gmail.com
>
> Facebook: https://www.facebook.com/deicool
> LinkedIn: www.linkedin.com/in/deicool
>
> "Plant a Tree, Go Green"
>
> On Mon, Jan 8, 2018 at 10:55 PM, Alexandre Rafalovitch <arafa...@gmail.com
> > wrote:
>
>> I think you are missing /query handler endpoint in the URL. Plus actual
>> search parameters.
>>
>> You may try using the admin UI to build your queries first.
>>
>> Regards,
>> Alex
>>
>> On Jan 8, 2018 12:23 PM, "Deepak Goel" <deic...@gmail.com> wrote:
>>
>> > Hello
>> >
>> > *I am trying to search for documents in my collection (Shakespeare). The
>> > code is as follows:*
>> >
>> > SolrClient client = new HttpSolrClient.Builder("
>> > http://localhost:8983/solr/shakespeare;).build();
>> >
>> > SolrDocument doc = client.getById("2");
>> > *However this does not return any document. What mistake am I making?*
>> >
>> > Thank You
>> > Deepak
>> >
>> > Deepak
>> > "Please stop cruelty to Animals, help by becoming a Vegan"
>> > +91 73500 12833
>> > deic...@gmail.com
>> >
>> > Facebook: https://www.facebook.com/deicool
>> > LinkedIn: www.linkedin.com/in/deicool
>> >
>> > "Plant a Tree, Go Green"
>> >
>> > <http://www.avg.com/email-signature?utm_medium=email;
>> > utm_source=link_campaign=sig-email_content=webmail>
>> > Virus-free.
>> > www.avg.com
>> > <http://www.avg.com/email-signature?utm_medium=email;
>> > utm_source=link_campaign=sig-email_content=webmail>
>> > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>> >
>>
>
>


Re: Newbie Question

2018-01-09 Thread Deepak Goel
*Hello*

*The code which worked for me:*

SolrClient client = new HttpSolrClient.Builder("
http://localhost:8983/solr/shakespeare;).build();

SolrQuery query = new SolrQuery();
query.setRequestHandler("/select");
query.setQuery("text_entry:henry");
query.setFields("text_entry");

QueryResponse queryResponse = null;
try
{
queryResponse = client.query(query);
}
catch (Exception e)
{

}

System.out.println("Query Response: " +queryResponse.toString());

if (queryResponse!=null && queryResponse.getResponse().size()>0)
{
SolrDocumentList results = queryResponse.getResults();
for (int i = 0; i < results.size(); ++i) {
SolrDocument document = results.get(i);
System.out.println("The result is: " +results.get(i));
System.out.println("The Document field names are: "
+document.getFieldNames());
}
}

*The data:*

{"index":{"_index":"shakespeare","_id":0}}
{"type":"act","line_id":1,"play_name":"Henry IV",
"speech_number":"","line_number":"","speaker":"","text_entry":"ACT I"}
{"index":{"_index":"shakespeare","_id":1}}
{"type":"scene","line_id":2,"play_name":"Henry
IV","speech_number":"","line_number":"","speaker":"","text_entry":"SCENE I.
London. The palace."}

*Deepak*


<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail>
Virus-free.
www.avg.com
<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>



Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Tue, Jan 9, 2018 at 8:09 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 1/8/2018 10:23 AM, Deepak Goel wrote:
> > *I am trying to search for documents in my collection (Shakespeare). The
> > code is as follows:*
> >
> > SolrClient client = new HttpSolrClient.Builder("
> > http://localhost:8983/solr/shakespeare;).build();
> >
> > SolrDocument doc = client.getById("2");
> > *However this does not return any document. What mistake am I making?*
>
> The getById method accesses the handler named "/get", normally defined
> with the RealTimeGetHandler class.  In recent Solr versions, the /get
> handler is defined implicitly and does not have to be configured, but in
> older versions (not sure which ones) you do need to have it in
> solrconfig.xml.
>
> I didn't expect your code to work because getById method returns a
> SolrDocumentList and you have SolrDocument, but apparently this actually
> does work.  I have tried code very similar to yours against the
> techproducts example in version 7.1, and it works perfectly.  I will
> share the exact code I tried and what results I got below.
>
> What code have you tried after the code you've shared?  How are you
> determining that no document is returned?  Are there any error messages
> logged by the client code or Solr?  If there are, can you share them?
>
> Do you have a document in the shakespeare index that has the value "2"
> in whatever field is the uniqueKey?  Does the schema have a uniqueKey
> defined?
>
> Can you find the entry in solr.log that logs the query and share that
> entire log entry?
>
> Code:
>
> public static void main(String[] args) throws SolrServerException,
> IOException
> {
>   String baseUrl = "http://localhost:8983/solr/techproducts;;
>   SolrClient client = new HttpSolrClient.Builder(baseUrl).build();
>   SolrDocument doc = client.getById("SP2514N");
>   System.out.println(doc.getFieldValue("name"));
> }
>
> Console log from that code:
>
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for
> further details.
> Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133
>
>
> Including the collection/core name in the URL is an older way of writing
> SolrJ code.  It works well, but multiple collections can be accessed
> through one client object if you change it and your SolrJ version is new
> enough.
>
> Thanks,
> Shawn
>
>


Solr Exception: Undefined Field

2018-01-17 Thread Deepak Goel
*Hello*

*In Solr Admin: I type the q parameter as - *

*text_entry:**

*It gives the following exception (In the schema I do see a field as
text_entry):*

{ "responseHeader":{ "zkConnected":true, "status":400, "QTime":2, "params":{
"q":"text_entry:*", "_":"1516190134181"}}, "error":{ "metadata":[
"error-class","org.apache.solr.common.SolrException", "root-error-class",
"org.apache.solr.common.SolrException"], "msg":"undefined field text_entry",
"code":400}}


*However when i type the q paramter as -*

*{!term f=text_entry}henry*

*This does give out the output as foll:*

{ "responseHeader":{ "zkConnected":true, "status":0, "QTime":0, "params":{ "
q":"{!term f=text_entry}henry", "_":"1516190134181"}}, "response":{"numFound
":262,"start":0,"docs":[ { "type":"line", "line_id":"80075",
"play_name":"Richard
II", "speech_number":"13", "line_number":"3.3.37", "speaker":"HENRY
BOLINGBROKE", "text_entry":"Henry Bolingbroke", "id":
"9428c765-a4e8-4116-937a-9b70e8a8e2de", "_version_":1588569205789163522, "
speaker_str":["HENRY BOLINGBROKE"], "text_entry_str":["Henry Bolingbroke"],
"line_number_str":["3.3.37"], "type_str":["line"], "play_name_str":["Richard
II"]}, {
**

Any ideas what is going wrong in the first q?

Thank You

Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"


Virus-free.
www.avg.com

<#m_-480358672325756571_m_-3347175065213108175_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: Solr Exception: Undefined Field

2018-01-18 Thread Deepak Goel
Hello

In Solr Admin: I type the q parameter as -

text_entry:*

It gives the following exception (In the schema I do see a field as text_entry):

{
"responseHeader":{
"zkConnected":true,
"status":400,
"QTime":2,
"params":{
"q":"text_entry:*",
"_":"1516190134181"}},
"error":{
"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.solr.common.SolrException"],
"msg":"undefined field text_entry",
"code":400}}


However when i type the q paramter as -

{!term f=text_entry}henry

This does give out the output as foll:

{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":0,
"params":{
"q":"{!term f=text_entry}henry",
"_":"1516190134181"}},
"response":{"numFound":262,"start":0,"docs":[
{
"type":"line",
"line_id":"80075",
"play_name":"Richard II",
"speech_number":"13",
"line_number":"3.3.37",
"speaker":"HENRY BOLINGBROKE",
"text_entry":"Henry Bolingbroke",
"id":"9428c765-a4e8-4116-937a-9b70e8a8e2de",
"_version_":1588569205789163522,
"speaker_str":["HENRY BOLINGBROKE"],
"text_entry_str":["Henry Bolingbroke"],
"line_number_str":["3.3.37"],
"type_str":["line"],
"play_name_str":["Richard II"]},
{


Any ideas what is going wrong in the first q?

Thank You

On 1/18/18, Rick Leir <rl...@leirtech.com> wrote:
> Deepak
> Would you like to write your post again without asterisks? Include the
> asterisks which are necessary to the query of course.
> Rick
>
> On January 17, 2018 1:10:28 PM EST, Deepak Goel <deic...@gmail.com> wrote:
>>*Hello*
>>
>>*In Solr Admin: I type the q parameter as - *
>>
>>*text_entry:**
>>
>>*It gives the following exception (In the schema I do see a field as
>>text_entry):*
>>
>>{ "responseHeader":{ "zkConnected":true, "status":400, "QTime":2,
>>"params":{
>>"q":"text_entry:*", "_":"1516190134181"}}, "error":{ "metadata":[
>>"error-class","org.apache.solr.common.SolrException",
>>"root-error-class",
>>"org.apache.solr.common.SolrException"], "msg":"undefined field
>>text_entry",
>>"code":400}}
>>
>>
>>*However when i type the q paramter as -*
>>
>>*{!term f=text_entry}henry*
>>
>>*This does give out the output as foll:*
>>
>>{ "responseHeader":{ "zkConnected":true, "status":0, "QTime":0,
>>"params":{ "
>>q":"{!term f=text_entry}henry", "_":"1516190134181"}},
>>"response":{"numFound
>>":262,"start":0,"docs":[ { "type":"line", "line_id":"80075",
>>"play_name":"Richard
>>II", "speech_number":"13", "line_number":"3.3.37", "speaker":"HENRY
>>BOLINGBROKE", "text_entry":"Henry Bolingbroke", "id":
>>"9428c765-a4e8-4116-937a-9b70e8a8e2de",
>>"_version_":1588569205789163522, "
>>speaker_str":["HENRY BOLINGBROKE"], "text_entry_str":["Henry
>>Bolingbroke"],
>>"line_number_str":["3.3.37"], "type_str":["line"],
>>"play_name_str":["Richard
>>II"]}, {
>>**
>>
>>Any ideas what is going wrong in the first q?
>>
>>Thank You
>>
>>Deepak
>>"Please stop cruelty to Animals, help by becoming a Vegan"
>>+91 73500 12833
>>deic...@gmail.com
>>
>>Facebook: https://www.facebook.com/deicool
>>LinkedIn: www.linkedin.com/in/deicool
>>
>>"Plant a Tree, Go Green"
>>
>><http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail>
>>Virus-free.
>>www.avg.com
>><http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail>
>><#m_-480358672325756571_m_-3347175065213108175_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com


-- 


Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"


Re: query response time is too high

2018-01-29 Thread Deepak Goel
FYI. I recently did a study on 'Performance of Solr'

https://www.linkedin.com/pulse/performance-comparison-solr-elasticsearch-deepak-goel/?trackingId=N2j9xWvVEQQaZYa%2BoEsy%2Bw%3D%3D



Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Mon, Jan 29, 2018 at 4:56 PM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Aashish,
> Can you tell us a bit more about the size of your index and if you are
> running updates at the same time, types of queries, tests (is it some
> randomized query or some predefined), how many test threads do you use?
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 29 Jan 2018, at 11:17, Aashish Agarwal <aaashi...@gmail.com> wrote:
> >
> > Hi,
> >
> > Solr query time for a request comes aroung 10-12ms. But when I am hitting
> > the queries parallely the qtime rises to 900 ms but there is no
> significant
> > increase in cpu load. I am using solr with default memory settings. How
> can
> > I optimize to give less query time.
> >
> > Thanks in advance.
> >
> >
> > Aashish Agarwal
> > Computer Science
> > Birla Institute of Technology and Science,Pilani
> >
> >
> >
> > <https://mailtrack.io/> Sent with Mailtrack
> > <https://chrome.google.com/webstore/detail/mailtrack-for-gmail-inbox/
> ndnaehgpjlnokgebbaldlmgkapkpjkkb?utm_source=gmail_
> medium=signature_campaign=signaturevirality>
>
>


Re: Solr4 To Solr6 CPU load issues

2018-02-12 Thread Deepak Goel
I would suggest to keep the load same both for solr4 and solr6. And then
test. Also please post exact concurrent hits

On 12 Feb 2018 12:48, "~$alpha`"  wrote:

When both solr4 and solr6 have concurrent hits:
1. 30 to 40 :
Avg response time 470ms vs 380ms
Load 6 vs 10

1. 80 to 90 :
Avg response time 500ms vs 620ms (solr6 performing bad on peak hours)
Load 11 vs 25




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr4 To Solr6 CPU load issues

2018-02-12 Thread Deepak Goel
This would then mean that solr6 is reaching some kind of saturation (number
of threads, etc) at about loads of 60 Hits which then drives the
performance of it to be very bad !



Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Mon, Feb 12, 2018 at 8:38 PM, ~$alpha`  wrote:

> I cant test on more as performance is already degraded.
> Its a 32core system and load 25 means 2500% cpu
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Solr4 To Solr6 CPU load issues

2018-02-12 Thread Deepak Goel
Please test for higher number of hits till cpu load reaches 100%

On 12 Feb 2018 19:44, "~$alpha`"  wrote:

> Hits 41 :
> Avg response time470ms vs 380ms
> CPU Load  reaches6 vs 10
>
> Hits 82:
> Avg response time 500ms vs 620ms (solr6 performing bad on peak hours)
> CPU Load  reaches11 vs 25
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Solr4 To Solr6 CPU load issues

2018-02-12 Thread Deepak Goel
One more idea could be is to have multiple vm's (8 cpu each) on your server
and load balance them. That would help Solr6 scale nicely

On 12 Feb 2018 23:05, "Deepak Goel" <deic...@gmail.com> wrote:

> If the community cannot help, the only way i can think is either to
> profile Solr (java) under a load test to find the problem. You could also
> use an APM.
>
> On 12 Feb 2018 23:00, "~$alpha`" <lavesh.ra...@gmail.com> wrote:
>
>> Yes, but how to move ahead  now.
>> Its strange solr4 is better behaving than solr6
>>
>>
>>
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>


Re: Solr4 To Solr6 CPU load issues

2018-02-12 Thread Deepak Goel
If the community cannot help, the only way i can think is either to profile
Solr (java) under a load test to find the problem. You could also use an
APM.

On 12 Feb 2018 23:00, "~$alpha`"  wrote:

> Yes, but how to move ahead  now.
> Its strange solr4 is better behaving than solr6
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Solr4 To Solr6 CPU load issues

2018-02-11 Thread Deepak Goel
Two things trouble me:

1. It is a shared resource, so results are unreliable

2. Since cache results have increased, memory access will increase and it
will result in an increase in cpu usage. However response times will also
improve

To support more load you will have to increase server capacity or add
servers.




On 11 Feb 2018 23:28, "~$alpha`"  wrote:

Config : 64GB RAM 32 CORE CPU
but i have given 20Gb to solr JVM.. Also its a shared resource



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr4 To Solr6 CPU load issues

2018-02-11 Thread Deepak Goel
Can you please give the configuration of your server?

On 11 Feb 2018 19:17, "~$alpha`"  wrote:

> I have upgraded Solr4.0 Beta to Solr6.6. The Cache results look Awesome but
> overall the CPU load on solr6.6 is double the load on solr4.0 and hence I
> am
> not able to roll solr6.6 to 100% of my traffic.
>
> *Some Key Stats In Performance of Sol6 Vs Solr4*
> Document cache usage increased from .98 from .14
> Query Result cache usage increased from .10 from .24
> Filter cache same as .94
> Field Value cache was 0.99 in solr4 but n/a in solr6 (i guess because field
> multivalued concept was changed from solr4 to solr6)
>
> *Please Help Note: I have given document cache 3 times memory for
> doc.cache.*
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Solr4 To Solr6 CPU load issues

2018-02-11 Thread Deepak Goel
Also can you please post the throughputs for both of your tests

On 12 Feb 2018 00:35, "Deepak Goel" <deic...@gmail.com> wrote:

> Yup. Improvement of response time would hurt the cpu usage. The other
> thing is more memory usage (cache) which gets included into the cpu usage.
>
>
>
> Deepak
> "Please stop cruelty to Animals, help by becoming a Vegan"
> +91 73500 12833
> deic...@gmail.com
>
> Facebook: https://www.facebook.com/deicool
> LinkedIn: www.linkedin.com/in/deicool
>
> "Plant a Tree, Go Green"
>
> On Mon, Feb 12, 2018 at 12:24 AM, ~$alpha` <lavesh.ra...@gmail.com> wrote:
>
>> Other resource is not using the cpu.
>> Its true that response is better to 300ms from 350ms but cpu usage almost
>> doubled?
>>
>>
>>
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>
>


Re: Solr4 To Solr6 CPU load issues

2018-02-11 Thread Deepak Goel
Yup. Improvement of response time would hurt the cpu usage. The other thing
is more memory usage (cache) which gets included into the cpu usage.



Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Mon, Feb 12, 2018 at 12:24 AM, ~$alpha`  wrote:

> Other resource is not using the cpu.
> Its true that response is better to 300ms from 350ms but cpu usage almost
> doubled?
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Indexing timeout issues with SolrCloud 7.1

2018-02-23 Thread Deepak Goel
Can you please post all the errors? The current error is only for the node
'solr-2d'

On 23 Feb 2018 09:42, "Tom Peters"  wrote:

I'm trying to debug why indexing in SolrCloud 7.1 is having so many issues.
It will hang most of the time, and timeout the rest.

Here's an example:

time curl -s 'myhost:8080/solr/mycollection/update/json/docs' -d
'{"solr_id":"test_001", "data_type":"test"}'|jq .
{
  "responseHeader": {
"status": 0,
"QTime": 5004
  }
}
curl -s 'myhost:8080/solr/mycollection/update/json/docs' -d   0.00s
user 0.00s system 0% cpu 5.025 total
jq .  0.01s user 0.00s system 0% cpu 5.025 total

Here's some of the timeout errors I'm seeing:

2018-02-23 03:55:02.903 ERROR (qtp1595212853-3607) [c:mycollection
s:shard1 r:core_node12 x:mycollection_shard1_replica_n11]
o.a.s.h.RequestHandlerBase java.io.IOException:
java.util.concurrent.TimeoutException:
Idle timeout expired: 12/12 ms
2018-02-23 03:55:02.903 ERROR (qtp1595212853-3607) [c:mycollection
s:shard1 r:core_node12 x:mycollection_shard1_replica_n11]
o.a.s.s.HttpSolrCall null:java.io.IOException:
java.util.concurrent.TimeoutException:
Idle timeout expired: 12/12 ms
2018-02-23 03:55:36.517 ERROR (recoveryExecutor-3-thread-4-
processing-n:solr2-d.myhost:8080_solr x:mycollection_shard1_replica_n11
s:shard1 c:mycollection r:core_node12) [c:mycollection s:shard1
r:core_node12 x:mycollection_shard1_replica_n11] o.a.s.h.ReplicationHandler
Index fetch failed :org.apache.solr.common.SolrException: Index fetch
failed :
2018-02-23 03:55:36.517 ERROR (recoveryExecutor-3-thread-4-
processing-n:solr2-d.myhost:8080_solr x:mycollection_shard1_replica_n11
s:shard1 c:mycollection r:core_node12) [c:mycollection s:shard1
r:core_node12 x:mycollection_shard1_replica_n11] o.a.s.c.RecoveryStrategy
Error while trying to recover:org.apache.solr.common.SolrException:
Replication for recovery failed.


We currently have two separate Solr clusters. Our current in-production
cluster which runs on Solr 3.4 and a new ring that I'm trying to bring up
which runs on SolrCloud 7.1. I have the exact same code that is indexing to
both clusters. The Solr 3.4 indexes fine, but I'm running into lots of
issues with SolrCloud 7.1.


Some additional details about the setup:

* 5 nodes solr2-a through solr2-e.
* 5 replicas
* 1 shard
* The servers have 48G of RAM with -Xmx and -Xms set to 16G
* I currently have soft commits at 10m intervals and hard commits (with
openSearcher=false) at 1m intervals. I also tried 5m (soft) and 15s (hard)
as well.

Any help or pointers would be greatly appreciated. Thanks!


This message and any attachment may contain information that is
confidential and/or proprietary. Any use, disclosure, copying, storing, or
distribution of this e-mail or any attached file by anyone other than the
intended recipient is strictly prohibited. If you have received this
message in error, please notify the sender by reply email and delete the
message and any attachments. Thank you.


Re: Indexing timeout issues with SolrCloud 7.1

2018-02-24 Thread Deepak Goel
shard1 r:core_node7 x:mycollection_shard1_replica_n4]
o.a.s.h.IndexFetcher Error deleting file: tlog.0046787.
1593163366289899520
2018-02-23 04:12:22.405 ERROR
(recoveryExecutor-3-thread-6-processing-n:solr2-e:8080_solr
x:mycollection_shard1_replica_n4 s:shard1 c:mycollection r:core_node7)
[c:mycollection s:shard1 r:core_node7 x:mycollection_shard1_replica_n4]
o.a.s.c.RecoveryStrategy Error while trying to
recover:org.apache.solr.common.SolrException:
Replication for recovery failed.
2018-02-23 04:12:22.405 ERROR
(recoveryExecutor-3-thread-6-processing-n:solr2-e:8080_solr
x:mycollection_shard1_replica_n4 s:shard1 c:mycollection r:core_node7)
[c:mycollection s:shard1 r:core_node7 x:mycollection_shard1_replica_n4]
o.a.s.c.RecoveryStrategy Recovery failed - trying again... (1)
2018-02-23 04:12:22.405 ERROR
(recoveryExecutor-3-thread-6-processing-n:solr2-e:8080_solr
x:mycollection_shard1_replica_n4 s:shard1 c:mycollection r:core_node7)
[c:mycollection s:shard1 r:core_node7 x:mycollection_shard1_replica_n4]
o.a.s.h.ReplicationHandler Index fetch failed
:org.apache.solr.common.SolrException:
Unable to download tlog.0046787.1593163366289899520 completely.
Downloaded 0!=179060


> On Feb 23, 2018, at 4:15 PM, Deepak Goel <deic...@gmail.com> wrote:
>
> Can you please post all the errors? The current error is only for the node
> 'solr-2d'
>
> On 23 Feb 2018 09:42, "Tom Peters" <tpet...@synacor.com> wrote:
>
> I'm trying to debug why indexing in SolrCloud 7.1 is having so many
issues.
> It will hang most of the time, and timeout the rest.
>
> Here's an example:
>
>time curl -s 'myhost:8080/solr/mycollection/update/json/docs' -d
> '{"solr_id":"test_001", "data_type":"test"}'|jq .
>{
>  "responseHeader": {
>"status": 0,
>"QTime": 5004
>  }
>}
>curl -s 'myhost:8080/solr/mycollection/update/json/docs' -d   0.00s
> user 0.00s system 0% cpu 5.025 total
>jq .  0.01s user 0.00s system 0% cpu 5.025 total
>
> Here's some of the timeout errors I'm seeing:
>
>2018-02-23 03:55:02.903 ERROR (qtp1595212853-3607) [c:mycollection
> s:shard1 r:core_node12 x:mycollection_shard1_replica_n11]
> o.a.s.h.RequestHandlerBase java.io.IOException:
> java.util.concurrent.TimeoutException:
> Idle timeout expired: 12/12 ms
>2018-02-23 03:55:02.903 ERROR (qtp1595212853-3607) [c:mycollection
> s:shard1 r:core_node12 x:mycollection_shard1_replica_n11]
> o.a.s.s.HttpSolrCall null:java.io.IOException:
> java.util.concurrent.TimeoutException:
> Idle timeout expired: 12/12 ms
>2018-02-23 03:55:36.517 ERROR (recoveryExecutor-3-thread-4-
> processing-n:solr2-d.myhost:8080_solr x:mycollection_shard1_replica_n11
> s:shard1 c:mycollection r:core_node12) [c:mycollection s:shard1
> r:core_node12 x:mycollection_shard1_replica_n11]
o.a.s.h.ReplicationHandler
> Index fetch failed :org.apache.solr.common.SolrException: Index fetch
> failed :
>2018-02-23 03:55:36.517 ERROR (recoveryExecutor-3-thread-4-
> processing-n:solr2-d.myhost:8080_solr x:mycollection_shard1_replica_n11
> s:shard1 c:mycollection r:core_node12) [c:mycollection s:shard1
> r:core_node12 x:mycollection_shard1_replica_n11] o.a.s.c.RecoveryStrategy
> Error while trying to recover:org.apache.solr.common.SolrException:
> Replication for recovery failed.
>
>
> We currently have two separate Solr clusters. Our current in-production
> cluster which runs on Solr 3.4 and a new ring that I'm trying to bring up
> which runs on SolrCloud 7.1. I have the exact same code that is indexing
to
> both clusters. The Solr 3.4 indexes fine, but I'm running into lots of
> issues with SolrCloud 7.1.
>
>
> Some additional details about the setup:
>
> * 5 nodes solr2-a through solr2-e.
> * 5 replicas
> * 1 shard
> * The servers have 48G of RAM with -Xmx and -Xms set to 16G
> * I currently have soft commits at 10m intervals and hard commits (with
> openSearcher=false) at 1m intervals. I also tried 5m (soft) and 15s (hard)
> as well.
>
> Any help or pointers would be greatly appreciated. Thanks!
>
>
> This message and any attachment may contain information that is
> confidential and/or proprietary. Any use, disclosure, copying, storing, or
> distribution of this e-mail or any attached file by anyone other than the
> intended recipient is strictly prohibited. If you have received this
> message in error, please notify the sender by reply email and delete the
> message and any attachments. Thank you.



This message and any attachment may contain information that is
confidential and/or proprietary. Any use, disclosure, copying, storing, or
distribution of this e-mail or any attached file by anyone other than the
intended recipient is strictly prohibited. If you have received this
message in error, please notify the sender by reply email and delete the
message and any attachments. Thank you.


Re: Some performance questions....

2018-03-11 Thread Deepak Goel
To rephrase your Question

"Does Solr do well with Scale-up or Scale-out?"

Are there any Performance Benchmarks for the same out there supporting the
claim?

On 11 Mar 2018 23:05, "BlackIce"  wrote:

> Hi,
>
> I have some questions regarding performance.
>
> Lets says I have a dual CPU with a total of 8 cores and 24 GB RAM for my
> Solr and some other stuff.
>
> Would it be more beneficial to only run 1 instance of Solr with the
> collection stored on 4 HD's in RAID 0?? Or Have several Virtual
> Machines each running of its own HD, ie: Have 4 VM's running Solr?
>
> Any Thoughts?
>
> Thank you!
>
> RRK
>


Re: Some performance questions....

2018-03-11 Thread Deepak Goel
On 12 Mar 2018 05:51, "Shawn Heisey"  wrote:

On 3/11/2018 11:35 AM, BlackIce wrote:

> I have some questions regarding performance.
>
> Lets says I have a dual CPU with a total of 8 cores and 24 GB RAM for my
> Solr and some other stuff.
>
> Would it be more beneficial to only run 1 instance of Solr with the
> collection stored on 4 HD's in RAID 0?? Or Have several Virtual
> Machines each running of its own HD, ie: Have 4 VM's running Solr?
>

Performance is always going to be better on bare metal than on virtual
machines.  Virtualization in modern times is really good, so the difference
*might* be minimal, but there is ALWAYS overhead.

*Deepak*

I doubt this. It would be great if someone can subtantiate this with hard
facts
*Deepak*


I used to create virtual machines in my hardware for Solr. Initially with
vmware esxi, then later natively in Linux with KVM.  At that time, I was
running one index core per VM.  Just for some testing, I took a similar
machine and set up one Solr instance handling all the same cores on bare
metal.  I do not remember HOW much faster it was, but it was definitely
faster. One big thing I like about bare metal is that there's only one
"machine", IP address, and Solr instance to administer.

Unless you're willing to completely rebuild the whole thing in the event of
drive failure, don't use RAID0.  If one drive dies (and every hard drive IS
eventually going to die if it's used long enough), then *all* of the data
on the whole RAID volume is gone.

You could do RAID5, which has decent redundancy and good space efficiency,
but if you're not familiar with the RAID5 write penalty, do some research
on it, and you'll probably come out of it not wanting to EVER use it.  If
you like, I can explain exactly why you should avoid any RAID level that
incorporates 5 or 6.

Overall, the best level is RAID10 ... but it has a glaring disadvantage
from a cost perspective -- you lose half of your raw capacity.  Since
drives are relatively cheap, I always build my servers with RAID10, using a
1MB stripe size and a battery-backed caching controller.  For the typical
hardware I'm using, that means that I'm going to end up with 6 to 12TB of
usable space instead of 10 to 20TB (raid5), but the volume is FAST.

Thanks,
Shawn


Re: Some performance questions....

2018-03-12 Thread Deepak Goel
We need benchmarks or data to support the claim.

A single OS and JVM does not scale linearly for higher loads. If you have
seperate OS and Java, the load is distributed across multiple instances
(with each instance only requiered to support a smaller load and hence
would scale nicely)

I had found this for running multiple apache servers on multiple VMs as
compared to a single instance (not Solr). But i am pretty sure it would be
same for Solr too

On 12 Mar 2018 12:42, "Shawn Heisey" <apa...@elyograg.org> wrote:

> On 3/11/2018 7:39 PM, Deepak Goel wrote:
>
>> I doubt this. It would be great if someone can subtantiate this with hard
>> facts
>>
>
> This seems to be in response to my claim that virtualization always has
> overhead.  I don't see how this statement can be at all controversial.
>
> Virtualization isn't free, even if the hardware and software in use are
> extremely efficient at it.  Translating what a virtual machine does into a
> corresponding action on the real hardware is going to take time and
> resources beyond whatever the action itself is.
>
> Plus there's the application-level overhead.  You have the overhead of
> multiple operating systems, multiple copies of Java running, multiple
> servlet containers (probably Jetty), and multiple copies of Solr.  And each
> of them is running inside a limited subset of the hardware installed in the
> physical server.
>
> Let's say you start with VMs on a server, and benchmark Solr's
> performance.  Then you completely erase the server, install one operating
> system, install Solr onto the OS, and then install all of the indexes that
> were running on the VMs into that one Solr instance.  Assuming that things
> are set up correctly and that you give that Solr instance the correct
> amount of heap memory, it's almost guaranteed to be faster than the VMs.  I
> can't tell you whether the improvement will be half a percent or 50
> percent, only that it will be faster.
>
> Thanks,
> Shawn
>
>


Re: Some performance questions....

2018-03-12 Thread Deepak Goel
Now you are mixing your original question about performance with reliability

On 12 Mar 2018 02:29, "BlackIce" <blackice...@gmail.com> wrote:

> Second to this wouldn't 4 Solr instances each with its own HD be fault
> tolerant? vs. one solr instance with 4 HD's in RAID 0? Plus to his comes
> the storage capacity, I need the capacity of those 4 drives... the more I
> read.. the more questions
>
> On Sun, Mar 11, 2018 at 9:43 PM, BlackIce <blackice...@gmail.com> wrote:
>
> > Thnx for the pointers.
> >
> > I haven't given much thought to Solr, asides shemal.xml and
> solrconfig.xml
> > and I'm just diving into a bit more deeper stuff!
> >
> > Greetz
> >
> > RRK
> >
> > On Sun, Mar 11, 2018 at 8:58 PM, Deepak Goel <deic...@gmail.com> wrote:
> >
> >> To rephrase your Question
> >>
> >> "Does Solr do well with Scale-up or Scale-out?"
> >>
> >> Are there any Performance Benchmarks for the same out there supporting
> the
> >> claim?
> >>
> >> On 11 Mar 2018 23:05, "BlackIce" <blackice...@gmail.com> wrote:
> >>
> >> > Hi,
> >> >
> >> > I have some questions regarding performance.
> >> >
> >> > Lets says I have a dual CPU with a total of 8 cores and 24 GB RAM for
> my
> >> > Solr and some other stuff.
> >> >
> >> > Would it be more beneficial to only run 1 instance of Solr with the
> >> > collection stored on 4 HD's in RAID 0?? Or Have several Virtual
> >> > Machines each running of its own HD, ie: Have 4 VM's running Solr?
> >> >
> >> > Any Thoughts?
> >> >
> >> > Thank you!
> >> >
> >> > RRK
> >> >
> >>
> >
> >
>


Re: Some performance questions....

2018-03-12 Thread Deepak Goel
I am not sure if I understand your question

*"How do I test this?"*
You have to run test (benchmark test) of transactions (queries) which are
most representative of your system (requirement).

You can use a performance testing tool like JMeter (along with PerfMon
configured for utilisation metrics)



Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Mon, Mar 12, 2018 at 10:57 PM, BlackIce <blackice...@gmail.com> wrote:

> So Im thinking following scenarios :
> Single instance with drives in raid 0, raid 10 and raid 5.
>
> And then having 3 Vms and 4 Solr instances each with its own HD.
>
> How do I test this?
>
>
> Greetz
>
> On Mar 12, 2018 1:16 PM, "BlackIce" <blackice...@gmail.com> wrote:
>
> > OK, so we're gone nowhere,  since I've already lost lots of time...  A
> few
> > days more or less won't make a difference  I'd be willing to
> benchmark
> > if some tells me how to.
> >
> >
> > Greetz
> >
> > On Mar 12, 2018 7:17 AM, "Deepak Goel" <deic...@gmail.com> wrote:
> >
> >> Now you are mixing your original question about performance with
> >> reliability
> >>
> >> On 12 Mar 2018 02:29, "BlackIce" <blackice...@gmail.com> wrote:
> >>
> >> > Second to this wouldn't 4 Solr instances each with its own HD be fault
> >> > tolerant? vs. one solr instance with 4 HD's in RAID 0? Plus to his
> comes
> >> > the storage capacity, I need the capacity of those 4 drives... the
> more
> >> I
> >> > read.. the more questions
> >> >
> >> > On Sun, Mar 11, 2018 at 9:43 PM, BlackIce <blackice...@gmail.com>
> >> wrote:
> >> >
> >> > > Thnx for the pointers.
> >> > >
> >> > > I haven't given much thought to Solr, asides shemal.xml and
> >> > solrconfig.xml
> >> > > and I'm just diving into a bit more deeper stuff!
> >> > >
> >> > > Greetz
> >> > >
> >> > > RRK
> >> > >
> >> > > On Sun, Mar 11, 2018 at 8:58 PM, Deepak Goel <deic...@gmail.com>
> >> wrote:
> >> > >
> >> > >> To rephrase your Question
> >> > >>
> >> > >> "Does Solr do well with Scale-up or Scale-out?"
> >> > >>
> >> > >> Are there any Performance Benchmarks for the same out there
> >> supporting
> >> > the
> >> > >> claim?
> >> > >>
> >> > >> On 11 Mar 2018 23:05, "BlackIce" <blackice...@gmail.com> wrote:
> >> > >>
> >> > >> > Hi,
> >> > >> >
> >> > >> > I have some questions regarding performance.
> >> > >> >
> >> > >> > Lets says I have a dual CPU with a total of 8 cores and 24 GB RAM
> >> for
> >> > my
> >> > >> > Solr and some other stuff.
> >> > >> >
> >> > >> > Would it be more beneficial to only run 1 instance of Solr with
> the
> >> > >> > collection stored on 4 HD's in RAID 0?? Or Have several
> Virtual
> >> > >> > Machines each running of its own HD, ie: Have 4 VM's running
> Solr?
> >> > >> >
> >> > >> > Any Thoughts?
> >> > >> >
> >> > >> > Thank you!
> >> > >> >
> >> > >> > RRK
> >> > >> >
> >> > >>
> >> > >
> >> > >
> >> >
> >>
> >
>


Re: Some performance questions....

2018-03-14 Thread Deepak Goel
The OS resources would be shared in that case

On 14 Mar 2018 17:19, "BlackIce" <blackice...@gmail.com> wrote:

> I was just thinking Do I really need separate VM's in order to run
> multiple Solr instances? Doesn't it suffice to have each instance in its
> own user account?
>
> Greetz
>
> On Mon, Mar 12, 2018 at 7:41 PM, BlackIce <blackice...@gmail.com> wrote:
>
> > I don't have any production logs and this all sounds to complicated.
> >
> > So, I'll just trow the system together in a way it makes the most sense
> > for now.. collect some logs and then do some testing further down the
> road.
> > For now just get the sucker up and running.
> >
> > Thanks all
> >
> > On Mon, Mar 12, 2018 at 7:23 PM, Deepak Goel <deic...@gmail.com> wrote:
> >
> >> I am not sure if I understand your question
> >>
> >> *"How do I test this?"*
> >> You have to run test (benchmark test) of transactions (queries) which
> are
> >> most representative of your system (requirement).
> >>
> >> You can use a performance testing tool like JMeter (along with PerfMon
> >> configured for utilisation metrics)
> >>
> >>
> >>
> >> Deepak
> >> "Please stop cruelty to Animals, help by becoming a Vegan"
> >> +91 73500 12833
> >> deic...@gmail.com
> >>
> >> Facebook: https://www.facebook.com/deicool
> >> LinkedIn: www.linkedin.com/in/deicool
> >>
> >> "Plant a Tree, Go Green"
> >>
> >> On Mon, Mar 12, 2018 at 10:57 PM, BlackIce <blackice...@gmail.com>
> wrote:
> >>
> >> > So Im thinking following scenarios :
> >> > Single instance with drives in raid 0, raid 10 and raid 5.
> >> >
> >> > And then having 3 Vms and 4 Solr instances each with its own HD.
> >> >
> >> > How do I test this?
> >> >
> >> >
> >> > Greetz
> >> >
> >> > On Mar 12, 2018 1:16 PM, "BlackIce" <blackice...@gmail.com> wrote:
> >> >
> >> > > OK, so we're gone nowhere,  since I've already lost lots of
> time...  A
> >> > few
> >> > > days more or less won't make a difference  I'd be willing to
> >> > benchmark
> >> > > if some tells me how to.
> >> > >
> >> > >
> >> > > Greetz
> >> > >
> >> > > On Mar 12, 2018 7:17 AM, "Deepak Goel" <deic...@gmail.com> wrote:
> >> > >
> >> > >> Now you are mixing your original question about performance with
> >> > >> reliability
> >> > >>
> >> > >> On 12 Mar 2018 02:29, "BlackIce" <blackice...@gmail.com> wrote:
> >> > >>
> >> > >> > Second to this wouldn't 4 Solr instances each with its own HD be
> >> fault
> >> > >> > tolerant? vs. one solr instance with 4 HD's in RAID 0? Plus to
> his
> >> > comes
> >> > >> > the storage capacity, I need the capacity of those 4 drives...
> the
> >> > more
> >> > >> I
> >> > >> > read.. the more questions
> >> > >> >
> >> > >> > On Sun, Mar 11, 2018 at 9:43 PM, BlackIce <blackice...@gmail.com
> >
> >> > >> wrote:
> >> > >> >
> >> > >> > > Thnx for the pointers.
> >> > >> > >
> >> > >> > > I haven't given much thought to Solr, asides shemal.xml and
> >> > >> > solrconfig.xml
> >> > >> > > and I'm just diving into a bit more deeper stuff!
> >> > >> > >
> >> > >> > > Greetz
> >> > >> > >
> >> > >> > > RRK
> >> > >> > >
> >> > >> > > On Sun, Mar 11, 2018 at 8:58 PM, Deepak Goel <
> deic...@gmail.com>
> >> > >> wrote:
> >> > >> > >
> >> > >> > >> To rephrase your Question
> >> > >> > >>
> >> > >> > >> "Does Solr do well with Scale-up or Scale-out?"
> >> > >> > >>
> >> > >> > >> Are there any Performance Benchmarks for the same out there
> >> > >> supporting
> >> > >> > the
> >> > >> > >> claim?
> >> > >> > >>
> >> > >> > >> On 11 Mar 2018 23:05, "BlackIce" <blackice...@gmail.com>
> wrote:
> >> > >> > >>
> >> > >> > >> > Hi,
> >> > >> > >> >
> >> > >> > >> > I have some questions regarding performance.
> >> > >> > >> >
> >> > >> > >> > Lets says I have a dual CPU with a total of 8 cores and 24
> GB
> >> RAM
> >> > >> for
> >> > >> > my
> >> > >> > >> > Solr and some other stuff.
> >> > >> > >> >
> >> > >> > >> > Would it be more beneficial to only run 1 instance of Solr
> >> with
> >> > the
> >> > >> > >> > collection stored on 4 HD's in RAID 0?? Or Have several
> >> > Virtual
> >> > >> > >> > Machines each running of its own HD, ie: Have 4 VM's running
> >> > Solr?
> >> > >> > >> >
> >> > >> > >> > Any Thoughts?
> >> > >> > >> >
> >> > >> > >> > Thank you!
> >> > >> > >> >
> >> > >> > >> > RRK
> >> > >> > >> >
> >> > >> > >>
> >> > >> > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >
> >> >
> >>
> >
> >
>


Re: Some performance questions....

2018-03-15 Thread Deepak Goel
Please see inline...



Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Thu, Mar 15, 2018 at 6:04 PM, BlackIce  wrote:

> Shawn:
> well the idea was to utilize system resources more efficiently.. this is
> not due so much to Solr, as I sayd I don't know that much about Solr,
> except Shema.xml and Solarconfig.xml - However the main app that will be
> running is more or less a single threated app which takes advantage when
> run under several instances, ie: parallelism, so I thought, since I'm at it
> I may give solr a few instances as well... but the more I read, the more
> confused I get.. I've read about some guy running 8 Solr instances on his
> dual Xeon 26xx series, each VM with 12 GB ram..
>
> Deepak:
>
> Well its kinda a given that when running ANYTHING under a VM you have an
> overhead..

***Deepak***
You mean you are assuming without any facts (performance benchmark with n
without VM)
 ***Deepak***

> so since I control the hardware, ie: not sharing space on some
> hosted VM by some ISP... why not skip the whole VM thing entirely?
>
> Thnx for the Heap pointer.. I've read, from some Professor.. that Solr
> actually is more efficient with a very small Heap and to have everything
> mapped to virtual memory... Which brings me to the next question.. is the
> Virtual memory mapping done by the OS or Solar? Does the Virtual memory
> reside on the OS HDD? Or on the Solr HDD?.. and if the Virtual memory
> mapping is done on the OS HDD, wouldn't it be beneficial to run the OS off
> a SSD?
>
> ***Deepak***
The OS does mapping itself to virtual memory (Atleast Unix does). However
am not sure of the internal mechanism of Solr
***Deepak***


> For now.. my FEELING is to run one Solr instance on this particular
> machine.. by the time the RAM is outgrown add another machine and so
> forth...

***Deepak***
I wonder if there are any performance benchmarks showing how Solr scales at
higher loads on a single machine (is it linear or non linear). Most
software don't scale linearly at higher loads
 ***Deepak***

> I've had a small set-back: due to the chasis configuration I could
> only fit in Half of the HDD's I intented.. the rest collide with the CPU
> heatsinks (Don't ask)
>  so my entire initial set-up has changed and with it my initial "growth
> strategy"
>
> On Wed, Mar 14, 2018 at 4:15 PM, Shawn Heisey  wrote:
>
> > On 3/14/2018 5:49 AM, BlackIce wrote:
> >
> >> I was just thinking Do I really need separate VM's in order to run
> >> multiple Solr instances? Doesn't it suffice to have each instance in its
> >> own user account?
> >>
> >
> > You can run multiple instances all under the same account on one machine.
> > But for a single machine, why do you need multiple Solr instances at all?
> > One instance can handle many indexes, and will probably do it more
> > efficiently than multiple instances.
> >
> > The only time I would *ever* recommend multiple Solr instances is when a
> > single instance would need an ENORMOUS Java heap -- something much larger
> > than 32GB.  If something like that can be split into multiple instances
> > where each one has a heap that's 31GB heap or less, then memory usage
> will
> > be more efficient and Java's garbage collection will work better.
> >
> > FYI -- Running Java with a 32GB heap actually has LESS memory available
> > than running it with a 31GB heap.  This is because when the heap reaches
> > 32GB, Java must switch to 64-bit pointers, so every little allocation
> > requires a little bit more memory.
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: Some performance questions....

2018-03-14 Thread Deepak Goel
Have you measured the overhead of VM anytime? Or have you read it somewhere?

On 14 Mar 2018 18:10, "BlackIce" <blackice...@gmail.com> wrote:

> but it should be possible, without the overhead of VM's
>
> On Wed, Mar 14, 2018 at 1:30 PM, Deepak Goel <deic...@gmail.com> wrote:
>
> > The OS resources would be shared in that case
> >
> > On 14 Mar 2018 17:19, "BlackIce" <blackice...@gmail.com> wrote:
> >
> > > I was just thinking Do I really need separate VM's in order to run
> > > multiple Solr instances? Doesn't it suffice to have each instance in
> its
> > > own user account?
> > >
> > > Greetz
> > >
> > > On Mon, Mar 12, 2018 at 7:41 PM, BlackIce <blackice...@gmail.com>
> wrote:
> > >
> > > > I don't have any production logs and this all sounds to
> > complicated.
> > > >
> > > > So, I'll just trow the system together in a way it makes the most
> sense
> > > > for now.. collect some logs and then do some testing further down the
> > > road.
> > > > For now just get the sucker up and running.
> > > >
> > > > Thanks all
> > > >
> > > > On Mon, Mar 12, 2018 at 7:23 PM, Deepak Goel <deic...@gmail.com>
> > wrote:
> > > >
> > > >> I am not sure if I understand your question
> > > >>
> > > >> *"How do I test this?"*
> > > >> You have to run test (benchmark test) of transactions (queries)
> which
> > > are
> > > >> most representative of your system (requirement).
> > > >>
> > > >> You can use a performance testing tool like JMeter (along with
> PerfMon
> > > >> configured for utilisation metrics)
> > > >>
> > > >>
> > > >>
> > > >> Deepak
> > > >> "Please stop cruelty to Animals, help by becoming a Vegan"
> > > >> +91 73500 12833
> > > >> deic...@gmail.com
> > > >>
> > > >> Facebook: https://www.facebook.com/deicool
> > > >> LinkedIn: www.linkedin.com/in/deicool
> > > >>
> > > >> "Plant a Tree, Go Green"
> > > >>
> > > >> On Mon, Mar 12, 2018 at 10:57 PM, BlackIce <blackice...@gmail.com>
> > > wrote:
> > > >>
> > > >> > So Im thinking following scenarios :
> > > >> > Single instance with drives in raid 0, raid 10 and raid 5.
> > > >> >
> > > >> > And then having 3 Vms and 4 Solr instances each with its own HD.
> > > >> >
> > > >> > How do I test this?
> > > >> >
> > > >> >
> > > >> > Greetz
> > > >> >
> > > >> > On Mar 12, 2018 1:16 PM, "BlackIce" <blackice...@gmail.com>
> wrote:
> > > >> >
> > > >> > > OK, so we're gone nowhere,  since I've already lost lots of
> > > time...  A
> > > >> > few
> > > >> > > days more or less won't make a difference  I'd be willing to
> > > >> > benchmark
> > > >> > > if some tells me how to.
> > > >> > >
> > > >> > >
> > > >> > > Greetz
> > > >> > >
> > > >> > > On Mar 12, 2018 7:17 AM, "Deepak Goel" <deic...@gmail.com>
> wrote:
> > > >> > >
> > > >> > >> Now you are mixing your original question about performance
> with
> > > >> > >> reliability
> > > >> > >>
> > > >> > >> On 12 Mar 2018 02:29, "BlackIce" <blackice...@gmail.com>
> wrote:
> > > >> > >>
> > > >> > >> > Second to this wouldn't 4 Solr instances each with its own HD
> > be
> > > >> fault
> > > >> > >> > tolerant? vs. one solr instance with 4 HD's in RAID 0? Plus
> to
> > > his
> > > >> > comes
> > > >> > >> > the storage capacity, I need the capacity of those 4
> drives...
> > > the
> > > >> > more
> > > >> > >> I
> > > >> > >> > read.. the more questions
> > > >> > >> >
> > > >> > >> > On Sun, Mar 11, 2018 at 9:43 PM

Re: Solr OOM Crashes / JVM tuning advice

2018-04-11 Thread Deepak Goel
A few observations:

1. The Old Gen Heap on 9th April is about 6GB occupied which then runs up
to 9+GB on 10th April (It steadily increases throughout the day)
2. The Old Gen GC is never able to reclaim any free memory



Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Wed, Apr 11, 2018 at 8:53 PM, Adam Harrison-Fuller <
aharrison-ful...@mintel.com> wrote:

> In addition, here is the GC log leading up to the crash.
>
> https://www.dropbox.com/s/sq09d6hbss9b5ov/solr_gc_log_
> 20180410_1009.zip?dl=0
>
> Thanks!
>
> Adam
>
> On 11 April 2018 at 16:18, Adam Harrison-Fuller <
> aharrison-ful...@mintel.com
> > wrote:
>
> > Thanks for the advice so far.
> >
> > The directoryFactory is set to ${solr.directoryFactory:solr.
> NRTCachingDirectoryFactory}.
> >
> >
> > The servers workload is predominantly queries with updates taking place
> > once a day.  It seems the servers are more likely to go down whilst the
> > servers are indexing but not exclusively so.
> >
> > I'm having issues locating the actual out of memory exception.  I can
> tell
> > that it has ran out of memory as its called the oom_killer script which
> as
> > left a log file in the logs directory.  I cannot find the actual
> exception
> > in the solr.log or our solr_gc.log, any suggestions?
> >
> > Cheers,
> > Adam
> >
> >
> > On 11 April 2018 at 15:49, Walter Underwood 
> wrote:
> >
> >> For readability, I’d use -Xmx12G instead of -XX:MaxHeapSize=12884901888.
> >> Also, I always use a start size the same as the max size, since servers
> >> will eventually grow to the max size. So:
> >>
> >> -Xmx12G -Xms12G
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >> > On Apr 11, 2018, at 6:29 AM, Sujay Bawaskar 
> >> wrote:
> >> >
> >> > What is directory factory defined in solrconfig.xml? Your JVM heap
> >> should
> >> > be tuned up with respect to that.
> >> > How solr is being use,  is it more updates and less query or less
> >> updates
> >> > more queries?
> >> > What is OOM error? Is it frequent GC or Error 12?
> >> >
> >> > On Wed, Apr 11, 2018 at 6:05 PM, Adam Harrison-Fuller <
> >> > aharrison-ful...@mintel.com> wrote:
> >> >
> >> >> Hey Jesus,
> >> >>
> >> >> Thanks for the suggestions.  The Solr nodes have 4 CPUs assigned to
> >> them.
> >> >>
> >> >> Cheers!
> >> >> Adam
> >> >>
> >> >> On 11 April 2018 at 11:22, Jesus Olivan 
> >> wrote:
> >> >>
> >> >>> Hi Adam,
> >> >>>
> >> >>> IMHO you could try increasing heap to 20 Gb (with 46 Gb of physical
> >> RAM,
> >> >>> your JVM can afford more RAM without threading penalties due to
> >> outside
> >> >>> heap RAM lacks.
> >> >>>
> >> >>> Another good one would be to increase -XX:CMSInitiatingOccupancyFrac
> >> tion
> >> >>> =50
> >> >>> to 75. I think that CMS collector works better when Old generation
> >> space
> >> >> is
> >> >>> more populated.
> >> >>>
> >> >>> I usually use to set Survivor spaces to lesser size. If you want to
> >> try
> >> >>> SurvivorRatio to 6, i think performance would be improved.
> >> >>>
> >> >>> Another good practice for me would be to set an static NewSize
> instead
> >> >>> of -XX:NewRatio=3.
> >> >>> You could try to set -XX:NewSize=7000m and -XX:MaxNewSize=7000Mb
> (one
> >> >> third
> >> >>> of total heap space is recommended).
> >> >>>
> >> >>> Finally, my best results after a deep JVM I+D related to Solr, came
> >> >>> removing ScavengeBeforeRemark flag and applying this new one: +
> >> >>> ParGCCardsPerStrideChunk.
> >> >>>
> >> >>> However, It would be a good one to set ParallelGCThreads and
> >> >>> *ConcGCThreads *to their optimal value, and we need you system CPU
> >> number
> >> >>> to know it. Can you provide this data, please?
> >> >>>
> >> >>> Regards
> >> >>>
> >> >>>
> >> >>> 2018-04-11 12:01 GMT+02:00 Adam Harrison-Fuller <
> >> >>> aharrison-ful...@mintel.com
> >>  :
> >> >>>
> >>  Hey all,
> >> 
> >>  I was wondering if I could get some JVM/GC tuning advice to resolve
> >> an
> >>  issue that we are experiencing.
> >> 
> >>  Full disclaimer, I am in no way a JVM/Solr expert so any advice you
> >> can
> >>  render would be greatly appreciated.
> >> 
> >>  Our Solr cloud nodes are having issues throwing OOM exceptions
> under
> >> >>> load.
> >>  This issue has only started manifesting itself over the last few
> >> months
> >>  during which time the only change I can discern is an increase in
> >> index
> >>  size.  They are running Solr 5.5.2 on OpenJDK version "1.8.0_101".
> >> The
> >>  index is currently 58G and the server has 46G of physical RAM and
> >> runs
> >>  nothing other than the Solr node.
> >> 
> >>  The JVM is invoked with the following JVM options:
> >>  

Re: Performance & CPU Usage of 6.2.1 vs 6.5.1 & above

2018-04-17 Thread Deepak Goel
Please post the exact results. Many a times the high cpu utilisation may be
a boon as it improves query response times

On Tue, 17 Apr 2018, 13:55 mganeshs,  wrote:

> Regarding query times, we couldn't see big improvements. Both are more or
> less same.
>
> Our main worry is that, why CPU usage is so high in 6.5.1 and above ?
> What's
> going wrong ?
>
> Is any one else facing this sort of issue ? If yes, how to bring down the
> CPU usage? Is there any settings which we need to set ( not default one )
> in
> 6.5.1 ?
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Performance & CPU Usage of 6.2.1 vs 6.5.1 & above

2018-04-16 Thread Deepak Goel
Do you see a performance improvement in your 'Query Times'  (6.2.1
vis-a-vis 6.5.1)?



Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Mon, Apr 16, 2018 at 3:15 PM, Bernd Fehling <
bernd.fehl...@uni-bielefeld.de> wrote:

> It would help if you can trace it down to a version change.
> Do you have a test system and start with 6.3.0 as next version above 6.2.1
> to see which version change is making you trouble?
> You can then try 6.4.0 and 6.5.0 next. And after that go into subversions.
>
> Regards, Bernd
>
>
> Am 16.04.2018 um 09:39 schrieb mganeshs:
> > Hi Bernd,
> >
> > We didn't change any default settings.
> >
> > Both 6.2.1 and 6.5.1 is running with same settings, same volume of data,
> > same code, which means indexing rate is also same.
> >
> > In Case of 6.2.1 CPU is around 60 to 70%. But in 6.5.1 it's always around
> > 95%. The CPU % in 6.5.1 is alarming for us and we keep getting alerts as
> > it's always more than 95%.
> >
> > Basically, my question is why is that in 6.2.1 CPU is low and for 6.5.1
> it's
> > very high ? I though only I am facing this issue, but one more in the
> forum
> > also raised this issue, but nothing concluded so far.
> >
> > In another thread Shawn also suggested changes wrt merge policy numbers.
> But
> > CPU % didn't come down. But in 6.2.1 with default settings itself, it
> works
> > fine and CPU is also normal. So created new thread to discuss wrt CPU
> > utilization between old version (6.2.1 ) and new version (6.5.1+)
> >
> > Regards,
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >
>


Re: Performance & CPU Usage of 6.2.1 vs 6.5.1 & above

2018-04-19 Thread Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Thu, Apr 19, 2018 at 9:23 AM, mganeshs <mgane...@live.in> wrote:

> Hello Deepak,
>
> We are not querying when indexing is going on. Whatever CPU graph I shared
> for 6.2.1 and 6.5.1 was only while we do batch indexing. During that time
> we
> don't query and no queries are getting executed.
>
> We index in a batch with a rate of around 100 documents / sec.


Is the batch rate same for both the cases (6.2.1 & 6.5.1)?


> And it's not
> so high too. But same piece of code and same config, with 6.2.1 CPU is
> normal and in 6.5.1 it always stays above 90% or 95%.
>
> @Solr Experts,
>
> From one of the thread by " Yasoob
> <http://lucene.472066.n3.nabble.com/CommitScheduler-Thread-blocked-due-to-
> excessive-number-of-Merging-Threads-tp4353964p4354334.html>
> " it's mentioned as
>
> /I compared the source code for the two versions and found that different
> merge functions were being used to merge the postings. In 5.4, the default
> merge method of FieldsConsumer class was being used. While in 6.6, the
> PerFieldPostingsFormat's merge method is being used. I checked and it
> looks
> like this change went in Solr 6.3. So I replaced the 6.6 instance with
> 6.2.1
> and re-indexed all the data, and it is working very well, even with the
> settings I had initially used. /
>
> Is anyone else facing this issue or any fixes got released in future build
> for this ?
>
> Keep us posted
>
>
> Deepak Goel wrote
> > Please post the exact results. Many a times the high cpu utilisation may
> > be
> > a boon as it improves query response times
> >
> > On Tue, 17 Apr 2018, 13:55 mganeshs, 
>
> > mganeshs@
>
> >  wrote:
> >
> >> Regarding query times, we couldn't see big improvements. Both are more
> or
> >> less same.
> >>
> >> Our main worry is that, why CPU usage is so high in 6.5.1 and above ?
> >> What's
> >> going wrong ?
> >>
> >> Is any one else facing this sort of issue ? If yes, how to bring down
> the
> >> CPU usage? Is there any settings which we need to set ( not default one
> )
> >> in
> >> 6.5.1 ?
> >>
> >>
> >>
> >> --
> >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >>
>
>
> Deepak Goel wrote
> > Please post the exact results. Many a times the high cpu utilisation may
> > be
> > a boon as it improves query response times
> >
> > On Tue, 17 Apr 2018, 13:55 mganeshs, 
>
> > mganeshs@
>
> >  wrote:
> >
> >> Regarding query times, we couldn't see big improvements. Both are more
> or
> >> less same.
> >>
> >> Our main worry is that, why CPU usage is so high in 6.5.1 and above ?
> >> What's
> >> going wrong ?
> >>
> >> Is any one else facing this sort of issue ? If yes, how to bring down
> the
> >> CPU usage? Is there any settings which we need to set ( not default one
> )
> >> in
> >> 6.5.1 ?
> >>
> >>
> >>
> >> --
> >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >>
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Some performance questions....

2018-03-24 Thread Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Sat, Mar 24, 2018 at 6:03 AM, Rick Leir  wrote:

>
>
> Deep,
> What is the test so I can try it.
>
>
*The test goal now according to me is to check:*

'How does Solr scales up on a single server (with varying OS if possible -
Linux, Windows) at 25%, 50%, 75%, 100% utilisation?'

*The original question from the Author was:*

Lets says I have a dual CPU with a total of 8 cores and 24 GB RAM for my
Solr and some other stuff.

Would it be more beneficial to only run 1 instance of Solr with the
collection stored on 4 HD's in RAID 0?? Or Have several Virtual
Machines each running of its own HD, ie: Have 4 VM's running Solr?


> 75 or 90 ms .. is that the JVM startup time?
>

This time is the time taken by my code to create a 'Client Object' in Solr
on Windows environment


> Cheers -- Rick
> >>
> >>
> >I have stated the numbers which I found during my test. The best way to
> >verify them is for someone else to run the same test. Otherwise I don't
> >see
> >how we can verify the results
>
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>


Re: Some performance questions....

2018-03-24 Thread Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Sat, Mar 24, 2018 at 5:16 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 3/23/2018 11:31 AM, Deepak Goel wrote:
> > Do you have any specific questions about the benchmark setup?
>
> How many docs are in the Solr index?  How much disk space does it
> consume?  How much total memory is in the machine?  How much memory is
> allocated to Java heaps?  Is there any other software running besides
> the Solr server and the benchmark program?  If it's a virtual machine,
> do you know anything about how many virtual machines are on the physical
> hardware, and whether resources are oversubscribed on the physical
> hardware?
>
> > I have stated the numbers which I found during my test. The best way to
> > verify them is for someone else to run the same test. Otherwise I don't
> see
> > how we can verify the results
>
> You have provided a code fragment, not complete code that can be used to
> compile exactly what you're running.  There is no information about
> exactly what you're doing with JMeter.  There are no version numbers for
> any of the software that you're using.  When I look at what's available,
> I don't have enough information to replicate your test.
>
> Your code fragment has a hard-coded query in it.  Running the same query
> over and over won't provide meaningful results, and definitely shouldn't
> show an average query time of nearly 1.5 seconds.
>
>
Please check the section *Questions from ‘Around the World’* in the
following doc for answers to your questions:

*https://docs.google.com/document/d/1ZwyveG-Zjy7tbsvh9xjMug4bnIMRqKnNax3jh4GJlzM/edit?usp=sharing
<https://docs.google.com/document/d/1ZwyveG-Zjy7tbsvh9xjMug4bnIMRqKnNax3jh4GJlzM/edit?usp=sharing>*


Thanks,
> Shawn
>
>


Re: Some performance questions....

2018-03-24 Thread Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Sat, Mar 24, 2018 at 6:18 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 3/23/2018 1:13 PM, Deepak Goel wrote:
> > Yes I am now creating a client object only once. On Linux it has superb
> > results (performance improves by around two times). However on Windows it
> > has no improvement
> >
> > *SoftwareThroughput (/sec)Response Time (msec)Utilization (%CPU)UnTuned
> > (Windows)27.8142665UnTuned (Linux)34528091Partially Tuned with Shawn's
> > suggestions (Linux)56417290Partially Tuned with Shawn's suggestions
> > (Windows)28.11.10560*
>
> This information is unreadable.  All the whitespace between the columns
> is missing.
>
> Please check this document
https://docs.google.com/document/d/1ZwyveG-Zjy7tbsvh9xjMug4bnIMRqKnNax3jh4GJlzM/edit?usp=sharing


> Thanks,
> Shawn
>
>


Re: Some performance questions....

2018-03-24 Thread Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Sun, Mar 25, 2018 at 4:00 AM, Shawn Heisey <elyog...@elyograg.org> wrote:

> On 3/24/2018 1:25 PM, Deepak Goel wrote:
>
>> Please check the section *Questions from ‘Around the World’* in the
>> following doc for answers to your questions:
>>
>> *https://docs.google.com/document/d/1ZwyveG-Zjy7tbsvh9xjMug4
>> bnIMRqKnNax3jh4GJlzM/edit?usp=sharing
>>
>
> The document says that 80 percent of the time it's the same query and 20
> percent it's a different one.  But the code does not have any facility for
> changing the query, as far as I can see.  It appears to be always the same.
>
>
My first test was to test with static queries. Does Solr scale-up as we
increase the load of same query?

The second test would be to check with 'Different Queries'.

And then finally check with 80% similar queries and 20% different queries.


> If the query is always the same, or if it's the same 80 pecent of the
> time, I would expect response time on the vast majority of the queries to
> be about one to five milliseconds


Do you have any documented proof of the same (1 to 5ms)? Or is it an
educated guess


> , no matter how big the index is, but your document says it's 280 on
> Linux, and 1426 on Windows.
>
>
At peak loads on Linux, the response-time is 172ms. If I decrease the load
by half, the response time is around 50ms


> If all settings such as heap are at their defaults, then I suspect you may
> be running Solr with a heap size that's FAR too small.  If this is what's
> happening, then the JVM is going to be spending a very large amount of time
> performing garbage collection, instead of running the application.
>
>
I don't think the Jvm heap is a problem. But I will bump it up and test
again


> The default heap size when starting Solr using the included scripts is 512
> megabytes.  This is VERY small, to ensure that Solr will successfully start
> on any system.  Nearly all users must increase the heap size before they go
> to production.  I would set it to 2GB for your index.  If starting Solr
> with the bin\solr or bin/solr command, add a "-m 2g" parameter to the start
> command. 2GB should be a lot more than Solr needs to handle that index, but
> it isn't a HUGE amount.  Be aware that you may need to adjust the heap size
> for your Tomcat installation, and possibly JMeter as well, to be sure that
> those processes are allocating reasonable amounts of memory.


I dont think Tomcat and Jmeter are a bottleneck. But I will bump up the
heap size of them too


> I do not know what the recommended sizes for those programs will be, you
> would need to ask those communities.
>
>
The problem I am facing: On Windows, the tps is 28 while on Linix, the tps
is 564 (All the configuration and hardware is same). The other problem is,
Even if there is plenty of hardware available, the Windows environment does
not scale. And I wonder why is this so?


> Thanks,
> Shawn
>
>


Re: Some performance questions....

2018-03-24 Thread Deepak Goel
On 25 Mar 2018 6:49 am, "Shawn Heisey" <apa...@elyograg.org> wrote:

On 3/24/2018 6:21 PM, Deepak Goel wrote:

> Do you have any documented proof of the same (1 to 5ms)? Or is it an
> educated guess
>

Just now, I did a test.  I did a "*:*" query (all docs), the QTime was 194
milliseconds, numFound was 188635489.  Then I did the exact same query
again.  QTime dropped to 39 milliseconds.

Next, I did a query for "banjo" ... something I don't think a lot of people
are searching for.  The QTime on this was 2395 milliseconds, numFound was
737280.  Running the same query again, QTime dropped to 11 milliseconds.


I believe you ran this query with a 1 user load. Or was it a multi-user
load test? If it was multi-user load test, how many users did you test for?
And what were the utilisations and tps?


My index is big and distributed.  Your index is very small, and likely
contained in one core, so it should have far better performance than my
index.


I dont think Tomcat and Jmeter are a bottleneck. But I will bump up the
> heap size of them too
>

I was actually thinking that if these are run *without* a max heap setting,
that you might want to explicitly set the heap size so that it's not too
big.  Those programs probably don't need a very big heap at all.  If Java
were to choose a big default heap size, the server might start swapping,
and that would REALLY make performance bad, especially on Windows.


The problem I am facing: On Windows, the tps is 28 while on Linix, the tps
> is 564 (All the configuration and hardware is same). The other problem is,
> Even if there is plenty of hardware available, the Windows environment does
> not scale. And I wonder why is this so?
>

My first guess would be the 512MB heap, possibly causing even more problems
on Windows.

And then there's my general bias against Microsoft.  I have witnessed
deficiencies in their memory management, their filesystem performance, and
other things.  Linux just does a better job in almost every category that I
care about for a server.

Which version of Windows are you running it on?  You would only want to do
a test like this on a Server OS, and I'd hope that it's at least Server
2008.  The client operating systems do not handle server programs very
well.  And it should be a 64-bit OS, with 64-bit Java.

Thanks,
Shawn


Re: Some performance questions....

2018-03-25 Thread Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Sun, Mar 25, 2018 at 2:24 PM, Shawn Heisey  wrote:

> On 3/25/2018 1:45 AM, Shawn Heisey wrote:
>
>> I have written a little test program that can pound the system harder,
>> need a little more time to gather what I learned with it.
>>
>
> Here's the code and three results with different threadcounts:
>
> https://gist.github.com/elyograg/abedf4ae28467059e46781f7d474f379
>
> I ran the program several times while writing it.  Once I had it finished,
> I did the 20 thread run first, then the 100 thread run, and then the 200
> thread run.  Gist re-ordered my files, wasn't expecting that.
>
>
$ Why is the 'qps' not increasing with increase in threads? (If I
understand the qps parameter right?)

$ Is it possible to run with 10 & 5 & 2 threads?

$ What were the server utilisation (CPU, Memory) when you ran the test?

$ The 'query median' increases from 35 to 470 as you increase threads from
20 to 200 (You had mentioned earlier that QTime for Banjo query was 11 when
you had hit it the second time around)

$ Can you please give Linux server configuration if possible?


> It was executed inside eclipse on a Windows 7 system.  The Solr servers
> are running Linux.  This is a distributed index with 7 total shards running
> on two servers.  The "shards" parameter is defined on the server side in
> the 'ncmain' core, which has an empty index.  The servers are NOT running
> in SolrCloud mode.
>
> As you can see in the code, I was using exactly the same query every time
> -- that "banjo" query that I tried earlier.
>
> I have to try and remember how to build a simple program like this on the
> commandline before I can try it in Linux.  I don't know if it would see a
> performance improvement running on Linux.
>
> Thanks,
> Shawn
>
>


Re: Some performance questions....

2018-03-25 Thread Deepak Goel
Some observations:

*#* The CPU Load on idxa1 never crosses above 91% mark mostly even if you
increase the load (by increasing the number of threads). This is similar to
my environment (I can never cross 90% on Linux even if I increase the load.
For Windows I can never cross 65% for some reason)

*#* Similarly the CPU Load on idxa2 never crosses 50% (I guess this follows
from the above point)

*#* Your system saturates at 10 threads (The qps hits the highest mark at
this load). Increasing the load further (number of threads - 20, 100, 200)
only worsens the response time, while the qps remains the same

*#* The Query-Time is anywhere between 25-100ms. For 200 threads, the
Query-Time is between 500-1400ms. This is for a load of 'Static-Query'.

A 'Dynamic-Query' load would only worsen the Query-Time (It will also
probably bring down the qps and max-cpu-utilisation)

*#* The author has a similar hardware configuration as yours (idxa1). The
author has not specified the OS though.

If it is Windows, then I would believe it might be a good idea to have 2
VM's on his box

If it is Linux, it might be a good idea to decide once someone does the
test with Dynamic-Query Load. If the author has a load of Static-Query,
then having one VM on his box should be fine as 90% of CPU resources can be
consumed (However he would loose on Reliability, Availability as compared
to 2 VM's)

Some other points:

*@* I would have liked to have the vmstat information for 10,5,7,8 threads

*@* Also if you could run the test for 7 and 8 threads (Because at 10
threads system saturates and at 5 threads the load is less)

*@* Can you please also do a Load-Test for Dynamic-Queries with 5-10
threads (I am sorry for asking too much. You can please ignore these
demands if it is too much). I will do the same on my environment



Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Sun, Mar 25, 2018 at 9:45 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 3/25/2018 7:15 AM, Deepak Goel wrote:
>
>> $ Why is the 'qps' not increasing with increase in threads? (If I
>> understand the qps parameter right?)
>>
>
> Likely because I sent all these queries to a single copy of the index.  We
> only have two copies of the index in production, plus a third copy on a dev
> server running a newer version of Solr. I sent the queries from the test
> program to the production server pair that's designated "standby" -- not
> receiving queries unless the other pair is down.
>
> Our Solr servers do not handle a high query load.  It's usually less than
> two queries per second.
>
> Handling a very high query load requires load balancing to multiple copies
> of the index (replicas in SolrCloud terminology). We don't need that, so we
> don't have a bunch of copies.  The only reason we have two copies is so we
> can handle hardware failure gracefully.  I bypassed the load balancer for
> these tests.
>
> $ Is it possible to run with 10 & 5 & 2 threads?
>>
>
> Sure.
>
> I have updated the gist with those results.
>
> https://gist.github.com/elyograg/abedf4ae28467059e46781f7d474f379
>
> $ What were the server utilisation (CPU, Memory) when you ran the test?
>>
>
> I actually never looked when I was running the tests before.  I ran
> additional tests so I could gather that data.  The updated gist has vmstat
> information (while running a 20 thread test, and while running a 200 thread
> test) for the server side. The server named idxa1 has a higher CPU load
> because it is aggregating the shard data and replying to the query, in
> addition to serving three out of the seven shards.  The server named idxa2
> has four shards.  The extra shard on idxa2 is very small - a little over
> 321000 docs, a little over 500MB disk used.  This is where new docs are
> written.
>
> The CPU load on idxa2 is similar for both thread levels.  I this is
> because all queries are served from cache.  But idxa1 shows a higher load,
> because even when the cache is used, that server must still aggregate the
> shard data (which was pulled from cache) and create responses.  The
> aggregation is not cached, because Solr has no way to know that what it is
> receiving from the shards is cached data.
>
> Here's the benchmark output from the 200 thread test when I was getting
> the CPU information:
>
> query count: 20
> elapsed count: 20
> query median: 488.0
> elapsed median: 500.0
> query 75th: 674.0
> elapsed 75th: 686.0
> query 95th: 1006.0
> elapsed 95th: 1018.0
> query 99th: 1283.01
> elapsed 99th: 1299.0
> total time in seconds: 542
> numThreads: 200
> queries per thread: 1000
&

Re: Some performance questions....

2018-03-16 Thread Deepak Goel
> That benchmark is on Windows, so not interesting for most of us.

I guess I must have missed this in the author's question. Did he describe
his OS?

Also other applications scale well on Windows. Why would Solr be different?
The Solr page does not say about any performance limits on windows
(shouldn't they say that upfront in that case!)

https://lucene.apache.org/solr/guide/6_6/installing-solr.html#got-java
(You can install Solr in any system where a suitable Java Runtime
Environment (JRE) is available, as detailed below. Currently this includes
Linux, OS X, and Microsoft Windows.)

> Windows has very different handling for threads, memory, and files
compared to Unix. I had to do a lot of Windows-specific tuning for > >
Ultraseek Server to get decent performance. For example, merge speed was
terrible unless I opened files with a Windows-specific > > > >caching hint.





Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"
On Fri, Mar 16, 2018 at 9:43 PM, Walter Underwood <wun...@wunderwood.org>
wrote:

> On Mar 16, 2018, at 6:38 AM, Deepak Goel <deic...@gmail.com> wrote:
> >
> > I did a performance study of Solr a while back. And I found that it does
> > not scale beyond a particular point on a single machine (could be due to
> > the way its coded). Hence multiple instances might make sense.
> >
> > https://docs.google.com/document/d/1kUqEcZl3NhOo6SLklo5Icg3fMnn9O
> tLY_lwnc6wbXus/edit?usp=sharing <https://docs.google.com/document/d/
> 1kUqEcZl3NhOo6SLklo5Icg3fMnn9OtLY_lwnc6wbXus/edit?usp=sharing>
> >
> > ***Deepak***
>
> That benchmark is on Windows, so not interesting for most of us.
>
> Windows has very different handling for threads, memory, and files
> compared to Unix. I had to do a lot of Windows-specific tuning for
> Ultraseek Server to get decent performance. For example, merge speed was
> terrible unless I opened files with a Windows-specific caching hint.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>


Re: Some performance questions....

2018-03-16 Thread Deepak Goel
> On Mar 16, 2018, at 6:26 AM, Deepak Goel <deic...@gmail.com> wrote:
>
> I would try multiple Solr instances rather a single Solr instance (it
> definitely will give a performance boost)
> I would avoid multiple Solr instances on single machine. I can use all 36
cores on our servers with one Solr process.

Is your load scaling linearly? Can you please post the results?




Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Fri, Mar 16, 2018 at 9:39 PM, Walter Underwood <wun...@wunderwood.org>
wrote:

> > On Mar 16, 2018, at 6:26 AM, Deepak Goel <deic...@gmail.com> wrote:
> >
> > I would try multiple Solr instances rather a single Solr instance (it
> > definitely will give a performance boost)
>
>
> I would avoid multiple Solr instances on single machine. I can use all 36
> cores on our servers with one Solr process.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>


Re: Some performance questions....

2018-03-16 Thread Deepak Goel
On Fri, Mar 16, 2018 at 6:03 PM, Shawn Heisey  wrote:

> On 3/15/2018 6:34 AM, BlackIce wrote:
>
>> However the main app that will be
>> running is more or less a single threated app which takes advantage when
>> run under several instances, ie: parallelism, so I thought, since I'm at
>> it
>> I may give solr a few instances as well
>>
>
> ***Deepak***

I did a performance study of Solr a while back. And I found that it does
not scale beyond a particular point on a single machine (could be due to
the way its coded). Hence multiple instances might make sense.

https://docs.google.com/document/d/1kUqEcZl3NhOo6SLklo5Icg3fMnn9OtLY_lwnc6wbXus/edit?usp=sharing

***Deepak***



> Solr is a fully threaded app, capable of doing LOTS of things at the same
> time, without multiple instances.
>
> Thnx for the Heap pointer.. I've read, from some Professor.. that Solr
>> actually is more efficient with a very small Heap and to have everything
>> mapped to virtual memory... Which brings me to the next question.. is the
>> Virtual memory mapping done by the OS or Solar? Does the Virtual memory
>> reside on the OS HDD? Or on the Solr HDD?.. and if the Virtual memory
>> mapping is done on the OS HDD, wouldn't it be beneficial to run the OS off
>> a SSD?
>>
>
> ***Deepak***
If you have a small RAM (I am assuming that is what you mean by a small
heap), then OS will do swapping or demand paging to manage your memory
requirements. SSD will help. However it might be better to have a larger
RAM than rely on SSD.
***Deepak***

> There appears to be some confusion here.
>
> The virtual memory doesn't reside on ANY hard drive, unless you've REALLY
> configured the system badly and the system starts using swap space.  If the
> system starts using swap, performance is going to be terrible, no matter
> how fast the disk where swap resides is.
>
> The "mapping to virtual memory" feature is something the operating system
> does.  Lucene/Solr utilizes MMAP code in Java, which then turns around and
> uses MMAP functionality provided by the OS.
>
> At that point, that file can be accessed by the application as if it were
> a very large block of memory.  Mapping the file doesn't immediately use any
> memory at all.  The OS manages the access to the file.  If the part of the
> file that is being accessed has not been accessed before, then the OS will
> read the data off the disk, place it into the OS disk cache, and provide it
> to whatever requested it.  If it has been accessed before and is still in
> the disk cache, then it won't read the disk, it will just provide the data
> from the cache.  Getting most data from cache is *required* for good Solr
> performance.
>
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Running with your indexes on SSD might indeed help performance, and
> regardless of anything that's going on, WILL help performance in the short
> term, when you first turn the machine on.  But if it also helps with
> long-term query performance, then chances are that the machine doesn't have
> enough memory.When Solr servers are sized correctly, running on SSD is
> typically not going to make a big difference, unless the machine does a lot
> more indexing than querying.
>
> For now.. my FEELING is to run one Solr instance on this particular
>> machine.. by the time the RAM is outgrown add another machine and so
>> forth...
>>
>
> Any plans you have for a growth strategy with multiple Solr instances are
> extremely likely to still be possible with only one instance, with very
> little change.
>
> Thanks,
> Shawn







Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Fri, Mar 16, 2018 at 6:03 PM, Shawn Heisey  wrote:

> On 3/15/2018 6:34 AM, BlackIce wrote:
>
>> However the main app that will be
>> running is more or less a single threated app which takes advantage when
>> run under several instances, ie: parallelism, so I thought, since I'm at
>> it
>> I may give solr a few instances as well
>>
>
> Solr is a fully threaded app, capable of doing LOTS of things at the same
> time, without multiple instances.
>
> Thnx for the Heap pointer.. I've read, from some Professor.. that Solr
>> actually is more efficient with a very small Heap and to have everything
>> mapped to virtual memory... Which brings me to the next question.. is the
>> Virtual memory mapping done by the OS or Solar? Does the Virtual memory
>> reside on the OS HDD? Or on the Solr HDD?.. and if the Virtual memory
>> mapping is done on the OS HDD, wouldn't it be beneficial to run the OS off
>> a SSD?
>>
>
> There appears to be some confusion here.
>
> The virtual memory doesn't reside on ANY hard drive, unless you've REALLY
> configured the system badly and the system starts using swap space.  If the
> system starts using swap, 

Re: Some performance questions....

2018-03-16 Thread Deepak Goel
On Sat, Mar 17, 2018 at 1:06 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 3/16/2018 7:38 AM, Deepak Goel wrote:
> > I did a performance study of Solr a while back. And I found that it does
> > not scale beyond a particular point on a single machine (could be due to
> > the way its coded). Hence multiple instances might make sense.
> >
> > https://docs.google.com/document/d/1kUqEcZl3NhOo6SLklo5Icg3fMnn9O
> tLY_lwnc6wbXus/edit?usp=sharing
>
> How did you *use* that code that you've shown?  That is not apparent (at
> least to me) from the document.
>
> If every usage of the SolrJ code went through ALL of the code you've
> shown, then it's not done well.  It appears that you're creating and
> closing a client object with every query.  This will be VERY inefficient.
>
> The client object should be created during an initialization step, and
> then passed to the benchmark step to be used there.  One client object
> can be used by many threads.


I wanted to test how many max connections can Solr handle concurrently.
Also I would have to implement an 'connection pooling' of the client-object
connections rather than a single connection thread

However a single client object with thousands of queries coming in would
surely become a bottleneck. I can test this scenario too.

Very likely the ES client works the same,
> but you'd need to ask them to be sure.
>
> That code seems to be doing an identical query on every run.  If that's
> what's happening, it's not a good indicator of performance.  Running the
> same query over and over will show better performance than you can
> expect from a real-world query load

What evidence do you see that Solr isn't scaling like you expect?
>
> The problem is the max throughput which I can get on the machine is around
28 tps, even though I increase the load further & only 65% CPU is utilised
(there is still 35% which is not being used). This clearly indicates the
software is a problem as there is enough hardware resources.

Also very soon I would have a Linux environment with me, so I can conduct
the test in the document on Linux too (for the users interested in Linux
and not Windows)


> Thanks,
> Shawn
>
>


Re: Some performance questions....

2018-03-16 Thread Deepak Goel
On Sat, Mar 17, 2018 at 3:11 AM, Walter Underwood <wun...@wunderwood.org>
wrote:

> > On Mar 16, 2018, at 1:21 PM, Deepak Goel <deic...@gmail.com> wrote:
> >
> > However a single client object with thousands of queries coming in would
> > surely become a bottleneck. I can test this scenario too.
>
> No it isn’t. The single client object is thread-safe and manages a pool of
> connections.
>
> Your benchmark is probably the bottleneck. I have no problem driving 36
> CPUs to beyond
> 65% utilization with a benchmark.
>
>
Can you please post results of your test?

Please tell us the tps at 25%, 50%, 75%, 100% of your CPU resource


> Using one client object is not a scenario. It is how SolrJ was designed to
> be used.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>


Re: Some performance questions....

2018-03-16 Thread Deepak Goel
On Sat, Mar 17, 2018 at 2:56 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 3/16/2018 2:21 PM, Deepak Goel wrote:
> > I wanted to test how many max connections can Solr handle concurrently.
> > Also I would have to implement an 'connection pooling' of the
> client-object
> > connections rather than a single connection thread
> >
> > However a single client object with thousands of queries coming in would
> > surely become a bottleneck. I can test this scenario too.
>
> Handling thousands of simultaneous queries is NOT something you can
> expect a single Solr server to do.  It's not going to happen.  It
> wouldn't happen with ES, either.  Handling that much load requires load
> balancing to a LOT of servers.  The server would much more of a
> bottleneck than the client.
>

The problem is not server in my case. The server has hardware resources.
It's the software which is a problem.


>
> > The problem is the max throughput which I can get on the machine is
> around
> > 28 tps, even though I increase the load further & only 65% CPU is
> utilised
> > (there is still 35% which is not being used). This clearly indicates the
> > software is a problem as there is enough hardware resources.
>
> If your code is creating a client object before every single query, that
> could be part of the issue.  The benchmark code should be using the same
> client for all requests.  I really don't know how long it takes to
> create HttpSolrClient objects, but I don't imagine that it's super-fast.
>
>
It is taking less than 100ms to create a HttpSolrClient Object


> What version of SolrJ were you using?
>

Solr 7.2.0


> Depending on the SolrJ version you may need to create the client with a
> custom HttpClient object in order to allow it to handle plenty of
> threads.  This is how I create client objects in my SolrJ code:
>
>   RequestConfig rc = RequestConfig.custom().setConnectTimeout(2000)
> .setSocketTimeout(6).build();
>   CloseableHttpClient httpClient =
> HttpClients.custom().setDefaultRequestConfig(rc).setMaxConnPerRoute(1024)
> .setMaxConnTotal(4096).disableAutomaticRetries().build();
>
>   SolrClient sc = new HttpSolrClient.Builder().withBaseSolrUrl(solrUrl)
> .withHttpClient(httpClient).build();
>
>
I can give the above configuration a spin and test if the results improve


> Thanks,
> Shawn
>
>


Re: Some performance questions....

2018-03-21 Thread Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Mon, Mar 19, 2018 at 2:40 AM, Walter Underwood <wun...@wunderwood.org>
wrote:

> > On Mar 17, 2018, at 3:23 AM, Deepak Goel <deic...@gmail.com> wrote:
> >
> > Sorry for being rude. But the ' results ' please, not the ' road to the
> > results '
>
> We have 15 different search collections, all different sizes and all with
> different kinds of queries. Here are the two major ones.
>
> 22 million docs
> 32 server Solr Cloud cluster, EC2 c4.8xlarge instances (36 CPU, 59 GB RAM)
> Solr 6.6.2
> 4 shards
> 24,000 requests/minute
> 95th percentile query response time 5 to 7 seconds
>
> 250,000 docs
> 4 server Solr master/slave cluster, EC2 c4.4xlarge (16 CPU, 30 GB RAM)
> Solr 4.10.4
> 60,000 requests/minute
> 95th percentile 100 ms
>
> This does not help at all. If you look at the author's question, i think
it is about a single server. You will have to post your results (25%CPU,
50%CPU, 75%CPU, 100%CPU) for a single server (how does the server scale
with increase in load)


> That should make everything crystal clear.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>

<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail>
Virus-free.
www.avg.com
<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: Some performance questions....

2018-03-21 Thread Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Sat, Mar 17, 2018 at 2:56 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 3/16/2018 2:21 PM, Deepak Goel wrote:
> > I wanted to test how many max connections can Solr handle concurrently.
> > Also I would have to implement an 'connection pooling' of the
> client-object
> > connections rather than a single connection thread
> >
> > However a single client object with thousands of queries coming in would
> > surely become a bottleneck. I can test this scenario too.
>
> Handling thousands of simultaneous queries is NOT something you can
> expect a single Solr server to do.  It's not going to happen.  It
> wouldn't happen with ES, either.  Handling that much load requires load
> balancing to a LOT of servers.  The server would much more of a
> bottleneck than the client.
>
> > The problem is the max throughput which I can get on the machine is
> around
> > 28 tps, even though I increase the load further & only 65% CPU is
> utilised
> > (there is still 35% which is not being used). This clearly indicates the
> > software is a problem as there is enough hardware resources.
>
> If your code is creating a client object before every single query, that
> could be part of the issue.  The benchmark code should be using the same
> client for all requests.  I really don't know how long it takes to
> create HttpSolrClient objects, but I don't imagine that it's super-fast.
>
> What version of SolrJ were you using?
>
> Depending on the SolrJ version you may need to create the client with a
> custom HttpClient object in order to allow it to handle plenty of
> threads.  This is how I create client objects in my SolrJ code:
>
>   RequestConfig rc = RequestConfig.custom().setConnectTimeout(2000)
> .setSocketTimeout(6).build();
>   CloseableHttpClient httpClient =
> HttpClients.custom().setDefaultRequestConfig(rc).setMaxConnPerRoute(1024)
> .setMaxConnTotal(4096).disableAutomaticRetries().build();
>
>   SolrClient sc = new HttpSolrClient.Builder().withBaseSolrUrl(solrUrl)
> .withHttpClient(httpClient).build();
>
> I tried the above suggestion. The throughput and utilisation remain the
same (they dont increase even if I increase the load). The response time
comes down.







*SoftwareThroughput (/sec)Response Time (msec)Utilization (%CPU)UnTuned
(Windows)27.8142665UnTuned (Linux)Partially Tuned (Linux)Partially Tuned
(Windows)28.11.10560 *I am going to give your suggestion a spin on Linux
next (This might take a day or two)



> Thanks,
> Shawn
>
>

<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail>
Virus-free.
www.avg.com
<http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: Some performance questions....

2018-03-23 Thread Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Fri, Mar 23, 2018 at 11:38 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 3/23/2018 11:21 AM, Deepak Goel wrote:
> >> I tried the above suggestion. The throughput and utilisation remain the
> >> same (they dont increase even if I increase the load). The response time
> >> comes down.
> >>
>
> Are you still creating a new client object for every query?  Changing
> how the client object is created won't improve anything if you're still
> making a new one every time.
>
> You're going to need to move the client creation somewhere else in your
> code that only gets run once at startup, and then use the already-built
> client object in the code that does the query.  The different way of
> creating the client object that I gave you will ensure that it is
> actually capable of running concurrently with many threads. (With some
> older versions, this is not guaranteed)
>
>
Yes I am now creating a client object only once. On Linux it has superb
results (performance improves by around two times). However on Windows it
has no improvement


*SoftwareThroughput (/sec)Response Time (msec)Utilization (%CPU)UnTuned
(Windows)27.8142665UnTuned (Linux)34528091Partially Tuned with Shawn's
suggestions (Linux)56417290Partially Tuned with Shawn's suggestions
(Windows)28.11.10560*





Thanks,
> Shawn
>
>
>
>


Re: Some performance questions....

2018-03-16 Thread Deepak Goel
>I think there is no benefit in having multiple Solr instances on a single
>server, unless the heap memory required by the JVM is too big.
Deepak***
I would try multiple Solr instances rather a single Solr instance (it
definitely will give a performance boost)
Deepak***
>And remember that this has relatively to do with the index size ( inverted
>index is memory mapped OFF heap and docValues as well).
>On the other hand of course Apache Solr uses plenty of JVM heap memory as
>well ( caches, temporary data structures during indexing, ect ect)

> Deepak:
>
> Well its kinda a given that when running ANYTHING under a VM you have an
> overhead..

>***Deepak***
>You mean you are assuming without any facts (performance benchmark with n
>without VM)
 >***Deepak***
>I think Shawn detailed this quite extensively, I am no sys admin or OS
>expert, but there is no need of benchmarks and I don't even understand your
>doubts.
>In Information technology anytime you add additional layers of software you
>need adapters which means additional instructions executed.
>It is obvious  that having :
>metal -> OS -> APP is cheaper instruction wise then
>metal -> OS -> VM -> APP
>The APP will execute instruction in the VM that will be responsible to
>translate those instructions for the underlining OS.
Deepak***
I had past experience with VM's. They absolutely do not take any overheads.
Since we have conflicting opinions, it is best to benchmark it yourself
Deepak***
>Going direct you skip one passage.
>you can think about this when you emulate different OS, is it cheaper to
run
>windows on a machine directly to execute windows applications or run a
>Windows VM on top of another OS to execute windows applications ?









Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Thu, Mar 15, 2018 at 9:43 PM, Alessandro Benedetti 
wrote:

> *Single Solr Instance VS Multiple Solr instances on Single Server
> *
>
> I think there is no benefit in having multiple Solr instances on a single
> server, unless the heap memory required by the JVM is too big.
> And remember that this has relatively to do with the index size ( inverted
> index is memory mapped OFF heap and docValues as well).
> On the other hand of course Apache Solr uses plenty of JVM heap memory as
> well ( caches, temporary data structures during indexing, ect ect)
>
> > Deepak:
> >
> > Well its kinda a given that when running ANYTHING under a VM you have an
> > overhead..
>
> ***Deepak***
> You mean you are assuming without any facts (performance benchmark with n
> without VM)
>  ***Deepak***
> I think Shawn detailed this quite extensively, I am no sys admin or OS
> expert, but there is no need of benchmarks and I don't even understand your
> doubts.
> In Information technology anytime you add additional layers of software you
> need adapters which means additional instructions executed.
> It is obvious  that having :
> metal -> OS -> APP is cheaper instruction wise then
> metal -> OS -> VM -> APP
> The APP will execute instruction in the VM that will be responsible to
> translate those instructions for the underlining OS.
> Going direct you skip one passage.
> you can think about this when you emulate different OS, is it cheaper to
> run
> windows on a machine directly to execute windows applications or run a
> Windows VM on top of another OS to execute windows applications ?
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Some performance questions....

2018-03-17 Thread Deepak Goel
On 17 Mar 2018 05:19, "Walter Underwood" <wun...@wunderwood.org> wrote:

> On Mar 16, 2018, at 3:26 PM, Deepak Goel <deic...@gmail.com> wrote:
>
> Can you please post results of your test?
>
> Please tell us the tps at 25%, 50%, 75%, 100% of your CPU resource


I could, but it probably would not be useful for your documents or your
queries.

We have 22 million homework problems. Our queries are often hundreds of
words long,
because students copy and paste entire problems. After pre-processing, the
average query
is still 25 words.

For load benchmarking, I use access logs from production. I typically
gather over a half-million
lines of log. Using production logs means that queries have the same
statistical distribution
as prod, so the cache hit rates are reasonable.

Before each benchmark, I restart all the Solr instances to clear the
caches. Then the first part
of the query log is used to warm the caches, typically about 4000 queries.

After that, the measured benchmark run starts. This uses JMeter with
100-500 threads. Each
thread is configured with a constant throughput timer so a constant load is
offered. Test run
one or two hours. Recently, I ran a test with a rate of 1000
requests/minute for one hour.

During the benchmark, I monitor the CPU usage. Our systems are configured
with enough RAM
so that disk is not accessed for search indexes. If the CPU goes over
75-80%, there is congestion
and queries will slow down. Also, if the run queue (load average) increases
over the number of
CPUs, there will be congestion.

After the benchmark run, the JMeter log is analyzed to report response time
percentiles for
each Solr request handler.


Sorry for being rude. But the ' results ' please, not the ' road to the
results '


wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


Re: Some performance questions....

2018-03-23 Thread Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Thu, Mar 22, 2018 at 1:25 AM, Deepak Goel <deic...@gmail.com> wrote:

>
>
>
>
> Deepak
> "Please stop cruelty to Animals, help by becoming a Vegan"
> +91 73500 12833
> deic...@gmail.com
>
> Facebook: https://www.facebook.com/deicool
> LinkedIn: www.linkedin.com/in/deicool
>
> "Plant a Tree, Go Green"
>
> On Sat, Mar 17, 2018 at 2:56 AM, Shawn Heisey <apa...@elyograg.org> wrote:
>
>> On 3/16/2018 2:21 PM, Deepak Goel wrote:
>> > I wanted to test how many max connections can Solr handle concurrently.
>> > Also I would have to implement an 'connection pooling' of the
>> client-object
>> > connections rather than a single connection thread
>> >
>> > However a single client object with thousands of queries coming in would
>> > surely become a bottleneck. I can test this scenario too.
>>
>> Handling thousands of simultaneous queries is NOT something you can
>> expect a single Solr server to do.  It's not going to happen.  It
>> wouldn't happen with ES, either.  Handling that much load requires load
>> balancing to a LOT of servers.  The server would much more of a
>> bottleneck than the client.
>>
>> > The problem is the max throughput which I can get on the machine is
>> around
>> > 28 tps, even though I increase the load further & only 65% CPU is
>> utilised
>> > (there is still 35% which is not being used). This clearly indicates the
>> > software is a problem as there is enough hardware resources.
>>
>> If your code is creating a client object before every single query, that
>> could be part of the issue.  The benchmark code should be using the same
>> client for all requests.  I really don't know how long it takes to
>> create HttpSolrClient objects, but I don't imagine that it's super-fast.
>>
>> What version of SolrJ were you using?
>>
>> Depending on the SolrJ version you may need to create the client with a
>> custom HttpClient object in order to allow it to handle plenty of
>> threads.  This is how I create client objects in my SolrJ code:
>>
>>   RequestConfig rc = RequestConfig.custom().setConnectTimeout(2000)
>> .setSocketTimeout(6).build();
>>   CloseableHttpClient httpClient =
>> HttpClients.custom().setDefaultRequestConfig(rc).setMaxConnPerRoute(1024)
>> .setMaxConnTotal(4096).disableAutomaticRetries().build();
>>
>>   SolrClient sc = new HttpSolrClient.Builder().withBaseSolrUrl(solrUrl)
>> .withHttpClient(httpClient).build();
>>
>> I tried the above suggestion. The throughput and utilisation remain the
> same (they dont increase even if I increase the load). The response time
> comes down.
>
>
>
>
>
>
>
> *SoftwareThroughput (/sec)Response Time (msec)Utilization (%CPU)UnTuned
> (Windows)27.8142665UnTuned (Linux)Partially Tuned (Linux)Partially Tuned
> (Windows)28.11.10560 *I am going to give your suggestion a spin on Linux
> next (This might take a day or two)
>
>
>

This is how the Linux results look like


*SoftwareThroughput (/sec)Response Time (msec)Utilization (%CPU)UnTuned
(Windows)27.8142665UnTuned (Linux)34528091Partially Tuned
(Linux)56417290Partially Tuned (Windows)28.11.10560*



> Thanks,
>> Shawn
>>
>>
>
>
> <http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail>
>  Virus-free.
> www.avg.com
> <http://www.avg.com/email-signature?utm_medium=email_source=link_campaign=sig-email_content=webmail>
> <#m_7316059216213330048_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>


Re: Some performance questions....

2018-03-23 Thread Deepak Goel
Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Tue, Mar 20, 2018 at 3:32 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 3/16/2018 4:24 PM, Deepak Goel wrote:
> > It is taking less than 100ms to create a HttpSolrClient Object
>
> "Less than 100ms" is vague.  Let's say by that you mean it takes at
> least 50 milliseconds.  This is a lot slower than I expected it to be,
> but if you've measured it, I'll accept that.
>
>
The results were a bit volatile from test to test. It used to take
sometimes 75ms and sometimes around 95ms. So I have stated the upper-bound
on the results (100ms)

(Sorry for being rude) However you don't need to accept my results. May I
suggest you to measure it yourself (or anyone else can also do it)


> If every single thread you're running has to spend 50 milliseconds or
> more creating a client before it can actually send a request, then the
> application is going to be spending a lot of time NOT sending requests,
> but creating and destroying clients.  (You didn't indicate how long the
> close() takes)
>

I did implement your solution (On windows it does not make a difference, on
Linux it does by at-least a margin of twice)


>
> Your numbers indicated a response time of 1426 milliseconds for Solr.
> If this is an average or a median, then that is not a fast query.  These
> numbers make me question the entire benchmark setup.


Do you have any specific questions about the benchmark setup?


> Based on the code
> provided, I don't see how the numbers can be that bad, even if we assume
> that up to 100 milliseconds is spent creating every client.
>
>
I have stated the numbers which I found during my test. The best way to
verify them is for someone else to run the same test. Otherwise I don't see
how we can verify the results


> Because the ES numbers are so much worse than the Solr numbers, I'm
> betting that creating an ES client is even less efficient than creating
> a Solr client.  If that's the case, I do not know why ... maybe that
> client runs through more startup checks than a Solr client does.
> Creation time for the client shouldn't matter, since it should only be
> done once for every benchmark run, and the time spent creating the
> client shouldn't be counted in the benchmark numbers.
>
>
I can check up & optimise the ES code. However it will take me a couple of
weeks on that


> Thanks,
> Shawn
>
>


Re: Slow Response for less volume

2018-10-24 Thread Deepak Goel
Are you getting errors in Jmeter?

On Wed, 24 Oct 2018, 21:49 Amjad Khan,  wrote:

> Hi,
>
> We recently moved to Solr Cloud (Google) with 4 nodes and have very
> limited number of data.
>
> We are facing very weird issue here, solr cluster response time for query
> is high when we have less number of hit and the moment we run our test to
> hit the solr cluster hard we see better response in 10ms.
>
> Any clue will be appreciated.
>
> Thanks


Re: searching is slow while adding document each time

2018-10-28 Thread Deepak Goel
What are your hardware utilisations (cpu, memory, disk, network)?

I think you might have to tune lucene too

On Wed, 26 Sep 2018, 14:33 Mugeesh Husain,  wrote:

> Hi,
>
> We are running 3 node solr cloud(4.4) in our production infrastructure, We
> recently moved our SOLR server host softlayer to digital ocean server with
> same configuration as production.
>
> Now we are facing some slowness in the searcher when we index document,
> when
> we stop indexing then searches is fine, while adding document then it
> become
> slow. one of solr server we are indexing other 2 for searching the request.
>
>
> I am just wondering what was the reason searches become slow while indexing
> even we are using same configuration as we had in prod?
>
> at the time we are pushing 500 document at a time, this processing is
> continuously running(adding & deleting)
>
> these are the indexing logs
>
> 65497339 [http-apr-8980-exec-45] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> path=/update
> params={distrib.from=
> http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
> }
> {add=[E4751FCCE977BAC7 (1612655281518411776), 8E712AD1BE76AB63
> (1612655281527848960), 789AA5D0FB149A37 (1612655281538334720),
> B4F3AA526506F6B7 (1612655281553014784), A9F29F556F6CD1C8
> (1612655281566646272), 8D15813305BF7417 (1612655281584472064),
> DD13CFA12973E85B (1612655281596006400), 3C93BDBA5DFDE3B3
> (1612655281613832192), 96981A0785BFC9BF (1612655281625366528),
> D1E52788A466E484 (1612655281636900864)]} 0 9
> 65497459 [http-apr-8980-exec-22] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> path=/update
> params={distrib.from=
> http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
> }
> {add=[D8AA2E196967D241 (1612655281649483776), E73420772E3235B7
> (1612655281666260992), DFDCF1F8325A3EF6 (1612655281680941056),
> 1B10EF90E7C3695F (1612655281689329664), 51CBD7F59644A718
> (1612655281699815424), 1D31EF403AF13E04 (1612655281714495488),
> 68E1DC3A614B7269 (1612655281723932672), F9BF6A3CF89D74FB
> (1612655281737564160), 419E017E1F360EB6 (1612655281749098496),
> 50EF977E5E873065 (1612655281759584256)]} 0 9
> 65497572 [http-apr-8980-exec-40] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> path=/update
> params={distrib.from=
> http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
> }
> {add=[B63AD0671A5E57B9 (1612655281772167168), 00B8A4CCFABFA1AC
> (1612655281784750080), 9C89A1516C9166E6 (1612655281798381568),
> 9322E17ECEAADE66 (1612655281803624448), C6DDB4BF8E94DE6B
> (1612655281814110208), DAA49178A5E74285 (1612655281830887424),
> 829C2AE38A3E78E4 (1612655281845567488), 4C7B19756D8E4208
> (1612655281859198976), BE0F7354DC30164C (1612655281869684736),
> 59C4A764BB50B13B (1612655281880170496)]} 0 9
> 65497724 [http-apr-8980-exec-31] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> path=/update
> params={distrib.from=
> http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe
> }
> {add=[1F694F99367D7CE1 (1612655281895899136), 2AEAAF67A6893ABE
> (1612655281911627776), 81E72DC36C7A9EBC (1612655281926307840),
> AA71BD9B23548E6D (1612655281939939328), 359E8C4C6EC72AFA
> (1612655281954619392), 7FEB6C65A3E23311 (1612655281972445184),
> 9B5ED0BE7AFDD1D0 (1612655281991319552), 99FE8958F6ED8B91
> (1612655282009145344), 2BDC61DC4038E19F (1612655282023825408),
> 5131AEC4B87FBFE9 (1612655282037456896)]} 0 10
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: SolrCloud scaling/optimization for high request rate

2018-10-30 Thread Deepak Goel
Please see inline...


Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please consider stopping the cruelty by becoming a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home


On Tue, Oct 30, 2018 at 5:21 PM Sofiya Strochyk 
wrote:

> My swappiness is set to 10, swap is almost not used (used space is on
> scale of a few MB) and there is no swap IO.
>
> There is disk IO like this, though:
>
> https://upload.cc/i1/2018/10/30/43lGfj.png
> https://upload.cc/i1/2018/10/30/T3u9oY.png
>
**
The time for the data is too short. Can you provide for larger timeframes?
**

>
> However CPU iowait is still zero, so not sure if the disk io is
> introducing any kind of delay...
>
> **
Can you provide graphs for cpu iowait too? (For large timeframes)
**

> On 30.10.18 10:21, Deepak Goel wrote:
>
> Yes. Swapping from disk to memory & vice versa
>
>
> Deepak
> "The greatness of a nation can be judged by the way its animals are
> treated. Please consider stopping the cruelty by becoming a Vegan"
>
> +91 73500 12833
> deic...@gmail.com
>
> Facebook: https://www.facebook.com/deicool
> LinkedIn: www.linkedin.com/in/deicool
>
> "Plant a Tree, Go Green"
>
> Make In India : http://www.makeinindia.com/home
>
>
> On Mon, Oct 29, 2018 at 11:24 PM Sofiya Strochyk 
> wrote:
>
>> Could you please clarify what is memory disk layer? Do you mean swapping
>> from memory to disk, reading from disk to memory, or something else?
>>
>> On 29.10.18 17:20, Deepak Goel wrote:
>>
>> I would then suspect performance is choking in memory disk layer. can you
>> please check the performance?
>>
>> On Mon, 29 Oct 2018, 20:30 Sofiya Strochyk,  wrote:
>>
>>> Hi Deepak and thanks for your reply,
>>>
>>> On 27.10.18 10:35, Deepak Goel wrote:
>>>
>>>
>>> Last, what is the nature of your request. Are the queries the same? Or
>>> they are very random? Random queries would need more tuning than if the
>>> queries the same.
>>>
>>> The search term (q) is different for each query, and filter query terms
>>> (fq) are repeated very often. (so we have very little cache hit ratio for
>>> query result cache, and very high hit ratio for filter cache)
>>>
>>> --
>>>
>>> *Sofiia Strochyk *
>>>
>>>
>>> s...@interlogic.com.ua
>>> [image: InterLogic]
>>> www.interlogic.com.ua
>>>
>>> [image: Facebook icon] <https://www.facebook.com/InterLogicOfficial>   
>>> [image:
>>> LinkedIn icon] <https://www.linkedin.com/company/interlogic>
>>>
>>
>> --
>>
>> *Sofiia Strochyk *
>>
>>
>> s...@interlogic.com.ua
>> [image: InterLogic]
>> www.interlogic.com.ua
>>
>> [image: Facebook icon] <https://www.facebook.com/InterLogicOfficial>   
>> [image:
>> LinkedIn icon] <https://www.linkedin.com/company/interlogic>
>>
>
> --
>
> *Sofiia Strochyk *
>
>
> s...@interlogic.com.ua
> [image: InterLogic]
> www.interlogic.com.ua
>
> [image: Facebook icon] <https://www.facebook.com/InterLogicOfficial>   [image:
> LinkedIn icon] <https://www.linkedin.com/company/interlogic>
>


Re: SolrCloud scaling/optimization for high request rate

2018-10-30 Thread Deepak Goel
Yes. Swapping from disk to memory & vice versa


Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please consider stopping the cruelty by becoming a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home


On Mon, Oct 29, 2018 at 11:24 PM Sofiya Strochyk 
wrote:

> Could you please clarify what is memory disk layer? Do you mean swapping
> from memory to disk, reading from disk to memory, or something else?
>
> On 29.10.18 17:20, Deepak Goel wrote:
>
> I would then suspect performance is choking in memory disk layer. can you
> please check the performance?
>
> On Mon, 29 Oct 2018, 20:30 Sofiya Strochyk,  wrote:
>
>> Hi Deepak and thanks for your reply,
>>
>> On 27.10.18 10:35, Deepak Goel wrote:
>>
>>
>> Last, what is the nature of your request. Are the queries the same? Or
>> they are very random? Random queries would need more tuning than if the
>> queries the same.
>>
>> The search term (q) is different for each query, and filter query terms
>> (fq) are repeated very often. (so we have very little cache hit ratio for
>> query result cache, and very high hit ratio for filter cache)
>>
>> --
>>
>> *Sofiia Strochyk *
>>
>>
>> s...@interlogic.com.ua
>> [image: InterLogic]
>> www.interlogic.com.ua
>>
>> [image: Facebook icon] <https://www.facebook.com/InterLogicOfficial>   
>> [image:
>> LinkedIn icon] <https://www.linkedin.com/company/interlogic>
>>
>
> --
>
> *Sofiia Strochyk *
>
>
> s...@interlogic.com.ua
> [image: InterLogic]
> www.interlogic.com.ua
>
> [image: Facebook icon] <https://www.facebook.com/InterLogicOfficial>   [image:
> LinkedIn icon] <https://www.linkedin.com/company/interlogic>
>


Re: SolrCloud scaling/optimization for high request rate

2018-10-27 Thread Deepak Goel
On Fri, Oct 26, 2018 at 9:25 PM Sofiya Strochyk 
wrote:

> Hi everyone,
>
> We have a SolrCloud setup with the following configuration:
>
>- 4 nodes (3x128GB RAM Intel Xeon E5-1650v2, 1x64GB RAM Intel Xeon
>E5-1650v2, 12 cores, with SSDs)
>- One collection, 4 shards, each has only a single replica (so 4
>replicas in total), using compositeId router
>- Total index size is about 150M documents/320GB, so about 40M/80GB
>per node
>- Zookeeper is on a separate server
>- Documents consist of about 20 fields (most of them are both stored
>and indexed), average document size is about 2kB
>- Queries are mostly 2-3 words in the q field, with 2 fq parameters,
>with complex sort expression (containing IF functions)
>- We don't use faceting due to performance reasons but need to add it
>in the future
>- Majority of the documents are reindexed 2 times/day, as fast as the
>SOLR allows, in batches of 1000-1 docs. Some of the documents are also
>deleted (by id, not by query)
>- autoCommit is set to maxTime of 1 minute with openSearcher=false and
>autoSoftCommit maxTime is 30 minutes with openSearcher=true. Commits from
>clients are ignored.
>- Heap size is set to 8GB.
>
> Target query rate is up to 500 qps, maybe 300, and we need to keep
> response time at <200ms. But at the moment we only see very good search
> performance with up to 100 requests per second. Whenever it grows to about
> 200, average response time abruptly increases to 0.5-1 second. (Also it
> seems that request rate reported by SOLR in admin metrics is 2x higher than
> the real one, because for every query, every shard receives 2 requests: one
> to obtain IDs and second one to get data by IDs; so target rate for SOLR
> metrics would be 1000 qps).
>
> During high request load, CPU usage increases dramatically on the SOLR
> nodes. It doesn't reach 100% but averages at 50-70% on 3 servers and about
> 93% on 1 server (random server each time, not the smallest one).
>
> The documentation mentions replication to spread the load between the
> servers. We tested replicating to smaller servers (32GB RAM, Intel Core
> i7-4770). However, when we tested it, the replicas were going out of sync
> all the time (possibly during commits) and reported errors like "PeerSync
> Recovery was not successful - trying replication." Then they proceed with
> replication which takes hours and the leader handles all requests
> singlehandedly during that time. Also both leaders and replicas started
> encountering OOM errors (heap space) for unknown reason. Heap dump analysis
> shows that most of the memory is consumed by [J (array of long) type, my
> best guess would be that it is "_version_" field, but it's still unclear
> why it happens. Also, even though with replication request rate and CPU
> usage drop 2 times, it doesn't seem to affect mean_ms, stddev_ms or p95_ms
> numbers (p75_ms is much smaller on nodes with replication, but still not as
> low as under load of <100 requests/s).
>
> Garbage collection is much more active during high load as well. Full GC
> happens almost exclusively during those times. We have tried tuning GC
> options like suggested here
> 
> and it didn't change things though.
>
> My questions are
>
>- How do we increase throughput? Is replication the only solution?
>
> 1. Increase the CPU speed
2. Increase the heap size (& tune the GC)
3. Replication
4. Have one more node on the hardware server (if cpu is not reaching 100%)

>
>-
>- if yes - then why doesn't it affect response times, considering that
>CPU is not 100% used and index fits into memory?
>- How to deal with OOM and replicas going into recovery?
>
> 1. Increase the heap size
2. Memory debug to check for memory leaks (rare)

>
>- Is memory or CPU the main problem? (When searching on the internet,
>i never see CPU as main bottleneck for SOLR, but our case might be
>different)
>
> 1. Could be both

>
>-
>- Or do we need smaller shards? Could segments merging be a problem?
>- How to add faceting without search queries slowing down too much?
>- How to diagnose these problems and narrow down to the real reason in
>hardware or setup?
>
> 1. I would first tune all the software (OS, JVM, Solr) & benchmark the
current hardware setup
2. Then i would play around with the hardware to check performance benefits

>
>
> Any help would be much appreciated.
>

Increase in response time of 1 sec when you bump up the load indicates
Queuing happening in your setup. (Since CPU is not 100% utilised, it most
likely indicates memory-disk-network or software problem)

Last, what is the nature of your request. Are the queries the same? Or they
are very random? Random queries would need more tuning than if the queries
the same.

> Thanks!
> --
>
> *Sofiia Strochyk *
>
>
> s...@interlogic.com.ua
> [image: 

Re: SolrCloud performance

2018-11-02 Thread Deepak Goel
Please see inline for my thoughts


Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please consider stopping the cruelty by becoming a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home


On Sat, Nov 3, 2018 at 1:08 AM Chuming Chen  wrote:

> Hi All,
>
> I am running a Solr cloud 7.4 with 4 shards and 4 nodes (JVM "-Xms20g
> -Xmx40g”), each shard has 32 million documents and 32Gbytes in size.
>
> For a given query (I use complexphrase query), typically, the first time
> it took a couple of seconds to return the first 20 docs. However, for the
> following page, or sorting by a field, even run the same query again took a
> lot longer to return results. I can see my 4 solr nodes running crazy with
> more than 100%CPU.
>
> I think the first time the query is being returned by Lucene (which is
already sorted out due to inverted field format). Second time around the
query is satisified by Solr (which is taking longer).


> My understanding is that Solr has query cache, run same query should be
> faster.
>
> What could be wrong here? How do I debug? I checked solr.log in all nodes
> and didn’t see anything unusual. Most frequent log entry looks like this.
>
> INFO  - 2018-11-02 19:32:55.189; [   ]
> org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null
> path=/admin/metrics
> params={wt=javabin=2=solr.core.patternmatch.shard3.replica_n8:UPDATE./update.requests=solr.core.patternmatch.shard3.replica_n8:INDEX.sizeInBytes=solr.core.patternmatch.shard1.replica_n1:QUERY./select.requests=solr.core.patternmatch.shard1.replica_n1:INDEX.sizeInBytes=solr.core.patternmatch.shard1.replica_n1:UPDATE./update.requests=solr.core.patternmatch.shard3.replica_n8:QUERY./select.requests}
> status=0 QTime=7
> INFO  - 2018-11-02 19:32:55.192; [   ]
> org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null
> path=/admin/metrics
> params={wt=javabin=2=solr.jvm:os.processCpuLoad=solr.node:CONTAINER.fs.coreRoot.usableSpace=solr.jvm:os.systemLoadAverage=solr.jvm:memory.heap.used}
> status=0 QTime=1
>
> Thank you for your kind help.
>
> Chuming
>
>
>
>


Re: Index optimization takes too long

2018-11-03 Thread Deepak Goel
I would start by monitoring the hardware (CPU, Memory, Disk) & software
(heap, threads) utilization's and seeing where the bottlenecks are. Or what
is getting utilized the most. And then tune that parameter.

I would also look at profiling the software.


Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please consider stopping the cruelty by becoming a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home


On Sat, Nov 3, 2018 at 4:30 AM Wei  wrote:

> Hello,
>
> After a recent schema change,  it takes almost 40 minutes to optimize the
> index.  The schema change is to enable docValues for all sort/facet fields,
> which increase the index size from 12G to 14G. Before the change it only
> takes 5 minutes to do the optimization.
>
> I have tried to increase maxMergeAtOnceExplicit because the default 30
> could be too low:
>
> 100
>
> But it doesn't seem to help. Any suggestions?
>
> Thanks,
> Wei
>


Re: Slow import from MsSQL and down cluster during process

2018-10-24 Thread Deepak Goel
Please check if there is a deadlock happening by taking heap dumps


Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please consider stopping the cruelty by becoming a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home


On Wed, Oct 24, 2018 at 11:12 AM Daniel Carrasco 
wrote:

> Thanks for all, I'll try later ;)
>
> Greetings!!.
>
> El mié., 24 oct. 2018 a las 7:13, Walter Underwood ( >)
> escribió:
>
> > We handle request rates at a few thousand requests/minute with an 8 GB
> > heap. 95th percentile response time is 200 ms. Median (cached) is 4 ms.
> >
> > An oversized heap will hurt your query performance because everything
> > stops for the huge GC.
> >
> > RAM is still a thousand times faster than SSD, so you want a lot of RAM
> > available for file system buffers managed by the OS.
> >
> > I recommend trying an 8 GB heap with the latest version of Java 8 and the
> > G1 collector.
> >
> > We have this in our solr.in.sh:
> >
> > SOLR_HEAP=8g
> > # Use G1 GC  -- wunder 2017-01-23
> > # Settings from https://wiki.apache.org/solr/ShawnHeisey
> > GC_TUNE=" \
> > -XX:+UseG1GC \
> > -XX:+ParallelRefProcEnabled \
> > -XX:G1HeapRegionSize=8m \
> > -XX:MaxGCPauseMillis=200 \
> > -XX:+UseLargePages \
> > -XX:+AggressiveOpts \
> > "
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> > > On Oct 23, 2018, at 9:51 PM, Daniel Carrasco 
> > wrote:
> > >
> > > Hello,
> > >
> > > I've set that heap size because the solr receives a lot of queries
> every
> > > second and I want to cache as much as possible. Also I'm not sure about
> > the
> > > number of documents in the collection, but the webpage have a lot of
> > > products.
> > >
> > > About store the index data in RAM is just an expression. The data is
> > stored
> > > on SSD disks with XFS (faster than EXT4).
> > >
> > > I'll take a look to the links tomorrow at work.
> > >
> > > Thanks!!
> > > Greetings!!
> > >
> > >
> > > El mar., 23 oct. 2018 23:48, Shawn Heisey 
> > escribió:
> > >
> > >> On 10/23/2018 7:15 AM, Daniel Carrasco wrote:
> > >>> Hello,
> > >>>
> > >>> Thanks for your response.
> > >>>
> > >>> We've already thought about that and doubled the instances. Just now
> > for
> > >>> every Solr instance we've 60GB of RAM (40GB configured on Solr), and
> a
> > 16
> > >>> Cores CPU. The entire Data can be stored on RAM and will not fill the
> > RAM
> > >>> (of course talking about raw data, not procesed data).
> > >>
> > >> Why are you making the heap so large?  I've set up servers that can
> > >> handle hundreds of millions of Solr documents in a much smaller
> heap.  A
> > >> 40GB heap would be something you might do if you're handling billions
> of
> > >> documents on one server.
> > >>
> > >> When you say the entire data can be stored in RAM ... are you counting
> > >> that 40GB you gave to Solr?  Because you can't count that -- that's
> for
> > >> Solr, NOT the index data.
> > >>
> > >> The heap size should never be dictated by the amount of memory in the
> > >> server.  It should be made as large as it needs to be for the job, and
> > >> no larger.
> > >>
> > >> https://wiki.apache.org/solr/SolrPerformanceProblems#RAM
> > >>
> > >>> About the usage, I've checked the RAM and CPU usage and are not fully
> > >> used.
> > >>
> > >> What exactly are you looking at?  I've had people swear that they
> can't
> > >> see a problem with their systems when Solr is REALLY struggling to
> keep
> > >> up with what it has been asked to do.
> > >>
> > >> Further down on the page I linked above is a section about asking for
> > >> help.  If you can provide the screenshot it mentions there, that would
> > >> be helpful.  Here's a direct link to that section:
> > >>
> > >>
> > >>
> >
> https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue
> > >>
> > >> Thanks,
> > >> Shawn
> > >>
> > >>
> >
> >
>
> --
> _
>
>   Daniel Carrasco Marín
>   Ingeniería para la Innovación i2TIC, S.L.
>   Tlf:  +34 911 12 32 84 Ext: 223
>   www.i2tic.com
> _
>


Re: SolrCloud scaling/optimization for high request rate

2018-10-29 Thread Deepak Goel
I would then suspect performance is choking in memory disk layer. can you
please check the performance?

On Mon, 29 Oct 2018, 20:30 Sofiya Strochyk,  wrote:

> Hi Deepak and thanks for your reply,
>
> On 27.10.18 10:35, Deepak Goel wrote:
>
>
> Last, what is the nature of your request. Are the queries the same? Or
> they are very random? Random queries would need more tuning than if the
> queries the same.
>
> The search term (q) is different for each query, and filter query terms
> (fq) are repeated very often. (so we have very little cache hit ratio for
> query result cache, and very high hit ratio for filter cache)
>
> --
>
> *Sofiia Strochyk *
>
>
> s...@interlogic.com.ua
> [image: InterLogic]
> www.interlogic.com.ua
>
> [image: Facebook icon] <https://www.facebook.com/InterLogicOfficial>   [image:
> LinkedIn icon] <https://www.linkedin.com/company/interlogic>
>


Re: one node too busy

2018-11-27 Thread Deepak Goel
You might have to use a APM tool (AppDynamics) to debug the busy solr
instance


Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please consider stopping the cruelty by becoming a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home


On Tue, Nov 27, 2018 at 10:25 PM Kudrettin Güleryüz 
wrote:

> Hi,
>
> How can I debug what is causing occasional hiccup of our Solr cloud
> instance? When this issue happens, I can see that one of the nodes is too
> busy and the others are just doing fine. We use 6 nodes, 6 shards (1 shard
> per node), 1 replica for each collection.
>
> Can you please suggest tools to debug what may be causing this possibly a
> bottleneck situation?
>
> Thank you
>


Re: Solrcloud TimeoutException: Idle timeout expired

2019-01-29 Thread Deepak Goel
Document is not being passed. It has zero content.

It could be due to no memory in heap. For this please check GC logs

On Tue, 29 Jan 2019, 08:54 Schaum Mallik  I am seeing this error in our logs. Our Solr heap is set to more than 10G.
> Any clues which anyone can provide will be very helpful.
>
> Thank you
>
> null:java.io.IOException: java.util.concurrent.TimeoutException: Idle
> timeout expired: 12/12 ms
> at
> org.eclipse.jetty.server.HttpInput$ErrorState.noContent(HttpInput.java:1075)
> at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:313)
> at
> org.apache.solr.servlet.ServletInputStreamWrapper.read(ServletInputStreamWrapper.java:74)
> at
> org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:100)
> at
> org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:79)
> at
> org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:88)
> at
> org.apache.solr.common.util.FastInputStream.peek(FastInputStream.java:60)
> at
> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:107)
> at
> org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:55)
> at
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2539)
> at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> at org.eclipse.jetty.server.Server.handle(Server.java:531)
> at
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
> at org.eclipse.jetty.io
> .AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
> at org.eclipse.jetty.io
> .FillInterest.fillable(FillInterest.java:102)
> at org.eclipse.jetty.io
> .ChannelEndPoint$2.run(ChannelEndPoint.java:118)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
> at
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
> at
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
> at
> 

Re: Throughput does not increase in spite of low CPU usage

2019-09-30 Thread Deepak Goel
Hello

Can you please try increasing 'new size' and 'max new size' to 1GB+?

Deepak

On Mon, 30 Sep 2019, 13:35 Yasufumi Mizoguchi, 
wrote:

> Hi, Deepak.
> Thank you for replying me.
>
> JVM settings from solr.in.sh file are as follows. (Sorry, I could not show
> all due to our policy)
>
> -verbose:gc
> -XX:+PrintHeapAtGC
> -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps
> -XX:+PrintTenuringDistribution
> -XX:+PrintGCApplicationStoppedTime
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.port=18983
> -XX:OnOutOfMemoryError=/home/solr/solr-6.2.1/bin/oom_solr.sh
> -XX:NewSize=128m
> -XX:MaxNewSize=128m
> -XX:+UseG1GC
> -XX:+PerfDisableSharedMem
> -XX:+ParallelRefProcEnabled
> -XX:G1HeapRegionSize=8m
> -XX:MaxGCPauseMillis=250
> -XX:InitiatingHeapOccupancyPercent=75
> -XX:+UseLargePages
> -XX:+AggressiveOpts
> -Xmx32G
> -Xms32G
> -Xss256k
>
>
> Thanks & Regards
> Yasufumi.
>
> 2019年9月30日(月) 16:12 Deepak Goel :
>
> > Hello
> >
> > Can you please share the JVM heap settings in detail?
> >
> > Deepak
> >
> > On Mon, 30 Sep 2019, 11:15 Yasufumi Mizoguchi, 
> > wrote:
> >
> > > Hi,
> > >
> > > I am trying some tests to confirm if single Solr instance can perform
> > over
> > > 1000 queries per second(!).
> > >
> > > But now, although CPU usage is 40% or so and iowait is almost 0%,
> > > throughput does not increase over 60 queries per second.
> > >
> > > I think there are some bottlenecks around Kernel, JVM, or Solr
> settings.
> > >
> > > The values we already checked and configured are followings.
> > >
> > > * Kernel:
> > > file descriptor
> > > net.ipv4.tcp_max_syn_backlog
> > > net.ipv4.tcp_syncookies
> > > net.core.somaxconn
> > > net.core.rmem_max
> > > net.core.wmem_max
> > > net.ipv4.tcp_rmem
> > > net.ipv4.tcp_wmem
> > >
> > > * JVM:
> > > Heap [ -> 32GB]
> > > G1GC settings
> > >
> > > * Solr:
> > > (Jetty) MaxThreads [ -> 2]
> > >
> > >
> > > And the other info is as follows.
> > >
> > > CPU : 16 cores
> > > RAM : 128 GB
> > > Disk : SSD 500GB
> > > NIC : 10Gbps(maybe)
> > > OS : Ubuntu 14.04
> > > JVM : OpenJDK 1.8.0u191
> > > Solr : 6.2.1
> > > Index size : about 60GB
> > >
> > > Any insights will be appreciated.
> > >
> > > Thanks and regards,
> > > Yasufumi.
> > >
> >
>


Re: Throughput does not increase in spite of low CPU usage

2019-09-30 Thread Deepak Goel
Hello

Can you please share the JVM heap settings in detail?

Deepak

On Mon, 30 Sep 2019, 11:15 Yasufumi Mizoguchi, 
wrote:

> Hi,
>
> I am trying some tests to confirm if single Solr instance can perform over
> 1000 queries per second(!).
>
> But now, although CPU usage is 40% or so and iowait is almost 0%,
> throughput does not increase over 60 queries per second.
>
> I think there are some bottlenecks around Kernel, JVM, or Solr settings.
>
> The values we already checked and configured are followings.
>
> * Kernel:
> file descriptor
> net.ipv4.tcp_max_syn_backlog
> net.ipv4.tcp_syncookies
> net.core.somaxconn
> net.core.rmem_max
> net.core.wmem_max
> net.ipv4.tcp_rmem
> net.ipv4.tcp_wmem
>
> * JVM:
> Heap [ -> 32GB]
> G1GC settings
>
> * Solr:
> (Jetty) MaxThreads [ -> 2]
>
>
> And the other info is as follows.
>
> CPU : 16 cores
> RAM : 128 GB
> Disk : SSD 500GB
> NIC : 10Gbps(maybe)
> OS : Ubuntu 14.04
> JVM : OpenJDK 1.8.0u191
> Solr : 6.2.1
> Index size : about 60GB
>
> Any insights will be appreciated.
>
> Thanks and regards,
> Yasufumi.
>