Re: Performance issue with Solr 8.6.1 Unified Highlighter does not occur on Solr 6.

2021-02-01 Thread Kerwin
 Hi David,

Thanks for filing this issue. The classic non-weightMatcher mode works well
for us right now. Yes, we are using the POSTINGS mode for most of the
fields although explicitly mentioning it gives an error since not all
fields are indexed with offsets. So I guess the highlighter is picking the
right choice for each field. Here is the test with hl.offsetSource=ANALYSIS
and hl.weightMatches=false that you requested.

hl.offsetSource=ANALYSIS=false (340 ms)

The above is thus better than the original highlighter. I'll also try and
create that PR soon.


Re: Performance issue with Solr 8.6.1 Unified Highlighter does not occur on Solr 6.

2021-01-29 Thread David Smiley
https://issues.apache.org/jira/browse/SOLR-10321 -- near the end my opinion
is we should just omit the field if there is no highlight, which would
address your need to do this work-around.  Glob or no glob.  PR welcome!

It's satisfying seeing that the Unified Highlighter is so much faster than
the original.  I aim to make UH the default in 9.0.  SOLR-12901


It's kinda depressing that the weightMatcher mode is slow when there are
many fields because I was hoping this choice might eventually be permanent
in order to obsolete lots of code in the highlighter.  I can guess why it's
slow -- and I filed an issue --
https://issues.apache.org/jira/browse/LUCENE-9712 -- a tough one!  Don't
expect anything from me there for the foreseeable future.  It'd take either
some ugly hack that has some limited qualifications, or a substantial
rewrite of much of the UH.  At least there's the classic non-weightMatcher
mode, which works faithfully, albeit with some of its own gotchas around
obscure/custom query compatibility.

You said the original highlighter performs at ~1.5 seconds.  For the UH, I
suspect your offset source is postings from the index to get such fantastic
numbers that you get with it; right?  For curiosity's sake, can you please
set hl.offsetSource=ANALYSIS and tell me what speed you get?  Set
hl.weightMatches=false as well.  My hope is that it's still substantially
better than the original highlighter.

Just because hl.requireFieldMatch=false is the default, doesn't mean it's
the _right_ choice for everyone's app :-).  I tend to think Solr should
flip this in 9.0 for both accuracy & performance sake.  And unset
hl.maxAnalyzedChars -- mostly an obsolete safety with the UH being so much
faster.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Jan 29, 2021 at 2:46 AM Kerwin  wrote:

> On another note, since response time is in question, I have been using a
> customhighlighter to just override the method encodeSnippets() in the
> UnifiedSolrHighlighter class since solr 6 since Solr sends back blank array
> (ZERO_LEN_STR_ARRAY) in the response payload for fields that do not match.
> Here is the code before:
> if (snippet == null) {
>   //TODO reuse logic of DefaultSolrHighlighter.alternateField
>   summary.add(field, ZERO_LEN_STR_ARRAY);
> } 
>
> So I had removed this clause and made the following change:
>
> if (snippet != null) {
>// we used a special snippet separator char and we can now split on
> it.
>   summary.add(field, snippet.split(SNIPPET_SEPARATOR));
> }
>
> This has not changed in Solr 8 too, which for 76 fields gives a very large
> payload. So I will keep this custom code for now.
>
> On Fri, Jan 29, 2021 at 12:28 PM Kerwin  wrote:
>
>> Hi David,
>>
>> Thanks so much for your reply.
>> hl.weightMatches was indeed the culprit. After setting it to false, I am
>> now getting the same sub-second response as Solr 6. I am using Solr 8.6.1
>> (8.6.1)
>>
>> Here are the tests I carried out:
>> hl.requireFieldMatch=true=true  (2458 ms)
>> hl.requireFieldMatch=false=true (3964 ms)
>> hl.requireFieldMatch=true=false (158 ms)
>> hl.requireFieldMatch=false=false (169 ms) (CHOSEN since
>> this is consistent with our earlier setting).
>>
>> Thanks again, I will inform our other teams as well doing the Solr
>> upgrade to check the CHANGES.txt doc related to this.
>>
>


Re: Performance issue with Solr 8.6.1 Unified Highlighter does not occur on Solr 6.

2021-01-28 Thread Kerwin
On another note, since response time is in question, I have been using a
customhighlighter to just override the method encodeSnippets() in the
UnifiedSolrHighlighter class since solr 6 since Solr sends back blank array
(ZERO_LEN_STR_ARRAY) in the response payload for fields that do not match.
Here is the code before:
if (snippet == null) {
  //TODO reuse logic of DefaultSolrHighlighter.alternateField
  summary.add(field, ZERO_LEN_STR_ARRAY);
} 

So I had removed this clause and made the following change:

if (snippet != null) {
   // we used a special snippet separator char and we can now split on
it.
  summary.add(field, snippet.split(SNIPPET_SEPARATOR));
}

This has not changed in Solr 8 too, which for 76 fields gives a very large
payload. So I will keep this custom code for now.

On Fri, Jan 29, 2021 at 12:28 PM Kerwin  wrote:

> Hi David,
>
> Thanks so much for your reply.
> hl.weightMatches was indeed the culprit. After setting it to false, I am
> now getting the same sub-second response as Solr 6. I am using Solr 8.6.1
> (8.6.1)
>
> Here are the tests I carried out:
> hl.requireFieldMatch=true=true  (2458 ms)
> hl.requireFieldMatch=false=true (3964 ms)
> hl.requireFieldMatch=true=false (158 ms)
> hl.requireFieldMatch=false=false (169 ms) (CHOSEN since
> this is consistent with our earlier setting).
>
> Thanks again, I will inform our other teams as well doing the Solr upgrade
> to check the CHANGES.txt doc related to this.
>


Re: Performance issue with Solr 8.6.1 Unified Highlighter does not occur on Solr 6.

2021-01-28 Thread Kerwin
Hi David,

Thanks so much for your reply.
hl.weightMatches was indeed the culprit. After setting it to false, I am
now getting the same sub-second response as Solr 6. I am using Solr 8.6.1
(8.6.1)

Here are the tests I carried out:
hl.requireFieldMatch=true=true  (2458 ms)
hl.requireFieldMatch=false=true (3964 ms)
hl.requireFieldMatch=true=false (158 ms)
hl.requireFieldMatch=false=false (169 ms) (CHOSEN since
this is consistent with our earlier setting).

Thanks again, I will inform our other teams as well doing the Solr upgrade
to check the CHANGES.txt doc related to this.


Re: Performance issue with Solr 8.6.1 Unified Highlighter does not occur on Solr 6.

2021-01-28 Thread David Smiley
Hello Kerwin,

Firstly, hopefully you've seen the upgrade notes:
https://lucene.apache.org/solr/guide/8_7/solr-upgrade-notes.html
8.6 fixes a performance regression found in 8.5; perhaps you are using 8.5?

Missing from the upgrade notes but found in the CHANGES.txt for 8.0
is hl.weightMatches=true is now the default.  Try setting it to false.
Does that help performance much?  It's documented on the highlighting page
of the ref guide:
https://lucene.apache.org/solr/guide/8_7/highlighting.html#the-unified-highlighter

You might want to try toggling hl.requireFieldMatch=true (defaults to
false).  For a query with dismax, it makes no semantic difference since all
clauses target all fields, unless users know how to query only specific
fields and do that.  It may impact performance significantly when there are
many fields.  Try a matrix of toggling this and hl.weightMatches (2x2=4
tests).

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Jan 27, 2021 at 2:20 AM Kerwin  wrote:

> Hi,
>
> While upgrading to Solr 8 from 6 the Unified highlighter begins to have
> performance issues going from approximately 100ms to more than 4 seconds
> with 76 fields in the hl.q  and hl.fl parameters. So I played with
> different options and found that the hl.q parameter needs to have any one
> field for the performance issue to vanish. I do not know why this would be
> so. Could you check if this is a bug or something else? This is not the
> case if I use the original highlighter which has same performance on Solr 6
> and Solr 8 of ~ 1.5 seconds. The highlighting payload is also mostly same
> in all the cases.
>
> Prior Solr 8 configuration with bad performance of > 4sec
> {!edismax qf="field1 field2 ..field76" v=$qq}
> field1 field2 ..field76
>
> Solr 8 configuration with original Solr 6 performance of ~ 100 ms
> {!edismax qf="field1" v=$qq}
> field1 field2 ..field76
>
> Other highlighting parameters
> true
> unified
> 200
> WORD
> en
> 10
>
> If I remove the hl.q parameter altogether, the performance time shoots up
> to 6-7 seconds, since our user query is quite large with more fields and is
> more complicated, I suspect.
>


Performance issue with Solr 8.6.1 Unified Highlighter does not occur on Solr 6.

2021-01-26 Thread Kerwin
Hi,

While upgrading to Solr 8 from 6 the Unified highlighter begins to have
performance issues going from approximately 100ms to more than 4 seconds
with 76 fields in the hl.q  and hl.fl parameters. So I played with
different options and found that the hl.q parameter needs to have any one
field for the performance issue to vanish. I do not know why this would be
so. Could you check if this is a bug or something else? This is not the
case if I use the original highlighter which has same performance on Solr 6
and Solr 8 of ~ 1.5 seconds. The highlighting payload is also mostly same
in all the cases.

Prior Solr 8 configuration with bad performance of > 4sec
{!edismax qf="field1 field2 ..field76" v=$qq}
field1 field2 ..field76

Solr 8 configuration with original Solr 6 performance of ~ 100 ms
{!edismax qf="field1" v=$qq}
field1 field2 ..field76

Other highlighting parameters
true
unified
200
WORD
en
10

If I remove the hl.q parameter altogether, the performance time shoots up
to 6-7 seconds, since our user query is quite large with more fields and is
more complicated, I suspect.


Re: Cursor Performance Issue

2021-01-14 Thread Ajay Sharma
Hi Mike,

Thanks for your reply.

I remember DocValues is enabled by default since solr 6.

If it is not and I reindex the data with DocValues= true for id field. How
much my index size will increase due to this.
Currently I have 90 GB as index size


On Wed, 13 Jan, 2021, 9:14 pm Mike Drob,  wrote:

> You should be using docvalues on your id, but note that switching this
> would require a reindex.
>
> On Wed, Jan 13, 2021 at 6:04 AM Ajay Sharma 
> wrote:
>
> > Hi All,
> >
> > I have used cursors to search and export documents in solr according to
> >
> >
> https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors
> >
> > Solr version: 6.5.0
> > No of Documents: 10 crore
> >
> > Before implementing cursor, I was using the start and rows parameter to
> > fetch records
> > Service response time used to be 2 sec
> >
> > *Before implementing Cursor Solr URL:*
> > http://localhost:8080/solr/search/select?q=bird
> > toy=mapping=3=25=100
> >
> > Request handler Looks like this: fl contains approx 20 fields
> > 
> > 
> > edismax
> > on
> > 0.01
> > 
> > 
> > id,refid,title,smalldesc:""
> > 
> >
> > none
> > json
> > 25
> > 15000
> > smalldesc
> > title_text
> > titlews^3
> > sdescnisq
> > 1
> > 
> > 2-1 470%
> > 
> > 
> >
> > Sharing Response with EchoParams=all > Qtime is 6
> > responseHeader: {
> > status: 0,
> > QTime: 6,
> > params: {
> > ps: "3",
> > echoParams: "all",
> > indent: "on",
> > fl: "id,refid,title,smalldesc:"",
> > tie: "0.01",
> > defType: "edismax",
> > qf: "customphonetic",
> > wt: "json",
> >qs: "1",
> >qt: "mapping",
> >rows: "25",
> >q: "bird toy",
> >timeAllowed: "15000"
> > }
> > },
> > response: {
> > numFound: 17,
> > start: 0,
> > maxScore: 26.616478,
> > docs: [
> >   {
> > id: "22347708097",
> > refid: "152585558",
> > title: "Round BIRD COLOURFUL SWINGING CIRCULAR SITTING TOY",
> > smalldesc: "",
> > score: 26.616478
> >  }
> > ]
> > }
> >
> > I am facing a performance issue now after implementing the cursor.
> Service
> > response time is increased 3 to 4 times .i.e. 8 sec in some cases
> >
> > *After implementing Cursor query is-*
> > localhost:8080/solr/search/select?q=bird
> > toy=cursor=3=1000=100=score desc,id asc=*
> >
> > Just added =score desc,id asc=* to the before query and
> > rows to be fetched is 1000 now and fl contains just a single field
> >
> > Request handler remains same as before just changed the name and made fl
> > change and added df in defaults
> >
> > 
> >
> >   edismax
> >   on
> >   0.01
> >
> >
> >   refid
> >
> >
> >   none
> >   json
> >   1000
> >   smalldesc
> >   title_text
> >   titlews^3
> >   sdescnisq
> >   1
> >   2-1 470%
> >   product_titles
> >
> > 
> >
> > Response with Cursor and echoParams=all-> *Qtime is now 17* i.e approx 3
> > time of previous qtime
> > responseHeader: {
> > status: 0,
> > QTime: 17,
> > params: {
> > df: "product_titles",
> > ps: "3",
> > echoParams: "all",
> > indent: "on",
> > fl: "refid",
> > tie: "0.01",
> > defType: "edismax",
> > qf: "customphonetic",
> > qs: "1",
> > qt: "cursor",
> > sort: "score desc,id asc",
> > rows: "1000",
> > q: "bird toy",
> > cursorMark: "*",
> > }
> > },
> > response: {
> > numFound: 17,
> > start: 0,
> > docs: [
> > {
> > refid: "152585558"
> > },
> > {
> > refid: "157276077"
> > }
> > ]
> > }
> >
> >
> > When i curl http://localhost:8080/solr/search/select?q=bird
> > toy=mapping=3=25=100, i can get results in 3 seconds.
> > When i curl localhost:8080/solr/search/select?q=bird
> > toy=cursor=3=1000=100=score desc,id asc=*
> it
> > consumed 8 seconds to return result even if the result count=0
> >
> > BTW, the id schema definition is used in sort
> >  required="true"
> > omitNorms="true" multiValued="false"/>
> >
> > Is it due to the sort I have applied or I have implemented it in the
> wrong
> > way?
> > Please help or provide the direction to solve this issue
> >
> >
> > Thanks in advance
> >
> > --
> > Thanks & Regards,
> > Ajay Sharma
> > Product Search
> > Indiamart Intermesh Ltd.
> >
> > --
> >
> >
>

-- 



Re: Cursor Performance Issue

2021-01-13 Thread Mike Drob
You should be using docvalues on your id, but note that switching this
would require a reindex.

On Wed, Jan 13, 2021 at 6:04 AM Ajay Sharma 
wrote:

> Hi All,
>
> I have used cursors to search and export documents in solr according to
>
> https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors
>
> Solr version: 6.5.0
> No of Documents: 10 crore
>
> Before implementing cursor, I was using the start and rows parameter to
> fetch records
> Service response time used to be 2 sec
>
> *Before implementing Cursor Solr URL:*
> http://localhost:8080/solr/search/select?q=bird
> toy=mapping=3=25=100
>
> Request handler Looks like this: fl contains approx 20 fields
> 
> 
> edismax
> on
> 0.01
> 
> 
> id,refid,title,smalldesc:""
> 
>
> none
> json
> 25
> 15000
> smalldesc
> title_text
> titlews^3
> sdescnisq
> 1
> 
> 2-1 470%
> 
> 
>
> Sharing Response with EchoParams=all > Qtime is 6
> responseHeader: {
> status: 0,
> QTime: 6,
> params: {
> ps: "3",
> echoParams: "all",
> indent: "on",
> fl: "id,refid,title,smalldesc:"",
> tie: "0.01",
> defType: "edismax",
> qf: "customphonetic",
> wt: "json",
>qs: "1",
>qt: "mapping",
>rows: "25",
>q: "bird toy",
>    timeAllowed: "15000"
> }
> },
> response: {
> numFound: 17,
> start: 0,
> maxScore: 26.616478,
> docs: [
>   {
> id: "22347708097",
> refid: "152585558",
> title: "Round BIRD COLOURFUL SWINGING CIRCULAR SITTING TOY",
> smalldesc: "",
> score: 26.616478
>  }
> ]
> }
>
> I am facing a performance issue now after implementing the cursor. Service
> response time is increased 3 to 4 times .i.e. 8 sec in some cases
>
> *After implementing Cursor query is-*
> localhost:8080/solr/search/select?q=bird
> toy=cursor=3=1000=100=score desc,id asc=*
>
> Just added =score desc,id asc=* to the before query and
> rows to be fetched is 1000 now and fl contains just a single field
>
> Request handler remains same as before just changed the name and made fl
> change and added df in defaults
>
> 
>
>   edismax
>   on
>   0.01
>
>
>   refid
>
>
>   none
>   json
>   1000
>   smalldesc
>   title_text
>   titlews^3
>   sdescnisq
>   1
>   2-1 470%
>   product_titles
>
> 
>
> Response with Cursor and echoParams=all-> *Qtime is now 17* i.e approx 3
> time of previous qtime
> responseHeader: {
> status: 0,
> QTime: 17,
> params: {
> df: "product_titles",
> ps: "3",
> echoParams: "all",
> indent: "on",
> fl: "refid",
> tie: "0.01",
> defType: "edismax",
> qf: "customphonetic",
> qs: "1",
> qt: "cursor",
> sort: "score desc,id asc",
> rows: "1000",
> q: "bird toy",
> cursorMark: "*",
> }
> },
> response: {
> numFound: 17,
> start: 0,
> docs: [
> {
> refid: "152585558"
> },
> {
> refid: "157276077"
> }
> ]
> }
>
>
> When i curl http://localhost:8080/solr/search/select?q=bird
> toy=mapping=3=25=100, i can get results in 3 seconds.
> When i curl localhost:8080/solr/search/select?q=bird
> toy=cursor=3=1000=100=score desc,id asc=* it
> consumed 8 seconds to return result even if the result count=0
>
> BTW, the id schema definition is used in sort
>  omitNorms="true" multiValued="false"/>
>
> Is it due to the sort I have applied or I have implemented it in the wrong
> way?
> Please help or provide the direction to solve this issue
>
>
> Thanks in advance
>
> --
> Thanks & Regards,
> Ajay Sharma
> Product Search
> Indiamart Intermesh Ltd.
>
> --
>
>


Cursor Performance Issue

2021-01-13 Thread Ajay Sharma
Hi All,

I have used cursors to search and export documents in solr according to
https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors

Solr version: 6.5.0
No of Documents: 10 crore

Before implementing cursor, I was using the start and rows parameter to
fetch records
Service response time used to be 2 sec

*Before implementing Cursor Solr URL:*
http://localhost:8080/solr/search/select?q=bird
toy=mapping=3=25=100

Request handler Looks like this: fl contains approx 20 fields


edismax
on
0.01


id,refid,title,smalldesc:""

   
none
json
25
15000
smalldesc
title_text
titlews^3
sdescnisq
1

2-1 470%



Sharing Response with EchoParams=all > Qtime is 6
responseHeader: {
status: 0,
QTime: 6,
params: {
ps: "3",
echoParams: "all",
indent: "on",
fl: "id,refid,title,smalldesc:"",
tie: "0.01",
defType: "edismax",
qf: "customphonetic",
wt: "json",
   qs: "1",
   qt: "mapping",
   rows: "25",
   q: "bird toy",
   timeAllowed: "15000"
}
},
response: {
numFound: 17,
start: 0,
maxScore: 26.616478,
docs: [
  {
id: "22347708097",
refid: "152585558",
title: "Round BIRD COLOURFUL SWINGING CIRCULAR SITTING TOY",
smalldesc: "",
score: 26.616478
 }
]
}

I am facing a performance issue now after implementing the cursor. Service
response time is increased 3 to 4 times .i.e. 8 sec in some cases

*After implementing Cursor query is-*
localhost:8080/solr/search/select?q=bird
toy=cursor=3=1000=100=score desc,id asc=*

Just added =score desc,id asc=* to the before query and
rows to be fetched is 1000 now and fl contains just a single field

Request handler remains same as before just changed the name and made fl
change and added df in defaults


   
  edismax
  on
  0.01
   
   
  refid
   
   
  none
  json
  1000
  smalldesc
  title_text
  titlews^3
  sdescnisq
  1
  2-1 470%
  product_titles
   


Response with Cursor and echoParams=all-> *Qtime is now 17* i.e approx 3
time of previous qtime
responseHeader: {
status: 0,
QTime: 17,
params: {
df: "product_titles",
ps: "3",
echoParams: "all",
indent: "on",
fl: "refid",
tie: "0.01",
defType: "edismax",
qf: "customphonetic",
qs: "1",
qt: "cursor",
sort: "score desc,id asc",
rows: "1000",
q: "bird toy",
cursorMark: "*",
}
},
response: {
numFound: 17,
start: 0,
docs: [
{
refid: "152585558"
},
{
refid: "157276077"
}
]
}


When i curl http://localhost:8080/solr/search/select?q=bird
toy=mapping=3=25=100, i can get results in 3 seconds.
When i curl localhost:8080/solr/search/select?q=bird
toy=cursor=3=1000=100=score desc,id asc=* it
consumed 8 seconds to return result even if the result count=0

BTW, the id schema definition is used in sort


Is it due to the sort I have applied or I have implemented it in the wrong
way?
Please help or provide the direction to solve this issue


Thanks in advance

-- 
Thanks & Regards,
Ajay Sharma
Product Search
Indiamart Intermesh Ltd.

-- 



Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1

2020-05-18 Thread vishal patel
Any one is looking my issue? Due to this issue I can not upgrade Solr 8.3.0.

regards,
Vishal Patel

From: vishal patel 
Sent: Sunday, May 17, 2020 11:49 AM
To: solr-user 
Subject: Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1

Solr 6.1.0 : 1881

Here is my thread dump stack trace and log for Solr 6.1.0. It is helpful for 
you.
My threads: qtp557041912-245356 and qtp557041912-245342.
https://drive.google.com/file/d/1owtotYEnJacMiEZyuGLk3AHQ9kQG5rww/view?usp=sharing

Regards
Vishal Patel



From: vishal patel 
Sent: Sunday, May 17, 2020 11:04 AM
To: solr-user 
Subject: Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1

Thanks for reply.

I know Query field value is large. But same thing is working fine in Solr 6.1.0 
and query executed within 300 milliseconds. Schema.xml and Solrconfig.xml are 
same. Why is it taking lots of time for execution in Solr 8.3.0?

Is there any changes in Solr 8.3.0?

Regards,
Vishal Patel

From: Mikhail Khludnev 
Sent: Saturday, May 16, 2020 6:55 PM
To: solr-user 
Subject: Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1

It seems this thread is doing heavy work, mind the bottom line.

202.8013ms
124.8008ms
qtp153245266-156 (156)
org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer.(BM25Similarity.java:219)
org.apache.lucene.search.similarities.BM25Similarity.scorer(BM25Similarity.java:192)
org.apache.lucene.search.similarities.PerFieldSimilarityWrapper.scorer(PerFieldSimilarityWrapper.java:47)
org.apache.lucene.search.TermQuery$TermWeight.(TermQuery.java:74)
org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:205)
org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:726)
org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:63)
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:231)
org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:726)
org.apache.lucene.search.TopFieldCollector.populateScores(TopFieldCollector.java:531)
org.apache.solr.search.grouping.distributed.command.TopGroupsFieldCommand.postCollect(TopGroupsFieldCommand.java:178)
org.apache.solr.search.grouping.CommandHandler.execute(CommandHandler.java:168)
org.apache.solr.handler.component.QueryComponent.doProcessGroupedDistributedSearchSecondPhase(QueryComponent.java:1403)
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:387)
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)


It seems like it ranks groups by query score, that doubtful thing to do.

>From the log. Here's how to recognize query running 25 sec "QTime=25063"


Query itself q=+msg_id:(10519539+10519540+10523575+10523576+ ... is
not what search engines are made for. They are purposed for short
query.

You may

1. leverage {!terms} query parser which might handle such long terms
list more efficiently

2. make sure you don't enable unnecessary grouping features, eg group
ranking in the stack above makes no sense for this kind of query


It's worth to revamp an overall approach in favor of query time
{!join} or index time join see {!parent}/nested docs.



On Sat, May 16, 2020 at 1:46 PM vishal patel 
wrote:

> Thanks for reply.
>
> I have taken a thread dump at the time of query execution. I do not know
> the thread name so send the All threads. I have also send the logs so you
> can get idea.
>
> Thread Dump All Stack Trace:
> https://drive.google.com/file/d/1N4rVXJoaAwNvPIY2aw57gKA9mb4vRTMR/view
> Solr 8.3 shard 1 log:
> https://drive.google.com/file/d/1h5d_eZfQvYET7JKzbNKZwhZ_RmaX7hWf/view
> Solr 8.3 shard 2 log:
> https://drive.google.com/file/d/19CRflzQ7n5BZBNaaC7EFszgzKKlPfIVl/view
>
> I have some questions regarding the thread dump
> - How can I know the my thread name from thread dump? can I get from the
> log?
> - When do I take a thread dump? on query execution or after query
> execution?
>
> Note: I got a thread name from log and checked in thread dump on query
> execution time and after query executed. Both time thread stack trace got
> different.
>
> If any other things are required then let me know I will send.
>
> Regards,
> Vishal Patel
> 
> From: Mikhail Khludnev 
> Sent: Saturday, May 16, 2020 2:23 PM
> To: solr-user 
> Subject: Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1
>
> Can you check Thread Dump in Solr Admin while Solr 8.3 crunches query for
> 34 seconds? Please share the deepest thread stack. This might give a clue
> what's going on there.
>
> On Sat, May 16, 2020 at 11:46 AM vishal patel <
> vishalpa

Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1

2020-05-17 Thread vishal patel
Solr 6.1.0 : 1881

Here is my thread dump stack trace and log for Solr 6.1.0. It is helpful for 
you.
My threads: qtp557041912-245356 and qtp557041912-245342.
https://drive.google.com/file/d/1owtotYEnJacMiEZyuGLk3AHQ9kQG5rww/view?usp=sharing

Regards
Vishal Patel



From: vishal patel 
Sent: Sunday, May 17, 2020 11:04 AM
To: solr-user 
Subject: Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1

Thanks for reply.

I know Query field value is large. But same thing is working fine in Solr 6.1.0 
and query executed within 300 milliseconds. Schema.xml and Solrconfig.xml are 
same. Why is it taking lots of time for execution in Solr 8.3.0?

Is there any changes in Solr 8.3.0?

Regards,
Vishal Patel

From: Mikhail Khludnev 
Sent: Saturday, May 16, 2020 6:55 PM
To: solr-user 
Subject: Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1

It seems this thread is doing heavy work, mind the bottom line.

202.8013ms
124.8008ms
qtp153245266-156 (156)
org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer.(BM25Similarity.java:219)
org.apache.lucene.search.similarities.BM25Similarity.scorer(BM25Similarity.java:192)
org.apache.lucene.search.similarities.PerFieldSimilarityWrapper.scorer(PerFieldSimilarityWrapper.java:47)
org.apache.lucene.search.TermQuery$TermWeight.(TermQuery.java:74)
org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:205)
org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:726)
org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:63)
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:231)
org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:726)
org.apache.lucene.search.TopFieldCollector.populateScores(TopFieldCollector.java:531)
org.apache.solr.search.grouping.distributed.command.TopGroupsFieldCommand.postCollect(TopGroupsFieldCommand.java:178)
org.apache.solr.search.grouping.CommandHandler.execute(CommandHandler.java:168)
org.apache.solr.handler.component.QueryComponent.doProcessGroupedDistributedSearchSecondPhase(QueryComponent.java:1403)
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:387)
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)


It seems like it ranks groups by query score, that doubtful thing to do.

>From the log. Here's how to recognize query running 25 sec "QTime=25063"


Query itself q=+msg_id:(10519539+10519540+10523575+10523576+ ... is
not what search engines are made for. They are purposed for short
query.

You may

1. leverage {!terms} query parser which might handle such long terms
list more efficiently

2. make sure you don't enable unnecessary grouping features, eg group
ranking in the stack above makes no sense for this kind of query


It's worth to revamp an overall approach in favor of query time
{!join} or index time join see {!parent}/nested docs.



On Sat, May 16, 2020 at 1:46 PM vishal patel 
wrote:

> Thanks for reply.
>
> I have taken a thread dump at the time of query execution. I do not know
> the thread name so send the All threads. I have also send the logs so you
> can get idea.
>
> Thread Dump All Stack Trace:
> https://drive.google.com/file/d/1N4rVXJoaAwNvPIY2aw57gKA9mb4vRTMR/view
> Solr 8.3 shard 1 log:
> https://drive.google.com/file/d/1h5d_eZfQvYET7JKzbNKZwhZ_RmaX7hWf/view
> Solr 8.3 shard 2 log:
> https://drive.google.com/file/d/19CRflzQ7n5BZBNaaC7EFszgzKKlPfIVl/view
>
> I have some questions regarding the thread dump
> - How can I know the my thread name from thread dump? can I get from the
> log?
> - When do I take a thread dump? on query execution or after query
> execution?
>
> Note: I got a thread name from log and checked in thread dump on query
> execution time and after query executed. Both time thread stack trace got
> different.
>
> If any other things are required then let me know I will send.
>
> Regards,
> Vishal Patel
> 
> From: Mikhail Khludnev 
> Sent: Saturday, May 16, 2020 2:23 PM
> To: solr-user 
> Subject: Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1
>
> Can you check Thread Dump in Solr Admin while Solr 8.3 crunches query for
> 34 seconds? Please share the deepest thread stack. This might give a clue
> what's going on there.
>
> On Sat, May 16, 2020 at 11:46 AM vishal patel <
> vishalpatel200...@outlook.com>
> wrote:
>
> > Any one is looking my issue? Please help me.
> >
> > Sent from Outlook<http://aka.ms/weboutlook>
> > 
> > From: vishal patel 
> > Sent: Friday, May 15, 2020 3:06 PM
>

Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1

2020-05-16 Thread vishal patel
Thanks for reply.

I know Query field value is large. But same thing is working fine in Solr 6.1.0 
and query executed within 300 milliseconds. Schema.xml and Solrconfig.xml are 
same. Why is it taking lots of time for execution in Solr 8.3.0?

Is there any changes in Solr 8.3.0?

Regards,
Vishal Patel

From: Mikhail Khludnev 
Sent: Saturday, May 16, 2020 6:55 PM
To: solr-user 
Subject: Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1

It seems this thread is doing heavy work, mind the bottom line.

202.8013ms
124.8008ms
qtp153245266-156 (156)
org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer.(BM25Similarity.java:219)
org.apache.lucene.search.similarities.BM25Similarity.scorer(BM25Similarity.java:192)
org.apache.lucene.search.similarities.PerFieldSimilarityWrapper.scorer(PerFieldSimilarityWrapper.java:47)
org.apache.lucene.search.TermQuery$TermWeight.(TermQuery.java:74)
org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:205)
org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:726)
org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:63)
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:231)
org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:726)
org.apache.lucene.search.TopFieldCollector.populateScores(TopFieldCollector.java:531)
org.apache.solr.search.grouping.distributed.command.TopGroupsFieldCommand.postCollect(TopGroupsFieldCommand.java:178)
org.apache.solr.search.grouping.CommandHandler.execute(CommandHandler.java:168)
org.apache.solr.handler.component.QueryComponent.doProcessGroupedDistributedSearchSecondPhase(QueryComponent.java:1403)
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:387)
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)


It seems like it ranks groups by query score, that doubtful thing to do.

>From the log. Here's how to recognize query running 25 sec "QTime=25063"


Query itself q=+msg_id:(10519539+10519540+10523575+10523576+ ... is
not what search engines are made for. They are purposed for short
query.

You may

1. leverage {!terms} query parser which might handle such long terms
list more efficiently

2. make sure you don't enable unnecessary grouping features, eg group
ranking in the stack above makes no sense for this kind of query


It's worth to revamp an overall approach in favor of query time
{!join} or index time join see {!parent}/nested docs.



On Sat, May 16, 2020 at 1:46 PM vishal patel 
wrote:

> Thanks for reply.
>
> I have taken a thread dump at the time of query execution. I do not know
> the thread name so send the All threads. I have also send the logs so you
> can get idea.
>
> Thread Dump All Stack Trace:
> https://drive.google.com/file/d/1N4rVXJoaAwNvPIY2aw57gKA9mb4vRTMR/view
> Solr 8.3 shard 1 log:
> https://drive.google.com/file/d/1h5d_eZfQvYET7JKzbNKZwhZ_RmaX7hWf/view
> Solr 8.3 shard 2 log:
> https://drive.google.com/file/d/19CRflzQ7n5BZBNaaC7EFszgzKKlPfIVl/view
>
> I have some questions regarding the thread dump
> - How can I know the my thread name from thread dump? can I get from the
> log?
> - When do I take a thread dump? on query execution or after query
> execution?
>
> Note: I got a thread name from log and checked in thread dump on query
> execution time and after query executed. Both time thread stack trace got
> different.
>
> If any other things are required then let me know I will send.
>
> Regards,
> Vishal Patel
> 
> From: Mikhail Khludnev 
> Sent: Saturday, May 16, 2020 2:23 PM
> To: solr-user 
> Subject: Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1
>
> Can you check Thread Dump in Solr Admin while Solr 8.3 crunches query for
> 34 seconds? Please share the deepest thread stack. This might give a clue
> what's going on there.
>
> On Sat, May 16, 2020 at 11:46 AM vishal patel <
> vishalpatel200...@outlook.com>
> wrote:
>
> > Any one is looking my issue? Please help me.
> >
> > Sent from Outlook<http://aka.ms/weboutlook>
> > 
> > From: vishal patel 
> > Sent: Friday, May 15, 2020 3:06 PM
> > To: solr-user@lucene.apache.org 
> > Subject: Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1
> >
> > I have result of query debug for both version so It will helpful.
> >
> > Solr 6.1 query debug URL
> > https://drive.google.com/file/d/1ixqpgAXsVLDZA-aUobJLrMOOefZX2NL1/view
> > Solr 8.3.1 query debug URL
> > https://drive.google.com/file/d/1MOKVE-iPZFuzRnDZhY9V6OsAKFT38U5r/

Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1

2020-05-16 Thread Mikhail Khludnev
It seems this thread is doing heavy work, mind the bottom line.

202.8013ms
124.8008ms
qtp153245266-156 (156)
org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer.(BM25Similarity.java:219)
org.apache.lucene.search.similarities.BM25Similarity.scorer(BM25Similarity.java:192)
org.apache.lucene.search.similarities.PerFieldSimilarityWrapper.scorer(PerFieldSimilarityWrapper.java:47)
org.apache.lucene.search.TermQuery$TermWeight.(TermQuery.java:74)
org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:205)
org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:726)
org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:63)
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:231)
org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:726)
org.apache.lucene.search.TopFieldCollector.populateScores(TopFieldCollector.java:531)
org.apache.solr.search.grouping.distributed.command.TopGroupsFieldCommand.postCollect(TopGroupsFieldCommand.java:178)
org.apache.solr.search.grouping.CommandHandler.execute(CommandHandler.java:168)
org.apache.solr.handler.component.QueryComponent.doProcessGroupedDistributedSearchSecondPhase(QueryComponent.java:1403)
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:387)
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)


It seems like it ranks groups by query score, that doubtful thing to do.

>From the log. Here's how to recognize query running 25 sec "QTime=25063"


Query itself q=+msg_id:(10519539+10519540+10523575+10523576+ ... is
not what search engines are made for. They are purposed for short
query.

You may

1. leverage {!terms} query parser which might handle such long terms
list more efficiently

2. make sure you don't enable unnecessary grouping features, eg group
ranking in the stack above makes no sense for this kind of query


It's worth to revamp an overall approach in favor of query time
{!join} or index time join see {!parent}/nested docs.



On Sat, May 16, 2020 at 1:46 PM vishal patel 
wrote:

> Thanks for reply.
>
> I have taken a thread dump at the time of query execution. I do not know
> the thread name so send the All threads. I have also send the logs so you
> can get idea.
>
> Thread Dump All Stack Trace:
> https://drive.google.com/file/d/1N4rVXJoaAwNvPIY2aw57gKA9mb4vRTMR/view
> Solr 8.3 shard 1 log:
> https://drive.google.com/file/d/1h5d_eZfQvYET7JKzbNKZwhZ_RmaX7hWf/view
> Solr 8.3 shard 2 log:
> https://drive.google.com/file/d/19CRflzQ7n5BZBNaaC7EFszgzKKlPfIVl/view
>
> I have some questions regarding the thread dump
> - How can I know the my thread name from thread dump? can I get from the
> log?
> - When do I take a thread dump? on query execution or after query
> execution?
>
> Note: I got a thread name from log and checked in thread dump on query
> execution time and after query executed. Both time thread stack trace got
> different.
>
> If any other things are required then let me know I will send.
>
> Regards,
> Vishal Patel
> 
> From: Mikhail Khludnev 
> Sent: Saturday, May 16, 2020 2:23 PM
> To: solr-user 
> Subject: Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1
>
> Can you check Thread Dump in Solr Admin while Solr 8.3 crunches query for
> 34 seconds? Please share the deepest thread stack. This might give a clue
> what's going on there.
>
> On Sat, May 16, 2020 at 11:46 AM vishal patel <
> vishalpatel200...@outlook.com>
> wrote:
>
> > Any one is looking my issue? Please help me.
> >
> > Sent from Outlook<http://aka.ms/weboutlook>
> > 
> > From: vishal patel 
> > Sent: Friday, May 15, 2020 3:06 PM
> > To: solr-user@lucene.apache.org 
> > Subject: Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1
> >
> > I have result of query debug for both version so It will helpful.
> >
> > Solr 6.1 query debug URL
> > https://drive.google.com/file/d/1ixqpgAXsVLDZA-aUobJLrMOOefZX2NL1/view
> > Solr 8.3.1 query debug URL
> > https://drive.google.com/file/d/1MOKVE-iPZFuzRnDZhY9V6OsAKFT38U5r/view
> >
> > I indexed same data in both version.
> >
> > I found score=1.0 in result of Solr 8.3.0 and score=0.016147947 in result
> > of Solr 8.6.1. Is there any impact of score in query execution? why is
> > score=1.0 in result of Solr 8.3.0?
> >
> > Regards,
> > Vishal Patel
> > ____
> > From: vishal patel 
> > Sent: Thursday, May 14, 2020 7:39 PM
> > To: solr-user@lucene.a

Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1

2020-05-16 Thread vishal patel
Thanks for reply.

I have taken a thread dump at the time of query execution. I do not know the 
thread name so send the All threads. I have also send the logs so you can get 
idea.

Thread Dump All Stack Trace:
https://drive.google.com/file/d/1N4rVXJoaAwNvPIY2aw57gKA9mb4vRTMR/view
Solr 8.3 shard 1 log:
https://drive.google.com/file/d/1h5d_eZfQvYET7JKzbNKZwhZ_RmaX7hWf/view
Solr 8.3 shard 2 log:
https://drive.google.com/file/d/19CRflzQ7n5BZBNaaC7EFszgzKKlPfIVl/view

I have some questions regarding the thread dump
- How can I know the my thread name from thread dump? can I get from the log?
- When do I take a thread dump? on query execution or after query execution?

Note: I got a thread name from log and checked in thread dump on query 
execution time and after query executed. Both time thread stack trace got 
different.

If any other things are required then let me know I will send.

Regards,
Vishal Patel

From: Mikhail Khludnev 
Sent: Saturday, May 16, 2020 2:23 PM
To: solr-user 
Subject: Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1

Can you check Thread Dump in Solr Admin while Solr 8.3 crunches query for
34 seconds? Please share the deepest thread stack. This might give a clue
what's going on there.

On Sat, May 16, 2020 at 11:46 AM vishal patel 
wrote:

> Any one is looking my issue? Please help me.
>
> Sent from Outlook<http://aka.ms/weboutlook>
> 
> From: vishal patel 
> Sent: Friday, May 15, 2020 3:06 PM
> To: solr-user@lucene.apache.org 
> Subject: Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1
>
> I have result of query debug for both version so It will helpful.
>
> Solr 6.1 query debug URL
> https://drive.google.com/file/d/1ixqpgAXsVLDZA-aUobJLrMOOefZX2NL1/view
> Solr 8.3.1 query debug URL
> https://drive.google.com/file/d/1MOKVE-iPZFuzRnDZhY9V6OsAKFT38U5r/view
>
> I indexed same data in both version.
>
> I found score=1.0 in result of Solr 8.3.0 and score=0.016147947 in result
> of Solr 8.6.1. Is there any impact of score in query execution? why is
> score=1.0 in result of Solr 8.3.0?
>
> Regards,
> Vishal Patel
> 
> From: vishal patel 
> Sent: Thursday, May 14, 2020 7:39 PM
> To: solr-user@lucene.apache.org 
> Subject: Performance issue in Query execution in Solr 8.3.0 and 8.5.1
>
> I am upgrading Solr 6.1.0 to Solr 8.3.0 or Solr 8.5.1.
>
> I get performance issue for query execution in Solr 8.3.0 or Solr 8.5.1
> when values of one field is large in query and group field is apply.
>
> My Solr URL :
> https://drive.google.com/file/d/1UqFE8I6M451Z1wWAu5_C1dzqYEOGjuH2/view
> My Solr config and schema :
> https://drive.google.com/drive/folders/1pJBxL0OOwAJSEC5uK_87ikaHEVGdDEEn<
> https://drive.google.com/drive/folders/1pJBxL0OOwAJSEC5uK_87ikaHEVGdDEEn>
>
> It takes 34 seconds in Solr 8.3.0 or Solr 8.5.1. Same URL takes 1.5
> seconds in Solr 6.1.0.
>
> Is there any changes or issue related to grouping in Solr 8.3.0 or 8.5.1?
>
>
> Regards,
> Vishal Patel
>
>

--
Sincerely yours
Mikhail Khludnev


Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1

2020-05-16 Thread Mikhail Khludnev
Can you check Thread Dump in Solr Admin while Solr 8.3 crunches query for
34 seconds? Please share the deepest thread stack. This might give a clue
what's going on there.

On Sat, May 16, 2020 at 11:46 AM vishal patel 
wrote:

> Any one is looking my issue? Please help me.
>
> Sent from Outlook<http://aka.ms/weboutlook>
> 
> From: vishal patel 
> Sent: Friday, May 15, 2020 3:06 PM
> To: solr-user@lucene.apache.org 
> Subject: Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1
>
> I have result of query debug for both version so It will helpful.
>
> Solr 6.1 query debug URL
> https://drive.google.com/file/d/1ixqpgAXsVLDZA-aUobJLrMOOefZX2NL1/view
> Solr 8.3.1 query debug URL
> https://drive.google.com/file/d/1MOKVE-iPZFuzRnDZhY9V6OsAKFT38U5r/view
>
> I indexed same data in both version.
>
> I found score=1.0 in result of Solr 8.3.0 and score=0.016147947 in result
> of Solr 8.6.1. Is there any impact of score in query execution? why is
> score=1.0 in result of Solr 8.3.0?
>
> Regards,
> Vishal Patel
> 
> From: vishal patel 
> Sent: Thursday, May 14, 2020 7:39 PM
> To: solr-user@lucene.apache.org 
> Subject: Performance issue in Query execution in Solr 8.3.0 and 8.5.1
>
> I am upgrading Solr 6.1.0 to Solr 8.3.0 or Solr 8.5.1.
>
> I get performance issue for query execution in Solr 8.3.0 or Solr 8.5.1
> when values of one field is large in query and group field is apply.
>
> My Solr URL :
> https://drive.google.com/file/d/1UqFE8I6M451Z1wWAu5_C1dzqYEOGjuH2/view
> My Solr config and schema :
> https://drive.google.com/drive/folders/1pJBxL0OOwAJSEC5uK_87ikaHEVGdDEEn<
> https://drive.google.com/drive/folders/1pJBxL0OOwAJSEC5uK_87ikaHEVGdDEEn>
>
> It takes 34 seconds in Solr 8.3.0 or Solr 8.5.1. Same URL takes 1.5
> seconds in Solr 6.1.0.
>
> Is there any changes or issue related to grouping in Solr 8.3.0 or 8.5.1?
>
>
> Regards,
> Vishal Patel
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1

2020-05-16 Thread vishal patel
Any one is looking my issue? Please help me.

Sent from Outlook<http://aka.ms/weboutlook>

From: vishal patel 
Sent: Friday, May 15, 2020 3:06 PM
To: solr-user@lucene.apache.org 
Subject: Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1

I have result of query debug for both version so It will helpful.

Solr 6.1 query debug URL
https://drive.google.com/file/d/1ixqpgAXsVLDZA-aUobJLrMOOefZX2NL1/view
Solr 8.3.1 query debug URL
https://drive.google.com/file/d/1MOKVE-iPZFuzRnDZhY9V6OsAKFT38U5r/view

I indexed same data in both version.

I found score=1.0 in result of Solr 8.3.0 and score=0.016147947 in result of 
Solr 8.6.1. Is there any impact of score in query execution? why is score=1.0 
in result of Solr 8.3.0?

Regards,
Vishal Patel

From: vishal patel 
Sent: Thursday, May 14, 2020 7:39 PM
To: solr-user@lucene.apache.org 
Subject: Performance issue in Query execution in Solr 8.3.0 and 8.5.1

I am upgrading Solr 6.1.0 to Solr 8.3.0 or Solr 8.5.1.

I get performance issue for query execution in Solr 8.3.0 or Solr 8.5.1 when 
values of one field is large in query and group field is apply.

My Solr URL : 
https://drive.google.com/file/d/1UqFE8I6M451Z1wWAu5_C1dzqYEOGjuH2/view
My Solr config and schema : 
https://drive.google.com/drive/folders/1pJBxL0OOwAJSEC5uK_87ikaHEVGdDEEn<https://drive.google.com/drive/folders/1pJBxL0OOwAJSEC5uK_87ikaHEVGdDEEn>

It takes 34 seconds in Solr 8.3.0 or Solr 8.5.1. Same URL takes 1.5 seconds in 
Solr 6.1.0.

Is there any changes or issue related to grouping in Solr 8.3.0 or 8.5.1?


Regards,
Vishal Patel



Re: Performance issue in Query execution in Solr 8.3.0 and 8.5.1

2020-05-15 Thread vishal patel
I have result of query debug for both version so It will helpful.

Solr 6.1 query debug URL
https://drive.google.com/file/d/1ixqpgAXsVLDZA-aUobJLrMOOefZX2NL1/view
Solr 8.3.1 query debug URL
https://drive.google.com/file/d/1MOKVE-iPZFuzRnDZhY9V6OsAKFT38U5r/view

I indexed same data in both version.

I found score=1.0 in result of Solr 8.3.0 and score=0.016147947 in result of 
Solr 8.6.1. Is there any impact of score in query execution? why is score=1.0 
in result of Solr 8.3.0?

Regards,
Vishal Patel

From: vishal patel 
Sent: Thursday, May 14, 2020 7:39 PM
To: solr-user@lucene.apache.org 
Subject: Performance issue in Query execution in Solr 8.3.0 and 8.5.1

I am upgrading Solr 6.1.0 to Solr 8.3.0 or Solr 8.5.1.

I get performance issue for query execution in Solr 8.3.0 or Solr 8.5.1 when 
values of one field is large in query and group field is apply.

My Solr URL : 
https://drive.google.com/file/d/1UqFE8I6M451Z1wWAu5_C1dzqYEOGjuH2/view
My Solr config and schema : 
https://drive.google.com/drive/folders/1pJBxL0OOwAJSEC5uK_87ikaHEVGdDEEn<https://drive.google.com/drive/folders/1pJBxL0OOwAJSEC5uK_87ikaHEVGdDEEn>

It takes 34 seconds in Solr 8.3.0 or Solr 8.5.1. Same URL takes 1.5 seconds in 
Solr 6.1.0.

Is there any changes or issue related to grouping in Solr 8.3.0 or 8.5.1?


Regards,
Vishal Patel



Performance issue in Query execution in Solr 8.3.0 and 8.5.1

2020-05-14 Thread vishal patel
I am upgrading Solr 6.1.0 to Solr 8.3.0 or Solr 8.5.1.

I get performance issue for query execution in Solr 8.3.0 or Solr 8.5.1 when 
values of one field is large in query and group field is apply.

My Solr URL : 
https://drive.google.com/file/d/1UqFE8I6M451Z1wWAu5_C1dzqYEOGjuH2/view
My Solr config and schema : 
https://drive.google.com/drive/folders/1pJBxL0OOwAJSEC5uK_87ikaHEVGdDEEn<https://drive.google.com/drive/folders/1pJBxL0OOwAJSEC5uK_87ikaHEVGdDEEn>

It takes 34 seconds in Solr 8.3.0 or Solr 8.5.1. Same URL takes 1.5 seconds in 
Solr 6.1.0.

Is there any changes or issue related to grouping in Solr 8.3.0 or 8.5.1?


Regards,
Vishal Patel



RE: Possible performance issue in my environment setup

2020-02-11 Thread Rudenko, Artur
Thanks for helping, I will keep investigating.

Just note, we did stopped indexing and we did not saw any significant changes.

Artur Rudenko
Analytics Developer
Customer Engagement Solutions, VERINT
T +972.74.747.2536 | M +972.52.425.4686

-Original Message-
From: Erick Erickson 
Sent: Tuesday, February 11, 2020 4:16 PM
To: solr-user@lucene.apache.org
Subject: Re: Possible performance issue in my environment setup

My first bit of advice would be to fix your autocommit intervals. There’s not 
much point in having openSearcher set to true _and_ having your soft commit 
times also set, all soft commit does is open a searcher and your autocommit 
does that.

I’d also reduce the time for autoCommit. You’re _probably_ being saved by the 
maxDoc entry,

Fix here is set openSearcher=false in autoCommit, and reduce the time. And let 
soft commit handle opening searchers. Here’s more than you want to know about 
how all this works:

https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Given your observation that you see a new searcher being opened 65K times, my 
bet is that you’re somehow committing far, far too often. What’s the rate of 
opening new searchers? Do those 65K entries span an hour? 10 days? Either 
you’re sending 50K docs very frequently or your client is sending commits.

So here’s what I’d do as a quick-n-dirty triage of where to look first:

- first turn off indexing. Does your query performance improve? If so, consider 
autowarming and tuning your commit interval.

- next, add =timing to some of your queries. That’ll tell you if a 
particular _component_ is taking a long time, something like faceting say.

- If nothing jumps out, throw a profiler at Solr to see where it’s spending 
it’s time.

Best,
Erick

> On Feb 11, 2020, at 6:17 AM, Rudenko, Artur  wrote:
>
> I'm am currently investigating a performance issue in our environment (20M 
> large PARENT documents and 800M nested small CHILD documents). The system 
> inserts about 400K PARENT documents and 16M CHILD documents per day.
> This is a solr cloud 8.3 environment with 7 servers (64 VCPU 128 GB RAM each, 
> 24GB allocated to Solr) with single collection (32 shards and replication 
> factor 2).
>
> Solr config related info :
>
> 
>  ${solr.autoCommit.maxTime:360}
>  ${solr.autoCommit.maxDocs:5}
>  true
>   
>
>
>   
>  ${solr.autoSoftCommit.maxTime:30}
>   
>
> I found in the solr log the following log line:
>
> [2020-02-10T00:01:00.522] INFO [qtp1686100174-100525]
> org.apache.solr.search.SolrIndexSearcher Opening
> [Searcher@37c9205b[0_shard29_replica_n112] realtime]
>
> From a log with 100K records, the above log record appears 65K times.
>
> We are experiencing extremely slow query time while the indexing time is fast 
> and sufficient.
>
> Is this a possible direction to keep investigating? If so, any advices?
>
>
> Thanks,
> Artur Rudenko
>
>
> This electronic message may contain proprietary and confidential information 
> of Verint Systems Inc., its affiliates and/or subsidiaries. The information 
> is intended to be for the use of the individual(s) or entity(ies) named 
> above. If you are not the intended recipient (or authorized to receive this 
> e-mail for the intended recipient), you may not use, copy, disclose or 
> distribute to anyone this message or any information contained in this 
> message. If you have received this electronic message in error, please notify 
> us by replying to this e-mail.



This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries. The information is 
intended to be for the use of the individual(s) or entity(ies) named above. If 
you are not the intended recipient (or authorized to receive this e-mail for 
the intended recipient), you may not use, copy, disclose or distribute to 
anyone this message or any information contained in this message. If you have 
received this electronic message in error, please notify us by replying to this 
e-mail.


Re: Possible performance issue in my environment setup

2020-02-11 Thread Erick Erickson
My first bit of advice would be to fix your autocommit intervals. There’s not 
much point
in having openSearcher set to true _and_ having your soft commit times also 
set, all
soft commit does is open a searcher and your autocommit does that.

I’d also reduce the time for autoCommit. You’re _probably_ being saved by the 
maxDoc entry,

Fix here is set openSearcher=false in autoCommit, and reduce the time. And let
soft commit handle opening searchers. Here’s
more than you want to know about how all this works:

https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Given your observation that you see a new searcher being opened
65K times, my bet is that you’re somehow committing far, far too
often. What’s the rate of opening new searchers? Do those 65K
entries span an hour? 10 days? Either you’re sending 50K docs very
frequently or your client is sending commits.

So here’s what I’d do as a quick-n-dirty triage of where to look first:

- first turn off indexing. Does your query performance improve? If so, consider 
autowarming and tuning your commit interval.

- next, add =timing to some of your queries. That’ll tell you if a 
particular _component_ is taking a long time, something like faceting say.

- If nothing jumps out, throw a profiler at Solr to see where it’s spending 
it’s time.

Best,
Erick

> On Feb 11, 2020, at 6:17 AM, Rudenko, Artur  wrote:
> 
> I'm am currently investigating a performance issue in our environment (20M 
> large PARENT documents and 800M nested small CHILD documents). The system 
> inserts about 400K PARENT documents and 16M CHILD documents per day.
> This is a solr cloud 8.3 environment with 7 servers (64 VCPU 128 GB RAM each, 
> 24GB allocated to Solr) with single collection (32 shards and replication 
> factor 2).
> 
> Solr config related info :
> 
> 
>  ${solr.autoCommit.maxTime:360}
>  ${solr.autoCommit.maxDocs:5}
>  true
>   
> 
> 
>   
>  ${solr.autoSoftCommit.maxTime:30}
>   
> 
> I found in the solr log the following log line:
> 
> [2020-02-10T00:01:00.522] INFO [qtp1686100174-100525] 
> org.apache.solr.search.SolrIndexSearcher Opening 
> [Searcher@37c9205b[0_shard29_replica_n112] realtime]
> 
> From a log with 100K records, the above log record appears 65K times.
> 
> We are experiencing extremely slow query time while the indexing time is fast 
> and sufficient.
> 
> Is this a possible direction to keep investigating? If so, any advices?
> 
> 
> Thanks,
> Artur Rudenko
> 
> 
> This electronic message may contain proprietary and confidential information 
> of Verint Systems Inc., its affiliates and/or subsidiaries. The information 
> is intended to be for the use of the individual(s) or entity(ies) named 
> above. If you are not the intended recipient (or authorized to receive this 
> e-mail for the intended recipient), you may not use, copy, disclose or 
> distribute to anyone this message or any information contained in this 
> message. If you have received this electronic message in error, please notify 
> us by replying to this e-mail.



Possible performance issue in my environment setup

2020-02-11 Thread Rudenko, Artur
I'm am currently investigating a performance issue in our environment (20M 
large PARENT documents and 800M nested small CHILD documents). The system 
inserts about 400K PARENT documents and 16M CHILD documents per day.
This is a solr cloud 8.3 environment with 7 servers (64 VCPU 128 GB RAM each, 
24GB allocated to Solr) with single collection (32 shards and replication 
factor 2).

Solr config related info :


  ${solr.autoCommit.maxTime:360}
  ${solr.autoCommit.maxDocs:5}
  true
   


   
  ${solr.autoSoftCommit.maxTime:30}
   

I found in the solr log the following log line:

[2020-02-10T00:01:00.522] INFO [qtp1686100174-100525] 
org.apache.solr.search.SolrIndexSearcher Opening 
[Searcher@37c9205b[0_shard29_replica_n112] realtime]

>From a log with 100K records, the above log record appears 65K times.

We are experiencing extremely slow query time while the indexing time is fast 
and sufficient.

Is this a possible direction to keep investigating? If so, any advices?


Thanks,
Artur Rudenko


This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries. The information is 
intended to be for the use of the individual(s) or entity(ies) named above. If 
you are not the intended recipient (or authorized to receive this e-mail for 
the intended recipient), you may not use, copy, disclose or distribute to 
anyone this message or any information contained in this message. If you have 
received this electronic message in error, please notify us by replying to this 
e-mail.


Re: Performance Issue since Solr 7.7 with wt=javabin

2020-01-30 Thread Karl Stoney
To be specific sorry, we already build 77 from source, I don’t have confidence 
in back porting the fix however so it would be awesome if someone would help 
out other 77 users who aren’t able to upgrade to 84 yet with this important fix 
:(

Get Outlook for iOS<https://aka.ms/o0ukef>

From: Karl Stoney 
Sent: Thursday, January 30, 2020 3:56:31 PM
To: solr-user@lucene.apache.org 
Subject: Re: Performance Issue since Solr 7.7 with wt=javabin

I don’t have confidence in my ability to do that, I was hoping someone could 
help out as moving to 8.4 is too much of a jump for me right now!

Would really appreciate it..

Get Outlook for 
iOS<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2Fo0ukefdata=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C31d1a342a9d14a76c2f908d7a59d039b%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637159966090301280sdata=uxA25dkVY4afCu9M7EB6cVDmd731oK10tlbqXrquZ7Q%3Dreserved=0>

From: Jan Høydahl 
Sent: Thursday, January 30, 2020 2:23:40 PM
To: solr-user 
Subject: Re: Performance Issue since Solr 7.7 with wt=javabin

No further releases are planned for 7.x, so your best bet is to patch 
branch_7_7 yourself and build a custom Solr version.

Jan

> 29. jan. 2020 kl. 20:54 skrev Karl Stoney 
> :
>
> Could anyone produce a patch for 7.7 please?
> 
> From: Florent Sithi 
> Sent: 29 January 2020 14:34
> To: solr-user@lucene.apache.org 
> Subject: Re: Performance Issue since Solr 7.7 with wt=javabin
>
> yes thanks so much, fixed in 8.4.0
>
>
>
> --
> Sent from: 
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.472066.n3.nabble.com%2FSolr-User-f472068.htmldata=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C31d1a342a9d14a76c2f908d7a59d039b%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637159966090301280sdata=MZ70%2BqLfibjYwveSBnQyL9ME0dmHLgok6ci4qdOpqH8%3Dreserved=0
>
> This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
> Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
> 9439967). This email and any files transmitted with it are confidential and 
> may be legally privileged, and intended solely for the use of the individual 
> or entity to whom they are addressed. If you have received this email in 
> error please notify the sender. This email message has been swept for the 
> presence of computer viruses.

This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
9439967). This email and any files transmitted with it are confidential and may 
be legally privileged, and intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this email in error 
please notify the sender. This email message has been swept for the presence of 
computer viruses.
This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
9439967). This email and any files transmitted with it are confidential and may 
be legally privileged, and intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this email in error 
please notify the sender. This email message has been swept for the presence of 
computer viruses.


Re: Performance Issue since Solr 7.7 with wt=javabin

2020-01-30 Thread Karl Stoney
I don’t have confidence in my ability to do that, I was hoping someone could 
help out as moving to 8.4 is too much of a jump for me right now!

Would really appreciate it..

Get Outlook for iOS<https://aka.ms/o0ukef>

From: Jan Høydahl 
Sent: Thursday, January 30, 2020 2:23:40 PM
To: solr-user 
Subject: Re: Performance Issue since Solr 7.7 with wt=javabin

No further releases are planned for 7.x, so your best bet is to patch 
branch_7_7 yourself and build a custom Solr version.

Jan

> 29. jan. 2020 kl. 20:54 skrev Karl Stoney 
> :
>
> Could anyone produce a patch for 7.7 please?
> 
> From: Florent Sithi 
> Sent: 29 January 2020 14:34
> To: solr-user@lucene.apache.org 
> Subject: Re: Performance Issue since Solr 7.7 with wt=javabin
>
> yes thanks so much, fixed in 8.4.0
>
>
>
> --
> Sent from: 
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.472066.n3.nabble.com%2FSolr-User-f472068.htmldata=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7Ce2428e6206154c5bec6e08d7a5900731%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637159910319192220sdata=VZ19SKPemqvUcvrdJEX%2FIZ7JEypez8lvG6U6aYYudjM%3Dreserved=0
>
> This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
> Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
> 9439967). This email and any files transmitted with it are confidential and 
> may be legally privileged, and intended solely for the use of the individual 
> or entity to whom they are addressed. If you have received this email in 
> error please notify the sender. This email message has been swept for the 
> presence of computer viruses.

This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
9439967). This email and any files transmitted with it are confidential and may 
be legally privileged, and intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this email in error 
please notify the sender. This email message has been swept for the presence of 
computer viruses.


Re: Performance Issue since Solr 7.7 with wt=javabin

2020-01-30 Thread Jan Høydahl
No further releases are planned for 7.x, so your best bet is to patch 
branch_7_7 yourself and build a custom Solr version.

Jan

> 29. jan. 2020 kl. 20:54 skrev Karl Stoney 
> :
> 
> Could anyone produce a patch for 7.7 please?
> 
> From: Florent Sithi 
> Sent: 29 January 2020 14:34
> To: solr-user@lucene.apache.org 
> Subject: Re: Performance Issue since Solr 7.7 with wt=javabin
> 
> yes thanks so much, fixed in 8.4.0
> 
> 
> 
> --
> Sent from: 
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.472066.n3.nabble.com%2FSolr-User-f472068.htmldata=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C908476b216bd4c6d8cb008d7a4d5af4d%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C1%7C637159109977374057sdata=tcRNMCd5JOMFnx9ukCqikpVUUB%2FTOCwmsrZsalNUc4I%3Dreserved=0
> 
> This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
> Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
> 9439967). This email and any files transmitted with it are confidential and 
> may be legally privileged, and intended solely for the use of the individual 
> or entity to whom they are addressed. If you have received this email in 
> error please notify the sender. This email message has been swept for the 
> presence of computer viruses.



Re: Performance Issue since Solr 7.7 with wt=javabin

2020-01-29 Thread Karl Stoney
Could anyone produce a patch for 7.7 please?

From: Florent Sithi 
Sent: 29 January 2020 14:34
To: solr-user@lucene.apache.org 
Subject: Re: Performance Issue since Solr 7.7 with wt=javabin

yes thanks so much, fixed in 8.4.0



--
Sent from: 
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.472066.n3.nabble.com%2FSolr-User-f472068.htmldata=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C908476b216bd4c6d8cb008d7a4d5af4d%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C1%7C637159109977374057sdata=tcRNMCd5JOMFnx9ukCqikpVUUB%2FTOCwmsrZsalNUc4I%3Dreserved=0

This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
9439967). This email and any files transmitted with it are confidential and may 
be legally privileged, and intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this email in error 
please notify the sender. This email message has been swept for the presence of 
computer viruses.


Re: Performance Issue since Solr 7.7 with wt=javabin

2020-01-29 Thread Florent Sithi
yes thanks so much, fixed in 8.4.0



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Performance Issue since Solr 7.7 with wt=javabin

2020-01-29 Thread Jan Høydahl
Check out SOLR-14013  which I 
believe is what you are looking for

Jan

> 29. jan. 2020 kl. 11:46 skrev Florent Sithi :
> 
> Hi Paras,
> 
> Thanks for your answer and your ideas ;)
> 
> I have the exact same issue than Andy  "wt=javabin=2" have really
> poor performances comprared to wt=json
> I'm using : 
> - solr 7.7.2 
> - OpenJDK8U-jdk_x64_linux_hotspot_8u222b10 or jdk-8u241-linux-x64 (same
> behaviour)
> 
> The server have much RAM and GC are not triggered during the test.
> I alternatively perform stress test with wt=javabin then wt=json without
> restarting solr. I presume warmups is not an issue there.
> 
> What do you mean by "rebuild the performance matrix"
> 
> Thanks
> Florent
> 
> 
> 
> 
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Performance Issue since Solr 7.7 with wt=javabin

2020-01-29 Thread Florent Sithi
Hi Paras,

Thanks for your answer and your ideas ;)

I have the exact same issue than Andy  "wt=javabin=2" have really
poor performances comprared to wt=json
I'm using : 
- solr 7.7.2 
- OpenJDK8U-jdk_x64_linux_hotspot_8u222b10 or jdk-8u241-linux-x64 (same
behaviour)

The server have much RAM and GC are not triggered during the test.
I alternatively perform stress test with wt=javabin then wt=json without
restarting solr. I presume warmups is not an issue there.

What do you mean by "rebuild the performance matrix"

Thanks
Florent







--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Performance Issue since Solr 7.7 with wt=javabin

2019-10-18 Thread Paras Lehana
Hi Andy,

Have you run performance benchmarking for sometime and made sure that the
Solr Caching and GC doesn't impact the performance? I recommend that you
should rebuild the performance matrix after few warmups and requests. Have
you invalidated this?

On Fri, 18 Oct 2019 at 12:35, Jan Høydahl  wrote:

> Hi,
>
> Did you find a solution to your performance problem?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 17. jun. 2019 kl. 17:17 skrev Andy Reek :
> >
> > Hi Solr team,
> >
> > we are using Solr in version 7.1 as search engine in our online shop
> (SAP Hybris). And as a task I needed to migrate to the most recent Solr in
> version 7 (7.7). Doing this I faced extreme performance issues. After
> debugging and testing different setups I found out, that they were caused
> by the parameter wt=javabin. These issues begin to raise since version 7.7,
> in 7.6 it is still working as fast as in 7.1.
> >
> > Just an example: Doing a simple query for *.* and wt=javabin in 7.6: 0.2
> seconds and in 7.7: 34 seconds!
> >
> > The configuration of the schema.xml and solrconfig.xml are equal in both
> versions. Version 8.1 has the same effect as 7.7. Using something other
> than wt=javabin (e.g. wt=xml) will work fast in every version - which is
> our current workaround.
> >
> >
> > To reproduce this issue I have attached my used configsets folder plus
> some test data. This all can be tested with docker and wget:
> >
> > Solr 7.6:
> > docker run -d --name solr7.6 -p 8983:8983 --rm -v
> $PWD/configsets/default:/opt/solr/server/solr/configsets/myconfig:ro
> solr:7.6-slim solr-create -c mycore -d
> /opt/solr/server/solr/configsets/myconfig
> > docker cp $PWD/data.json solr7.6:/opt/solr/data.json
> > docker exec -it --user solr solr7.6 bin/post -c mycore data.json
> > wget "
> http://localhost:8983/solr/mycore/select?q=*:*=javabin= <
> http://localhost:8983/solr/mycore/select?q=*:*=javabin=>"
> > (0.2s)
> >
> > Solr 7.7:
> > docker run -d --name solr7.7 -p 18983:8983 --rm -v
> $PWD/configsets/default:/opt/solr/server/solr/configsets/myconfig:ro
> solr:7.7-slim solr-create -c mycore -d
> /opt/solr/server/solr/configsets/myconfig
> > docker cp $PWD/data.json solr7.7:/opt/solr/data.json
> > docker exec -it --user solr solr7.7 bin/post -c mycore data.json
> > (34s)
> >
> > For me it seems like a bug. But if not, then please let me know what I
> did wrong ;-)
> >
> >
> > Best Regards,
> >
> > Andy Reek
> > Principal Software Developer
> >
> > diva-e Jena
> > Mälzerstraße 3, 07745 Jena, Deutschland
> >
> > T:   +49 (3641) 3678 (223)
> > F:   +49 (3641) 3678 101
> > andy.r...@diva-e.com 
> >
> > www.diva-e.com  follow us: facebook <
> https://www.facebook.com/digital.value.enterprise/?ref=hl>, twitter <
> https://twitter.com/diva_enterprise>, LinkedIn <
> https://www.linkedin.com/company/diva-e-digital-value-enterprise-gmbh>,
> Xing 
> > 
> >
> > diva-e AGETO GmbH
> > Handelsregister: HRB 210399 Amtsgericht Jena
> > Geschäftsführung: Sascha Sauer, Sirko Schneppe, Axel Jahn
> > 
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Software Programmer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.


Re: Performance Issue since Solr 7.7 with wt=javabin

2019-10-18 Thread Jan Høydahl
Hi,

Did you find a solution to your performance problem?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 17. jun. 2019 kl. 17:17 skrev Andy Reek :
> 
> Hi Solr team,
> 
> we are using Solr in version 7.1 as search engine in our online shop (SAP 
> Hybris). And as a task I needed to migrate to the most recent Solr in version 
> 7 (7.7). Doing this I faced extreme performance issues. After debugging and 
> testing different setups I found out, that they were caused by the parameter 
> wt=javabin. These issues begin to raise since version 7.7, in 7.6 it is still 
> working as fast as in 7.1.
> 
> Just an example: Doing a simple query for *.* and wt=javabin in 7.6: 0.2 
> seconds and in 7.7: 34 seconds!
> 
> The configuration of the schema.xml and solrconfig.xml are equal in both 
> versions. Version 8.1 has the same effect as 7.7. Using something other than 
> wt=javabin (e.g. wt=xml) will work fast in every version - which is our 
> current workaround.
> 
> 
> To reproduce this issue I have attached my used configsets folder plus some 
> test data. This all can be tested with docker and wget:
> 
> Solr 7.6:
> docker run -d --name solr7.6 -p 8983:8983 --rm -v 
> $PWD/configsets/default:/opt/solr/server/solr/configsets/myconfig:ro 
> solr:7.6-slim solr-create -c mycore -d 
> /opt/solr/server/solr/configsets/myconfig
> docker cp $PWD/data.json solr7.6:/opt/solr/data.json
> docker exec -it --user solr solr7.6 bin/post -c mycore data.json
> wget "http://localhost:8983/solr/mycore/select?q=*:*=javabin= 
> "
> (0.2s)
> 
> Solr 7.7:
> docker run -d --name solr7.7 -p 18983:8983 --rm -v 
> $PWD/configsets/default:/opt/solr/server/solr/configsets/myconfig:ro 
> solr:7.7-slim solr-create -c mycore -d 
> /opt/solr/server/solr/configsets/myconfig
> docker cp $PWD/data.json solr7.7:/opt/solr/data.json
> docker exec -it --user solr solr7.7 bin/post -c mycore data.json
> (34s)
> 
> For me it seems like a bug. But if not, then please let me know what I did 
> wrong ;-)
> 
> 
> Best Regards,
>  
> Andy Reek
> Principal Software Developer
>  
> diva-e Jena
> Mälzerstraße 3, 07745 Jena, Deutschland
> 
> T:   +49 (3641) 3678 (223)
> F:   +49 (3641) 3678 101
> andy.r...@diva-e.com 
>  
> www.diva-e.com  follow us: facebook 
> , twitter 
> , LinkedIn 
> , Xing 
> 
> 
>  
> diva-e AGETO GmbH
> Handelsregister: HRB 210399 Amtsgericht Jena
> Geschäftsführung: Sascha Sauer, Sirko Schneppe, Axel Jahn
> 



Re: Performance Issue since Solr 7.7 with wt=javabin

2019-10-12 Thread Noble Paul
How are you consuming the output? Are you using solrj?

On Tue, Jun 18, 2019, 1:27 AM Andy Reek  wrote:

> Hi Solr team,
>
>
> we are using Solr in version 7.1 as search engine in our online shop (SAP
> Hybris). And as a task I needed to migrate to the most recent Solr in
> version 7 (7.7). Doing this I faced extreme performance issues. After
> debugging and testing different setups I found out, that they were caused
> by the parameter wt=javabin. These issues begin to raise since version 7.7,
> in 7.6 it is still working as fast as in 7.1.
>
>
> Just an example: Doing a simple query for *.* and wt=javabin in 7.6: 0.2
> seconds and in 7.7: 34 seconds!
>
>
> The configuration of the schema.xml and solrconfig.xml are equal in both
> versions. Version 8.1 has the same effect as 7.7. Using something other
> than wt=javabin (e.g. wt=xml) will work fast in every version - which is
> our current workaround.
>
>
>
> To reproduce this issue I have attached my used configsets folder plus
> some test data. This all can be tested with docker and wget:
>
>
> Solr 7.6:
>
> docker run -d --name solr7.6 -p 8983:8983 --rm -v
> $PWD/configsets/default:/opt/solr/server/solr/configsets/myconfig:ro
> solr:7.6-slim solr-create -c mycore -d
> /opt/solr/server/solr/configsets/myconfig
> docker cp $PWD/data.json solr7.6:/opt/solr/data.json
> docker exec -it --user solr solr7.6 bin/post -c mycore data.json
> wget "http://localhost:8983/solr/mycore/select?q=*:*=javabin=;
> (0.2s)
>
> Solr 7.7:
> docker run -d --name solr7.7 -p 18983:8983 --rm -v
> $PWD/configsets/default:/opt/solr/server/solr/configsets/myconfig:ro
> solr:7.7-slim solr-create -c mycore -d
> /opt/solr/server/solr/configsets/myconfig
> docker cp $PWD/data.json solr7.7:/opt/solr/data.json
> docker exec -it --user solr solr7.7 bin/post -c mycore data.json
> (34s)
>
> For me it seems like a bug. But if not, then please let me know what I did
> wrong ;-)
>
>
>
> Best Regards,
>
>
>
> *Andy Reek*
>
> Principal Software Developer
>
>
>
> *diva-e* Jena
>
> Mälzerstraße 3, 07745 Jena, Deutschland
>
> T:   +49 (3641) 3678 (223)
>
> F:   +49 (3641) 3678 101
>
> *andy.r...@diva-e.com *
>
>
>
> *www.diva-e.com * follow us: facebook
> , twitter
> , LinkedIn
> *,*
>  *Xing *
>
> 
>
>
>
> *diva-e* AGETO GmbH
>
> Handelsregister: HRB 210399 Amtsgericht Jena
>
> Geschäftsführung: Sascha Sauer, Sirko Schneppe, Axel Jahn
>
>


Re: Solr 7.2.1 Collection Backup Performance issue

2018-09-21 Thread Walter Underwood
I don’t know how well it worked, but for a while, I did this to warm up the 
file buffers.
It should be OK if RAM is bigger than data. Though “cat” probably opens the 
files with
the hint that it will never re-read the data.

find /solr-data-dir -type f | xargs cat > /dev/null

Basically, read every file with the “cat” program, which should load it into OS 
file buffers.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 21, 2018, at 12:53 PM, Ganesh Sethuraman  
> wrote:
> 
> We don't have all the index size fit in into memory, but we still have an
> acceptable performance as of now for reads/query. But with BACKUP we are
> seeing a increase in the OS memory usage. Given that, I am sure many of
> system might be running with less memory but good enough for their
> application. But BACKUP is changing this equation now. Are there any best
> practices to do backup during off peak hours? and do some kind of warm-up?
> (if so how to warm up)
> 
> 
> On Tue, Sep 18, 2018 at 5:48 PM Ganesh Sethuraman 
> wrote:
> 
>> Thanks for the information. I thought backup is going to be more of the
>> disk activity. But I understand now that RAM is involved here as well. We
>> indeed did NOT have enough memory in this box, as it is 64GB box with index
>> size of 72GB, being backed up. The read (real time GET) performance was
>> better without BACKUP, could be because there was minimal disk access, but
>> with Backup running, reads (GET) are probably doing disk read  for every
>> request.
>> 
>> Thanks,
>> Ganesh
>> 
>> On Tue, Sep 18, 2018 at 3:43 PM Shawn Heisey  wrote:
>> 
>>> On 9/18/2018 11:00 AM, Ganesh Sethuraman wrote:
 We are using Solr 7.2.1 with SolrCloud with 35 collections with 1 node
>>> ZK
 ensemble (in lower environment, we will have 3 nodes ensemble) in AWS.
>>> We
 are testing to see if we have Async Solr Cloud backup  (
 https://lucene.apache.org/solr/guide/7_2/collections-api.html#backup)
>>> done
 every time we are create a new collection or update an existing
>>> collection.
 There are 1 replica and 8 shards per collection. Two Solr nodes.
 
 For the largest collection (index size of 80GB), we see that BACKUP to
>>> the
 EFS drive takes about ~10 mins. We are doing lot of /get (real time get)
 option from the application. We are seeing that that the performance
 significantly (2x) degrades on the read (get) performance when we
>>> BACK-UP
 is going on in parallel.
>>> 
>>> My best guess here is that you do not have enough memory. For good
>>> performance, Solr is extremely reliant on having certain parts of the
>>> index data sitting in memory, so that it doesn't have to actually read
>>> the disk to discover matches for a query.  When all is working well,
>>> that data will be read from memory instead of the disk.  Memory is MUCH
>>> MUCH faster than a disk.
>>> 
>>> Making a backup is going to read ALL of the index data.  So if you do
>>> not have enough spare memory to cache the entire index, reading the
>>> index to make the backup is going to push the important parts of the
>>> index out of the cache, and then Solr will have to actually go and read
>>> the disk in order to satisfy a query.
>>> 
>>> https://wiki.apache.org/solr/SolrPerformanceProblems#RAM
>>> 
>>> Can you gather a screenshot of your process list and put it on a file
>>> sharing website?  You'll find instructions on how to do this here:
>>> 
>>> 
>>> https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue
>>> 
>>> Thanks,
>>> Shawn
>>> 
>>> 



Re: Solr 7.2.1 Collection Backup Performance issue

2018-09-21 Thread Ganesh Sethuraman
We don't have all the index size fit in into memory, but we still have an
acceptable performance as of now for reads/query. But with BACKUP we are
seeing a increase in the OS memory usage. Given that, I am sure many of
system might be running with less memory but good enough for their
application. But BACKUP is changing this equation now. Are there any best
practices to do backup during off peak hours? and do some kind of warm-up?
(if so how to warm up)


On Tue, Sep 18, 2018 at 5:48 PM Ganesh Sethuraman 
wrote:

> Thanks for the information. I thought backup is going to be more of the
> disk activity. But I understand now that RAM is involved here as well. We
> indeed did NOT have enough memory in this box, as it is 64GB box with index
> size of 72GB, being backed up. The read (real time GET) performance was
> better without BACKUP, could be because there was minimal disk access, but
> with Backup running, reads (GET) are probably doing disk read  for every
> request.
>
> Thanks,
> Ganesh
>
> On Tue, Sep 18, 2018 at 3:43 PM Shawn Heisey  wrote:
>
>> On 9/18/2018 11:00 AM, Ganesh Sethuraman wrote:
>> > We are using Solr 7.2.1 with SolrCloud with 35 collections with 1 node
>> ZK
>> > ensemble (in lower environment, we will have 3 nodes ensemble) in AWS.
>> We
>> > are testing to see if we have Async Solr Cloud backup  (
>> > https://lucene.apache.org/solr/guide/7_2/collections-api.html#backup)
>> done
>> > every time we are create a new collection or update an existing
>> collection.
>> > There are 1 replica and 8 shards per collection. Two Solr nodes.
>> >
>> > For the largest collection (index size of 80GB), we see that BACKUP to
>> the
>> > EFS drive takes about ~10 mins. We are doing lot of /get (real time get)
>> > option from the application. We are seeing that that the performance
>> > significantly (2x) degrades on the read (get) performance when we
>> BACK-UP
>> > is going on in parallel.
>>
>> My best guess here is that you do not have enough memory. For good
>> performance, Solr is extremely reliant on having certain parts of the
>> index data sitting in memory, so that it doesn't have to actually read
>> the disk to discover matches for a query.  When all is working well,
>> that data will be read from memory instead of the disk.  Memory is MUCH
>> MUCH faster than a disk.
>>
>> Making a backup is going to read ALL of the index data.  So if you do
>> not have enough spare memory to cache the entire index, reading the
>> index to make the backup is going to push the important parts of the
>> index out of the cache, and then Solr will have to actually go and read
>> the disk in order to satisfy a query.
>>
>> https://wiki.apache.org/solr/SolrPerformanceProblems#RAM
>>
>> Can you gather a screenshot of your process list and put it on a file
>> sharing website?  You'll find instructions on how to do this here:
>>
>>
>> https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue
>>
>> Thanks,
>> Shawn
>>
>>


Re: Solr 7.2.1 Collection Backup Performance issue

2018-09-18 Thread Ganesh Sethuraman
Thanks for the information. I thought backup is going to be more of the
disk activity. But I understand now that RAM is involved here as well. We
indeed did NOT have enough memory in this box, as it is 64GB box with index
size of 72GB, being backed up. The read (real time GET) performance was
better without BACKUP, could be because there was minimal disk access, but
with Backup running, reads (GET) are probably doing disk read  for every
request.

Thanks,
Ganesh

On Tue, Sep 18, 2018 at 3:43 PM Shawn Heisey  wrote:

> On 9/18/2018 11:00 AM, Ganesh Sethuraman wrote:
> > We are using Solr 7.2.1 with SolrCloud with 35 collections with 1 node ZK
> > ensemble (in lower environment, we will have 3 nodes ensemble) in AWS. We
> > are testing to see if we have Async Solr Cloud backup  (
> > https://lucene.apache.org/solr/guide/7_2/collections-api.html#backup)
> done
> > every time we are create a new collection or update an existing
> collection.
> > There are 1 replica and 8 shards per collection. Two Solr nodes.
> >
> > For the largest collection (index size of 80GB), we see that BACKUP to
> the
> > EFS drive takes about ~10 mins. We are doing lot of /get (real time get)
> > option from the application. We are seeing that that the performance
> > significantly (2x) degrades on the read (get) performance when we BACK-UP
> > is going on in parallel.
>
> My best guess here is that you do not have enough memory. For good
> performance, Solr is extremely reliant on having certain parts of the
> index data sitting in memory, so that it doesn't have to actually read
> the disk to discover matches for a query.  When all is working well,
> that data will be read from memory instead of the disk.  Memory is MUCH
> MUCH faster than a disk.
>
> Making a backup is going to read ALL of the index data.  So if you do
> not have enough spare memory to cache the entire index, reading the
> index to make the backup is going to push the important parts of the
> index out of the cache, and then Solr will have to actually go and read
> the disk in order to satisfy a query.
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#RAM
>
> Can you gather a screenshot of your process list and put it on a file
> sharing website?  You'll find instructions on how to do this here:
>
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue
>
> Thanks,
> Shawn
>
>


Re: Solr 7.2.1 Collection Backup Performance issue

2018-09-18 Thread Shawn Heisey

On 9/18/2018 11:00 AM, Ganesh Sethuraman wrote:

We are using Solr 7.2.1 with SolrCloud with 35 collections with 1 node ZK
ensemble (in lower environment, we will have 3 nodes ensemble) in AWS. We
are testing to see if we have Async Solr Cloud backup  (
https://lucene.apache.org/solr/guide/7_2/collections-api.html#backup) done
every time we are create a new collection or update an existing collection.
There are 1 replica and 8 shards per collection. Two Solr nodes.

For the largest collection (index size of 80GB), we see that BACKUP to the
EFS drive takes about ~10 mins. We are doing lot of /get (real time get)
option from the application. We are seeing that that the performance
significantly (2x) degrades on the read (get) performance when we BACK-UP
is going on in parallel.


My best guess here is that you do not have enough memory. For good 
performance, Solr is extremely reliant on having certain parts of the 
index data sitting in memory, so that it doesn't have to actually read 
the disk to discover matches for a query.  When all is working well, 
that data will be read from memory instead of the disk.  Memory is MUCH 
MUCH faster than a disk.


Making a backup is going to read ALL of the index data.  So if you do 
not have enough spare memory to cache the entire index, reading the 
index to make the backup is going to push the important parts of the 
index out of the cache, and then Solr will have to actually go and read 
the disk in order to satisfy a query.


https://wiki.apache.org/solr/SolrPerformanceProblems#RAM

Can you gather a screenshot of your process list and put it on a file 
sharing website?  You'll find instructions on how to do this here:


https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue

Thanks,
Shawn



Solr 7.2.1 Collection Backup Performance issue

2018-09-18 Thread Ganesh Sethuraman
Hi

We are using Solr 7.2.1 with SolrCloud with 35 collections with 1 node ZK
ensemble (in lower environment, we will have 3 nodes ensemble) in AWS. We
are testing to see if we have Async Solr Cloud backup  (
https://lucene.apache.org/solr/guide/7_2/collections-api.html#backup) done
every time we are create a new collection or update an existing collection.
There are 1 replica and 8 shards per collection. Two Solr nodes.

For the largest collection (index size of 80GB), we see that BACKUP to the
EFS drive takes about ~10 mins. We are doing lot of /get (real time get)
option from the application. We are seeing that that the performance
significantly (2x) degrades on the read (get) performance when we BACK-UP
is going on in parallel.

Is there anyway to tune the system so that read does not suffer?

Any other best practices? like should we run back up during off peak load?

Is there a way to keep track of which collections are already backed up?


Re: Suggestions for debugging performance issue

2018-06-27 Thread Susheel Kumar
Did you try to see where/which component  like query, facet highlight... is
taking time by debugQuery=on when performance is slow. Just to rule out any
other component is not the culprit...

Thnx

On Mon, Jun 25, 2018 at 2:06 PM, Chris Troullis 
wrote:

> FYI to all, just as an update, we rebuilt the index in question from
> scratch for a second time this weekend and the problem went away on 1 node,
> but we were still seeing it on the other node. After restarting the
> problematic node, the problem went away. Still makes me a little uneasy as
> we weren't able to determine the cause, but at least we are back to normal
> query times now.
>
> Chris
>
> On Fri, Jun 15, 2018 at 8:06 AM, Chris Troullis 
> wrote:
>
> > Thanks Shawn,
> >
> > As mentioned previously, we are hard committing every 60 seconds, which
> we
> > have been doing for years, and have had no issues until enabling CDCR. We
> > have never seen large tlog sizes before, and even manually issuing a hard
> > commit to the collection does not reduce the size of the tlogs. I believe
> > this is because when using the CDCRUpdateLog the tlogs are not purged
> until
> > the docs have been replicated over. Anyway, since we manually purged the
> > tlogs they seem to now be staying at an acceptable size, so I don't think
> > that is the cause. The documents are not abnormally large, maybe ~20
> > string/numeric fields with simple whitespace tokenization.
> >
> > To answer your questions:
> >
> > -Solr version: 7.2.1
> > -What OS vendor and version Solr is running on: CentOS 6
> > -Total document count on the server (counting all index cores): 13
> > collections totaling ~60 million docs
> > -Total index size on the server (counting all cores): ~60GB
> > -What the total of all Solr heaps on the server is - 16GB heap (we had to
> > increase for CDCR because it was using a lot more heap).
> > -Whether there is software other than Solr on the server - No
> > -How much total memory the server has installed - 64 GB
> >
> > All of this has been consistent for multiple years across multiple Solr
> > versions and we have only started seeing this issue once we started using
> > the CDCRUpdateLog and CDCR, hence why that is the only real thing we can
> > point to. And again, the issue is only affecting 1 of the 13 collections
> on
> > the server, so if it was hardware/heap/GC related then I would think we
> > would be seeing it for every collection, not just one, as they all share
> > the same resources.
> >
> > I will take a look at the GC logs, but I don't think that is the cause.
> > The consistent nature of the slow performance doesn't really point to GC
> > issues, and we have profiling set up in New Relic and it does not show
> any
> > long/frequent GC pauses.
> >
> > We are going to try and rebuild the collection from scratch again this
> > weekend as that has solved the issue in some lower environments, although
> > it's not really consistent. At this point it's all we can think of to do.
> >
> > Thanks,
> >
> > Chris
> >
> >
> > On Thu, Jun 14, 2018 at 6:23 PM, Shawn Heisey 
> wrote:
> >
> >> On 6/12/2018 12:06 PM, Chris Troullis wrote:
> >> > The issue we are seeing is with 1 collection in particular, after we
> >> set up
> >> > CDCR, we are getting extremely slow response times when retrieving
> >> > documents. Debugging the query shows QTime is almost nothing, but the
> >> > overall responseTime is like 5x what it should be. The problem is
> >> > exacerbated by larger result sizes. IE retrieving 25 results is almost
> >> > normal, but 200 results is way slower than normal. I can run the exact
> >> same
> >> > query multiple times in a row (so everything should be cached), and I
> >> still
> >> > see response times way higher than another environment that is not
> using
> >> > CDCR. It doesn't seem to matter if CDCR is enabled or disabled, just
> >> that
> >> > we are using the CDCRUpdateLog. The problem started happening even
> >> before
> >> > we enabled CDCR.
> >> >
> >> > In a lower environment we noticed that the transaction logs were huge
> >> > (multiple gigs), so we tried stopping solr and deleting the tlogs then
> >> > restarting, and that seemed to fix the performance issue. We tried the
> >> same
> >> > thing in production the other day but it had no effect, so now I don't
> >> know
> >> > if it was a coincidence or not.
>

Re: Suggestions for debugging performance issue

2018-06-25 Thread Chris Troullis
FYI to all, just as an update, we rebuilt the index in question from
scratch for a second time this weekend and the problem went away on 1 node,
but we were still seeing it on the other node. After restarting the
problematic node, the problem went away. Still makes me a little uneasy as
we weren't able to determine the cause, but at least we are back to normal
query times now.

Chris

On Fri, Jun 15, 2018 at 8:06 AM, Chris Troullis 
wrote:

> Thanks Shawn,
>
> As mentioned previously, we are hard committing every 60 seconds, which we
> have been doing for years, and have had no issues until enabling CDCR. We
> have never seen large tlog sizes before, and even manually issuing a hard
> commit to the collection does not reduce the size of the tlogs. I believe
> this is because when using the CDCRUpdateLog the tlogs are not purged until
> the docs have been replicated over. Anyway, since we manually purged the
> tlogs they seem to now be staying at an acceptable size, so I don't think
> that is the cause. The documents are not abnormally large, maybe ~20
> string/numeric fields with simple whitespace tokenization.
>
> To answer your questions:
>
> -Solr version: 7.2.1
> -What OS vendor and version Solr is running on: CentOS 6
> -Total document count on the server (counting all index cores): 13
> collections totaling ~60 million docs
> -Total index size on the server (counting all cores): ~60GB
> -What the total of all Solr heaps on the server is - 16GB heap (we had to
> increase for CDCR because it was using a lot more heap).
> -Whether there is software other than Solr on the server - No
> -How much total memory the server has installed - 64 GB
>
> All of this has been consistent for multiple years across multiple Solr
> versions and we have only started seeing this issue once we started using
> the CDCRUpdateLog and CDCR, hence why that is the only real thing we can
> point to. And again, the issue is only affecting 1 of the 13 collections on
> the server, so if it was hardware/heap/GC related then I would think we
> would be seeing it for every collection, not just one, as they all share
> the same resources.
>
> I will take a look at the GC logs, but I don't think that is the cause.
> The consistent nature of the slow performance doesn't really point to GC
> issues, and we have profiling set up in New Relic and it does not show any
> long/frequent GC pauses.
>
> We are going to try and rebuild the collection from scratch again this
> weekend as that has solved the issue in some lower environments, although
> it's not really consistent. At this point it's all we can think of to do.
>
> Thanks,
>
> Chris
>
>
> On Thu, Jun 14, 2018 at 6:23 PM, Shawn Heisey  wrote:
>
>> On 6/12/2018 12:06 PM, Chris Troullis wrote:
>> > The issue we are seeing is with 1 collection in particular, after we
>> set up
>> > CDCR, we are getting extremely slow response times when retrieving
>> > documents. Debugging the query shows QTime is almost nothing, but the
>> > overall responseTime is like 5x what it should be. The problem is
>> > exacerbated by larger result sizes. IE retrieving 25 results is almost
>> > normal, but 200 results is way slower than normal. I can run the exact
>> same
>> > query multiple times in a row (so everything should be cached), and I
>> still
>> > see response times way higher than another environment that is not using
>> > CDCR. It doesn't seem to matter if CDCR is enabled or disabled, just
>> that
>> > we are using the CDCRUpdateLog. The problem started happening even
>> before
>> > we enabled CDCR.
>> >
>> > In a lower environment we noticed that the transaction logs were huge
>> > (multiple gigs), so we tried stopping solr and deleting the tlogs then
>> > restarting, and that seemed to fix the performance issue. We tried the
>> same
>> > thing in production the other day but it had no effect, so now I don't
>> know
>> > if it was a coincidence or not.
>>
>> There is one other cause besides CDCR buffering that I know of for huge
>> transaction logs, and it has nothing to do with CDCR:  A lack of hard
>> commits.  It is strongly recommended to have autoCommit set to a
>> reasonably short interval (about a minute in my opinion, but 15 seconds
>> is VERY common).  Most of the time openSearcher should be set to false
>> in the autoCommit config, and other mechanisms (which might include
>> autoSoftCommit) should be used for change visibility.  The example
>> autoCommit settings might seem superfluous because they don't affect
>> what's searchable, but it is actually a very important configurati

Re: Suggestions for debugging performance issue

2018-06-15 Thread Chris Troullis
Thanks Shawn,

As mentioned previously, we are hard committing every 60 seconds, which we
have been doing for years, and have had no issues until enabling CDCR. We
have never seen large tlog sizes before, and even manually issuing a hard
commit to the collection does not reduce the size of the tlogs. I believe
this is because when using the CDCRUpdateLog the tlogs are not purged until
the docs have been replicated over. Anyway, since we manually purged the
tlogs they seem to now be staying at an acceptable size, so I don't think
that is the cause. The documents are not abnormally large, maybe ~20
string/numeric fields with simple whitespace tokenization.

To answer your questions:

-Solr version: 7.2.1
-What OS vendor and version Solr is running on: CentOS 6
-Total document count on the server (counting all index cores): 13
collections totaling ~60 million docs
-Total index size on the server (counting all cores): ~60GB
-What the total of all Solr heaps on the server is - 16GB heap (we had to
increase for CDCR because it was using a lot more heap).
-Whether there is software other than Solr on the server - No
-How much total memory the server has installed - 64 GB

All of this has been consistent for multiple years across multiple Solr
versions and we have only started seeing this issue once we started using
the CDCRUpdateLog and CDCR, hence why that is the only real thing we can
point to. And again, the issue is only affecting 1 of the 13 collections on
the server, so if it was hardware/heap/GC related then I would think we
would be seeing it for every collection, not just one, as they all share
the same resources.

I will take a look at the GC logs, but I don't think that is the cause. The
consistent nature of the slow performance doesn't really point to GC
issues, and we have profiling set up in New Relic and it does not show any
long/frequent GC pauses.

We are going to try and rebuild the collection from scratch again this
weekend as that has solved the issue in some lower environments, although
it's not really consistent. At this point it's all we can think of to do.

Thanks,

Chris


On Thu, Jun 14, 2018 at 6:23 PM, Shawn Heisey  wrote:

> On 6/12/2018 12:06 PM, Chris Troullis wrote:
> > The issue we are seeing is with 1 collection in particular, after we set
> up
> > CDCR, we are getting extremely slow response times when retrieving
> > documents. Debugging the query shows QTime is almost nothing, but the
> > overall responseTime is like 5x what it should be. The problem is
> > exacerbated by larger result sizes. IE retrieving 25 results is almost
> > normal, but 200 results is way slower than normal. I can run the exact
> same
> > query multiple times in a row (so everything should be cached), and I
> still
> > see response times way higher than another environment that is not using
> > CDCR. It doesn't seem to matter if CDCR is enabled or disabled, just that
> > we are using the CDCRUpdateLog. The problem started happening even before
> > we enabled CDCR.
> >
> > In a lower environment we noticed that the transaction logs were huge
> > (multiple gigs), so we tried stopping solr and deleting the tlogs then
> > restarting, and that seemed to fix the performance issue. We tried the
> same
> > thing in production the other day but it had no effect, so now I don't
> know
> > if it was a coincidence or not.
>
> There is one other cause besides CDCR buffering that I know of for huge
> transaction logs, and it has nothing to do with CDCR:  A lack of hard
> commits.  It is strongly recommended to have autoCommit set to a
> reasonably short interval (about a minute in my opinion, but 15 seconds
> is VERY common).  Most of the time openSearcher should be set to false
> in the autoCommit config, and other mechanisms (which might include
> autoSoftCommit) should be used for change visibility.  The example
> autoCommit settings might seem superfluous because they don't affect
> what's searchable, but it is actually a very important configuration to
> keep.
>
> Are the docs in this collection really big, by chance?
>
> As I went through previous threads you've started on the mailing list, I
> have noticed that none of your messages provided some details that would
> be useful for looking into performance problems:
>
>  * What OS vendor and version Solr is running on.
>  * Total document count on the server (counting all index cores).
>  * Total index size on the server (counting all cores).
>  * What the total of all Solr heaps on the server is.
>  * Whether there is software other than Solr on the server.
>  * How much total memory the server has installed.
>
> If you name the OS, I can use that information to help you gather some
> additional info which will actually show me most of that list.  Total
> docum

Re: Suggestions for debugging performance issue

2018-06-14 Thread Shawn Heisey
On 6/12/2018 12:06 PM, Chris Troullis wrote:
> The issue we are seeing is with 1 collection in particular, after we set up
> CDCR, we are getting extremely slow response times when retrieving
> documents. Debugging the query shows QTime is almost nothing, but the
> overall responseTime is like 5x what it should be. The problem is
> exacerbated by larger result sizes. IE retrieving 25 results is almost
> normal, but 200 results is way slower than normal. I can run the exact same
> query multiple times in a row (so everything should be cached), and I still
> see response times way higher than another environment that is not using
> CDCR. It doesn't seem to matter if CDCR is enabled or disabled, just that
> we are using the CDCRUpdateLog. The problem started happening even before
> we enabled CDCR.
>
> In a lower environment we noticed that the transaction logs were huge
> (multiple gigs), so we tried stopping solr and deleting the tlogs then
> restarting, and that seemed to fix the performance issue. We tried the same
> thing in production the other day but it had no effect, so now I don't know
> if it was a coincidence or not.

There is one other cause besides CDCR buffering that I know of for huge
transaction logs, and it has nothing to do with CDCR:  A lack of hard
commits.  It is strongly recommended to have autoCommit set to a
reasonably short interval (about a minute in my opinion, but 15 seconds
is VERY common).  Most of the time openSearcher should be set to false
in the autoCommit config, and other mechanisms (which might include
autoSoftCommit) should be used for change visibility.  The example
autoCommit settings might seem superfluous because they don't affect
what's searchable, but it is actually a very important configuration to
keep.

Are the docs in this collection really big, by chance?

As I went through previous threads you've started on the mailing list, I
have noticed that none of your messages provided some details that would
be useful for looking into performance problems:

 * What OS vendor and version Solr is running on.
 * Total document count on the server (counting all index cores).
 * Total index size on the server (counting all cores).
 * What the total of all Solr heaps on the server is.
 * Whether there is software other than Solr on the server.
 * How much total memory the server has installed.

If you name the OS, I can use that information to help you gather some
additional info which will actually show me most of that list.  Total
document count is something that I cannot get from the info I would help
you gather.

Something else that can cause performance issues is GC pauses.  If you
provide a GC log (The script that starts Solr logs this by default), we
can analyze it to see if that's a problem.

Attachments to messages on the mailing list typically do not make it to
the list, so a file sharing website is a better way to share large
logfiles.  A paste website is good for log data that's smaller.

Thanks,
Shawn



Re: Suggestions for debugging performance issue

2018-06-13 Thread Chris Troullis
't usually an issue, I'd expect you to see very
> large
> > > >> CPU spikes and/or I/O contention if that was the case.
> > > >>
> > > >> CDCR shouldn't really be that much of a hit, mostly I/O. Solr will
> > > >> have to look in the tlogs to get you the very most recent copy, so
> the
> > > >> first place I'd look is keeping the tlogs under control first.
> > > >>
> > > >> The other possibility (again unrelated to CDCR) is if your spikes
> are
> > > >> coincident with soft commits or hard-commits-with-
> opensearcher-true.
> > > >>
> > > >> In all, though, none of the usual suspects seems to make sense here
> > > >> since you say that absent configuring CDCR things seem to run fine.
> So
> > > >> I'd look at the tlogs and my commit intervals. Once the tlogs are
> > > >> under control then move on to other possibilities if the problem
> > > >> persists...
> > > >>
> > > >> Best,
> > > >> Erick
> > > >>
> > > >>
> > > >> On Tue, Jun 12, 2018 at 11:06 AM, Chris Troullis <
> > cptroul...@gmail.com>
> > > >> wrote:
> > > >> > Hi all,
> > > >> >
> > > >> > Recently we have gone live using CDCR on our 2 node solr cloud
> > cluster
> > > >> > (7.2.1). From a CDCR perspective, everything seems to be working
> > > >> > fine...collections are staying in sync across the cluster,
> > everything
> > > >> looks
> > > >> > good.
> > > >> >
> > > >> > The issue we are seeing is with 1 collection in particular, after
> we
> > > set
> > > >> up
> > > >> > CDCR, we are getting extremely slow response times when retrieving
> > > >> > documents. Debugging the query shows QTime is almost nothing, but
> > the
> > > >> > overall responseTime is like 5x what it should be. The problem is
> > > >> > exacerbated by larger result sizes. IE retrieving 25 results is
> > almost
> > > >> > normal, but 200 results is way slower than normal. I can run the
> > exact
> > > >> same
> > > >> > query multiple times in a row (so everything should be cached),
> and
> > I
> > > >> still
> > > >> > see response times way higher than another environment that is not
> > > using
> > > >> > CDCR. It doesn't seem to matter if CDCR is enabled or disabled,
> just
> > > that
> > > >> > we are using the CDCRUpdateLog. The problem started happening even
> > > before
> > > >> > we enabled CDCR.
> > > >> >
> > > >> > In a lower environment we noticed that the transaction logs were
> > huge
> > > >> > (multiple gigs), so we tried stopping solr and deleting the tlogs
> > then
> > > >> > restarting, and that seemed to fix the performance issue. We tried
> > the
> > > >> same
> > > >> > thing in production the other day but it had no effect, so now I
> > don't
> > > >> know
> > > >> > if it was a coincidence or not.
> > > >> >
> > > >> > Things that we have tried:
> > > >> >
> > > >> > -Completely deleting the collection and rebuilding from scratch
> > > >> > -Running the query directly from solr admin to eliminate other
> > causes
> > > >> > -Doing a tcpdump on the solr node to eliminate a network issue
> > > >> >
> > > >> > None of these things have yielded any results. It seems very
> > > >> inconsistent.
> > > >> > Some environments we can reproduce it in, others we can't.
> > > >> > Hardware/configuration/network is exactly the same between all
> > > >> > envrionments. The only thing that we have narrowed it down to is
> we
> > > are
> > > >> > pretty sure it has something to do with CDCR, as the issue only
> > > started
> > > >> > when we started using it.
> > > >> >
> > > >> > I'm wondering if any of this sparks any ideas from anyone, or if
> > > people
> > > >> > have suggestions as to how I can figure out what is causing this
> > long
> > > >> query
> > > >> > response time? The debug flag on the query seems more geared
> towards
> > > >> seeing
> > > >> > where time is spent in the actual query, which is nothing in my
> > case.
> > > The
> > > >> > time is spent retrieving the results, which I don't have much
> > > information
> > > >> > on. I have tried increasing the log level but nothing jumps out at
> > me
> > > in
> > > >> > the solr logs. Is there something I can look for specifically to
> > help
> > > >> debug
> > > >> > this?
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> > Chris
> > > >>
> > >
> >
>


Re: Suggestions for debugging performance issue

2018-06-13 Thread Susheel Kumar
gt;
> > >> > Recently we have gone live using CDCR on our 2 node solr cloud
> cluster
> > >> > (7.2.1). From a CDCR perspective, everything seems to be working
> > >> > fine...collections are staying in sync across the cluster,
> everything
> > >> looks
> > >> > good.
> > >> >
> > >> > The issue we are seeing is with 1 collection in particular, after we
> > set
> > >> up
> > >> > CDCR, we are getting extremely slow response times when retrieving
> > >> > documents. Debugging the query shows QTime is almost nothing, but
> the
> > >> > overall responseTime is like 5x what it should be. The problem is
> > >> > exacerbated by larger result sizes. IE retrieving 25 results is
> almost
> > >> > normal, but 200 results is way slower than normal. I can run the
> exact
> > >> same
> > >> > query multiple times in a row (so everything should be cached), and
> I
> > >> still
> > >> > see response times way higher than another environment that is not
> > using
> > >> > CDCR. It doesn't seem to matter if CDCR is enabled or disabled, just
> > that
> > >> > we are using the CDCRUpdateLog. The problem started happening even
> > before
> > >> > we enabled CDCR.
> > >> >
> > >> > In a lower environment we noticed that the transaction logs were
> huge
> > >> > (multiple gigs), so we tried stopping solr and deleting the tlogs
> then
> > >> > restarting, and that seemed to fix the performance issue. We tried
> the
> > >> same
> > >> > thing in production the other day but it had no effect, so now I
> don't
> > >> know
> > >> > if it was a coincidence or not.
> > >> >
> > >> > Things that we have tried:
> > >> >
> > >> > -Completely deleting the collection and rebuilding from scratch
> > >> > -Running the query directly from solr admin to eliminate other
> causes
> > >> > -Doing a tcpdump on the solr node to eliminate a network issue
> > >> >
> > >> > None of these things have yielded any results. It seems very
> > >> inconsistent.
> > >> > Some environments we can reproduce it in, others we can't.
> > >> > Hardware/configuration/network is exactly the same between all
> > >> > envrionments. The only thing that we have narrowed it down to is we
> > are
> > >> > pretty sure it has something to do with CDCR, as the issue only
> > started
> > >> > when we started using it.
> > >> >
> > >> > I'm wondering if any of this sparks any ideas from anyone, or if
> > people
> > >> > have suggestions as to how I can figure out what is causing this
> long
> > >> query
> > >> > response time? The debug flag on the query seems more geared towards
> > >> seeing
> > >> > where time is spent in the actual query, which is nothing in my
> case.
> > The
> > >> > time is spent retrieving the results, which I don't have much
> > information
> > >> > on. I have tried increasing the log level but nothing jumps out at
> me
> > in
> > >> > the solr logs. Is there something I can look for specifically to
> help
> > >> debug
> > >> > this?
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Chris
> > >>
> >
>


Re: Suggestions for debugging performance issue

2018-06-13 Thread Chris Troullis
nning these queries, our load is
> > pretty much flat on all accounts.
> >
> > I know it seems odd that CDCR would be the culprit, but it's really the
> > only thing we've changed, and we have other environments running the
> exact
> > same setup with no issues, so it is really making us tear our hair out.
> And
> > when we cleaned up the huge tlogs it didn't seem to make any difference
> in
> > the query time (I was originally thinking it was somehow searching
> through
> > the tlogs for documents, and that's why it was taking so long to retrieve
> > the results, but I don't know if that is actually how it works).
> >
> > Are you aware of any logger settings we could increase to potentially
> get a
> > better idea of where the time is being spent? I took the eventual query
> > response and just hosted as a static file on the same machine via nginx
> and
> > it downloaded lightning fast (I was trying to rule out network as the
> > culprit), so it seems like the time is being spent somewhere in solr.
> >
> > Thanks,
> > Chris
> >
> > On Tue, Jun 12, 2018 at 2:45 PM, Erick Erickson  >
> > wrote:
> >
> >> Having the tlogs be huge is a red flag. Do you have buffering enabled
> >> in CDCR? This was something of a legacy option that's going to be
> >> removed, it's been made obsolete by the ability of CDCR to bootstrap
> >> the entire index. Buffering should be disabled always.
> >>
> >> Another reason tlogs can grow is if you have very long times between
> >> hard commits. I doubt that's your issue, but just in case.
> >>
> >> And the final reason tlogs can grow is that the connection between
> >> source and target clusters is broken, but that doesn't sound like what
> >> you're seeing either since you say the target cluster is keeping up.
> >>
> >> The process of assembling the response can be long. If you have any
> >> stored fields (and not docValues-enabled), Solr will
> >> 1> seek the stored data on disk
> >> 2> decompress (min 16K blocks)
> >> 3> transmit the thing back to your client
> >>
> >> The decompressed version of the doc will be held in the
> >> documentResultCache configured in solrconfig.xml, so it may or may not
> >> be cached in memory. That said, this stuff is all MemMapped and the
> >> decompression isn't usually an issue, I'd expect you to see very large
> >> CPU spikes and/or I/O contention if that was the case.
> >>
> >> CDCR shouldn't really be that much of a hit, mostly I/O. Solr will
> >> have to look in the tlogs to get you the very most recent copy, so the
> >> first place I'd look is keeping the tlogs under control first.
> >>
> >> The other possibility (again unrelated to CDCR) is if your spikes are
> >> coincident with soft commits or hard-commits-with-opensearcher-true.
> >>
> >> In all, though, none of the usual suspects seems to make sense here
> >> since you say that absent configuring CDCR things seem to run fine. So
> >> I'd look at the tlogs and my commit intervals. Once the tlogs are
> >> under control then move on to other possibilities if the problem
> >> persists...
> >>
> >> Best,
> >> Erick
> >>
> >>
> >> On Tue, Jun 12, 2018 at 11:06 AM, Chris Troullis 
> >> wrote:
> >> > Hi all,
> >> >
> >> > Recently we have gone live using CDCR on our 2 node solr cloud cluster
> >> > (7.2.1). From a CDCR perspective, everything seems to be working
> >> > fine...collections are staying in sync across the cluster, everything
> >> looks
> >> > good.
> >> >
> >> > The issue we are seeing is with 1 collection in particular, after we
> set
> >> up
> >> > CDCR, we are getting extremely slow response times when retrieving
> >> > documents. Debugging the query shows QTime is almost nothing, but the
> >> > overall responseTime is like 5x what it should be. The problem is
> >> > exacerbated by larger result sizes. IE retrieving 25 results is almost
> >> > normal, but 200 results is way slower than normal. I can run the exact
> >> same
> >> > query multiple times in a row (so everything should be cached), and I
> >> still
> >> > see response times way higher than another environment that is not
> using
> >> > CDCR. It doesn't seem to matter if CDCR is enabled or disabled, just
> that
> >> > we are using the CDCR

Re: Suggestions for debugging performance issue

2018-06-13 Thread Erick Erickson
sion isn't usually an issue, I'd expect you to see very large
>> CPU spikes and/or I/O contention if that was the case.
>>
>> CDCR shouldn't really be that much of a hit, mostly I/O. Solr will
>> have to look in the tlogs to get you the very most recent copy, so the
>> first place I'd look is keeping the tlogs under control first.
>>
>> The other possibility (again unrelated to CDCR) is if your spikes are
>> coincident with soft commits or hard-commits-with-opensearcher-true.
>>
>> In all, though, none of the usual suspects seems to make sense here
>> since you say that absent configuring CDCR things seem to run fine. So
>> I'd look at the tlogs and my commit intervals. Once the tlogs are
>> under control then move on to other possibilities if the problem
>> persists...
>>
>> Best,
>> Erick
>>
>>
>> On Tue, Jun 12, 2018 at 11:06 AM, Chris Troullis 
>> wrote:
>> > Hi all,
>> >
>> > Recently we have gone live using CDCR on our 2 node solr cloud cluster
>> > (7.2.1). From a CDCR perspective, everything seems to be working
>> > fine...collections are staying in sync across the cluster, everything
>> looks
>> > good.
>> >
>> > The issue we are seeing is with 1 collection in particular, after we set
>> up
>> > CDCR, we are getting extremely slow response times when retrieving
>> > documents. Debugging the query shows QTime is almost nothing, but the
>> > overall responseTime is like 5x what it should be. The problem is
>> > exacerbated by larger result sizes. IE retrieving 25 results is almost
>> > normal, but 200 results is way slower than normal. I can run the exact
>> same
>> > query multiple times in a row (so everything should be cached), and I
>> still
>> > see response times way higher than another environment that is not using
>> > CDCR. It doesn't seem to matter if CDCR is enabled or disabled, just that
>> > we are using the CDCRUpdateLog. The problem started happening even before
>> > we enabled CDCR.
>> >
>> > In a lower environment we noticed that the transaction logs were huge
>> > (multiple gigs), so we tried stopping solr and deleting the tlogs then
>> > restarting, and that seemed to fix the performance issue. We tried the
>> same
>> > thing in production the other day but it had no effect, so now I don't
>> know
>> > if it was a coincidence or not.
>> >
>> > Things that we have tried:
>> >
>> > -Completely deleting the collection and rebuilding from scratch
>> > -Running the query directly from solr admin to eliminate other causes
>> > -Doing a tcpdump on the solr node to eliminate a network issue
>> >
>> > None of these things have yielded any results. It seems very
>> inconsistent.
>> > Some environments we can reproduce it in, others we can't.
>> > Hardware/configuration/network is exactly the same between all
>> > envrionments. The only thing that we have narrowed it down to is we are
>> > pretty sure it has something to do with CDCR, as the issue only started
>> > when we started using it.
>> >
>> > I'm wondering if any of this sparks any ideas from anyone, or if people
>> > have suggestions as to how I can figure out what is causing this long
>> query
>> > response time? The debug flag on the query seems more geared towards
>> seeing
>> > where time is spent in the actual query, which is nothing in my case. The
>> > time is spent retrieving the results, which I don't have much information
>> > on. I have tried increasing the log level but nothing jumps out at me in
>> > the solr logs. Is there something I can look for specifically to help
>> debug
>> > this?
>> >
>> > Thanks,
>> >
>> > Chris
>>


Re: Suggestions for debugging performance issue

2018-06-13 Thread Chris Troullis
Thanks Erick. A little more info:

-We do have buffering disabled everywhere, as I had read multiple posts on
the mailing list regarding the issue you described.
-We soft commit (with opensearcher=true) pretty frequently (15 seconds) as
we have some NRT requirements. We hard commit every 60 seconds. We never
commit manually, only via the autocommit timers. We have been using these
settings for a long time and have never had any issues until recently. And
all of our other indexes are fine (some larger than this one).
-We do have documentResultCache enabled, although it's not very big. But I
can literally spam the same query over and over again with no other queries
hitting the box, so all the results should be cached.
-We don't see any CPU/IO spikes when running these queries, our load is
pretty much flat on all accounts.

I know it seems odd that CDCR would be the culprit, but it's really the
only thing we've changed, and we have other environments running the exact
same setup with no issues, so it is really making us tear our hair out. And
when we cleaned up the huge tlogs it didn't seem to make any difference in
the query time (I was originally thinking it was somehow searching through
the tlogs for documents, and that's why it was taking so long to retrieve
the results, but I don't know if that is actually how it works).

Are you aware of any logger settings we could increase to potentially get a
better idea of where the time is being spent? I took the eventual query
response and just hosted as a static file on the same machine via nginx and
it downloaded lightning fast (I was trying to rule out network as the
culprit), so it seems like the time is being spent somewhere in solr.

Thanks,
Chris

On Tue, Jun 12, 2018 at 2:45 PM, Erick Erickson 
wrote:

> Having the tlogs be huge is a red flag. Do you have buffering enabled
> in CDCR? This was something of a legacy option that's going to be
> removed, it's been made obsolete by the ability of CDCR to bootstrap
> the entire index. Buffering should be disabled always.
>
> Another reason tlogs can grow is if you have very long times between
> hard commits. I doubt that's your issue, but just in case.
>
> And the final reason tlogs can grow is that the connection between
> source and target clusters is broken, but that doesn't sound like what
> you're seeing either since you say the target cluster is keeping up.
>
> The process of assembling the response can be long. If you have any
> stored fields (and not docValues-enabled), Solr will
> 1> seek the stored data on disk
> 2> decompress (min 16K blocks)
> 3> transmit the thing back to your client
>
> The decompressed version of the doc will be held in the
> documentResultCache configured in solrconfig.xml, so it may or may not
> be cached in memory. That said, this stuff is all MemMapped and the
> decompression isn't usually an issue, I'd expect you to see very large
> CPU spikes and/or I/O contention if that was the case.
>
> CDCR shouldn't really be that much of a hit, mostly I/O. Solr will
> have to look in the tlogs to get you the very most recent copy, so the
> first place I'd look is keeping the tlogs under control first.
>
> The other possibility (again unrelated to CDCR) is if your spikes are
> coincident with soft commits or hard-commits-with-opensearcher-true.
>
> In all, though, none of the usual suspects seems to make sense here
> since you say that absent configuring CDCR things seem to run fine. So
> I'd look at the tlogs and my commit intervals. Once the tlogs are
> under control then move on to other possibilities if the problem
> persists...
>
> Best,
> Erick
>
>
> On Tue, Jun 12, 2018 at 11:06 AM, Chris Troullis 
> wrote:
> > Hi all,
> >
> > Recently we have gone live using CDCR on our 2 node solr cloud cluster
> > (7.2.1). From a CDCR perspective, everything seems to be working
> > fine...collections are staying in sync across the cluster, everything
> looks
> > good.
> >
> > The issue we are seeing is with 1 collection in particular, after we set
> up
> > CDCR, we are getting extremely slow response times when retrieving
> > documents. Debugging the query shows QTime is almost nothing, but the
> > overall responseTime is like 5x what it should be. The problem is
> > exacerbated by larger result sizes. IE retrieving 25 results is almost
> > normal, but 200 results is way slower than normal. I can run the exact
> same
> > query multiple times in a row (so everything should be cached), and I
> still
> > see response times way higher than another environment that is not using
> > CDCR. It doesn't seem to matter if CDCR is enabled or disabled, just that
> > we are using the CDCRUpdateLog. The problem started happening even before
> > we 

Re: Suggestions for debugging performance issue

2018-06-12 Thread Erick Erickson
Having the tlogs be huge is a red flag. Do you have buffering enabled
in CDCR? This was something of a legacy option that's going to be
removed, it's been made obsolete by the ability of CDCR to bootstrap
the entire index. Buffering should be disabled always.

Another reason tlogs can grow is if you have very long times between
hard commits. I doubt that's your issue, but just in case.

And the final reason tlogs can grow is that the connection between
source and target clusters is broken, but that doesn't sound like what
you're seeing either since you say the target cluster is keeping up.

The process of assembling the response can be long. If you have any
stored fields (and not docValues-enabled), Solr will
1> seek the stored data on disk
2> decompress (min 16K blocks)
3> transmit the thing back to your client

The decompressed version of the doc will be held in the
documentResultCache configured in solrconfig.xml, so it may or may not
be cached in memory. That said, this stuff is all MemMapped and the
decompression isn't usually an issue, I'd expect you to see very large
CPU spikes and/or I/O contention if that was the case.

CDCR shouldn't really be that much of a hit, mostly I/O. Solr will
have to look in the tlogs to get you the very most recent copy, so the
first place I'd look is keeping the tlogs under control first.

The other possibility (again unrelated to CDCR) is if your spikes are
coincident with soft commits or hard-commits-with-opensearcher-true.

In all, though, none of the usual suspects seems to make sense here
since you say that absent configuring CDCR things seem to run fine. So
I'd look at the tlogs and my commit intervals. Once the tlogs are
under control then move on to other possibilities if the problem
persists...

Best,
Erick


On Tue, Jun 12, 2018 at 11:06 AM, Chris Troullis  wrote:
> Hi all,
>
> Recently we have gone live using CDCR on our 2 node solr cloud cluster
> (7.2.1). From a CDCR perspective, everything seems to be working
> fine...collections are staying in sync across the cluster, everything looks
> good.
>
> The issue we are seeing is with 1 collection in particular, after we set up
> CDCR, we are getting extremely slow response times when retrieving
> documents. Debugging the query shows QTime is almost nothing, but the
> overall responseTime is like 5x what it should be. The problem is
> exacerbated by larger result sizes. IE retrieving 25 results is almost
> normal, but 200 results is way slower than normal. I can run the exact same
> query multiple times in a row (so everything should be cached), and I still
> see response times way higher than another environment that is not using
> CDCR. It doesn't seem to matter if CDCR is enabled or disabled, just that
> we are using the CDCRUpdateLog. The problem started happening even before
> we enabled CDCR.
>
> In a lower environment we noticed that the transaction logs were huge
> (multiple gigs), so we tried stopping solr and deleting the tlogs then
> restarting, and that seemed to fix the performance issue. We tried the same
> thing in production the other day but it had no effect, so now I don't know
> if it was a coincidence or not.
>
> Things that we have tried:
>
> -Completely deleting the collection and rebuilding from scratch
> -Running the query directly from solr admin to eliminate other causes
> -Doing a tcpdump on the solr node to eliminate a network issue
>
> None of these things have yielded any results. It seems very inconsistent.
> Some environments we can reproduce it in, others we can't.
> Hardware/configuration/network is exactly the same between all
> envrionments. The only thing that we have narrowed it down to is we are
> pretty sure it has something to do with CDCR, as the issue only started
> when we started using it.
>
> I'm wondering if any of this sparks any ideas from anyone, or if people
> have suggestions as to how I can figure out what is causing this long query
> response time? The debug flag on the query seems more geared towards seeing
> where time is spent in the actual query, which is nothing in my case. The
> time is spent retrieving the results, which I don't have much information
> on. I have tried increasing the log level but nothing jumps out at me in
> the solr logs. Is there something I can look for specifically to help debug
> this?
>
> Thanks,
>
> Chris


Suggestions for debugging performance issue

2018-06-12 Thread Chris Troullis
Hi all,

Recently we have gone live using CDCR on our 2 node solr cloud cluster
(7.2.1). From a CDCR perspective, everything seems to be working
fine...collections are staying in sync across the cluster, everything looks
good.

The issue we are seeing is with 1 collection in particular, after we set up
CDCR, we are getting extremely slow response times when retrieving
documents. Debugging the query shows QTime is almost nothing, but the
overall responseTime is like 5x what it should be. The problem is
exacerbated by larger result sizes. IE retrieving 25 results is almost
normal, but 200 results is way slower than normal. I can run the exact same
query multiple times in a row (so everything should be cached), and I still
see response times way higher than another environment that is not using
CDCR. It doesn't seem to matter if CDCR is enabled or disabled, just that
we are using the CDCRUpdateLog. The problem started happening even before
we enabled CDCR.

In a lower environment we noticed that the transaction logs were huge
(multiple gigs), so we tried stopping solr and deleting the tlogs then
restarting, and that seemed to fix the performance issue. We tried the same
thing in production the other day but it had no effect, so now I don't know
if it was a coincidence or not.

Things that we have tried:

-Completely deleting the collection and rebuilding from scratch
-Running the query directly from solr admin to eliminate other causes
-Doing a tcpdump on the solr node to eliminate a network issue

None of these things have yielded any results. It seems very inconsistent.
Some environments we can reproduce it in, others we can't.
Hardware/configuration/network is exactly the same between all
envrionments. The only thing that we have narrowed it down to is we are
pretty sure it has something to do with CDCR, as the issue only started
when we started using it.

I'm wondering if any of this sparks any ideas from anyone, or if people
have suggestions as to how I can figure out what is causing this long query
response time? The debug flag on the query seems more geared towards seeing
where time is spent in the actual query, which is nothing in my case. The
time is spent retrieving the results, which I don't have much information
on. I have tried increasing the log level but nothing jumps out at me in
the solr logs. Is there something I can look for specifically to help debug
this?

Thanks,

Chris


Re: Solr performance issue

2018-02-15 Thread Shawn Heisey
On 2/15/2018 2:00 AM, Srinivas Kashyap wrote:
> I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the 
> child entities in data-config.xml. And i'm using the same for full-import 
> only. And in the beginning of my implementation, i had written delta-import 
> query to index the modified changes. But my requirement grew and i have 17 
> child entities for a single parent entity now. When doing delta-import for 
> huge data, the number of requests being made to datasource(database)  became 
> more and CPU utilization was 100% when concurrent users started modifying the 
> data. For this instead of calling delta-import which imports based on last 
> index time, I did full-import('SortedMapBackedCache' ) based on last index 
> time.
>
> Though the parent entity query would return only records that are modified, 
> the child entity queries pull all the data from the database and the indexing 
> happens 'in-memory' which is causing the JVM memory go out of memory.

Can you provide your DIH config file (with passwords redacted) and the
precise URL you are using to initiate dataimport?  Also, I would like to
know what field you have defined as your uniqueKey.  I may have more
questions about the data in your system, depending on what I see.

That cache implementation should only cache entries from the database
that are actually requested.  If your query is correctly defined, it
should not pull all records from the DB table.

> Is there a way to specify in the child query entity to pull the record 
> related to parent entity in the full-import mode.

If I am understanding your question correctly, this is one of the fairly
basic things that DIH does.  Look at this config example in the
reference guide:

https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html#configuring-the-dih-configuration-file

In the entity named feature in that example config, the query string
uses ${item.ID} to reference the ID column from the parent entity, which
is item.

I should warn you that a cached entity does not always improve
performance.  This is particularly true if the lookup into the cache is
the information that goes to your uniqueKey field.  When the lookup is
by uniqueKey, every single row requested from the database will be used
exactly once, so there's not really any point to caching it.

Thanks,
Shawn



Re: Solr performance issue

2018-02-15 Thread Erick Erickson
Srinivas:

Not an answer to your question, but when DIH starts getting this
complicated, I start to seriously think about SolrJ, see:
https://lucidworks.com/2012/02/14/indexing-with-solrj/

IN particular, it moves the heavy lifting of acquiring the data from a
Solr node (which I'm assuming also has to index docs) to "some
client". It also let's you play some tricks with the code to make
things faster.

Best,
Erick

On Thu, Feb 15, 2018 at 1:00 AM, Srinivas Kashyap
 wrote:
> Hi,
>
> I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the 
> child entities in data-config.xml. And i'm using the same for full-import 
> only. And in the beginning of my implementation, i had written delta-import 
> query to index the modified changes. But my requirement grew and i have 17 
> child entities for a single parent entity now. When doing delta-import for 
> huge data, the number of requests being made to datasource(database)  became 
> more and CPU utilization was 100% when concurrent users started modifying the 
> data. For this instead of calling delta-import which imports based on last 
> index time, I did full-import('SortedMapBackedCache' ) based on last index 
> time.
>
> Though the parent entity query would return only records that are modified, 
> the child entity queries pull all the data from the database and the indexing 
> happens 'in-memory' which is causing the JVM memory go out of memory.
>
> Is there a way to specify in the child query entity to pull the record 
> related to parent entity in the full-import mode.
>
> Thanks and Regards,
> Srinivas Kashyap
>
> DISCLAIMER:
> E-mails and attachments from TradeStone Software, Inc. are confidential.
> If you are not the intended recipient, please notify the sender immediately by
> replying to the e-mail, and then delete it without making copies or using it
> in any way. No representation is made that this email or any attachments are
> free of viruses. Virus scanning is recommended and is the responsibility of
> the recipient.


Solr performance issue

2018-02-15 Thread Srinivas Kashyap
Hi,

I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the 
child entities in data-config.xml. And i'm using the same for full-import only. 
And in the beginning of my implementation, i had written delta-import query to 
index the modified changes. But my requirement grew and i have 17 child 
entities for a single parent entity now. When doing delta-import for huge data, 
the number of requests being made to datasource(database)  became more and CPU 
utilization was 100% when concurrent users started modifying the data. For this 
instead of calling delta-import which imports based on last index time, I did 
full-import('SortedMapBackedCache' ) based on last index time.

Though the parent entity query would return only records that are modified, the 
child entity queries pull all the data from the database and the indexing 
happens 'in-memory' which is causing the JVM memory go out of memory.

Is there a way to specify in the child query entity to pull the record related 
to parent entity in the full-import mode.

Thanks and Regards,
Srinivas Kashyap

DISCLAIMER: 
E-mails and attachments from TradeStone Software, Inc. are confidential.
If you are not the intended recipient, please notify the sender immediately by
replying to the e-mail, and then delete it without making copies or using it
in any way. No representation is made that this email or any attachments are
free of viruses. Virus scanning is recommended and is the responsibility of
the recipient.

Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-30 Thread sasarun
Hi Erick, 

As suggested, I did try nonHDFS solr cloud instance and it response looks to
be really better. From the configuration side to, I am mostly using default
configurations and with block.cache.direct.memory.allocation as false.  On
analysis of hdfs cache, evictions seems to be on higher side. 

Thanks, 
Arun



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-27 Thread Emir Arnautović
Hi Arun,
It is hard to measure something without affecting it, but we could use debug 
results and combine with QTime without debug: If we ignore merging results, it 
seems that majority of time is spent for retrieving docs (~500ms). You should 
consider reducing number of rows if you want better response time (you can ask 
for rows=0 to see max possible time). Also, as Erick suggested, reducing number 
of shards (1 if not plan much more doc) will trim some overhead of merging 
results.

Thanks,
Emir

I noticed that you removed bq - is time with bq acceptable as well?
> On 27 Sep 2017, at 12:34, sasarun  wrote:
> 
> Hi Emir, 
> 
> Please find the response without bq parameter and debugQuery set to true. 
> Also it was noted that Qtime comes down drastically without the debug
> parameter to about 700-800. 
> 
> 
> true
> 0
> 3446
> 
> 
> ("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
> "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
> Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
> "hybrid electric" "electric powerplant")
> 
> edismax
> on
> 
> host
> title
> url
> customContent
> contentSpecificSearch
> 
> 
> id
> contentOntologyTagsCount
> 
> 0
> OR
> 3985d7e2-3e54-48d8-8336-229e85f5d9de
> 600
> true
> 
> 
>  maxScore="56.74194">...
> 
> 
> 
> solr-prd-cluster-m-GooglePatent_shard4_replica2-1506504238282-20
> 
> 
> 
> 35
> 159
> GET_TOP_IDS
> 41294
> ...
> 
> 
> 29
> 165
> GET_TOP_IDS
> 40980
> ...
> 
> 
> 31
> 200
> GET_TOP_IDS
> 41006
> ...
> 
> 
> 43
> 208
> GET_TOP_IDS
> 41040
> ...
> 
> 
> 181
> 466
> GET_TOP_IDS
> 41138
> ...
> 
> 
> 
> 
> 1518
> 1523
> GET_FIELDS,GET_DEBUG
> 110
> ...
> 
> 
> 1562
> 1573
> GET_FIELDS,GET_DEBUG
> 115
> ...
> 
> 
> 1793
> 1800
> GET_FIELDS,GET_DEBUG
> 120
> ...
> 
> 
> 2153
> 2161
> GET_FIELDS,GET_DEBUG
> 125
> ...
> 
> 
> 2957
> 2970
> GET_FIELDS,GET_DEBUG
> 130
> ...
> 
> 
> 
> 
> 10302.0
> 
> 2.0
> 
> 2.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 
> 10288.0
> 
> 661.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 9627.0
> 
> 
> 
> 
> ("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
> "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
> Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
> "hybrid electric" "electric powerplant")
> 
> 
> ("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
> "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
> Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
> "hybrid electric" "electric powerplant")
> 
> 
> (+(DisjunctionMaxQuery((host:hybrid electric powerplant |
> contentSpecificSearch:"hybrid electric powerplant" | customContent:"hybrid
> electric powerplant" | title:hybrid electric powerplant | url:hybrid
> electric powerplant)) DisjunctionMaxQuery((host:hybrid electric powerplants
> | contentSpecificSearch:"hybrid electric powerplants" |
> customContent:"hybrid electric powerplants" | title:hybrid electric
> powerplants | url:hybrid electric powerplants))
> DisjunctionMaxQuery((host:Electric | contentSpecificSearch:electric |
> customContent:electric | title:Electric | url:Electric))
> DisjunctionMaxQuery((host:Electrical | contentSpecificSearch:electrical |
> customContent:electrical | title:Electrical | url:Electrical))
> DisjunctionMaxQuery((host:Electricity | contentSpecificSearch:electricity |
> customContent:electricity | title:Electricity | url:Electricity))
> DisjunctionMaxQuery((host:Engine | contentSpecificSearch:engine |
> customContent:engine | title:Engine | url:Engine))
> DisjunctionMaxQuery((host:fuel economy | contentSpecificSearch:"fuel
> economy" | customContent:"fuel economy" | title:fuel economy | url:fuel
> economy)) DisjunctionMaxQuery((host:fuel efficiency |
> contentSpecificSearch:"fuel efficiency" | customContent:"fuel efficiency" |
> title:fuel efficiency | url:fuel efficiency))
> DisjunctionMaxQuery((host:Hybrid Electric Propulsion |
> contentSpecificSearch:"hybrid electric propulsion" | customContent:"hybrid
> electric propulsion" | title:Hybrid Electric Propulsion | url:Hybrid
> Electric Propulsion)) DisjunctionMaxQuery((host:Power Systems |
> contentSpecificSearch:"power systems" | customContent:"power systems" |
> title:Power Systems | url:Power Systems))
> DisjunctionMaxQuery((host:Powerplant | contentSpecificSearch:powerplant |
> customContent:powerplant | title:Powerplant | url:Powerplant))
> DisjunctionMaxQuery((host:Propulsion | contentSpecificSearch:propulsion |
> customContent:propulsion | title:Propulsion | url:Propulsion))
> DisjunctionMaxQuery((host:hybrid | contentSpecificSearch:hybrid |
> customContent:hybrid | title:hybrid | url:hybrid))
> DisjunctionMaxQuery((host:hybrid electric | contentSpecificSearch:"hybrid
> electric" | customContent:"hybrid 

Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-27 Thread sasarun
Hi Emir, 

Please find the response without bq parameter and debugQuery set to true. 
Also it was noted that Qtime comes down drastically without the debug
parameter to about 700-800. 


true
0
3446


("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
"Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
"hybrid electric" "electric powerplant")

edismax
on

host
title
url
customContent
contentSpecificSearch


id
contentOntologyTagsCount

0
OR
3985d7e2-3e54-48d8-8336-229e85f5d9de
600
true


...



solr-prd-cluster-m-GooglePatent_shard4_replica2-1506504238282-20



35
159
GET_TOP_IDS
41294
...


29
165
GET_TOP_IDS
40980
...


31
200
GET_TOP_IDS
41006
...


43
208
GET_TOP_IDS
41040
...


181
466
GET_TOP_IDS
41138
...




1518
1523
GET_FIELDS,GET_DEBUG
110
...


1562
1573
GET_FIELDS,GET_DEBUG
115
...


1793
1800
GET_FIELDS,GET_DEBUG
120
...


2153
2161
GET_FIELDS,GET_DEBUG
125
...


2957
2970
GET_FIELDS,GET_DEBUG
130
...




10302.0

2.0

2.0


0.0


0.0


0.0


0.0


0.0


0.0


0.0


0.0



10288.0

661.0


0.0


0.0


0.0


0.0


0.0


0.0


0.0


9627.0




("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
"Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
"hybrid electric" "electric powerplant")


("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
"Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
"hybrid electric" "electric powerplant")


(+(DisjunctionMaxQuery((host:hybrid electric powerplant |
contentSpecificSearch:"hybrid electric powerplant" | customContent:"hybrid
electric powerplant" | title:hybrid electric powerplant | url:hybrid
electric powerplant)) DisjunctionMaxQuery((host:hybrid electric powerplants
| contentSpecificSearch:"hybrid electric powerplants" |
customContent:"hybrid electric powerplants" | title:hybrid electric
powerplants | url:hybrid electric powerplants))
DisjunctionMaxQuery((host:Electric | contentSpecificSearch:electric |
customContent:electric | title:Electric | url:Electric))
DisjunctionMaxQuery((host:Electrical | contentSpecificSearch:electrical |
customContent:electrical | title:Electrical | url:Electrical))
DisjunctionMaxQuery((host:Electricity | contentSpecificSearch:electricity |
customContent:electricity | title:Electricity | url:Electricity))
DisjunctionMaxQuery((host:Engine | contentSpecificSearch:engine |
customContent:engine | title:Engine | url:Engine))
DisjunctionMaxQuery((host:fuel economy | contentSpecificSearch:"fuel
economy" | customContent:"fuel economy" | title:fuel economy | url:fuel
economy)) DisjunctionMaxQuery((host:fuel efficiency |
contentSpecificSearch:"fuel efficiency" | customContent:"fuel efficiency" |
title:fuel efficiency | url:fuel efficiency))
DisjunctionMaxQuery((host:Hybrid Electric Propulsion |
contentSpecificSearch:"hybrid electric propulsion" | customContent:"hybrid
electric propulsion" | title:Hybrid Electric Propulsion | url:Hybrid
Electric Propulsion)) DisjunctionMaxQuery((host:Power Systems |
contentSpecificSearch:"power systems" | customContent:"power systems" |
title:Power Systems | url:Power Systems))
DisjunctionMaxQuery((host:Powerplant | contentSpecificSearch:powerplant |
customContent:powerplant | title:Powerplant | url:Powerplant))
DisjunctionMaxQuery((host:Propulsion | contentSpecificSearch:propulsion |
customContent:propulsion | title:Propulsion | url:Propulsion))
DisjunctionMaxQuery((host:hybrid | contentSpecificSearch:hybrid |
customContent:hybrid | title:hybrid | url:hybrid))
DisjunctionMaxQuery((host:hybrid electric | contentSpecificSearch:"hybrid
electric" | customContent:"hybrid electric" | title:hybrid electric |
url:hybrid electric)) DisjunctionMaxQuery((host:electric powerplant |
contentSpecificSearch:"electric powerplant" | customContent:"electric
powerplant" | title:electric powerplant | url:electric
powerplant/no_coord


+((host:hybrid electric powerplant | contentSpecificSearch:"hybrid electric
powerplant" | customContent:"hybrid electric powerplant" | title:hybrid
electric powerplant | url:hybrid electric powerplant) (host:hybrid electric
powerplants | contentSpecificSearch:"hybrid electric powerplants" |
customContent:"hybrid electric powerplants" | title:hybrid electric
powerplants | url:hybrid electric powerplants) (host:Electric |
contentSpecificSearch:electric | customContent:electric | title:Electric |
url:Electric) (host:Electrical | contentSpecificSearch:electrical |
customContent:electrical | title:Electrical | url:Electrical)
(host:Electricity | contentSpecificSearch:electricity |
customContent:electricity | title:Electricity | url:Electricity)
(host:Engine | contentSpecificSearch:engine | customContent:engine |
title:Engine | url:Engine) (host:fuel 

Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-27 Thread sasarun
Hi Erick, 

Qtime comes down with rows set as 1. Also it was noted that qtime comes down
when debug parameter is not added with the query. It comes to about 900.

Thanks, 
Arun 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-27 Thread Toke Eskildsen
On Tue, 2017-09-26 at 07:43 -0700, sasarun wrote:
> Allocated heap size for young generation is about 8 gb and old 
> generation is about 24 gb. And gc analysis showed peak
> size utlisation is really low compared to these values.

That does not come as a surprise. Your collections would normally be
considered small, if not tiny, looking only at their size measured in
bytes. Again, if you expect them to grow significantly (more than 10x),
your allocation might make sense. If you do not expect such a growth in
the near future, you will be better off with a much smaller heap: The
peak heap utilization that you have logged (or twice that to err on the
cautious side) seems a good starting point.

And whatever you do, don't set Xmx to 32GB. Use <31GB or significantly
more than 32GB:
https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-mem
ory-oddities/


Are you indexing while you search? If so, you need to set auto-warm or
state a few explicit warmup-queries. If not, your measuring will not be
representative as it will be on first-searches, which are always slower
than warmed-searches.


- Toke Eskildsen, Royal Danish Library



Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-27 Thread Emir Arnautović
Hi Arun,
This is not the most simple query either - a dozen of phrase queries on several 
fields + the same query as bq. Can you provide debugQuery info.
I did not look much into debug times and what includes what, but one thing that 
is strange to me is that QTime is 4s while query in debug is 1.3s. Can you try 
running without bq? Can you include boost factors in the main query?

Thanks,
Emir

> On 26 Sep 2017, at 16:43, sasarun  wrote:
> 
> Hi All, 
> I have been using Solr for some time now but mostly in standalone mode. Now
> my current project is using Solr 6.5.1 hosted on hadoop. My solrconfig.xml
> has the following configuration. In the prod environment the performance on
> querying seems to really slow. Can anyone help me with few pointers on
> howimprove on the same. 
> 
> 
>${solr.hdfs.home:}
> name="solr.hdfs.blockcache.enabled">${solr.hdfs.blockcache.enabled:true}
> name="solr.hdfs.blockcache.slab.count">${solr.hdfs.blockcache.slab.count:1}
> name="solr.hdfs.blockcache.direct.memory.allocation">${solr.hdfs.blockcache.direct.memory.allocation:false}
> name="solr.hdfs.blockcache.blocksperbank">${solr.hdfs.blockcache.blocksperbank:16384}
> name="solr.hdfs.blockcache.read.enabled">${solr.hdfs.blockcache.read.enabled:true}
> name="solr.hdfs.blockcache.write.enabled">${solr.hdfs.blockcache.write.enabled:false}
> name="solr.hdfs.nrtcachingdirectory.enable">${solr.hdfs.nrtcachingdirectory.enable:true}
> name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">${solr.hdfs.nrtcachingdirectory.maxmergesizemb:16}
> name="solr.hdfs.nrtcachingdirectory.maxcachedmb">${solr.hdfs.nrtcachingdirectory.maxcachedmb:192}
> 
>hdfs
> It has 6 collections of following size 
> Collection 1 -->6.41 MB
> Collection 2 -->634.51 KB 
> Collection 3 -->4.59 MB 
> Collection 4 -->1,020.56 MB 
> Collection 5 --> 607.26 MB
> Collection 6 -->102.4 kb
> Each Collection has 5 shards each. Allocated heap size for young generation
> is about 8 gb and old generation is about 24 gb. And gc analysis showed peak
> size 
> utlisation is really low compared to these values. 
> But querying to Collection 4 and collection 5 is giving really slow response
> even thoughwe are not using any complex queries.Output of debug quries run
> with debug=timing
> are given below for reference. Can anyone help suggest a way improve the
> performance.
> 
> Response to query
> 
> 
> true
> 0
> 3962
> 
> 
> ("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
> "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
> Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
> "hybrid electric" "electric powerplant")
> 
> edismax
> true
> on
> 
> host
> title
> url
> customContent
> contentSpecificSearch
> 
> 
> id
> contentTagsCount
> 
> 0
> OR
> OR
> 3985d7e2-3e54-48d8-8336-229e85f5d9de
> 600
> 
> ("hybrid electric powerplant"^100.0 "hybrid electric powerplants"^100.0
> "Electric"^50.0 "Electrical"^50.0 "Electricity"^50.0 "Engine"^50.0 "fuel
> economy"^50.0 "fuel efficiency"^50.0 "Hybrid Electric Propulsion"^50.0
> "Power Systems"^50.0 "Powerplant"^50.0 "Propulsion"^50.0 "hybrid"^15.0
> "hybrid electric"^15.0 "electric powerplant"^15.0)
> 
> 
> 
> 
> 
> 15374.0
> 
> 2.0
> 
> 2.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 
> 15363.0
> 
> 1313.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 14048.0
> 
> 
> 
> 
> 
> Thanks,
> Arun
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-26 Thread Erick Erickson
Well, 15 second responses are not what I'd expect either. But two
things (just looked again)

1> note that the time to assemble the debug information is a large
majority of your total time (14 of 15.3 seconds).
2> you're specifying 600 rows which is quite a lot as each one
requires that a 16K block of data be read from disk and decompressed
to assemble the "fl" list.

so one quick test would be to set rows=1 or something. All that said,
the QTime value returned does _not_ include <1> or <2> above and even
4 seconds seems excessive.

Best,
Erick

On Tue, Sep 26, 2017 at 10:54 AM, sasarun  wrote:
> Hi Erick,
>
> Thank you for the quick response. Query time was relatively faster once it
> is read from memory. But personally I always felt response time could be far
> better. As suggested, We will try and set up in a non HDFS environment and
> update on the results.
>
> Thanks,
> Arun
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-26 Thread sasarun
Hi Erick, 

Thank you for the quick response. Query time was relatively faster once it
is read from memory. But personally I always felt response time could be far
better. As suggested, We will try and set up in a non HDFS environment and
update on the results. 

Thanks, 
Arun



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-26 Thread Erick Erickson
Does the query time _stay_ low? Once the data is read from HDFS it
should pretty much stay in memory. So my question is whether, once
Solr warms up you see this kind of query response time.

Have you tried this on a non HDFS system? That would be useful to help
figure out where to look.

And given the sizes of your collections, unless you expect them to get
much larger, there's no reason to shard any of them. Sharding should
only really be used when the collections are too big for a single
shard as distributed searches inevitably have increased overhead. I
expect _at least_ 20M documents/shard, and have seen 200M docs/shard.
YMMV of course.

Best,
Erick

On Tue, Sep 26, 2017 at 7:43 AM, sasarun  wrote:
> Hi All,
> I have been using Solr for some time now but mostly in standalone mode. Now
> my current project is using Solr 6.5.1 hosted on hadoop. My solrconfig.xml
> has the following configuration. In the prod environment the performance on
> querying seems to really slow. Can anyone help me with few pointers on
> howimprove on the same.
>
> 
> ${solr.hdfs.home:}
>  name="solr.hdfs.blockcache.enabled">${solr.hdfs.blockcache.enabled:true}
>  name="solr.hdfs.blockcache.slab.count">${solr.hdfs.blockcache.slab.count:1}
>  name="solr.hdfs.blockcache.direct.memory.allocation">${solr.hdfs.blockcache.direct.memory.allocation:false}
>  name="solr.hdfs.blockcache.blocksperbank">${solr.hdfs.blockcache.blocksperbank:16384}
>  name="solr.hdfs.blockcache.read.enabled">${solr.hdfs.blockcache.read.enabled:true}
>  name="solr.hdfs.blockcache.write.enabled">${solr.hdfs.blockcache.write.enabled:false}
>  name="solr.hdfs.nrtcachingdirectory.enable">${solr.hdfs.nrtcachingdirectory.enable:true}
>  name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">${solr.hdfs.nrtcachingdirectory.maxmergesizemb:16}
>  name="solr.hdfs.nrtcachingdirectory.maxcachedmb">${solr.hdfs.nrtcachingdirectory.maxcachedmb:192}
> 
> hdfs
> It has 6 collections of following size
> Collection 1 -->6.41 MB
> Collection 2 -->634.51 KB
> Collection 3 -->4.59 MB
> Collection 4 -->1,020.56 MB
> Collection 5 --> 607.26 MB
> Collection 6 -->102.4 kb
> Each Collection has 5 shards each. Allocated heap size for young generation
> is about 8 gb and old generation is about 24 gb. And gc analysis showed peak
> size
> utlisation is really low compared to these values.
> But querying to Collection 4 and collection 5 is giving really slow response
> even thoughwe are not using any complex queries.Output of debug quries run
> with debug=timing
> are given below for reference. Can anyone help suggest a way improve the
> performance.
>
> Response to query
> 
> 
> true
> 0
> 3962
> 
> 
> ("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
> "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
> Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
> "hybrid electric" "electric powerplant")
> 
> edismax
> true
> on
> 
> host
> title
> url
> customContent
> contentSpecificSearch
> 
> 
> id
> contentTagsCount
> 
> 0
> OR
> OR
> 3985d7e2-3e54-48d8-8336-229e85f5d9de
> 600
> 
> ("hybrid electric powerplant"^100.0 "hybrid electric powerplants"^100.0
> "Electric"^50.0 "Electrical"^50.0 "Electricity"^50.0 "Engine"^50.0 "fuel
> economy"^50.0 "fuel efficiency"^50.0 "Hybrid Electric Propulsion"^50.0
> "Power Systems"^50.0 "Powerplant"^50.0 "Propulsion"^50.0 "hybrid"^15.0
> "hybrid electric"^15.0 "electric powerplant"^15.0)
> 
> 
> 
> 
> 
> 15374.0
> 
> 2.0
> 
> 2.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 
> 15363.0
> 
> 1313.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 14048.0
> 
> 
> 
>
>
> Thanks,
> Arun
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr performance issue on querying --> Solr 6.5.1

2017-09-26 Thread sasarun
Hi All, 
I have been using Solr for some time now but mostly in standalone mode. Now
my current project is using Solr 6.5.1 hosted on hadoop. My solrconfig.xml
has the following configuration. In the prod environment the performance on
querying seems to really slow. Can anyone help me with few pointers on
howimprove on the same. 


${solr.hdfs.home:}
${solr.hdfs.blockcache.enabled:true}
${solr.hdfs.blockcache.slab.count:1}
${solr.hdfs.blockcache.direct.memory.allocation:false}
${solr.hdfs.blockcache.blocksperbank:16384}
${solr.hdfs.blockcache.read.enabled:true}
${solr.hdfs.blockcache.write.enabled:false}
${solr.hdfs.nrtcachingdirectory.enable:true}
${solr.hdfs.nrtcachingdirectory.maxmergesizemb:16}
${solr.hdfs.nrtcachingdirectory.maxcachedmb:192}

hdfs
It has 6 collections of following size 
Collection 1 -->6.41 MB
Collection 2 -->634.51 KB 
Collection 3 -->4.59 MB 
Collection 4 -->1,020.56 MB 
Collection 5 --> 607.26 MB
Collection 6 -->102.4 kb
Each Collection has 5 shards each. Allocated heap size for young generation
is about 8 gb and old generation is about 24 gb. And gc analysis showed peak
size 
utlisation is really low compared to these values. 
But querying to Collection 4 and collection 5 is giving really slow response
even thoughwe are not using any complex queries.Output of debug quries run
with debug=timing
are given below for reference. Can anyone help suggest a way improve the
performance.

Response to query


true
0
3962


("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
"Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
"hybrid electric" "electric powerplant")

edismax
true
on

host
title
url
customContent
contentSpecificSearch


id
contentTagsCount

0
OR
OR
3985d7e2-3e54-48d8-8336-229e85f5d9de
600

("hybrid electric powerplant"^100.0 "hybrid electric powerplants"^100.0
"Electric"^50.0 "Electrical"^50.0 "Electricity"^50.0 "Engine"^50.0 "fuel
economy"^50.0 "fuel efficiency"^50.0 "Hybrid Electric Propulsion"^50.0
"Power Systems"^50.0 "Powerplant"^50.0 "Propulsion"^50.0 "hybrid"^15.0
"hybrid electric"^15.0 "electric powerplant"^15.0)





15374.0

2.0

2.0


0.0


0.0


0.0


0.0


0.0


0.0


0.0


0.0



15363.0

1313.0


0.0


0.0


0.0


0.0


0.0


0.0


0.0


14048.0





Thanks,
Arun



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Performance Issue in Streaming Expressions

2017-06-02 Thread Joel Bernstein
Once you've scaled up the export from collection4 you can test the
performance of the join by moving the NullStream around the join.

parallel(null(innerJoin(collection 3, collection4)))

Again you'll want to test with different numbers of workers and replicas to
see where you max out performance of the join.


Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Jun 2, 2017 at 10:25 AM, Joel Bernstein  wrote:

> innerJoin(intersect(innerJoin(collection1, collection2),
>innerJoin(collection 3, collection4)),
> collection5)
>
> Let's focus on:
>
> innerJoin(collection 3, collection4))
>
> The first thing to focus on is how fast is the export from collection4.
> You can test this with the NullStream with the following construct:
>
> null(search(collection4))
>
> The null stream will eat all the tuples and report back timing
> information. This will isolate the performance of the export from
> collection4.
>
> Once you have a baseline for how fast you can export from a single node,
> you can test with parallel export from a single node:
>
> parallel(null(search(collection4)))
>
> Then you can add replicas for collection4 and increase workers.
>
>
>
>
>
>
>
>
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Jun 1, 2017 at 11:51 PM, Susmit Shukla 
> wrote:
>
>> Hi,
>>
>> Which version of solr are you on?
>> Increasing memory may not be useful as streaming API does not keep stuff
>> in
>> memory (except may be hash joins).
>> Increasing replicas (not sharding) and pushing the join computation on
>> worker solr cluster with #workers > 1 would definitely make things faster.
>> Are you limiting your results at some cutoff? if yes, then SOLR-10698
>>  can be useful fix.
>> Also
>> binary response format for streaming would be faster. (available in 6.5
>> probably)
>>
>>
>>
>> On Thu, Jun 1, 2017 at 3:04 PM, thiaga rajan <
>> ecethiagu2...@yahoo.co.in.invalid> wrote:
>>
>> > We are working on a proposal and feeling streaming API along with export
>> > handler will best fit for our usecases. We are already of having a
>> > structure in solr in which we are using graph queries to produce
>> > hierarchical structure. Now from the structure we need to join couple of
>> > more collections. We have 5 different collections.
>> >   Collection 1- 800 k records.
>> > Collection 2- 200k records.
>>  Collection 3
>> > - 7k records.   Collection 4 - 6
>> > million records. Collection 5 - 150 k
>> records
>> > we are using the below strategy
>> > innerJoin( intersect( innerJoin(collection 1,collection 2),
>> > innerJoin(Collection 3, Collection 4)), collection 5).
>> >We are seeing performance is too slow when we start
>> having
>> > collection 4. Just with collection 1 2 5 the results are coming in 2
>> secs.
>> > The moment I have included collection 4 in the query I could see  a
>> > performance impact. I believe exporting large results from collection 4
>> is
>> > causing the issie. Currently I am using single sharded collection with
>> no
>> > replica. I thinking if we can increase the memory as first option to
>> > increase performance as processing doc values need more memory. Then if
>> > that did not worked I can check using parallel stream/ sharding. Kindly
>> > advise is there could be anything else I  missing?
>> > Sent from Yahoo Mail on Android
>>
>
>


Re: Performance Issue in Streaming Expressions

2017-06-02 Thread Joel Bernstein
innerJoin(intersect(innerJoin(collection1, collection2),
   innerJoin(collection 3, collection4)),
collection5)

Let's focus on:

innerJoin(collection 3, collection4))

The first thing to focus on is how fast is the export from collection4. You
can test this with the NullStream with the following construct:

null(search(collection4))

The null stream will eat all the tuples and report back timing information.
This will isolate the performance of the export from collection4.

Once you have a baseline for how fast you can export from a single node,
you can test with parallel export from a single node:

parallel(null(search(collection4)))

Then you can add replicas for collection4 and increase workers.













Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Jun 1, 2017 at 11:51 PM, Susmit Shukla 
wrote:

> Hi,
>
> Which version of solr are you on?
> Increasing memory may not be useful as streaming API does not keep stuff in
> memory (except may be hash joins).
> Increasing replicas (not sharding) and pushing the join computation on
> worker solr cluster with #workers > 1 would definitely make things faster.
> Are you limiting your results at some cutoff? if yes, then SOLR-10698
>  can be useful fix. Also
> binary response format for streaming would be faster. (available in 6.5
> probably)
>
>
>
> On Thu, Jun 1, 2017 at 3:04 PM, thiaga rajan <
> ecethiagu2...@yahoo.co.in.invalid> wrote:
>
> > We are working on a proposal and feeling streaming API along with export
> > handler will best fit for our usecases. We are already of having a
> > structure in solr in which we are using graph queries to produce
> > hierarchical structure. Now from the structure we need to join couple of
> > more collections. We have 5 different collections.
> >   Collection 1- 800 k records.
> > Collection 2- 200k records.   Collection
> 3
> > - 7k records.   Collection 4 - 6
> > million records. Collection 5 - 150 k records
> > we are using the below strategy
> > innerJoin( intersect( innerJoin(collection 1,collection 2),
> > innerJoin(Collection 3, Collection 4)), collection 5).
> >We are seeing performance is too slow when we start having
> > collection 4. Just with collection 1 2 5 the results are coming in 2
> secs.
> > The moment I have included collection 4 in the query I could see  a
> > performance impact. I believe exporting large results from collection 4
> is
> > causing the issie. Currently I am using single sharded collection with no
> > replica. I thinking if we can increase the memory as first option to
> > increase performance as processing doc values need more memory. Then if
> > that did not worked I can check using parallel stream/ sharding. Kindly
> > advise is there could be anything else I  missing?
> > Sent from Yahoo Mail on Android
>


Re: Performance Issue in Streaming Expressions

2017-06-01 Thread Susmit Shukla
Hi,

Which version of solr are you on?
Increasing memory may not be useful as streaming API does not keep stuff in
memory (except may be hash joins).
Increasing replicas (not sharding) and pushing the join computation on
worker solr cluster with #workers > 1 would definitely make things faster.
Are you limiting your results at some cutoff? if yes, then SOLR-10698
 can be useful fix. Also
binary response format for streaming would be faster. (available in 6.5
probably)



On Thu, Jun 1, 2017 at 3:04 PM, thiaga rajan <
ecethiagu2...@yahoo.co.in.invalid> wrote:

> We are working on a proposal and feeling streaming API along with export
> handler will best fit for our usecases. We are already of having a
> structure in solr in which we are using graph queries to produce
> hierarchical structure. Now from the structure we need to join couple of
> more collections. We have 5 different collections.
>   Collection 1- 800 k records.
> Collection 2- 200k records.   Collection 3
> - 7k records.   Collection 4 - 6
> million records. Collection 5 - 150 k records
> we are using the below strategy
> innerJoin( intersect( innerJoin(collection 1,collection 2),
> innerJoin(Collection 3, Collection 4)), collection 5).
>We are seeing performance is too slow when we start having
> collection 4. Just with collection 1 2 5 the results are coming in 2 secs.
> The moment I have included collection 4 in the query I could see  a
> performance impact. I believe exporting large results from collection 4 is
> causing the issie. Currently I am using single sharded collection with no
> replica. I thinking if we can increase the memory as first option to
> increase performance as processing doc values need more memory. Then if
> that did not worked I can check using parallel stream/ sharding. Kindly
> advise is there could be anything else I  missing?
> Sent from Yahoo Mail on Android


Performance Issue in Streaming Expressions

2017-06-01 Thread thiaga rajan
We are working on a proposal and feeling streaming API along with export 
handler will best fit for our usecases. We are already of having a structure in 
solr in which we are using graph queries to produce hierarchical structure. Now 
from the structure we need to join couple of more collections.         We have 
5 different collections.                           Collection 1- 800 k records. 
                                  Collection 2- 200k records.                   
                Collection 3 - 7k records.                                      
 Collection 4 - 6 million records.                             Collection 5 - 
150 k records                               we are using the below strategy     
                        innerJoin( intersect( innerJoin(collection 1,collection 
2), innerJoin(Collection 3, Collection 4)), collection 5).                      
              We are seeing performance is too slow when we start having 
collection 4. Just with collection 1 2 5 the results are coming in 2 secs. The 
moment I have included collection 4 in the query I could see  a performance 
impact. I believe exporting large results from collection 4 is causing the 
issie. Currently I am using single sharded collection with no replica. I 
thinking if we can increase the memory as first option to increase performance 
as processing doc values need more memory. Then if that did not worked I can 
check using parallel stream/ sharding. Kindly advise is there could be anything 
else I  missing?
Sent from Yahoo Mail on Android

RE: Solr performance issue on indexing

2017-04-04 Thread Allison, Timothy B.
>  Also we will try to decouple tika to solr.
+1


-Original Message-
From: tstusr [mailto:ulfrhe...@gmail.com] 
Sent: Friday, March 31, 2017 4:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr performance issue on indexing

Hi, thanks for the feedback.

Yes, it is about OOM, indeed even solr instance makes unavailable. As I was 
saying I can't find more relevant information on logs.

We're are able to increment JVM amout, so, the first thing we'll do will be 
that.

As far as I know, all documents are bounded to that amount (14K), just the 
processing could change. We are making some tests on indexing and it seems it 
works without concurrent threads. Also we will try to decouple tika to solr.

By the way, make it available with solr cloud will improve performance? Or 
there will be no perceptible improvement?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-performance-issue-on-indexing-tp4327886p4327914.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr performance issue on indexing

2017-03-31 Thread Erick Erickson
If, by chance, the docs you're sending get routed to different Solr
nodes then all the processing is in parallel. I don't know if there's
a good way to insure that the docs get sent to different replicas on
different Solr instances. You could try addressing specific Solr
replicas, something like "blah
blah/solr/collection1_shard1_replica1/export" but I'm not totally sure
that'll do what you want either.

 But that still doesn't decouple Tika from the Solr instances running
those replicas. So if Tika has a problem it has the potential to bring
the Solr node down.

Best,
Erick

On Fri, Mar 31, 2017 at 1:31 PM, tstusr <ulfrhe...@gmail.com> wrote:
> Hi, thanks for the feedback.
>
> Yes, it is about OOM, indeed even solr instance makes unavailable. As I was
> saying I can't find more relevant information on logs.
>
> We're are able to increment JVM amout, so, the first thing we'll do will be
> that.
>
> As far as I know, all documents are bounded to that amount (14K), just the
> processing could change. We are making some tests on indexing and it seems
> it works without concurrent threads. Also we will try to decouple tika to
> solr.
>
> By the way, make it available with solr cloud will improve performance? Or
> there will be no perceptible improvement?
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-performance-issue-on-indexing-tp4327886p4327914.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr performance issue on indexing

2017-03-31 Thread tstusr
Hi, thanks for the feedback.

Yes, it is about OOM, indeed even solr instance makes unavailable. As I was
saying I can't find more relevant information on logs.

We're are able to increment JVM amout, so, the first thing we'll do will be
that.

As far as I know, all documents are bounded to that amount (14K), just the
processing could change. We are making some tests on indexing and it seems
it works without concurrent threads. Also we will try to decouple tika to
solr.

By the way, make it available with solr cloud will improve performance? Or
there will be no perceptible improvement?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-performance-issue-on-indexing-tp4327886p4327914.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr performance issue on indexing

2017-03-31 Thread Erick Erickson
First, running multiple threads with PDF files to a Solr running 4G of
JVM is...ambitious. You say it crashes; how? OOMs?

Second while the extracting request handler is a fine way to get up
and running, any problems with Tika will affect Solr. Tika does a
great job of extraction, but there are so many variants of so many
file formats that this scenario isn' recommended for production.
Consider extracting the PDF on a client and sending the docs to Solr.
Tika can run as a server also so you aren't coupling Solr and Tika.

For a sample SolrJ program, see:
https://lucidworks.com/2012/02/14/indexing-with-solrj/

Best,
Erick

On Fri, Mar 31, 2017 at 10:44 AM, tstusr <ulfrhe...@gmail.com> wrote:
> Hi there.
>
> We are currently indexing some PDF files, the main handler to index is
> /extract where we perform simple processing (extract relevant fields and
> store on some fields).
>
> The PDF files are about 10M~100M size and we have to have available the text
> extracted. So, everything works correct on test stages, but when we try to
> index all the 14K files (around 120Gb) on a client application that only
> sends http curls through 3-4 concurrent threads to /extract handler it
> crashes. I can't find some relevant information about on solr logs (We
> checked in server/logs & in core_dir/tlog).
>
> My question is about performance. I think it is a small amount of info we
> are processing, the deploy scenario is in a docker container with 4gb of JVM
> Memory and ~50gb of physical memory (reported through dashboard) we are
> using a single instance.
>
> I don't think is a normal behaviour that handler crashes. So, what are some
> general tips about improving performance for this scenario?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-performance-issue-on-indexing-tp4327886.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Solr performance issue on indexing

2017-03-31 Thread tstusr
Hi there.

We are currently indexing some PDF files, the main handler to index is
/extract where we perform simple processing (extract relevant fields and
store on some fields). 

The PDF files are about 10M~100M size and we have to have available the text
extracted. So, everything works correct on test stages, but when we try to
index all the 14K files (around 120Gb) on a client application that only
sends http curls through 3-4 concurrent threads to /extract handler it
crashes. I can't find some relevant information about on solr logs (We
checked in server/logs & in core_dir/tlog).

My question is about performance. I think it is a small amount of info we
are processing, the deploy scenario is in a docker container with 4gb of JVM
Memory and ~50gb of physical memory (reported through dashboard) we are
using a single instance. 

I don't think is a normal behaviour that handler crashes. So, what are some
general tips about improving performance for this scenario?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-performance-issue-on-indexing-tp4327886.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Heads up: SOLR-10130, Performance issue in Solr 6.4.1

2017-02-13 Thread Andrzej Białecki

> On 13 Feb 2017, at 13:46, Ere Maijala  wrote:
> 
> Hi all,
> 
> this is just a quick heads-up that we've stumbled on serious performance 
> issues after upgrading to Solr 6.4.1 apparently due to the new metrics 
> collection causing a major slowdown. I've filed an issue 
> (https://issues.apache.org/jira/browse/SOLR-10130) about it, but decided to 
> post this just so that anyone else doesn't need to encounter this unprepared. 
> It seems to me that metrics would need to be explicitly disabled altogether 
> in the index config to avoid the issue.
> 
> --Ere


Unfortunately this bug is present in both 6.4.0 and 6.4.1, and needs a patch, 
ie. config changes won’t solve it.

It’s a pity that Solr doesn’t have a continuous performance benchmark setup, 
like Lucene does.

--
Best regards,
Andrzej Bialecki

--=# http://www.lucidworks.com #=--



Re: Heads up: SOLR-10130, Performance issue in Solr 6.4.1

2017-02-13 Thread Walter Underwood
I’m seeing similar problems here. With 6.4.0, we were handling 6000 
requests/minute. With 6.4.1 it is 1000 rpm with median response times around 
2.5 seconds. I also switched to the G1 collector. I’m going to back that out 
and retest today to see if the performance comes back.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Feb 13, 2017, at 4:46 AM, Ere Maijala  wrote:
> 
> Hi all,
> 
> this is just a quick heads-up that we've stumbled on serious performance 
> issues after upgrading to Solr 6.4.1 apparently due to the new metrics 
> collection causing a major slowdown. I've filed an issue 
> (https://issues.apache.org/jira/browse/SOLR-10130) about it, but decided to 
> post this just so that anyone else doesn't need to encounter this unprepared. 
> It seems to me that metrics would need to be explicitly disabled altogether 
> in the index config to avoid the issue.
> 
> --Ere



Heads up: SOLR-10130, Performance issue in Solr 6.4.1

2017-02-13 Thread Ere Maijala

Hi all,

this is just a quick heads-up that we've stumbled on serious performance 
issues after upgrading to Solr 6.4.1 apparently due to the new metrics 
collection causing a major slowdown. I've filed an issue 
(https://issues.apache.org/jira/browse/SOLR-10130) about it, but decided 
to post this just so that anyone else doesn't need to encounter this 
unprepared. It seems to me that metrics would need to be explicitly 
disabled altogether in the index config to avoid the issue.


--Ere


Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-24 Thread Alexandre Rafalovitch
Yes, swap will switch which core the name points to. For non Cloud setup.

Just remember that your directory name does not get renamed, when you are
deleting the old one. Just the core name in core.properties file.

Regards,
   Alex

On 24 Sep 2016 10:28 AM, "slee" <sleed...@gmail.com> wrote:

Erick / Alex,

I want to thank you both. Your hints got me understand SOLR a bit better. I
ended up with reversewildcard, and it speeds up performance a lot. That's
what I'm expecting from SOLR...  I also no longer experience the huge memory
hog.

The only down-side I can think of is, you need to re-index when you change
the schema. But I can live with that, since I'll have 2 machines where one
is for reading, the other one is for indexing... I'll swap when the indexing
is done.. I presume that's what the swap from the Admin UI is for right?



--
View this message in context: http://lucene.472066.n3.
nabble.com/Performance-Issue-when-querying-Multivalued-fields-SOLR-6-1-0-
tp4297255p4297821.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-23 Thread slee
Erick / Alex,

I want to thank you both. Your hints got me understand SOLR a bit better. I
ended up with reversewildcard, and it speeds up performance a lot. That's
what I'm expecting from SOLR...  I also no longer experience the huge memory
hog.

The only down-side I can think of is, you need to re-index when you change
the schema. But I can live with that, since I'll have 2 machines where one
is for reading, the other one is for indexing... I'll swap when the indexing
is done.. I presume that's what the swap from the Admin UI is for right?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-Issue-when-querying-Multivalued-fields-SOLR-6-1-0-tp4297255p4297821.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-23 Thread Alexandre Rafalovitch
But if "SEF" and "OFF" are known to be searched for and especially if
they are well-delimited, they could just be pulled-out into a separate
field and just checked with an FQ.

In the end, there may be no need for either EdgeNGram or wildcards.
Just twisting the data during _indexing_ to represent the business
domain requirements.

Regards,
   Alex.

On 23 September 2016 at 10:39, Erick Erickson  wrote:
> If you can break these up into tokens somehow, that's clearly best. But from 
> the
> patterns you show it's not likely. WordDelimiterFactory won't quite
> work since it
> wouldn't be able to separate ASEF into the token SEF.


Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-22 Thread Erick Erickson
If you can break these up into tokens somehow, that's clearly best. But from the
patterns you show it's not likely. WordDelimiterFactory won't quite
work since it
wouldn't be able to separate ASEF into the token SEF.

You'll have a _lot_ fewer terms if you don't use edgengram. Try just
using bigrams (i.e. NGramFilterFactory) with both mingram and maxgram set
to 2.

Now you do phrase searches (also automatic) on pairs. So in your example
some of the pairs are:
#o
of
ff
f-

To find off, you search for the _phrase_ "of ff". There'll be some
fiddling here to
make it all work.

Best,
Erick

On Thu, Sep 22, 2016 at 11:49 AM, slee <sleed...@gmail.com> wrote:
> Alex,
>
> You do have a point with EdgeNGramFilterFactory. As requested, I've attached
> a sample screenshotfor your review.
> <http://lucene.472066.n3.nabble.com/file/n4297542/sample.png>
>
> Erick,
>
> Here's my use-case. Assume I have the following term stored in global_Value
> as such:
> - executionvenuetype#*OFF*-FACILITY
> - partyid#B2A*SEF*9AJP5P9OLL1190
>
> Now, I want to retrieve any document matching the term in global_Value that
> contains the keyword: "off" and "sef". With regards to leading wild-card,
> that's intentional. Not a mail issue. These fields typically contains Guid,
> and some financial terms (eg: Bonds, swaps, etc..). If I don't use any
> non-wildcard, then it's an exact match. But my use-case dictates that it
> should retrieve if it's a partial match.
>
> So what's my best bet for analyzer in such cases ?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Performance-Issue-when-querying-Multivalued-fields-SOLR-6-1-0-tp4297255p4297542.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-22 Thread Alexandre Rafalovitch
Not fully clear still, but perhaps you need several fields, at least one of
which just contains your SEF and OFF values serving effectively as binary
switches (FQ matches). And then maybe you strip the leading IDs that you
are not matching on.

Remember your Solr data shape does not need to match your original data
shape. Especially with extra fields that you could get through copyField
commands or through UpdateRequestProcessor duplicates. And you don't need
to store those duplicates, just index them for most effective search.

And yes, reversing filter and edge ngram together mean you don't need a
wildcard queries.

Regards,
Alex

On 23 Sep 2016 1:49 AM, "slee" <sleed...@gmail.com> wrote:

> Alex,
>
> You do have a point with EdgeNGramFilterFactory. As requested, I've
> attached
> a sample screenshotfor your review.
> <http://lucene.472066.n3.nabble.com/file/n4297542/sample.png>
>
> Erick,
>
> Here's my use-case. Assume I have the following term stored in global_Value
> as such:
> - executionvenuetype#*OFF*-FACILITY
> - partyid#B2A*SEF*9AJP5P9OLL1190
>
> Now, I want to retrieve any document matching the term in global_Value that
> contains the keyword: "off" and "sef". With regards to leading wild-card,
> that's intentional. Not a mail issue. These fields typically contains Guid,
> and some financial terms (eg: Bonds, swaps, etc..). If I don't use any
> non-wildcard, then it's an exact match. But my use-case dictates that it
> should retrieve if it's a partial match.
>
> So what's my best bet for analyzer in such cases ?
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Performance-Issue-when-querying-Multivalued-fields-SOLR-6-1-0-
> tp4297255p4297542.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-22 Thread slee
Alex,

You do have a point with EdgeNGramFilterFactory. As requested, I've attached
a sample screenshotfor your review.
<http://lucene.472066.n3.nabble.com/file/n4297542/sample.png> 

Erick,

Here's my use-case. Assume I have the following term stored in global_Value
as such:
- executionvenuetype#*OFF*-FACILITY
- partyid#B2A*SEF*9AJP5P9OLL1190

Now, I want to retrieve any document matching the term in global_Value that
contains the keyword: "off" and "sef". With regards to leading wild-card,
that's intentional. Not a mail issue. These fields typically contains Guid,
and some financial terms (eg: Bonds, swaps, etc..). If I don't use any
non-wildcard, then it's an exact match. But my use-case dictates that it
should retrieve if it's a partial match.

So what's my best bet for analyzer in such cases ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-Issue-when-querying-Multivalued-fields-SOLR-6-1-0-tp4297255p4297542.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-22 Thread Erick Erickson
I totally missed EdgeNGram. Good catch Alex!

Yeah, that's a killer. My shot in the dark here is that
your analysis chain isn't the best choice to support your use-case and you're
shooting yourself in the foot. So let's back up and talk
about your use-case and maybe re-define your analysis
chain for better performance.

Best,
Erick

On Thu, Sep 22, 2016 at 8:21 AM, Alexandre Rafalovitch
<arafa...@gmail.com> wrote:
> Well,
>
> I am guessing this is the line that's causing the problem:
>  maxGramSize="50"/>
>
> Run your real sample for that field against your indexing definition
> in Admin UI and see how many tokens you end up with. You may have 50
> tokens, but if each of them generates up to 47 representations..
>
> Regards,
> Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 22 September 2016 at 22:08, slee <sleed...@gmail.com> wrote:
>> Here's what I have define in my schema:
>> 
>> 
>>   
>>   
>>   
>>   > maxGramSize="50"/>
>> 
>> 
>>   
>>   
>>   
>> 
>>   
>>
>> > required="true" stored="true"/>
>>
>> This is what I send in the query (2 values):
>> q=global_Value:*mas+AND+global_Value:*sef=text=5=2.2=explicit=global_Value
>>
>> In addition, memory is taking way over 90%, given the heap space set at 5g.
>>
>>
>>
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Performance-Issue-when-querying-Multivalued-fields-SOLR-6-1-0-tp4297255p4297474.html
>> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-22 Thread Alexandre Rafalovitch
Well,

I am guessing this is the line that's causing the problem:


Run your real sample for that field against your indexing definition
in Admin UI and see how many tokens you end up with. You may have 50
tokens, but if each of them generates up to 47 representations..

Regards,
Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 22 September 2016 at 22:08, slee <sleed...@gmail.com> wrote:
> Here's what I have define in my schema:
> 
> 
>   
>   
>   
>maxGramSize="50"/>
> 
> 
>   
>   
>   
> 
>   
>
>  required="true" stored="true"/>
>
> This is what I send in the query (2 values):
> q=global_Value:*mas+AND+global_Value:*sef=text=5=2.2=explicit=global_Value
>
> In addition, memory is taking way over 90%, given the heap space set at 5g.
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Performance-Issue-when-querying-Multivalued-fields-SOLR-6-1-0-tp4297255p4297474.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-22 Thread Erick Erickson
Wait: Are you really doing leading wildcard queries? If so, that's
likely the root of
the problem. Unless you add ReverseWildcardFilterFactory to your
analysis chain, Lucene has to enumerate your entire set of terms to
find likely candidates,
which takes a lot of resources. What happens if you use similar
trailing wildcards? And
what happens when you use simple non-wildcard queries?

Or is this just bolding that gets translated to asterisks by the mail
formatting?

Finally, what are typical values in this field? I'm really asking if your use of
KeywordTokenizer is the best choice here. It often is, but I've seen
it mis-used so
I thought we should check.

Best,
Erick



On Thu, Sep 22, 2016 at 8:08 AM, slee <sleed...@gmail.com> wrote:
> Here's what I have define in my schema:
> 
> 
>   
>   
>   
>maxGramSize="50"/>
> 
> 
>   
>   
>   
> 
>   
>
>  required="true" stored="true"/>
>
> This is what I send in the query (2 values):
> q=global_Value:*mas+AND+global_Value:*sef=text=5=2.2=explicit=global_Value
>
> In addition, memory is taking way over 90%, given the heap space set at 5g.
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Performance-Issue-when-querying-Multivalued-fields-SOLR-6-1-0-tp4297255p4297474.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-22 Thread slee
Here's what I have define in my schema:


  
  
  
  


  
  
  

  



This is what I send in the query (2 values):
q=global_Value:*mas+AND+global_Value:*sef=text=5=2.2=explicit=global_Value

In addition, memory is taking way over 90%, given the heap space set at 5g.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-Issue-when-querying-Multivalued-fields-SOLR-6-1-0-tp4297255p4297474.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-21 Thread Alexandre Rafalovitch
Could you share the field and type definition and the type of query you are
doing.

Under the covers, multi valued fields are mostly the same as multi term
strings, just with large gaps between term positions.

And if you return the same set of fields, rehydration of the document
should be the same as well.

Regards,
   Alex

On 22 Sep 2016 8:30 AM, "Stan Lee" <sleed...@gmail.com> wrote:

> I did 3 sets of query as followed:
> - multi-value field only : slow
> -  single field value: fast
> - multi-value and single field combine: slow
>
> So yes, the difference is base on which field you search against. I'm
> experiencing the same issue described here:
>
> http://stackoverflow.com/questions/29745135/performance-issue-with-
> multivalued-field-in-lucene
>
> This individual ended up using elasticsearch which doesn't help me. I'm
> wondering if multivalue fields cannot exceed certain terms? I only have 54
> to 60 terms.
>
>
>   Original Message
> From: arafa...@gmail.com
> Sent: September 21, 2016 7:40 PM
> To: solr-user@lucene.apache.org
> Reply-to: solr-user@lucene.apache.org
> Subject: Re: Performance Issue when querying Multivalued fields [SOLR
> 6.1.0]
>
> Do you _return_ the same set of fields in both queries? Is the difference
> truly just which field you search against?
>
> Regards,
> Alex
>
> On 22 Sep 2016 3:03 AM, "slee" <sleed...@gmail.com> wrote:
>
> > I've been doing a lot of reading on this forum with regards to
> performance
> > on
> > multivalued fields, and nothing helps. When I query on singlie fields,
> the
> > response time is fairly quick (typically < 1sec). However, when I query
> on
> > multivalued fields, the response is > 2 mins ~ 3 mins.
> >
> > Here's my current environment:
> > CPU: Intel Xeon E5-2637 v3 @ 3.5Ghz
> > RAM: 16GB
> > OS: Windows 7 64 Bit
> > HD Controller: SCSI
> >
> > SOLR Documents: 17 million.
> > Average # of terms in a multivalued fields: 54~60
> > Schema: Multivalue field has indexed="true"
> >
> > I've set both my XMS and XMX to 5g, using -m 5g option. Another thing I
> > realized is, every time I query on the multivalued, the memory
> consumptions
> > takes up over 90%. Could this also be the cause of the issue? I have
> tried
> > MMapDirectoryFactory, the results seems to be the same (vs the default
> > NRTCachingDirectoryFactory).
> >
> > Please help. Any advise would be appreciated.
> > Thanks.
> >
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.
> > nabble.com/Performance-Issue-when-querying-Multivalued-
> > fields-SOLR-6-1-0-tp4297255.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-21 Thread Stan Lee
I did 3 sets of query as followed:
- multi-value field only : slow
-  single field value: fast
- multi-value and single field combine: slow

So yes, the difference is base on which field you search against. I'm 
experiencing the same issue described here: 

http://stackoverflow.com/questions/29745135/performance-issue-with-multivalued-field-in-lucene

This individual ended up using elasticsearch which doesn't help me. I'm 
wondering if multivalue fields cannot exceed certain terms? I only have 54 to 
60 terms.


  Original Message  
From: arafa...@gmail.com
Sent: September 21, 2016 7:40 PM
To: solr-user@lucene.apache.org
Reply-to: solr-user@lucene.apache.org
Subject: Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

Do you _return_ the same set of fields in both queries? Is the difference
truly just which field you search against?

Regards,
    Alex

On 22 Sep 2016 3:03 AM, "slee" <sleed...@gmail.com> wrote:

> I've been doing a lot of reading on this forum with regards to performance
> on
> multivalued fields, and nothing helps. When I query on singlie fields, the
> response time is fairly quick (typically < 1sec). However, when I query on
> multivalued fields, the response is > 2 mins ~ 3 mins.
>
> Here's my current environment:
> CPU: Intel Xeon E5-2637 v3 @ 3.5Ghz
> RAM: 16GB
> OS: Windows 7 64 Bit
> HD Controller: SCSI
>
> SOLR Documents: 17 million.
> Average # of terms in a multivalued fields: 54~60
> Schema: Multivalue field has indexed="true"
>
> I've set both my XMS and XMX to 5g, using -m 5g option. Another thing I
> realized is, every time I query on the multivalued, the memory consumptions
> takes up over 90%. Could this also be the cause of the issue? I have tried
> MMapDirectoryFactory, the results seems to be the same (vs the default
> NRTCachingDirectoryFactory).
>
> Please help. Any advise would be appreciated.
> Thanks.
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Performance-Issue-when-querying-Multivalued-
> fields-SOLR-6-1-0-tp4297255.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-21 Thread Alexandre Rafalovitch
Do you _return_ the same set of fields in both queries? Is the difference
truly just which field you search against?

Regards,
Alex

On 22 Sep 2016 3:03 AM, "slee" <sleed...@gmail.com> wrote:

> I've been doing a lot of reading on this forum with regards to performance
> on
> multivalued fields, and nothing helps. When I query on singlie fields, the
> response time is fairly quick (typically < 1sec). However, when I query on
> multivalued fields, the response is > 2 mins ~ 3 mins.
>
> Here's my current environment:
> CPU: Intel Xeon E5-2637 v3 @ 3.5Ghz
> RAM: 16GB
> OS: Windows 7 64 Bit
> HD Controller: SCSI
>
> SOLR Documents: 17 million.
> Average # of terms in a multivalued fields: 54~60
> Schema: Multivalue field has indexed="true"
>
> I've set both my XMS and XMX to 5g, using -m 5g option. Another thing I
> realized is, every time I query on the multivalued, the memory consumptions
> takes up over 90%. Could this also be the cause of the issue? I have tried
> MMapDirectoryFactory, the results seems to be the same (vs the default
> NRTCachingDirectoryFactory).
>
> Please help. Any advise would be appreciated.
> Thanks.
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Performance-Issue-when-querying-Multivalued-
> fields-SOLR-6-1-0-tp4297255.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-21 Thread slee
I've been doing a lot of reading on this forum with regards to performance on
multivalued fields, and nothing helps. When I query on singlie fields, the
response time is fairly quick (typically < 1sec). However, when I query on
multivalued fields, the response is > 2 mins ~ 3 mins.

Here's my current environment:
CPU: Intel Xeon E5-2637 v3 @ 3.5Ghz
RAM: 16GB
OS: Windows 7 64 Bit
HD Controller: SCSI

SOLR Documents: 17 million.
Average # of terms in a multivalued fields: 54~60
Schema: Multivalue field has indexed="true"

I've set both my XMS and XMX to 5g, using -m 5g option. Another thing I
realized is, every time I query on the multivalued, the memory consumptions
takes up over 90%. Could this also be the cause of the issue? I have tried
MMapDirectoryFactory, the results seems to be the same (vs the default
NRTCachingDirectoryFactory). 

Please help. Any advise would be appreciated. 
Thanks.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-Issue-when-querying-Multivalued-fields-SOLR-6-1-0-tp4297255.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr performance issue

2016-02-09 Thread Zheng Lin Edwin Yeo
1 million document isn't considered big for Solr. How much RAM does your
machine have?

Regards,
Edwin

On 8 February 2016 at 23:45, Susheel Kumar  wrote:

> 1 million document shouldn't have any issues at all.  Something else is
> wrong with your hw/system configuration.
>
> Thanks,
> Susheel
>
> On Mon, Feb 8, 2016 at 6:45 AM, sara hajili  wrote:
>
> > On Mon, Feb 8, 2016 at 3:04 AM, sara hajili 
> wrote:
> >
> > > sorry i made a mistake i have a bout 1000 K doc.
> > > i mean about 100 doc.
> > >
> > > On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic <
> > > emir.arnauto...@sematext.com> wrote:
> > >
> > >> Hi Sara,
> > >> Not sure if I am reading this right, but I read it as you have 1000
> doc
> > >> index and issues? Can you tell us bit more about your setup: number of
> > >> servers, hw, index size, number of shards, queries that you run, do
> you
> > >> index at the same time...
> > >>
> > >> It seems to me that you are running Solr on server with limited RAM
> and
> > >> probably small heap. Swapping for sure will slow things down and GC is
> > most
> > >> likely reason for high CPU.
> > >>
> > >> You can use http://sematext.com/spm to collect Solr and host metrics
> > and
> > >> see where the issue is.
> > >>
> > >> Thanks,
> > >> Emir
> > >>
> > >> --
> > >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > >> Solr & Elasticsearch Support * http://sematext.com/
> > >>
> > >>
> > >>
> > >> On 08.02.2016 10:27, sara hajili wrote:
> > >>
> > >>> hi all.
> > >>> i have a problem with my solr performance and usage hardware like a
> > >>> ram,cup...
> > >>> i have a lot of document and so indexed file about 1000 doc in solr
> > that
> > >>> every doc has about 8 field in average.
> > >>> and each field has about 60 char.
> > >>> i set my field as a storedfield = "false" except of  1 field. // i
> read
> > >>> that this help performance.
> > >>> i used copy field and dynamic field if it was necessary . // i read
> > that
> > >>> this help performance.
> > >>> and now my question is that when i run a lot of query on solr i faced
> > >>> with
> > >>> a problem solr use more cpu and ram and after that filled ,it use a
> lot
> > >>>   swapped storage and then use hard,but doesn't create a system file!
> > >>> solr
> > >>> fill hard until i forced to restart server to release hard disk.
> > >>> and now my question is why solr treat in this way? and how i can
> avoid
> > >>> solr
> > >>> to use huge cpu space?
> > >>> any config need?!
> > >>>
> > >>>
> > >>
> > >
> >
>


Re: solr performance issue

2016-02-08 Thread Susheel Kumar
1 million document shouldn't have any issues at all.  Something else is
wrong with your hw/system configuration.

Thanks,
Susheel

On Mon, Feb 8, 2016 at 6:45 AM, sara hajili  wrote:

> On Mon, Feb 8, 2016 at 3:04 AM, sara hajili  wrote:
>
> > sorry i made a mistake i have a bout 1000 K doc.
> > i mean about 100 doc.
> >
> > On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic <
> > emir.arnauto...@sematext.com> wrote:
> >
> >> Hi Sara,
> >> Not sure if I am reading this right, but I read it as you have 1000 doc
> >> index and issues? Can you tell us bit more about your setup: number of
> >> servers, hw, index size, number of shards, queries that you run, do you
> >> index at the same time...
> >>
> >> It seems to me that you are running Solr on server with limited RAM and
> >> probably small heap. Swapping for sure will slow things down and GC is
> most
> >> likely reason for high CPU.
> >>
> >> You can use http://sematext.com/spm to collect Solr and host metrics
> and
> >> see where the issue is.
> >>
> >> Thanks,
> >> Emir
> >>
> >> --
> >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> >> Solr & Elasticsearch Support * http://sematext.com/
> >>
> >>
> >>
> >> On 08.02.2016 10:27, sara hajili wrote:
> >>
> >>> hi all.
> >>> i have a problem with my solr performance and usage hardware like a
> >>> ram,cup...
> >>> i have a lot of document and so indexed file about 1000 doc in solr
> that
> >>> every doc has about 8 field in average.
> >>> and each field has about 60 char.
> >>> i set my field as a storedfield = "false" except of  1 field. // i read
> >>> that this help performance.
> >>> i used copy field and dynamic field if it was necessary . // i read
> that
> >>> this help performance.
> >>> and now my question is that when i run a lot of query on solr i faced
> >>> with
> >>> a problem solr use more cpu and ram and after that filled ,it use a lot
> >>>   swapped storage and then use hard,but doesn't create a system file!
> >>> solr
> >>> fill hard until i forced to restart server to release hard disk.
> >>> and now my question is why solr treat in this way? and how i can avoid
> >>> solr
> >>> to use huge cpu space?
> >>> any config need?!
> >>>
> >>>
> >>
> >
>


solr performance issue

2016-02-08 Thread sara hajili
hi all.
i have a problem with my solr performance and usage hardware like a
ram,cup...
i have a lot of document and so indexed file about 1000 doc in solr that
every doc has about 8 field in average.
and each field has about 60 char.
i set my field as a storedfield = "false" except of  1 field. // i read
that this help performance.
i used copy field and dynamic field if it was necessary . // i read that
this help performance.
and now my question is that when i run a lot of query on solr i faced with
a problem solr use more cpu and ram and after that filled ,it use a lot
 swapped storage and then use hard,but doesn't create a system file! solr
fill hard until i forced to restart server to release hard disk.
and now my question is why solr treat in this way? and how i can avoid solr
to use huge cpu space?
any config need?!


Re: solr performance issue

2016-02-08 Thread Emir Arnautovic

Hi Sara,
It is still considered to be small index. Can you give us bit details 
about your setup?


Thanks,
Emir

On 08.02.2016 12:04, sara hajili wrote:

sorry i made a mistake i have a bout 1000 K doc.
i mean about 100 doc.

On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:


Hi Sara,
Not sure if I am reading this right, but I read it as you have 1000 doc
index and issues? Can you tell us bit more about your setup: number of
servers, hw, index size, number of shards, queries that you run, do you
index at the same time...

It seems to me that you are running Solr on server with limited RAM and
probably small heap. Swapping for sure will slow things down and GC is most
likely reason for high CPU.

You can use http://sematext.com/spm to collect Solr and host metrics and
see where the issue is.

Thanks,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



On 08.02.2016 10:27, sara hajili wrote:


hi all.
i have a problem with my solr performance and usage hardware like a
ram,cup...
i have a lot of document and so indexed file about 1000 doc in solr that
every doc has about 8 field in average.
and each field has about 60 char.
i set my field as a storedfield = "false" except of  1 field. // i read
that this help performance.
i used copy field and dynamic field if it was necessary . // i read that
this help performance.
and now my question is that when i run a lot of query on solr i faced with
a problem solr use more cpu and ram and after that filled ,it use a lot
   swapped storage and then use hard,but doesn't create a system file! solr
fill hard until i forced to restart server to release hard disk.
and now my question is why solr treat in this way? and how i can avoid
solr
to use huge cpu space?
any config need?!




--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: solr performance issue

2016-02-08 Thread Emir Arnautovic

Hi Sara,
Not sure if I am reading this right, but I read it as you have 1000 doc 
index and issues? Can you tell us bit more about your setup: number of 
servers, hw, index size, number of shards, queries that you run, do you 
index at the same time...


It seems to me that you are running Solr on server with limited RAM and 
probably small heap. Swapping for sure will slow things down and GC is 
most likely reason for high CPU.


You can use http://sematext.com/spm to collect Solr and host metrics and 
see where the issue is.


Thanks,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On 08.02.2016 10:27, sara hajili wrote:

hi all.
i have a problem with my solr performance and usage hardware like a
ram,cup...
i have a lot of document and so indexed file about 1000 doc in solr that
every doc has about 8 field in average.
and each field has about 60 char.
i set my field as a storedfield = "false" except of  1 field. // i read
that this help performance.
i used copy field and dynamic field if it was necessary . // i read that
this help performance.
and now my question is that when i run a lot of query on solr i faced with
a problem solr use more cpu and ram and after that filled ,it use a lot
  swapped storage and then use hard,but doesn't create a system file! solr
fill hard until i forced to restart server to release hard disk.
and now my question is why solr treat in this way? and how i can avoid solr
to use huge cpu space?
any config need?!





Re: solr performance issue

2016-02-08 Thread sara hajili
sorry i made a mistake i have a bout 1000 K doc.
i mean about 100 doc.

On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Hi Sara,
> Not sure if I am reading this right, but I read it as you have 1000 doc
> index and issues? Can you tell us bit more about your setup: number of
> servers, hw, index size, number of shards, queries that you run, do you
> index at the same time...
>
> It seems to me that you are running Solr on server with limited RAM and
> probably small heap. Swapping for sure will slow things down and GC is most
> likely reason for high CPU.
>
> You can use http://sematext.com/spm to collect Solr and host metrics and
> see where the issue is.
>
> Thanks,
> Emir
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> On 08.02.2016 10:27, sara hajili wrote:
>
>> hi all.
>> i have a problem with my solr performance and usage hardware like a
>> ram,cup...
>> i have a lot of document and so indexed file about 1000 doc in solr that
>> every doc has about 8 field in average.
>> and each field has about 60 char.
>> i set my field as a storedfield = "false" except of  1 field. // i read
>> that this help performance.
>> i used copy field and dynamic field if it was necessary . // i read that
>> this help performance.
>> and now my question is that when i run a lot of query on solr i faced with
>> a problem solr use more cpu and ram and after that filled ,it use a lot
>>   swapped storage and then use hard,but doesn't create a system file! solr
>> fill hard until i forced to restart server to release hard disk.
>> and now my question is why solr treat in this way? and how i can avoid
>> solr
>> to use huge cpu space?
>> any config need?!
>>
>>
>


Re: solr performance issue

2016-02-08 Thread sara hajili
On Mon, Feb 8, 2016 at 3:04 AM, sara hajili  wrote:

> sorry i made a mistake i have a bout 1000 K doc.
> i mean about 100 doc.
>
> On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic <
> emir.arnauto...@sematext.com> wrote:
>
>> Hi Sara,
>> Not sure if I am reading this right, but I read it as you have 1000 doc
>> index and issues? Can you tell us bit more about your setup: number of
>> servers, hw, index size, number of shards, queries that you run, do you
>> index at the same time...
>>
>> It seems to me that you are running Solr on server with limited RAM and
>> probably small heap. Swapping for sure will slow things down and GC is most
>> likely reason for high CPU.
>>
>> You can use http://sematext.com/spm to collect Solr and host metrics and
>> see where the issue is.
>>
>> Thanks,
>> Emir
>>
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>>
>> On 08.02.2016 10:27, sara hajili wrote:
>>
>>> hi all.
>>> i have a problem with my solr performance and usage hardware like a
>>> ram,cup...
>>> i have a lot of document and so indexed file about 1000 doc in solr that
>>> every doc has about 8 field in average.
>>> and each field has about 60 char.
>>> i set my field as a storedfield = "false" except of  1 field. // i read
>>> that this help performance.
>>> i used copy field and dynamic field if it was necessary . // i read that
>>> this help performance.
>>> and now my question is that when i run a lot of query on solr i faced
>>> with
>>> a problem solr use more cpu and ram and after that filled ,it use a lot
>>>   swapped storage and then use hard,but doesn't create a system file!
>>> solr
>>> fill hard until i forced to restart server to release hard disk.
>>> and now my question is why solr treat in this way? and how i can avoid
>>> solr
>>> to use huge cpu space?
>>> any config need?!
>>>
>>>
>>
>


Re: Jetty Vs Tomcat (Performance issue)

2015-11-16 Thread Upayavira
Just to be sure, are you installing Solr inside a different Jetty, or
using the Jetty that comes with Solr?

You would be expected to use the one installed and managed by Solr.

Upayavira

On Mon, Nov 16, 2015, at 11:58 AM, Behzad Qureshi wrote:
> Hi All,
> 
> I am using Tomcat server with solr 4.10.3. I want to shift to Jetty as
> replacement of Tomcat server but I am not getting any good results with
> respect to performance. I have tried solr 4.10.3 on both Jetty 8 and
> Jetty
> 9 with java 8. Below are configurations I have used.
> 
> Can anyone please tell me if I am missing anything?
> 
> *Jetty:*
> 
> Xms: 5GB
> Xmx: 50GB
> Xss: 256MB
> 
> 
> *Tomcat:*
> 
> Xms: 5GB
> Xmx: 50GB
> Xss: Default
> 
> 
> *Index Size:*
> 
> 1TB (20 cores)
> 
> 
> 
> -- 
> 
> Regards,
> 
> Behzad Qureshi


Re: Jetty Vs Tomcat (Performance issue)

2015-11-16 Thread Ishan Chattopadhyaya
Also, what are the specific performance issues you are observing?

On Mon, Nov 16, 2015 at 6:41 PM, Upayavira  wrote:

> Just to be sure, are you installing Solr inside a different Jetty, or
> using the Jetty that comes with Solr?
>
> You would be expected to use the one installed and managed by Solr.
>
> Upayavira
>
> On Mon, Nov 16, 2015, at 11:58 AM, Behzad Qureshi wrote:
> > Hi All,
> >
> > I am using Tomcat server with solr 4.10.3. I want to shift to Jetty as
> > replacement of Tomcat server but I am not getting any good results with
> > respect to performance. I have tried solr 4.10.3 on both Jetty 8 and
> > Jetty
> > 9 with java 8. Below are configurations I have used.
> >
> > Can anyone please tell me if I am missing anything?
> >
> > *Jetty:*
> >
> > Xms: 5GB
> > Xmx: 50GB
> > Xss: 256MB
> >
> >
> > *Tomcat:*
> >
> > Xms: 5GB
> > Xmx: 50GB
> > Xss: Default
> >
> >
> > *Index Size:*
> >
> > 1TB (20 cores)
> >
> >
> >
> > --
> >
> > Regards,
> >
> > Behzad Qureshi
>


Re: Jetty Vs Tomcat (Performance issue)

2015-11-16 Thread Timothy Potter
I hope 256MB of Xss is a typo and you really meant 256k right?


On Mon, Nov 16, 2015 at 4:58 AM, Behzad Qureshi
 wrote:
> Hi All,
>
> I am using Tomcat server with solr 4.10.3. I want to shift to Jetty as
> replacement of Tomcat server but I am not getting any good results with
> respect to performance. I have tried solr 4.10.3 on both Jetty 8 and Jetty
> 9 with java 8. Below are configurations I have used.
>
> Can anyone please tell me if I am missing anything?
>
> *Jetty:*
>
> Xms: 5GB
> Xmx: 50GB
> Xss: 256MB
>
>
> *Tomcat:*
>
> Xms: 5GB
> Xmx: 50GB
> Xss: Default
>
>
> *Index Size:*
>
> 1TB (20 cores)
>
>
>
> --
>
> Regards,
>
> Behzad Qureshi


Jetty Vs Tomcat (Performance issue)

2015-11-16 Thread Behzad Qureshi
Hi All,

I am using Tomcat server with solr 4.10.3. I want to shift to Jetty as
replacement of Tomcat server but I am not getting any good results with
respect to performance. I have tried solr 4.10.3 on both Jetty 8 and Jetty
9 with java 8. Below are configurations I have used.

Can anyone please tell me if I am missing anything?

*Jetty:*

Xms: 5GB
Xmx: 50GB
Xss: 256MB


*Tomcat:*

Xms: 5GB
Xmx: 50GB
Xss: Default


*Index Size:*

1TB (20 cores)



-- 

Regards,

Behzad Qureshi


Re: Jetty Vs Tomcat (Performance issue)

2015-11-16 Thread Behzad Qureshi
Upayavira:: Just to be sure, are you installing Solr inside a different
Jetty, or using the Jetty that comes with Solr?
*Behzad:: *Jetty that comes with solr.

Jetty-8.1.10.v20130312
Embedded Solr 4.10.3

Also used Jetty9 but not embedded. Tried solr 4.10.3 with Jetty 9 but still
facing same issue.



Ishan:: Also, what are the specific performance issues you are observing?
*Behzad:: *Elapsed time (QTime) of SOLR with Jetty is more than Elapsed
time of Tomcat.



Timothy:: I hope 256MB of Xss is a typo and you really meant 256k right?
*Behzad:: *Right. My bad.

On Tue, Nov 17, 2015 at 3:17 AM, Timothy Potter 
wrote:

> I hope 256MB of Xss is a typo and you really meant 256k right?
>
>
> On Mon, Nov 16, 2015 at 4:58 AM, Behzad Qureshi
>  wrote:
> > Hi All,
> >
> > I am using Tomcat server with solr 4.10.3. I want to shift to Jetty as
> > replacement of Tomcat server but I am not getting any good results with
> > respect to performance. I have tried solr 4.10.3 on both Jetty 8 and
> Jetty
> > 9 with java 8. Below are configurations I have used.
> >
> > Can anyone please tell me if I am missing anything?
> >
> > *Jetty:*
> >
> > Xms: 5GB
> > Xmx: 50GB
> > Xss: 256MB
> >
> >
> > *Tomcat:*
> >
> > Xms: 5GB
> > Xmx: 50GB
> > Xss: Default
> >
> >
> > *Index Size:*
> >
> > 1TB (20 cores)
> >
> >
> >
> > --
> >
> > Regards,
> >
> > Behzad Qureshi
>



-- 

Regards,

Behzad Qureshi

Senior Software Engineer | NorthBay Solutions (Pvt.) Ltd

410-G4, Johar Town, Lahore, Pakistan
Ph: +92 42 35290152-56

Skype ID: behzadqureshi.nbs


RE: Performance issue with FILTER QUERY

2015-08-20 Thread Maulin Rathod
Thanks Erick. Even 1 second commit interval is fine for us. But in that case 
also filter cache will be flushed after 1 sec. The end user will still feel 
slowness due to this as the query will take around 1 sec if we use filter query.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 20 August 2015 00:44
To: solr-user@lucene.apache.org
Subject: Re: Performance issue with FILTER QUERY

If you're committing that rapidly then you're correct, filter caching may not 
be a good fit. The entire _point_ of filter caching is to increase performance 
of subsequent executions of the exact same fq clause. But if you're throwing 
them away every second there's little/no benefit.

You really have two choices here
1 lengthen out the commit interval. Frankly, 1 second commit
intervals are rarely necessary despite what
 your product manager says. Really, check this requirement out.
2 disable caches.

Autowarming is potentially useful here, but if your filter queries are taking 
on the order of a second and you're committing every second then autowarming 
takes too long to help.

Best,
Erick

On Wed, Aug 19, 2015 at 12:26 AM, Mikhail Khludnev mkhlud...@griddynamics.com 
wrote:
 Maulin,
 Did you check performance with segmented filters which I advised recently?

 On Wed, Aug 19, 2015 at 10:24 AM, Maulin Rathod mrat...@asite.com wrote:

 As per my understanding caches are flushed every time when add new 
 document to collection (we do soft commit at every 1 sec to make 
 newly added document available for search). Due to which it is not 
 effectively uses cache and hence it slow every time in our case.

 -Original Message-
 From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
 Sent: 19 August 2015 12:16
 To: solr-user@lucene.apache.org
 Subject: Re: Performance issue with FILTER QUERY

 On Wed, 2015-08-19 at 05:55 +, Maulin Rathod wrote:
  SLOW WITH FILTER QUERY (takes more than 1 second) 
  
 
  q=+recipient_id:(4042) AND project_id:(332) AND 
  resource_id:(13332247
  13332245 13332243 13332241 13332239) AND entity_type:(2) AND
  -action_id:(20 32) == This returns 5 records
  fq=+action_status:(0) AND is_active:(true) == This Filter Query 
  returns 9432252 records

 The fq is evaluated independently of the q: For the fq a bitset is 
 allocated, filled and stored in cache. Then the q is evaluated and 
 the two bitsets are merged.

 Next time you use the same fq, it should be cached (if you have 
 caching
 enabled) and be a lot faster.


 Also, if you ran your two tests right after each other, the second 
 one benefits from disk caching. If you had executed them in reverse 
 order, the
 q+fq might have been the fastest one.

 - Toke Eskildsen, State and University Library, Denmark





 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Performance issue with FILTER QUERY

2015-08-19 Thread Erick Erickson
If you're committing that rapidly then you're correct, filter caching
may not be a good fit. The entire _point_ of
filter caching is to increase performance of subsequent executions of
the exact same fq clause. But if you're
throwing them away every second there's little/no benefit.

You really have two choices here
1 lengthen out the commit interval. Frankly, 1 second commit
intervals are rarely necessary despite what
 your product manager says. Really, check this requirement out.
2 disable caches.

Autowarming is potentially useful here, but if your filter queries are
taking on the order of a second and
you're committing every second then autowarming takes too long to help.

Best,
Erick

On Wed, Aug 19, 2015 at 12:26 AM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
 Maulin,
 Did you check performance with segmented filters which I advised recently?

 On Wed, Aug 19, 2015 at 10:24 AM, Maulin Rathod mrat...@asite.com wrote:

 As per my understanding caches are flushed every time when add new
 document to collection (we do soft commit at every 1 sec to make newly
 added document available for search). Due to which it is not effectively
 uses cache and hence it slow every time in our case.

 -Original Message-
 From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
 Sent: 19 August 2015 12:16
 To: solr-user@lucene.apache.org
 Subject: Re: Performance issue with FILTER QUERY

 On Wed, 2015-08-19 at 05:55 +, Maulin Rathod wrote:
  SLOW WITH FILTER QUERY (takes more than 1 second)
  
 
  q=+recipient_id:(4042) AND project_id:(332) AND resource_id:(13332247
  13332245 13332243 13332241 13332239) AND entity_type:(2) AND
  -action_id:(20 32) == This returns 5 records
  fq=+action_status:(0) AND is_active:(true) == This Filter Query
  returns 9432252 records

 The fq is evaluated independently of the q: For the fq a bitset is
 allocated, filled and stored in cache. Then the q is evaluated and the two
 bitsets are merged.

 Next time you use the same fq, it should be cached (if you have caching
 enabled) and be a lot faster.


 Also, if you ran your two tests right after each other, the second one
 benefits from disk caching. If you had executed them in reverse order, the
 q+fq might have been the fastest one.

 - Toke Eskildsen, State and University Library, Denmark





 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com


  1   2   3   >