Re: FunctionQueries and FieldCache and OOM

2011-03-16 Thread Markus Jelsma
Hi,


> FWIW: it sounds like your problem wasn't actually related to your
> fieldCache, but probably instead if was because of how big your
> queryResultCache is

It's the same cluster as in the other thread. I decided a long time ago that 
documentCache and queryResultCache wouldn't be a good idea because of the 
extreme volume of queries, all hitting very different parts of the index. Hit 
ratio's were extremely low even with high cache sizes (and with appropriate 
high JVM heap). The commit rate is also high so auto warming a cache big 
enough to have a high hit ratio would take _very_ long.

> 
> : > > Am i correct when i assume that Lucene FieldCache entries are added
> : > > for each unique function query?  In that case, every query is a
> : > > unique cache
> 
> ...no, the FieldCache has one entry per field name, and the value of that
> cache is an "array" keyed off of the internal docId for every doc in the
> index, and the corrisponding value (it's an uninverted version of lucene's
> inverted index for doing fast value lookups by document)
> 
> changes in the *values* used in your function queries won't affect
> FieldCache usage -- only changing the *fields* used in your functions
> would impact that.

Thanks for bringing additional clarity :)

> 
> : > > each unique function query?  In that case, every query is a unique
> : > > cache entry because it operates on milliseconds. If all doesn't work
> : > > i might be
> 
> what you describe is correct, but not in the FieldCache -- the
> queryResultCache is where queries that deal with the main result set (ie:
> paginated and/or sorted) wind up .. having lots of distinct queries in
> the "bq" (or "q") param will make the number of unique items in that cache
> grow significantly (just like having lots of distinct queries in the "fq"
> will cause your filterCache to grow significantly)
> 
> you should definitley checkout what max size you have configured for your
> queryResultCache ... it sounds like it's proably too big, if you were
> getting OOM errors from having high precision dates in your boost queries.
> while i think using less precision is a wise choice, you should still
> consider dialing that max size down, so that if some other usage pattern
> still causes lots of unique queries in a short time period (a bot crawling
> your site map perhaps) it doesn't fill up and cause another OOM

One of the reasons i choose to disable queryResultCache. Can you come up with 
new explanations knowing that i only use filterCache and Lucene's 
fieldValueCache and fieldCache?

Thanks!

> 
> 
> 
> -Hoss


Re: FunctionQueries and FieldCache and OOM

2011-03-16 Thread Chris Hostetter

: Alright, i can now confirm the issue has been resolved by reducing precision. 
: The garbage collector on nodes without reduced precision has a real hard time 
: keeping up and clearly shows a very different graph of heap consumption.
: 
: Consider using MINUTE, HOUR or DAY as precision in case you suffer from 
: excessive memory consumption:
: 
: recip(ms(NOW/,),,1,1)

FWIW: it sounds like your problem wasn't actually related to your 
fieldCache, but probably instead if was because of how big your 
queryResultCache is

: > > Am i correct when i assume that Lucene FieldCache entries are added for
: > > each unique function query?  In that case, every query is a unique cache

...no, the FieldCache has one entry per field name, and the value of that 
cache is an "array" keyed off of the internal docId for every doc in the 
index, and the corrisponding value (it's an uninverted version of lucene's 
inverted index for doing fast value lookups by document)

changes in the *values* used in your function queries won't affect 
FieldCache usage -- only changing the *fields* used in your functions 
would impact that.

: > > each unique function query?  In that case, every query is a unique cache
: > > entry because it operates on milliseconds. If all doesn't work i might be

what you describe is correct, but not in the FieldCache -- the 
queryResultCache is where queries that deal with the main result set (ie: 
paginated and/or sorted) wind up .. having lots of distinct queries in 
the "bq" (or "q") param will make the number of unique items in that cache 
grow significantly (just like having lots of distinct queries in the "fq" 
will cause your filterCache to grow significantly)

you should definitley checkout what max size you have configured for your 
queryResultCache ... it sounds like it's proably too big, if you were 
getting OOM errors from having high precision dates in your boost queries.  
while i think using less precision is a wise choice, you should still 
consider dialing that max size down, so that if some other usage pattern 
still causes lots of unique queries in a short time period (a bot crawling 
your site map perhaps) it doesn't fill up and cause another OOM



-Hoss


Re: FunctionQueries and FieldCache and OOM

2011-03-10 Thread Markus Jelsma
Alright, i can now confirm the issue has been resolved by reducing precision. 
The garbage collector on nodes without reduced precision has a real hard time 
keeping up and clearly shows a very different graph of heap consumption.

Consider using MINUTE, HOUR or DAY as precision in case you suffer from 
excessive memory consumption:

recip(ms(NOW/,),,1,1)

On Thursday 10 March 2011 15:14:25 Markus Jelsma wrote:
> Well, it's quite hard to debug because the values listed on the stats page
> in the fieldCache section don't make much sense. Reducing precision with
> NOW/HOUR, however, does seem to make a difference.
> 
> It is hard (or impossible) to reproduce this is a test setup with the same
> index but without continues updates and without stress tests. Firing manual
> queries with different values for the bf parameter don't show any
> difference in the values listed on the stats page.
> 
> Someone cares to provide an explanation?
> 
> Thanks
> 
> On Wednesday 09 March 2011 22:21:19 Markus Jelsma wrote:
> > Hi,
> > 
> > In one of the environments i'm working on (4 Solr 1.4.1. nodes with
> > replication, 3+ million docs, ~5.5GB index size, high commit rate
> > (~1-2min), high query rate (~50q/s), high number of updates
> > (~1000docs/commit)) the nodes continuously run out of memory.
> > 
> > During development we frequently ran excessive stress tests and after
> > tuning JVM and Solr settings all ran fine. A while ago i added the DisMax
> > bq parameter for boosting recent documents, documents older than a day
> > receive 50% less boost, similar to the example but with a much steeper
> > slope. For clarity, i'm not using the ordinal function but the reciprocal
> > version in the bq parameter which is warned against when using Solr 1.4.1
> > according to the wiki.
> > 
> > This week we started the stress tests and nodes are going down again.
> > I've reconfigured the nodes to have different settings for the bq
> > parameter (or no bq parameter).
> > 
> > It seems the bq the cause of the misery.
> > 
> > Issue SOLR- keeps popping up but it has not been resolved. Is there
> > anyone who can confirm one of those patches fixes this issue before i
> > waste hours of work finding out it doesn't? ;)
> > 
> > Am i correct when i assume that Lucene FieldCache entries are added for
> > each unique function query?  In that case, every query is a unique cache
> > entry because it operates on milliseconds. If all doesn't work i might be
> > able to reduce precision by operating on minutes or even more instead of
> > milli seconds. I, however, cannot use other nice math function in the
> > ms() parameter so that might make things difficult.
> > 
> > However, date math seems available (NOW/HOUR) so i assume it would also
> > work for /HOUR as well. This way i just might prevent
> > useless entries.
> > 
> > My apologies for this long mail but it may prove useful for other users
> > and hopefully we find the solution and can update the wiki to add this
> > warning.
> > 
> > Cheers,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: FunctionQueries and FieldCache and OOM

2011-03-10 Thread Markus Jelsma
Well, it's quite hard to debug because the values listed on the stats page in 
the fieldCache section don't make much sense. Reducing precision with 
NOW/HOUR, however, does seem to make a difference.

It is hard (or impossible) to reproduce this is a test setup with the same 
index but without continues updates and without stress tests. Firing manual 
queries with different values for the bf parameter don't show any difference 
in the values listed on the stats page.

Someone cares to provide an explanation?

Thanks

On Wednesday 09 March 2011 22:21:19 Markus Jelsma wrote:
> Hi,
> 
> In one of the environments i'm working on (4 Solr 1.4.1. nodes with
> replication, 3+ million docs, ~5.5GB index size, high commit rate
> (~1-2min), high query rate (~50q/s), high number of updates
> (~1000docs/commit)) the nodes continuously run out of memory.
> 
> During development we frequently ran excessive stress tests and after
> tuning JVM and Solr settings all ran fine. A while ago i added the DisMax
> bq parameter for boosting recent documents, documents older than a day
> receive 50% less boost, similar to the example but with a much steeper
> slope. For clarity, i'm not using the ordinal function but the reciprocal
> version in the bq parameter which is warned against when using Solr 1.4.1
> according to the wiki.
> 
> This week we started the stress tests and nodes are going down again. I've
> reconfigured the nodes to have different settings for the bq parameter (or
> no bq parameter).
> 
> It seems the bq the cause of the misery.
> 
> Issue SOLR- keeps popping up but it has not been resolved. Is there
> anyone who can confirm one of those patches fixes this issue before i
> waste hours of work finding out it doesn't? ;)
> 
> Am i correct when i assume that Lucene FieldCache entries are added for
> each unique function query?  In that case, every query is a unique cache
> entry because it operates on milliseconds. If all doesn't work i might be
> able to reduce precision by operating on minutes or even more instead of
> milli seconds. I, however, cannot use other nice math function in the ms()
> parameter so that might make things difficult.
> 
> However, date math seems available (NOW/HOUR) so i assume it would also
> work for /HOUR as well. This way i just might prevent
> useless entries.
> 
> My apologies for this long mail but it may prove useful for other users and
> hopefully we find the solution and can update the wiki to add this warning.
> 
> Cheers,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


FunctionQueries and FieldCache and OOM

2011-03-09 Thread Markus Jelsma
Hi,

In one of the environments i'm working on (4 Solr 1.4.1. nodes with 
replication, 3+ million docs, ~5.5GB index size, high commit rate (~1-2min), 
high query rate (~50q/s), high number of updates (~1000docs/commit)) the nodes 
continuously run out of memory.

During development we frequently ran excessive stress tests and after tuning 
JVM and Solr settings all ran fine. A while ago i added the DisMax bq parameter 
for boosting recent documents, documents older than a day receive 50% less 
boost, similar to the example but with a much steeper slope. For clarity, i'm 
not using the ordinal function but the reciprocal version in the bq parameter 
which is warned against when using Solr 1.4.1 according to the wiki.

This week we started the stress tests and nodes are going down again. I've 
reconfigured the nodes to have different settings for the bq parameter (or no 
bq 
parameter).

It seems the bq the cause of the misery.

Issue SOLR- keeps popping up but it has not been resolved. Is there anyone 
who can confirm one of those patches fixes this issue before i waste hours of 
work finding out it doesn't? ;)

Am i correct when i assume that Lucene FieldCache entries are added for each 
unique function query?  In that case, every query is a unique cache entry 
because it operates on milliseconds. If all doesn't work i might be able to 
reduce precision by operating on minutes or even more instead of milli 
seconds. I, however, cannot use other nice math function in the ms() parameter 
so that might make things difficult.

However, date math seems available (NOW/HOUR) so i assume it would also work 
for /HOUR as well. This way i just might prevent useless 
entries.

My apologies for this long mail but it may prove useful for other users and 
hopefully we find the solution and can update the wiki to add this warning.

Cheers,