RE: Out of memory on sorting

2011-05-26 Thread pravesh
For saving Memory:

1. allocate as much memory to the JVM (especially if you are using 64bit OS)
2. You can set omitNorms=true for your date  id fields (actually for all
fields where index-time boosting  length normalization isn't required. This
will require a full reindex)
3. Are you sorting on all document available in index. Try to limit it using
filter queries.
4. Avoid match all docs query like, q=*:*  (if you are using this)
5. If you could do away with sorting on ID field, and sort on field with
lesser unique terms


Hope this helps

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Out-of-memory-on-sorting-tp2960578p2988336.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Out of memory on sorting

2011-05-19 Thread rajini maski
Explicit Warming of Sort Fields

If you do a lot of field based sorting, it is advantageous to add explicitly
warming queries to the newSearcher and firstSearcher event listeners in
your solrconfig which sort on those fields, so the FieldCache is populated
prior to any queries being executed by your users.
firstSearcher
lst str name=qsolr rocks/strstr name=start0/strstr
name=rows10/strstr name=sortempID asc/str/lst



On Thu, May 19, 2011 at 2:39 PM, Rohit ro...@in-rev.com wrote:

 Hi,



 We are moving to a multi-core Solr installation with each of the core
 having
 millions of documents, also documents would be added to the index on an
 hourly basis.  Everything seems to run find and I getting the expected
 result and performance, except where sorting is concerned.



 I have an index size of 13217121 documents, now when I want to get
 documents
 between two dates and then sort them by ID  solr goes out of memory. This
 is
 with just me using the system, we might also have simultaneous users, how
 can I improve this performance?



 Rohit




RE: Out of memory on sorting

2011-05-19 Thread Rohit
Thanks for pointing me in the right direction, now I see the configuration
for firstsearcher or newsearcher, the str name=q needs to configured
previously. In my case the q is every changing, users can actually search
for anything and the possibilities of queries unlimited. 

How can I make this generic?

-Rohit



-Original Message-
From: rajini maski [mailto:rajinima...@gmail.com] 
Sent: 19 May 2011 14:53
To: solr-user@lucene.apache.org
Subject: Re: Out of memory on sorting

Explicit Warming of Sort Fields

If you do a lot of field based sorting, it is advantageous to add explicitly
warming queries to the newSearcher and firstSearcher event listeners in
your solrconfig which sort on those fields, so the FieldCache is populated
prior to any queries being executed by your users.
firstSearcher
lst str name=qsolr rocks/strstr name=start0/strstr
name=rows10/strstr name=sortempID asc/str/lst



On Thu, May 19, 2011 at 2:39 PM, Rohit ro...@in-rev.com wrote:

 Hi,



 We are moving to a multi-core Solr installation with each of the core
 having
 millions of documents, also documents would be added to the index on an
 hourly basis.  Everything seems to run find and I getting the expected
 result and performance, except where sorting is concerned.



 I have an index size of 13217121 documents, now when I want to get
 documents
 between two dates and then sort them by ID  solr goes out of memory. This
 is
 with just me using the system, we might also have simultaneous users, how
 can I improve this performance?



 Rohit





Re: Out of memory on sorting

2011-05-19 Thread Erick Erickson
The warming queries warm up the caches used in sorting. So
just including the sort=. will warm the sort caches. the terms
searched are not important. The same is true with facets...

However, I don't understand how that relates to your OOM problems. I'd
expect the OOM to start happening on startup, you'd be doing
the operation that runs you out of memory on startup...

So, we need more details:
1 how is your sort field defined? String? Integer? If it's a string
 and you could change it to a numeric type, you'd use a lot
 less memory.
2 How many distinct terms? I'm guessing one/document actually,
 this is somewhat of an anti-pattern in Solr for all it's sometimes
 necessary.
3 How much memory are you allocating for the JVM?
4 What other fields are you sorting on and how many unique values
 in each? Solr Admin can help you here

Best
Erick


On Thu, May 19, 2011 at 6:20 AM, Rohit ro...@in-rev.com wrote:
 Thanks for pointing me in the right direction, now I see the configuration
 for firstsearcher or newsearcher, the str name=q needs to configured
 previously. In my case the q is every changing, users can actually search
 for anything and the possibilities of queries unlimited.

 How can I make this generic?

 -Rohit



 -Original Message-
 From: rajini maski [mailto:rajinima...@gmail.com]
 Sent: 19 May 2011 14:53
 To: solr-user@lucene.apache.org
 Subject: Re: Out of memory on sorting

 Explicit Warming of Sort Fields

 If you do a lot of field based sorting, it is advantageous to add explicitly
 warming queries to the newSearcher and firstSearcher event listeners in
 your solrconfig which sort on those fields, so the FieldCache is populated
 prior to any queries being executed by your users.
 firstSearcher
 lst str name=qsolr rocks/strstr name=start0/strstr
 name=rows10/strstr name=sortempID asc/str/lst



 On Thu, May 19, 2011 at 2:39 PM, Rohit ro...@in-rev.com wrote:

 Hi,



 We are moving to a multi-core Solr installation with each of the core
 having
 millions of documents, also documents would be added to the index on an
 hourly basis.  Everything seems to run find and I getting the expected
 result and performance, except where sorting is concerned.



 I have an index size of 13217121 documents, now when I want to get
 documents
 between two dates and then sort them by ID  solr goes out of memory. This
 is
 with just me using the system, we might also have simultaneous users, how
 can I improve this performance?



 Rohit






RE: Out of memory on sorting

2011-05-19 Thread Rohit
Hi Erick,

My OOM problem starts when I query the core with 13217121 documents. My
schema and other details are given below,

1 how is your sort field defined? String? Integer? If it's a string and you
could change it to a numeric type, you'd use a lot less memory.

We primarily use two different sort criteria one is a date field and the
other is string (id). I cannot change the id field as this is also the
uniquekey for my schema. 

2 How many distinct terms? I'm guessing one/document actually,this is
somewhat of an anti-pattern in Solr for all it's sometimes necessary.

Since one of the field is a timestamp instance and the other a unique key
all are distinct. (These are tweets happening for keyword)

3 How much memory are you allocating for the JVM?

I am starting solr with the following command java -Xms1024M -Xmx-2048M
start.jar


All out test case for moving to solr has passed, this is proving to be a big
set back. Help would be greatly appreciated.

Regards,
Rohit



-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 19 May 2011 18:21
To: solr-user@lucene.apache.org
Subject: Re: Out of memory on sorting

The warming queries warm up the caches used in sorting. So
just including the sort=. will warm the sort caches. the terms
searched are not important. The same is true with facets...

However, I don't understand how that relates to your OOM problems. I'd
expect the OOM to start happening on startup, you'd be doing
the operation that runs you out of memory on startup...

So, we need more details:
1 how is your sort field defined? String? Integer? If it's a string
 and you could change it to a numeric type, you'd use a lot
 less memory.
2 How many distinct terms? I'm guessing one/document actually,
 this is somewhat of an anti-pattern in Solr for all it's sometimes
 necessary.
3 How much memory are you allocating for the JVM?
4 What other fields are you sorting on and how many unique values
 in each? Solr Admin can help you here

Best
Erick


On Thu, May 19, 2011 at 6:20 AM, Rohit ro...@in-rev.com wrote:
 Thanks for pointing me in the right direction, now I see the configuration
 for firstsearcher or newsearcher, the str name=q needs to configured
 previously. In my case the q is every changing, users can actually search
 for anything and the possibilities of queries unlimited.

 How can I make this generic?

 -Rohit



 -Original Message-
 From: rajini maski [mailto:rajinima...@gmail.com]
 Sent: 19 May 2011 14:53
 To: solr-user@lucene.apache.org
 Subject: Re: Out of memory on sorting

 Explicit Warming of Sort Fields

 If you do a lot of field based sorting, it is advantageous to add
explicitly
 warming queries to the newSearcher and firstSearcher event listeners
in
 your solrconfig which sort on those fields, so the FieldCache is populated
 prior to any queries being executed by your users.
 firstSearcher
 lst str name=qsolr rocks/strstr name=start0/strstr
 name=rows10/strstr name=sortempID asc/str/lst



 On Thu, May 19, 2011 at 2:39 PM, Rohit ro...@in-rev.com wrote:

 Hi,



 We are moving to a multi-core Solr installation with each of the core
 having
 millions of documents, also documents would be added to the index on an
 hourly basis.  Everything seems to run find and I getting the expected
 result and performance, except where sorting is concerned.



 I have an index size of 13217121 documents, now when I want to get
 documents
 between two dates and then sort them by ID  solr goes out of memory. This
 is
 with just me using the system, we might also have simultaneous users, how
 can I improve this performance?



 Rohit







Re: Out of memory on sorting

2011-05-19 Thread Erick Erickson
See below:

On Thu, May 19, 2011 at 9:06 AM, Rohit ro...@in-rev.com wrote:
 Hi Erick,

 My OOM problem starts when I query the core with 13217121 documents. My
 schema and other details are given below,

H, how many cores are you running and what are they doing? Because they
all use the same memory pool, so you may be getting some carry-over. So one
strategy would be just to move this core to a dedicated machine.


 1 how is your sort field defined? String? Integer? If it's a string and you
 could change it to a numeric type, you'd use a lot less memory.

 We primarily use two different sort criteria one is a date field and the
 other is string (id). I cannot change the id field as this is also the
 uniquekey for my schema.

OK, but can you use a separate field just for sorting? Populate it with
a copyField and sort on that rather than ID. This is only helpful if
you can make a compact representation, e.g. integer.


 2 How many distinct terms? I'm guessing one/document actually,this is
 somewhat of an anti-pattern in Solr for all it's sometimes necessary.

 Since one of the field is a timestamp instance and the other a unique key
 all are distinct. (These are tweets happening for keyword)


Not one, but two fields where all values are distinct. Although  I don't think
the timestamp is much of a problem, assuming you're storing it as one
of the numeric types (I'd especially make sure it was one of the Trie types,
specifically tdate if you're going to do range queries). There are tricks for
dealing with this, but your id field will get you a bigger bang for the buck,
concentrate on that first.

 3 How much memory are you allocating for the JVM?

 I am starting solr with the following command java -Xms1024M -Xmx-2048M
 start.jar


Well, you can bump this higher if you're on 64 bit OSs, The other possibility is
to shard your index. But really, with 13M documents this should fit on one
machine.

What does your statistics page tell you, especially about cache usage?




 All out test case for moving to solr has passed, this is proving to be a big
 set back. Help would be greatly appreciated.

 Regards,
 Rohit



 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: 19 May 2011 18:21
 To: solr-user@lucene.apache.org
 Subject: Re: Out of memory on sorting

 The warming queries warm up the caches used in sorting. So
 just including the sort=. will warm the sort caches. the terms
 searched are not important. The same is true with facets...

 However, I don't understand how that relates to your OOM problems. I'd
 expect the OOM to start happening on startup, you'd be doing
 the operation that runs you out of memory on startup...

 So, we need more details:
 1 how is your sort field defined? String? Integer? If it's a string
     and you could change it to a numeric type, you'd use a lot
     less memory.
 2 How many distinct terms? I'm guessing one/document actually,
     this is somewhat of an anti-pattern in Solr for all it's sometimes
     necessary.
 3 How much memory are you allocating for the JVM?
 4 What other fields are you sorting on and how many unique values
     in each? Solr Admin can help you here

 Best
 Erick


 On Thu, May 19, 2011 at 6:20 AM, Rohit ro...@in-rev.com wrote:
 Thanks for pointing me in the right direction, now I see the configuration
 for firstsearcher or newsearcher, the str name=q needs to configured
 previously. In my case the q is every changing, users can actually search
 for anything and the possibilities of queries unlimited.

 How can I make this generic?

 -Rohit



 -Original Message-
 From: rajini maski [mailto:rajinima...@gmail.com]
 Sent: 19 May 2011 14:53
 To: solr-user@lucene.apache.org
 Subject: Re: Out of memory on sorting

 Explicit Warming of Sort Fields

 If you do a lot of field based sorting, it is advantageous to add
 explicitly
 warming queries to the newSearcher and firstSearcher event listeners
 in
 your solrconfig which sort on those fields, so the FieldCache is populated
 prior to any queries being executed by your users.
 firstSearcher
 lst str name=qsolr rocks/strstr name=start0/strstr
 name=rows10/strstr name=sortempID asc/str/lst



 On Thu, May 19, 2011 at 2:39 PM, Rohit ro...@in-rev.com wrote:

 Hi,



 We are moving to a multi-core Solr installation with each of the core
 having
 millions of documents, also documents would be added to the index on an
 hourly basis.  Everything seems to run find and I getting the expected
 result and performance, except where sorting is concerned.



 I have an index size of 13217121 documents, now when I want to get
 documents
 between two dates and then sort them by ID  solr goes out of memory. This
 is
 with just me using the system, we might also have simultaneous users, how
 can I improve this performance?



 Rohit