On Mon, Nov 24, 2008 at 8:48 PM, souravm <[EMAIL PROTECTED]> wrote:
> I have around 200M documents in index. The field I'm sorting on is a date 
> string (containing date and time in dd-mmm-yyyy  hh:mm:yy format) and the 
> field is part of the search criteria.
>
> Also please note that the number of documents returned by the search criteria 
> is much less than 200M. In fact even in case of 0 hit I found jvm out of 
> memory exception.

Right... that's just how the Lucene FieldCache used for sorting works right now.
The entire field is un-inverted and held in memory.

200M docs is a *lot*... you might try indexing your date fields as
integer types that would take only 4 bytes per doc - and that will
still take up 800M.  Given that 2 searchers can overlap, that still
adds up to more than your heap - you will need to up that.

The other option is to split your index across multiple nodes and use
distributed search.  If you want to do any faceting in the future, or
sort on multiple fields, you will need to do this anyway.

-Yonik

Reply via email to