Re: Sending a document to IndexWriter field by field

2014-02-20 Thread Michael McCandless
Yes, all postings for the entire doc are held in RAM data structures ... you could make your own indexing chain to somehow change this behavior, but I don't think that's an easy task. Mike McCandless http://blog.mikemccandless.com On Thu, Feb 20, 2014 at 4:02 PM, Igor Shalyminov wrote: > Mike,

Re: Sending a document to IndexWriter field by field

2014-02-20 Thread Igor Shalyminov
Mike, thank you! So eventually this amount of data must stay entirely in RAM (as postings) before flushing to disk? Can it be hacked?) The documents themselves (that I will deliver to user) are of a regular size, but features that I generate grow combinatorially in size and blow the index up i

Re: Sending a document to IndexWriter field by field

2014-02-20 Thread Michael McCandless
Yes, in 4.x IndexWriter now takes an Iterable that enumerates the fields one at a time. You can also pass a Reader to a Field. That said, there will still be massive RAM required by IW to hold the inverted postings for that one document, likely much more RAM than the original document's String co