On Jan 23, 2008 9:04 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On Jan 22, 2008 4:10 PM, Owens, Martin <[EMAIL PROTECTED]> wrote: > > We've got some memory constraint worries from using Java RMI, although I > > can see this problem could effect the xml requests too. The Java code > > doesn't seem to handle large files as streams. > > [...] > > If you are talking about a single very large document, you are > right... there is no way to stream this currently since the XML (and > CSV) parsers can't give us Readers to various fields. We perhaps > could in the future provide a field type that pulled it's actual value > from a URL. > > -Yonik
Supposing you could do this -- i.e. that you could get Solr to pass a particular field's data to Lucene without reading it all into memory first --, are there any potential problems on the Lucene end? It's not going to turn around and slurp the whole field into member itself, is it? That was the indexing side. You also have the searching side, in particular when you need to retrieve the value of a huge stored field. It looks like Lucene will give you a stored field's value as a stream (a Java Reader), but that won't do any good if, behind the scenes, it brings the whole field into memory first. Then there's the question of whether Solr needs to slurp that whole stream into memory before outputting that field's contents as XML. (I doubt it does, but I haven't looked at any of the code recently.) And then if you're using a client such as solrsharp, there's the question of whether *it* will slurp the whole stream into memory. Maybe this is something to take up on JIRA or solr-dev, rather than here. I was just trying to get a sense of how difficult the proposed feature would be.