Re: Updating and Appending
On Jan 23, 2008 1:29 PM, Chris Harris [EMAIL PROTECTED] wrote: And then if you're using a client such as solrsharp, there's the question of whether *it* will slurp the whole stream into memory. Solrsharp reads of the XML stream from Solr use standard dotnet framework XML objects, which by default read the entirety of the stream into memory before returning control back to your code. There are facilities in the dotnet framework which provide for reading XML data in chunks vs. the full stream, but solrsharp at present uses the defaults of the framework. -- jeff
Re: Updating and Appending
On Jan 22, 2008 4:10 PM, Owens, Martin [EMAIL PROTECTED] wrote: We've got some memory constraint worries from using Java RMI, although I can see this problem could effect the xml requests too. The Java code doesn't seem to handle large files as streams. It depends on what component we are talking about. The CSV loader does handle things as a stream. The XML update handler should also handle things as a stream (single documents will be loaded into memory at a time). If you are talking about a single very large document, you are right... there is no way to stream this currently since the XML (and CSV) parsers can't give us Readers to various fields. We perhaps could in the future provide a field type that pulled it's actual value from a URL. -Yonik
Re: Updating and Appending
On Jan 23, 2008 9:04 AM, Yonik Seeley [EMAIL PROTECTED] wrote: On Jan 22, 2008 4:10 PM, Owens, Martin [EMAIL PROTECTED] wrote: We've got some memory constraint worries from using Java RMI, although I can see this problem could effect the xml requests too. The Java code doesn't seem to handle large files as streams. [...] If you are talking about a single very large document, you are right... there is no way to stream this currently since the XML (and CSV) parsers can't give us Readers to various fields. We perhaps could in the future provide a field type that pulled it's actual value from a URL. -Yonik Supposing you could do this -- i.e. that you could get Solr to pass a particular field's data to Lucene without reading it all into memory first --, are there any potential problems on the Lucene end? It's not going to turn around and slurp the whole field into member itself, is it? That was the indexing side. You also have the searching side, in particular when you need to retrieve the value of a huge stored field. It looks like Lucene will give you a stored field's value as a stream (a Java Reader), but that won't do any good if, behind the scenes, it brings the whole field into memory first. Then there's the question of whether Solr needs to slurp that whole stream into memory before outputting that field's contents as XML. (I doubt it does, but I haven't looked at any of the code recently.) And then if you're using a client such as solrsharp, there's the question of whether *it* will slurp the whole stream into memory. Maybe this is something to take up on JIRA or solr-dev, rather than here. I was just trying to get a sense of how difficult the proposed feature would be.
Re: Updating and Appending
On Jan 23, 2008 4:29 PM, Chris Harris [EMAIL PROTECTED] wrote: Supposing you could do this -- i.e. that you could get Solr to pass a particular field's data to Lucene without reading it all into memory first --, are there any potential problems on the Lucene end? It's not going to turn around and slurp the whole field into member itself, is it? Well, yes and no reader based fields are for indexed fields only (and lucene won't read that all into memory before indexing), but the lucene index structures will still need to be created in memory. So using a Reader would save memory, but there would still be more memory use the bigger the document was. That was the indexing side. You also have the searching side, in particular when you need to retrieve the value of a huge stored field. Reader based fields can't be stored (and if that restriction were lifted, it wouldn't help memory anyway unless you wanted to buffer to disk). It looks like Lucene will give you a stored field's value as a stream (a Java Reader), but that won't do any good if, behind the scenes, it brings the whole field into memory first. Then there's the question of whether Solr needs to slurp that whole stream into memory before outputting that field's contents as XML. (I doubt it does, but I haven't looked at any of the code recently.) And then if you're using a client such as solrsharp, there's the question of whether *it* will slurp the whole stream into memory. Lucene doesn't provide a Reader interface to stored fields, but it should be possible. Maybe this is something to take up on JIRA or solr-dev, rather than here. I was just trying to get a sense of how difficult the proposed feature would be. One should consider storing really huge fields outside of Solr / Lucene. -Yonik
Updating and Appending
Hello, We've got some memory constraint worries from using Java RMI, although I can see this problem could effect the xml requests too. The Java code doesn't seem to handle large files as streams. Now we're thinking that there are two possible solutions, either the exists or we create a file path plugin which tells the server to load the contents from a file as a buffer, but this runs into the risk that Solr isn't built to deal with buffers and would simply eat up all the ram trying to load the file as a full string. Or there exists or we create some kind of update method which appends the contents to a field data and runs all the indexing and filters applicable. But I want to know what you guys think is the best solution for the problem. BEst Regards, Martin Owens