Re: Updating and Appending

2008-01-24 Thread Jeff Rodenburg
On Jan 23, 2008 1:29 PM, Chris Harris [EMAIL PROTECTED] wrote:


  And then if you're using
 a client such as solrsharp, there's the question of whether *it* will
 slurp the whole stream into memory.


Solrsharp reads of the XML stream from Solr use standard dotnet framework
XML objects, which by default read the entirety of the stream into memory
before returning control back to your code.  There are facilities in the
dotnet framework which provide for reading XML data in chunks vs. the full
stream, but solrsharp at present uses the defaults of the framework.

-- jeff


Re: Updating and Appending

2008-01-23 Thread Yonik Seeley
On Jan 22, 2008 4:10 PM, Owens, Martin [EMAIL PROTECTED] wrote:
 We've got some memory constraint worries from using Java RMI, although I can 
 see this problem could effect the xml requests too. The Java code doesn't 
 seem to handle large files as streams.

It depends on what component we are talking about.
The CSV loader does handle things as a stream.
The XML update handler should also handle things as a stream (single
documents will be loaded into memory at a time).

If you are talking about a single very large document, you are
right... there is no way to stream this currently since the XML (and
CSV) parsers can't give us Readers to various fields.  We perhaps
could in the future provide a field type that pulled it's actual value
from a URL.

-Yonik


Re: Updating and Appending

2008-01-23 Thread Chris Harris
On Jan 23, 2008 9:04 AM, Yonik Seeley [EMAIL PROTECTED] wrote:
 On Jan 22, 2008 4:10 PM, Owens, Martin [EMAIL PROTECTED] wrote:
  We've got some memory constraint worries from using Java RMI, although I 
  can see this problem could effect the xml requests too. The Java code 
  doesn't seem to handle large files as streams.

 [...]

 If you are talking about a single very large document, you are
 right... there is no way to stream this currently since the XML (and
 CSV) parsers can't give us Readers to various fields.  We perhaps
 could in the future provide a field type that pulled it's actual value
 from a URL.

 -Yonik

Supposing you could do this -- i.e. that you could get Solr to pass a
particular field's data to Lucene without reading it all into memory
first --, are there any potential problems on the Lucene end? It's not
going to turn around and slurp the whole field into member itself, is
it?

That was the indexing side. You also have the searching side, in
particular when you need to retrieve the value of a huge stored field.
It looks like Lucene will give you a stored field's value as a stream
(a Java Reader), but that won't do any good if, behind the scenes, it
brings the whole field into memory first. Then there's the question of
whether Solr needs to slurp that whole stream into memory before
outputting that field's contents as XML. (I doubt it does, but I
haven't looked at any of the code recently.) And then if you're using
a client such as solrsharp, there's the question of whether *it* will
slurp the whole stream into memory.

Maybe this is something to take up on JIRA or solr-dev, rather than
here. I was just trying to get a sense of how difficult the proposed
feature would be.


Re: Updating and Appending

2008-01-23 Thread Yonik Seeley
On Jan 23, 2008 4:29 PM, Chris Harris [EMAIL PROTECTED] wrote:
 Supposing you could do this -- i.e. that you could get Solr to pass a
 particular field's data to Lucene without reading it all into memory
 first --, are there any potential problems on the Lucene end? It's not
 going to turn around and slurp the whole field into member itself, is
 it?

Well, yes and no reader based fields are for indexed fields only
(and lucene won't read that all into memory before indexing), but the
lucene index structures will still need to be created in memory.  So
using a Reader would save memory, but there would still be more memory
use  the bigger the document was.

 That was the indexing side. You also have the searching side, in
 particular when you need to retrieve the value of a huge stored field.

Reader based fields can't be stored (and if that restriction were
lifted, it wouldn't help memory anyway unless you wanted to buffer to
disk).

 It looks like Lucene will give you a stored field's value as a stream
 (a Java Reader), but that won't do any good if, behind the scenes, it
 brings the whole field into memory first. Then there's the question of
 whether Solr needs to slurp that whole stream into memory before
 outputting that field's contents as XML. (I doubt it does, but I
 haven't looked at any of the code recently.) And then if you're using
 a client such as solrsharp, there's the question of whether *it* will
 slurp the whole stream into memory.

Lucene doesn't provide a Reader interface to stored fields, but it
should be possible.

 Maybe this is something to take up on JIRA or solr-dev, rather than
 here. I was just trying to get a sense of how difficult the proposed
 feature would be.

One should consider storing really huge fields outside of Solr / Lucene.

-Yonik


Updating and Appending

2008-01-22 Thread Owens, Martin
Hello,

We've got some memory constraint worries from using Java RMI, although I can 
see this problem could effect the xml requests too. The Java code doesn't seem 
to handle large files as streams. Now we're thinking that there are two 
possible solutions, either the exists or we create a file path plugin which 
tells the server to load the contents from a file as a buffer, but this runs 
into the risk that Solr isn't built to deal with buffers and would simply eat 
up all the ram trying to load the file as a full string. Or there exists or we 
create some kind of update method which appends the contents to a field data 
and runs all the indexing and filters applicable.

But I want to know what you guys think is the best solution for the problem.

BEst Regards, Martin Owens