Re: Incremantally updating a VERY LARGE field - Is this possibe ?

Ravish Bhagdev Wed, 04 Apr 2012 06:25:27 -0700

Yes, I think there are good reasons why it works like that.  Focus of
search system is to be efficient on query side at cost of being not that
efficient on storage.


You must however also note that by default a field's length is limited to
10000 words in solrconf.xml which you may also need to modify.  But I guess
if its going out of memory you might have already done this?

Ravish

On Wed, Apr 4, 2012 at 1:34 PM, Mikhail Khludnev <mkhlud...@griddynamics.com
> wrote:

> There is https://issues.apache.org/jira/browse/LUCENE-3837 but I suppose
> it's too far from completion.
>
> On Wed, Apr 4, 2012 at 2:48 PM, Ravish Bhagdev <ravish.bhag...@gmail.com
> >wrote:
>
> > Updating a single field is not possible in solr.  The whole record has to
> > be rewritten.
> >
> > 300 MB is still not that big a file.  Have you tried doing the indexing
> (if
> > its only a one time thing) by giving it ~2 GB or xmx?
> >
> > A single file with that size is strange!  May I ask what is it?
> >
> > Rav
> >
> > On Tue, Apr 3, 2012 at 7:32 PM, vybe3142 <vybe3...@gmail.com> wrote:
> >
> > >
> > > Some days ago, I posted about an issue with SOLR running out of memory
> > when
> > > attempting to index large text files (say 300 MB ). Details at
> > >
> > >
> >
> http://lucene.472066.n3.nabble.com/Solr-Tika-crashing-when-attempting-to-index-large-files-td3846939.html
> > >
> > > Two things I need to point out:
> > >
> > > 1. I don't need Tika for content extraction as the files are already in
> > > plain text format.
> > > 2. The heap space error was caused by a futile Tika/SOLR attempt at
> > > creating
> > > the corresponding huge XML document in memory
> > >
> > > I've decided to develop a custom handler that
> > > 1. reads the file text directly
> > > 2. attempts to create a SOLR document and directly add the text data to
> > the
> > > corresponding field.
> > >
> > > One approach I've taken is to read manageable chunks of text data
> > > sequentially from the file and process. We've used this approach
> > > sucessfully
> > > with Lucene in the past and I'm attempting to make it work with SOLR
> > too. I
> > > got most of the work done yesterday, but need a bit of guidance w.r.t.
> > > point
> > > 2.
> > >
> > > How can I achieve updating the same field multiple times. Looking at
> the
> > > SOLR source, processor.addField() merely
> > > a. adds to the in-memory field map and
> > > b. attempts to write EVERYTHING to the index later on.
> > >
> > > In my situation, (a) eventually causes a heap space error.
> > >
> > >
> > >
> > >
> > > Here's part of the handler code.
> > >
> > >
> > >
> > > Thanks much
> > >
> > > Thanks
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://lucene.472066.n3.nabble.com/Incremantally-updating-a-VERY-LARGE-field-Is-this-possibe-tp3881945p3881945.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> ge...@yandex.ru
>
> <http://www.griddynamics.com>
>  <mkhlud...@griddynamics.com>
>

Re: Incremantally updating a VERY LARGE field - Is this possibe ?

Reply via email to