RE: frequency of commit when building index from scratch

2009-08-25 Thread Fuad Efendi

But again, why someone has OOM??? I never had...

What I discovered is: committing millions docs (in SOLR-1.4) may take
several days (although adding docs takes a day) if you have somehow
_many_segments_ and bad I/O with <= 2 CPUs; I am using heavy ramBufferSizeMB
instead of heavy mergeFactor, and quad cores...


Yes, I am using SolrJ with binary format. 20 minutes to commit millions of
docs (including overwrites of existing ones with same uniqueId); I usually
have 2 segments (>10 Gb each)
-Fuad
http://www.casaGURU.com
=


If you're using SolrJ, it's due to improvements there too:
1) binary format by default - no XML parsing
2) not used by default, but try using StreamingUpdateSolrServer

-Yonik
http://www.lucidimagination.com


> Bill in most cases you probably cannot do one large commit as you will 
> hit OOM. How many documents can be uncommitted is based on the size of 
> the documents. Committing every document is slow. I have done a commit 
> every 10,000 mostly. Results may vary. Someone might have a better 
> answer then me.





Re: frequency of commit when building index from scratch

2009-08-25 Thread Yonik Seeley
On Tue, Aug 25, 2009 at 8:37 PM, Lance Norskog wrote:
> The latest Solr 1.4 can index 200k records in several minutes, then commit
> in a few seconds. I don't know but I'm guessing it is due to Lucene
> improvements. It does not use much memory doing this.

If you're using SolrJ, it's due to improvements there too:
1) binary format by default - no XML parsing
2) not used by default, but try using StreamingUpdateSolrServer

-Yonik
http://www.lucidimagination.com


Re: frequency of commit when building index from scratch

2009-08-25 Thread Lance Norskog
The latest Solr 1.4 can index 200k records in several minutes, then commit
in a few seconds. I don't know but I'm guessing it is due to Lucene
improvements. It does not use much memory doing this.

Lance

On Tue, Aug 25, 2009 at 2:43 PM, Fuad Efendi  wrote:

> I do commit once a day, millions of small docs... it takes 20 minutes in
> average... why OOM? I see only reduced I/O...
>
>
> -Original Message-
> From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
> Sent: August-25-09 5:35 PM
> To: solr-user@lucene.apache.org
> Subject: Re: frequency of commit when building index from scratch
>
> On Tue, Aug 25, 2009 at 5:29 PM, Bill Au wrote:
> > Just curious, how often do folks commit when building their Solr/Lucene
> > index from scratch for index with millions of documents?  Should I just
> wait
> > and do a single commit at the end after adding all the documents to the
> > index?
> >
> > Bill
> >
>
> Bill in most cases you probably cannot do one large commit as you will
> hit OOM. How many documents can be uncommitted is based on the size of
> the documents. Committing every document is slow. I have done a commit
> every 10,000 mostly. Results may vary. Someone might have a better
> answer then me.
>
>
>


-- 
Lance Norskog
goks...@gmail.com


RE: frequency of commit when building index from scratch

2009-08-25 Thread Fuad Efendi
I do commit once a day, millions of small docs... it takes 20 minutes in
average... why OOM? I see only reduced I/O...


-Original Message-
From: Edward Capriolo [mailto:edlinuxg...@gmail.com] 
Sent: August-25-09 5:35 PM
To: solr-user@lucene.apache.org
Subject: Re: frequency of commit when building index from scratch

On Tue, Aug 25, 2009 at 5:29 PM, Bill Au wrote:
> Just curious, how often do folks commit when building their Solr/Lucene
> index from scratch for index with millions of documents?  Should I just
wait
> and do a single commit at the end after adding all the documents to the
> index?
>
> Bill
>

Bill in most cases you probably cannot do one large commit as you will
hit OOM. How many documents can be uncommitted is based on the size of
the documents. Committing every document is slow. I have done a commit
every 10,000 mostly. Results may vary. Someone might have a better
answer then me.




Re: frequency of commit when building index from scratch

2009-08-25 Thread Bill Au
That's my gut feeling (start big and go lower if OOM occurs) too.

Bill

On Tue, Aug 25, 2009 at 5:34 PM, Edward Capriolo wrote:

> On Tue, Aug 25, 2009 at 5:29 PM, Bill Au wrote:
> > Just curious, how often do folks commit when building their Solr/Lucene
> > index from scratch for index with millions of documents?  Should I just
> wait
> > and do a single commit at the end after adding all the documents to the
> > index?
> >
> > Bill
> >
>
> Bill in most cases you probably cannot do one large commit as you will
> hit OOM. How many documents can be uncommitted is based on the size of
> the documents. Committing every document is slow. I have done a commit
> every 10,000 mostly. Results may vary. Someone might have a better
> answer then me.
>


Re: frequency of commit when building index from scratch

2009-08-25 Thread Edward Capriolo
On Tue, Aug 25, 2009 at 5:29 PM, Bill Au wrote:
> Just curious, how often do folks commit when building their Solr/Lucene
> index from scratch for index with millions of documents?  Should I just wait
> and do a single commit at the end after adding all the documents to the
> index?
>
> Bill
>

Bill in most cases you probably cannot do one large commit as you will
hit OOM. How many documents can be uncommitted is based on the size of
the documents. Committing every document is slow. I have done a commit
every 10,000 mostly. Results may vary. Someone might have a better
answer then me.


frequency of commit when building index from scratch

2009-08-25 Thread Bill Au
Just curious, how often do folks commit when building their Solr/Lucene
index from scratch for index with millions of documents?  Should I just wait
and do a single commit at the end after adding all the documents to the
index?

Bill