RE: Machine utilization while indexing

Nagelberg, Kallin Thu, 20 May 2010 08:36:40 -0700

You're sure it's not blocking on indexing IO? If not then I guess it must be a 
thread waiting unnecessarily in solr or your loading program. To get my loader 
running at full speed I hooked it up to jprofiler's thread views to see where 
the stalls were and optimized from there.


-Kallin Nagelberg

-----Original Message-----
From: Thijs [mailto:vonk.th...@gmail.com] 
Sent: Thursday, May 20, 2010 11:25 AM
To: solr-user@lucene.apache.org
Subject: Re: Machine utilization while indexing

I already have a blockingqueue in place (that's my custom queue) and 
luckily I'm indexing faster then what your doing.Currently it takes 
about 2hour to index the 5m documents I'm talking about. But I still 
feel as if my machine is under utilized.

Thijs


On 20-5-2010 17:16, Nagelberg, Kallin wrote:
> How about throwing a blockingqueue, 
> http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/BlockingQueue.html,
>  between your document-creator and solrserver? Give it a size of 10,000 or 
> something, with one thread trying to feed it, and one thread waiting for it 
> to get near full then draining it. Take the drained results and add them to 
> the server (maybe try not using streamingsolrserver). Something like that 
> worked well for me with about 5,000,000 documents each ~5k taking about 8 
> hours.
>
> -Kallin Nagelberg
>
> -----Original Message-----
> From: Thijs [mailto:vonk.th...@gmail.com]
> Sent: Thursday, May 20, 2010 11:02 AM
> To: solr-user@lucene.apache.org
> Subject: Machine utilization while indexing
>
> Hi.
>
> I have a question about how I can get solr to index quicker then it does
> at the moment.
>
> I have to index (and re-index) some 3-5 million documents. These
> documents are preprocessed by a java application that effectively
> combines multiple database tables with each-other to form the
> SolrInputDocument.
>
> What I'm seeing however is that the queue of documents that are ready to
> be send to the solr server exceeds my preset limit. Telling me that Solr
> somehow can't process the documents fast enough.
>
> (I have created my own queue in front of Solrj.StreamingUpdateSolrServer
> as it would not process the documents fast enough causing
> OutOfMemoryExceptions due to the large amount of documents building up
> in it's queue)
>
> I have an index that for 95% consist of ID's (Long). We don't do any
> analysis on the fields that are being indexed. The schema is rather
> straight forward.
>
> most fields look like
> <fieldType name="long" class="solr.LongField" omitNorms="true"/>
> <field name="objectId" type="long" stored="true" indexed="true"
> required="true" />
> <field name="listId" type="long" stored="false" indexed="true"
> multiValued="true"/>
>
> the relevant solrconfig.xml
> <indexDefaults>
>       <useCompoundFile>false</useCompoundFile>
>       <mergeFactor>100</mergeFactor>
>       <RAMBufferSizeMB>256</RAMBufferSizeMB>
>       <maxMergeDocs>2147483647</maxMergeDocs>
>       <maxFieldLength>10000</maxFieldLength>
>       <writeLockTimeout>1000</writeLockTimeout>
>       <commitLockTimeout>10000</commitLockTimeout>
>       <lockType>single</lockType>
> </indexDefaults>
>
>
> The machines I'm testing on have a:
> Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz
> With 4GB of ram.
> Running on linux java version 1.6.0_17, tomcat 6 and solr version 1.4
>
> What I'm seeing is that the network almost never reaches more then 10%
> of the 1GB/s connection.
> That the CPU utilization is always below 25% (1 core is used, not the
> others)
> I don't see heavy disk-io.
> Also while indexing the memory consumption is:
> Free memory: 212.15 MB Total memory: 509.12 MB Max memory: 2730.68 MB
>
> And that in the beginning (with a empty index) I get 2ms per insert but
> this slows to 18-19ms per insert.
>
> Are there any tips/tricks I can use to speed up my indexing? Because I
> have a feeling that my machine is capable of doing more (use more
> cpu's). I just can't figure-out how.
>
> Thijs

RE: Machine utilization while indexing

Reply via email to