You're sure it's not blocking on indexing IO? If not then I guess it must be a thread waiting unnecessarily in solr or your loading program. To get my loader running at full speed I hooked it up to jprofiler's thread views to see where the stalls were and optimized from there.
-Kallin Nagelberg -----Original Message----- From: Thijs [mailto:vonk.th...@gmail.com] Sent: Thursday, May 20, 2010 11:25 AM To: solr-user@lucene.apache.org Subject: Re: Machine utilization while indexing I already have a blockingqueue in place (that's my custom queue) and luckily I'm indexing faster then what your doing.Currently it takes about 2hour to index the 5m documents I'm talking about. But I still feel as if my machine is under utilized. Thijs On 20-5-2010 17:16, Nagelberg, Kallin wrote: > How about throwing a blockingqueue, > http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/BlockingQueue.html, > between your document-creator and solrserver? Give it a size of 10,000 or > something, with one thread trying to feed it, and one thread waiting for it > to get near full then draining it. Take the drained results and add them to > the server (maybe try not using streamingsolrserver). Something like that > worked well for me with about 5,000,000 documents each ~5k taking about 8 > hours. > > -Kallin Nagelberg > > -----Original Message----- > From: Thijs [mailto:vonk.th...@gmail.com] > Sent: Thursday, May 20, 2010 11:02 AM > To: solr-user@lucene.apache.org > Subject: Machine utilization while indexing > > Hi. > > I have a question about how I can get solr to index quicker then it does > at the moment. > > I have to index (and re-index) some 3-5 million documents. These > documents are preprocessed by a java application that effectively > combines multiple database tables with each-other to form the > SolrInputDocument. > > What I'm seeing however is that the queue of documents that are ready to > be send to the solr server exceeds my preset limit. Telling me that Solr > somehow can't process the documents fast enough. > > (I have created my own queue in front of Solrj.StreamingUpdateSolrServer > as it would not process the documents fast enough causing > OutOfMemoryExceptions due to the large amount of documents building up > in it's queue) > > I have an index that for 95% consist of ID's (Long). We don't do any > analysis on the fields that are being indexed. The schema is rather > straight forward. > > most fields look like > <fieldType name="long" class="solr.LongField" omitNorms="true"/> > <field name="objectId" type="long" stored="true" indexed="true" > required="true" /> > <field name="listId" type="long" stored="false" indexed="true" > multiValued="true"/> > > the relevant solrconfig.xml > <indexDefaults> > <useCompoundFile>false</useCompoundFile> > <mergeFactor>100</mergeFactor> > <RAMBufferSizeMB>256</RAMBufferSizeMB> > <maxMergeDocs>2147483647</maxMergeDocs> > <maxFieldLength>10000</maxFieldLength> > <writeLockTimeout>1000</writeLockTimeout> > <commitLockTimeout>10000</commitLockTimeout> > <lockType>single</lockType> > </indexDefaults> > > > The machines I'm testing on have a: > Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz > With 4GB of ram. > Running on linux java version 1.6.0_17, tomcat 6 and solr version 1.4 > > What I'm seeing is that the network almost never reaches more then 10% > of the 1GB/s connection. > That the CPU utilization is always below 25% (1 core is used, not the > others) > I don't see heavy disk-io. > Also while indexing the memory consumption is: > Free memory: 212.15 MB Total memory: 509.12 MB Max memory: 2730.68 MB > > And that in the beginning (with a empty index) I get 2ms per insert but > this slows to 18-19ms per insert. > > Are there any tips/tricks I can use to speed up my indexing? Because I > have a feeling that my machine is capable of doing more (use more > cpu's). I just can't figure-out how. > > Thijs