Thanks Mike, I'll try that. So nog being cpu-bound, you would indeed think indexing here is IO-bound? (Maybe it generally is, I'm not sure. ) What's a good tool to profile IO on windows, anyone?
2008/4/7, Mike Klaas <[EMAIL PROTECTED]>: > > On 5-Apr-08, at 7:09 AM, Britske wrote: > > Indexing of these documents takes a long time. Because of the size of the > > documents (because of the indexed fields) I am currently batching 50 > > documents at once which takes about 2 seconds.Without adding the 10000 > > indexed fields to the document, indexing flies at about 15 ms for these > > 50 > > documents. INdexing is done using SolrJ > > > > This is on a intel core 2 6400 @2.13ghz and 2 gb ram. > > > > To speed this up I let 2 threads do the indexing in parallel. What > > happens > > is that solr just takes double the time (about 4 seconds) to complete > > these > > two jobs of 50 docs each in parallel. I figured because of the > > multi-core > > setup indexing should improve, which it doesn't. > > > > Multiple processors really only help indexing speeds when there is heavy > analysis. > > Does this perhaps indicate that the setup is IO-bound? What would be your > > best guess (given the fact that the schema has a big amount of indexed > > fields) to try next to improve indexing performance? > > > > Use Lucene 2.3 with solr 1.2, or simple try out solr trunk. The indexing > has been reworked to be considerably faster (it also makes better use of > multiple processors by spawing a background merging thread). > > -Mike >