On Fri, Aug 01, 2008 at 03:36:13PM -0400, Ian Connor wrote: > I have a number of documents in files > > 1.xml <add><doc><fields....></doc></add> > 2.xml <add><doc><fields....></doc></add> > ... > 17M.xml <add><doc><fields....></doc></add> > > I have been using cat to join them all together: > > cat 1.xml 2.xml ... 1000.xml | grep -v '<\/add><add>' > /tmp/post.xml > > and posting them with curl: > > curl -d @/tmp/post.xml 'http://localhost:8983/solr/update' -H > 'Content-Type: text/xml' > > Is there a faster way to load up these documents into a number of solr > shards? I seem to be able to cover 3000/second just catting them > together (2500 at a time is the sweet spot for me) - but this slows > down to under 100/s once I try to do the post with curl.
If the xml files are available locally on the machine where the solr instances lie you can instead tell solr to load the file from disk instead of transmitting the file over http. You have to set enableRemoteStreaming="true" in the solrconfig.xml and then your curl request would I think be: curl -d stream.file=/tmp/post.xml http://localhost:8983/solr/update A similar approach works prety well for me. enjoy, -jeremy -- ======================================================================== Jeremy Hinegardner [EMAIL PROTECTED]