Mike: One issue is that you're forcing all the work onto the Solr server, and single-threading to boot by using DIH. You can consider moving to a SolrJ model where you can have N clients sending data to Solr if you can partition the data up amongst the N clients cleanly.
FWIW, Erick On Sat, Jun 29, 2013 at 8:20 AM, Ahmet Arslan <iori...@yahoo.com> wrote: > Hi Mike, > > > You could try http://wiki.apache.org/solr/UpdateCSV > > And make sure you commit at the very end. > > > > > ________________________________ > From: Mike L. <javaone...@yahoo.com> > To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> > Sent: Saturday, June 29, 2013 3:15 AM > Subject: FileDataSource vs JdbcDataSouce (speed) Solr 3.5 > > > > I've been working on improving index time with a JdbcDataSource DIH based > config and found it not to be as performant as I'd hoped for, for various > reasons, not specifically due to solr. With that said, I decided to switch > gears a bit and test out FileDataSource setup... I assumed by eliminiating > network latency, I should see drastic improvements in terms of import > time..but I'm a bit surprised that this process seems to run much slower, > at least the way I've initially coded it. (below) > > The below is a barebone file import that I wrote which consumes a tab > delimited file. Nothing fancy here. The regex just seperates out the > fields... Is there faster approach to doing this? If so, what is it? > > Also, what is the "recommended" approach in terms of index/importing data? > I know thats may come across as a vague question as there are various > options available, but which one would be considered the "standard" > approach within a production enterprise environment. > > > (below has been cleansed) > > <dataConfig> > <dataSource name="file" type="FileDataSource" /> > <document> > <entity name="entity1" > processor="LineEntityProcessor" > url="[location_of_file]/file.csv" > dataSource="file" > transformer="RegexTransformer,TemplateTransformer"> > <field column="rawLine" > > regex="^(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)$" > > groupNames="field1,field2,field3,field4,field5,field6,field7,field8,field9,field10,field11,field12,field13,field14,field15,field16,field17,field18,field19,field10,field11,field12" > /> > </entity> > </document> > </dataConfig> > > Thanks in advance, > Mike >