Hi Mike,
You could try http://wiki.apache.org/solr/UpdateCSV And make sure you commit at the very end. ________________________________ From: Mike L. <javaone...@yahoo.com> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> Sent: Saturday, June 29, 2013 3:15 AM Subject: FileDataSource vs JdbcDataSouce (speed) Solr 3.5 I've been working on improving index time with a JdbcDataSource DIH based config and found it not to be as performant as I'd hoped for, for various reasons, not specifically due to solr. With that said, I decided to switch gears a bit and test out FileDataSource setup... I assumed by eliminiating network latency, I should see drastic improvements in terms of import time..but I'm a bit surprised that this process seems to run much slower, at least the way I've initially coded it. (below) The below is a barebone file import that I wrote which consumes a tab delimited file. Nothing fancy here. The regex just seperates out the fields... Is there faster approach to doing this? If so, what is it? Also, what is the "recommended" approach in terms of index/importing data? I know thats may come across as a vague question as there are various options available, but which one would be considered the "standard" approach within a production enterprise environment. (below has been cleansed) <dataConfig> <dataSource name="file" type="FileDataSource" /> <document> <entity name="entity1" processor="LineEntityProcessor" url="[location_of_file]/file.csv" dataSource="file" transformer="RegexTransformer,TemplateTransformer"> <field column="rawLine" regex="^(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)$" groupNames="field1,field2,field3,field4,field5,field6,field7,field8,field9,field10,field11,field12,field13,field14,field15,field16,field17,field18,field19,field10,field11,field12" /> </entity> </document> </dataConfig> Thanks in advance, Mike