
One issue is that you're forcing all the work onto the Solr
server, and single-threading to boot by using DIH. You can
consider moving to a SolrJ model where you can have
N clients sending data to Solr if you can partition the data
up amongst the N clients cleanly.


On Sat, Jun 29, 2013 at 8:20 AM, Ahmet Arslan <iori...@yahoo.com> wrote:

> Hi Mike,
> You could try http://wiki.apache.org/solr/UpdateCSV
> And make sure you commit at the very end.
> ________________________________
>  From: Mike L. <javaone...@yahoo.com>
> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
> Sent: Saturday, June 29, 2013 3:15 AM
> Subject: FileDataSource vs JdbcDataSouce (speed) Solr 3.5
> I've been working on improving index time with a JdbcDataSource DIH based
> config and found it not to be as performant as I'd hoped for, for various
> reasons, not specifically due to solr. With that said, I decided to switch
> gears a bit and test out FileDataSource setup... I assumed by eliminiating
> network latency, I should see drastic improvements in terms of import
> time..but I'm a bit surprised that this process seems to run much slower,
> at least the way I've initially coded it. (below)
> The below is a barebone file import that I wrote which consumes a tab
> delimited file. Nothing fancy here. The regex just seperates out the
> fields... Is there faster approach to doing this? If so, what is it?
> Also, what is the "recommended" approach in terms of index/importing data?
> I know thats may come across as a vague question as there are various
> options available, but which one would be considered the "standard"
> approach within a production enterprise environment.
> (below has been cleansed)
> <dataConfig>
>      <dataSource name="file" type="FileDataSource" />
>    <document>
>          <entity name="entity1"
>                  processor="LineEntityProcessor"
>                  url="[location_of_file]/file.csv"
>                  dataSource="file"
>                  transformer="RegexTransformer,TemplateTransformer">
>  <field column="rawLine"
> regex="^(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)$"
> groupNames="field1,field2,field3,field4,field5,field6,field7,field8,field9,field10,field11,field12,field13,field14,field15,field16,field17,field18,field19,field10,field11,field12"
> />
>          </entity>
>    </document>
> </dataConfig>
> Thanks in advance,
> Mike

Reply via email to