RE: DIH parallel processing
This is also what I have done, but I agree with the notion of using something external to load the data. -Original Message- From: Dyer, James [mailto:james.d...@ingramcontent.com] Sent: Thursday, October 15, 2015 9:24 AM To: solr-user@lucene.apache.org Subject: RE: DIH parallel processing Nabil, What we do is have multiple dih request handlers configured in solrconfig.xml. Then in the sql query we put something like "where mod(id, ${partition})=0". Then an external script calls a full import on each request handler at the same time and monitors the response. This isn't the most elegant solution but it gets around the fact that DIH is single-threaded. James Dyer Ingram Content Group -Original Message- From: nabil Kouici [mailto:koui...@yahoo.fr] Sent: Thursday, October 15, 2015 3:58 AM To: Solr-user Subject: DIH parallel processing Hi All, I'm using DIH to index more than 15M from Sql Server to Solr. This take more than 2 hours. Big amount of this time is consumed by data fetching from database. I'm thinking about a solution to have parallel (thread) loud in the same DIH. Each thread load a part of data. Do you have any experience with this kind of situation? Regards,Nabil.
RE: DIH parallel processing
Nabil, What we do is have multiple dih request handlers configured in solrconfig.xml. Then in the sql query we put something like "where mod(id, ${partition})=0". Then an external script calls a full import on each request handler at the same time and monitors the response. This isn't the most elegant solution but it gets around the fact that DIH is single-threaded. James Dyer Ingram Content Group -Original Message- From: nabil Kouici [mailto:koui...@yahoo.fr] Sent: Thursday, October 15, 2015 3:58 AM To: Solr-user Subject: DIH parallel processing Hi All, I'm using DIH to index more than 15M from Sql Server to Solr. This take more than 2 hours. Big amount of this time is consumed by data fetching from database. I'm thinking about a solution to have parallel (thread) loud in the same DIH. Each thread load a part of data. Do you have any experience with this kind of situation? Regards,Nabil.
Re: DIH parallel processing
On 15/10/2015 09:57, nabil Kouici wrote: Hi All, I'm using DIH to index more than 15M from Sql Server to Solr. This take more than 2 hours. Big amount of this time is consumed by data fetching from database. I'm thinking about a solution to have parallel (thread) loud in the same DIH. Each thread load a part of data. Do you have any experience with this kind of situation? Regards,Nabil. Hi Nabil, Although very convenient for database imports, DIH is single-threaded and difficult to optimise for performance. There is a batchSize parameter that you may try adjusting to see if that helps. However, we generally avoid the DIH and roll our own indexers using Python or Java, reading the database using SQL (easy in either language) and then posting directly to Solr. This gives us a lot more flexibility in terms of conditioning the data, multi-threading and batching Solr updates. There are lots of great examples of high-performance indexing code available e.g.: http://bryanbende.com/development/2014/08/16/indexing-wikipedia-with-apache-solr/ Best Charlie -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk
DIH parallel processing
Hi All, I'm using DIH to index more than 15M from Sql Server to Solr. This take more than 2 hours. Big amount of this time is consumed by data fetching from database. I'm thinking about a solution to have parallel (thread) loud in the same DIH. Each thread load a part of data. Do you have any experience with this kind of situation? Regards,Nabil.