9m*15 - that's a lot of queries (>400 QPS). I would try reduce the number of queries:
1. Rewrite your main (root) query to select all possible data * use SQL joins instead of DIH nested entities * select data from 1-N related tables (tags, authors, etc) in the main query using GROUP_CONCAT (that's MySQL specific function, but there are similar functions for other RDBMS-es) aggregate function and then split concatenated data in a DIH transformer. 2. Identify small tables in nested entities and cache them completely in CachedSqlEntityProcessor. On Wed, Aug 8, 2012 at 10:35 AM, Mikhail Khludnev <mkhlud...@griddynamics.com> wrote: > Hello, > > Does your indexer utilize CPU/IO? - check it by iostat/vmstat. > If it doesn't, take several thread dumps by jvisualvm sampler or jstack, > try to understand what blocks your threads from progress. > It might happen you need to speedup your SQL data consumption, to do this, > you can enable threads in DIH (only in 3.6.1), move from N+1 SQL queries to > select all/cache approach > http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor and > https://issues.apache.org/jira/browse/SOLR-2382 > > Good luck > > On Wed, Aug 8, 2012 at 9:16 AM, Pranav Prakash <pra...@gmail.com> wrote: > >> Folks, >> >> My full data import takes ~80hrs. It has around ~9m documents and ~15 SQL >> queries for each document. The database servers are different from Solr >> Servers. Each document has an update processor chain which (a) calculates >> signature of the document using SignatureUpdateProcessorFactory and (b) >> Finds out terms which have term frequency > 2; using a custom processor. >> The index size is ~ 480GiB >> >> I want to know if the amount of time taken is too large compared to the >> document count? How do I benchmark the stats and what are some of the ways >> I can improve this? I believe there are some optimizations that I could do >> at Update Processor Factory level as well. What would be a good way to get >> dirty on this? >> >> *Pranav Prakash* >> >> "temet nosce" >> > > > > -- > Sincerely yours > Mikhail Khludnev > Tech Lead > Grid Dynamics > > <http://www.griddynamics.com> > <mkhlud...@griddynamics.com>