9m*15 - that's a lot of queries (>400 QPS).

I would try reduce the number of queries:

1. Rewrite your main (root) query to select all possible data
* use SQL joins instead of DIH nested entities
* select data from 1-N related tables (tags, authors, etc) in the main
query using GROUP_CONCAT (that's MySQL specific function, but there
are similar functions for other RDBMS-es) aggregate function and then
split concatenated data in a DIH transformer.

2. Identify small tables in nested entities and cache them completely
in CachedSqlEntityProcessor.



On Wed, Aug 8, 2012 at 10:35 AM, Mikhail Khludnev
<mkhlud...@griddynamics.com> wrote:
> Hello,
>
> Does your indexer utilize CPU/IO? - check it by iostat/vmstat.
> If it doesn't, take several thread dumps by jvisualvm sampler or jstack,
> try to understand what blocks your threads from progress.
> It might happen you need to speedup your SQL data consumption, to do this,
> you can enable threads in DIH (only in 3.6.1), move from N+1 SQL queries to
> select all/cache approach
> http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor and
> https://issues.apache.org/jira/browse/SOLR-2382
>
> Good luck
>
> On Wed, Aug 8, 2012 at 9:16 AM, Pranav Prakash <pra...@gmail.com> wrote:
>
>> Folks,
>>
>> My full data import takes ~80hrs. It has around ~9m documents and ~15 SQL
>> queries for each document. The database servers are different from Solr
>> Servers. Each document has an update processor chain which (a) calculates
>> signature of the document using SignatureUpdateProcessorFactory and (b)
>> Finds out terms which have term frequency > 2; using a custom processor.
>> The index size is ~ 480GiB
>>
>> I want to know if the amount of time taken is too large compared to the
>> document count? How do I benchmark the stats and what are some of the ways
>> I can improve this? I believe there are some optimizations that I could do
>> at Update Processor Factory level as well. What would be a good way to get
>> dirty on this?
>>
>> *Pranav Prakash*
>>
>> "temet nosce"
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Tech Lead
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mkhlud...@griddynamics.com>

Reply via email to