Hi,

I have a SolrCloud (Solr 4.4, writing to HDFS on CDH-5.3) collection
configured to be populated by flume Morphlines sink. The flume agent reads
data from Kafka and writes to the Solr collection.

The issue is that Solr indexing rate is abysmally poor (~6k docs/sec at
best, dips to a few hundred per sec) across the cluster. The incoming
data/document rate is about 30-40k/second.

I have gone wide/thin with 18 nodes and each with 8GB (Java) + 4GB
(non-heap) memory and narrow/thick with current set of 5 dedicated nodes
each with 36GB (Java) and 16GB (non-heap) memory (18 shards with the former
config and 5 shards, right now).

On the flume side, I have gone from 5 flume instances, each with a single
sink to 5 sinks for each flume instance. I have tweaked batchSize and
batchDuration.

I checked ZooKeeper loads and don't see it stressed. Neither are the
datanodes. On the Solr nodes, solr is consuming all the allocated memory
(32GB) but I don't see solr hitting any CPU limits.

*But*, indexing rate stubbornly stays at ~6k docs/sec. When I bounce the
flume agent, it jumps up momentarily to several hundreds of thousands but
then comes down to ~6k/sec and the flume channels get saturated within
seconds.

Any clues/pointers for troubleshooting will be appreciated?


Thanks,

Tim

Reply via email to