Hi Tim, Although I doubt Kafka is the problem, I'd look at that first and eliminate that.
What about those Flume agents? How are they behaving in terms of CPU/GC, and such? You have 18 Solr nodes..... what happens if you increase the number of Flume sinks? Are you seeing anything specific that makes you think the problem is on the Solr side? Can you share charts that show your GC activity, disk IO, etc.? (you can share them easily with SPM <http://sematext.com/spm>, which may help others help you more easily) Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On Tue, Feb 3, 2015 at 7:47 PM, Tim Smith <secs...@gmail.com> wrote: > Hi, > > I have a SolrCloud (Solr 4.4, writing to HDFS on CDH-5.3) collection > configured to be populated by flume Morphlines sink. The flume agent reads > data from Kafka and writes to the Solr collection. > > The issue is that Solr indexing rate is abysmally poor (~6k docs/sec at > best, dips to a few hundred per sec) across the cluster. The incoming > data/document rate is about 30-40k/second. > > I have gone wide/thin with 18 nodes and each with 8GB (Java) + 4GB > (non-heap) memory and narrow/thick with current set of 5 dedicated nodes > each with 36GB (Java) and 16GB (non-heap) memory (18 shards with the former > config and 5 shards, right now). > > On the flume side, I have gone from 5 flume instances, each with a single > sink to 5 sinks for each flume instance. I have tweaked batchSize and > batchDuration. > > I checked ZooKeeper loads and don't see it stressed. Neither are the > datanodes. On the Solr nodes, solr is consuming all the allocated memory > (32GB) but I don't see solr hitting any CPU limits. > > *But*, indexing rate stubbornly stays at ~6k docs/sec. When I bounce the > flume agent, it jumps up momentarily to several hundreds of thousands but > then comes down to ~6k/sec and the flume channels get saturated within > seconds. > > Any clues/pointers for troubleshooting will be appreciated? > > > Thanks, > > Tim >