One way to avoid managing so many sources would be to have an aggregation point 
between the data generators the flume sources. For example, maybe you could 
have the data generators put events into a message queue(s), then have flume 
consume from there?

Andrew

---- On Thu, 04 Sep 2014 08:29:04 -0700 JuanFra Rodriguez 
Cardoso<[email protected]> wrote ---- 


Hi all:

Considering an environment with thousands of sources, which are the best 
practices for managing the agent configuration (flume.conf)? Is it recommended 
to create a multi-layer topology where each agent takes control of a subset of 
sources?
 
In that case, a conf mgmg server (such as Puppet) would be responsible for 
editing flume.conf  with parameters 'agent.sources' from source1 to source3000 
(assuming we have 3000 sources machines).

Are my thoughts aligned with that scenarios of large scale data ingest?
 
Thanks a lot!
---
JuanFra



 
 


Reply via email to