Hello, The scenario which is describing Juanfra is related with the question I made a few days ago [1].
You can not install Flume agents in the SNMP managed devices, and you can not modify any software in the SNMP managed devide for use Flume client SDK (if I understand correctly your idea Ashish). There are two ways for SNMP data collection from SNMP devices using Flume (IMHO): 1. To create a custom application which launches the SNMP queries to the thousand of devices, and log the answer into a file: In this case Flume can sniff this file with the "exec source" core plugin (tail). 2. To use a flume-snmp-source plugin (similar to [2]), in other words, to shift the SNMP query custom application into a specialized Flume plugin. Juanfra is talking about a scenario like the second point. In that case you have to handle a huge flume configuration file, with an entry for each managed device to query. For this situation I guess there are two possible solutions: 1. The flume-snmp-source plugin can use a file with a list of host to query as parameter: agent.sources.source1.host = /path/to/list-of-host-file However I guess this breaks the philosophy or the simplicity of other core plugins of Flume. 2. Create a little program to fill the flume configuration file with a template, or something similar. Any other ideas? I guess this is a good discussion about a real world use case. [1] http://mail-archives.apache.org/mod_mbox/flume-user/201409.mbox/browser [2] https://github.com/javiroman/flume-snmp-source On Fri, Sep 5, 2014 at 4:56 AM, Ashish <[email protected]> wrote: > > Have a look at Flume Client SDK. One simple way would be to use Flume clients > implementations to send Events to Flume Sources, this would significantly > reduce the number of Sources you need to manage. > > HTH ! > > > On Thu, Sep 4, 2014 at 9:40 PM, JuanFra Rodriguez Cardoso > <[email protected]> wrote: >> >> Thanks Andrew for your quick response. >> >> My sources (server PUD) can't put events into an agregation point. For this >> reason I'm following a PollingSource schema where my agent needs to be >> configured with thousands of sources. Any clues for use cases where data is >> injected considering a polling process? >> >> Regards! >> --- >> JuanFra Rodriguez Cardoso >> >> >> 2014-09-04 17:41 GMT+02:00 Andrew Ehrlich <[email protected]>: >>> >>> One way to avoid managing so many sources would be to have an aggregation >>> point between the data generators the flume sources. For example, maybe you >>> could have the data generators put events into a message queue(s), then >>> have flume consume from there? >>> >>> Andrew >>> >>> ---- On Thu, 04 Sep 2014 08:29:04 -0700 JuanFra Rodriguez >>> Cardoso<[email protected]> wrote ---- >>> >>> Hi all: >>> >>> Considering an environment with thousands of sources, which are the best >>> practices for managing the agent configuration (flume.conf)? Is it >>> recommended to create a multi-layer topology where each agent takes control >>> of a subset of sources? >>> >>> In that case, a conf mgmg server (such as Puppet) would be responsible for >>> editing flume.conf with parameters 'agent.sources' from source1 to >>> source3000 (assuming we have 3000 sources machines). >>> >>> Are my thoughts aligned with that scenarios of large scale data ingest? >>> >>> Thanks a lot! >>> --- >>> JuanFra >>> >>> >> > > > > -- > thanks > ashish > > Blog: http://www.ashishpaliwal.com/blog > My Photo Galleries: http://www.pbase.com/ashishpaliwal
