Re: Collecting thousands of sources

Javi Roman Fri, 05 Sep 2014 00:15:33 -0700

Hello,

The scenario which is describing Juanfra is related with the question
I made a few days ago [1].

You can not install Flume agents in the SNMP managed devices, and you
can not modify any software in the SNMP managed devide for use Flume
client SDK (if I understand correctly your idea Ashish). There are two
ways for SNMP data collection from SNMP devices using Flume (IMHO):

1. To create a custom application which launches the SNMP queries to
the thousand of devices, and log the answer into a file: In this case
Flume can sniff this file with the "exec source" core plugin (tail).

2. To use a flume-snmp-source plugin (similar to [2]), in other words,
to shift the SNMP query custom application into a specialized Flume
plugin.

Juanfra is talking about a scenario like the second point. In that
case you have to handle a huge flume configuration file, with an entry
for each managed device to query. For this situation I guess there are
two possible solutions:

1. The flume-snmp-source plugin can use a file with a list of host to
query as parameter:

agent.sources.source1.host = /path/to/list-of-host-file

However I guess this breaks the philosophy or the simplicity of other
core plugins of Flume.

2.  Create a little program to fill the flume configuration file with
a template, or something similar.

Any other ideas? I guess this is a good discussion about a real world use case.

[1] http://mail-archives.apache.org/mod_mbox/flume-user/201409.mbox/browser
[2] https://github.com/javiroman/flume-snmp-source

On Fri, Sep 5, 2014 at 4:56 AM, Ashish <[email protected]> wrote:
>
> Have a look at Flume Client SDK. One simple way would be to use Flume clients 
> implementations to send Events to Flume Sources, this would significantly 
> reduce the number of Sources you need to manage.
>
> HTH !
>
>
> On Thu, Sep 4, 2014 at 9:40 PM, JuanFra Rodriguez Cardoso 
> <[email protected]> wrote:
>>
>> Thanks Andrew for your quick response.
>>
>> My sources (server PUD) can't put events into an agregation point. For this 
>> reason I'm following a PollingSource schema where my agent needs to be 
>> configured with thousands of sources. Any clues for use cases where data is 
>> injected considering a polling process?
>>
>> Regards!
>> ---
>> JuanFra Rodriguez Cardoso
>>
>>
>> 2014-09-04 17:41 GMT+02:00 Andrew Ehrlich <[email protected]>:
>>>
>>> One way to avoid managing so many sources would be to have an aggregation 
>>> point between the data generators the flume sources. For example, maybe you 
>>> could have the data generators put events into a message queue(s), then 
>>> have flume consume from there?
>>>
>>> Andrew
>>>
>>> ---- On Thu, 04 Sep 2014 08:29:04 -0700 JuanFra Rodriguez 
>>> Cardoso<[email protected]> wrote ----
>>>
>>> Hi all:
>>>
>>> Considering an environment with thousands of sources, which are the best 
>>> practices for managing the agent configuration (flume.conf)? Is it 
>>> recommended to create a multi-layer topology where each agent takes control 
>>> of a subset of sources?
>>>
>>> In that case, a conf mgmg server (such as Puppet) would be responsible for 
>>> editing flume.conf  with parameters 'agent.sources' from source1 to 
>>> source3000 (assuming we have 3000 sources machines).
>>>
>>> Are my thoughts aligned with that scenarios of large scale data ingest?
>>>
>>> Thanks a lot!
>>> ---
>>> JuanFra
>>>
>>>
>>
>
>
>
> --
> thanks
> ashish
>
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal

Re: Collecting thousands of sources

Reply via email to