Re: Fastest way to get data into flume?

Asim Zafir Thu, 27 Mar 2014 14:00:13 -0700

How much data are ingesting per minute or second bases?
How many source we are taking here ?
What kind of channel are you using currently and what is the memory
/storage footprint on the source as well as sink?
is it a uniform distribution of traffic? if not, what is the max peak of
the data throughput you you expect from a given source?




On Thu, Mar 27, 2014 at 11:07 AM, Andrew Ehrlich <[email protected]>wrote:

> What about having more than one flume agent?
>
> You could have two agents that read the small messages and sink to HDFS,
> or two agents that read the messages, serialize them, and send them to a
> third agent which sinks them into HDFS.
>
>
> On Thu, Mar 27, 2014 at 9:43 AM, Chris Schneider <
> [email protected]> wrote:
>
>> I have a fair bit of data continually being created in the form of
>> smallish messages (a few hundred bytes), which needs to enter flume, and
>> eventually sink into HDFS.
>>
>> I need to be sure that the data lands in persistent storage and won't be
>> lost, but otherwise throughput isn't important. It just needs to be fast
>> enough to not back up.
>>
>> I'm running into a bottleneck in the initial ingestion of data.
>>
>> I've tried the netcat source, and the thrift source but both have capped
>> out at a thousand or so records per second.
>>
>> Batching up the thrift api items into sets of 10 and using appendBatch is
>> a pretty large speedup, but still not enough.
>>
>> Here's a gist of my ruby test script, and some example runs, and my
>> config.
>>
>> https://gist.github.com/cschneid/9792305
>>
>>
>> 1.  Are there any obvious performance changes I can do to speed up
>> ingestion?
>> 2. How fast can flume reasonably go? Should I switch my source to be
>> something else that's faster? What?
>> 3. Is there a better tool for this kind of task? (rapid, safe ingestion
>> small messages).
>>
>> Thanks!
>> Chris
>>
>
>

Re: Fastest way to get data into flume?

Reply via email to