Re: Flume for multi KB or MB docs?

Mike Percy Mon, 15 Oct 2012 15:16:20 -0700

Hi Otis,
Flume was designed as a streaming event transport system, not as a general
purpose file transfer system. The two have quite different characteristics,
so while binary files could be transported by Flume, if you tried to
transport a 100MB PDF as a single event you may have issues around memory
allocation, GC, transfer speed, etc., since we hold at least one event at a
time in memory. However if you want to transfer a large log file and each
line is an event then it's a perfect use case because you care about the
individual events more than the file itself.

For transferring very large binary files that are not events or records,
you may want to look for something that it good at being a single-hop
system with resume capability, like rsync, to transfer the files. Then I
suppose you could use the hadoop fs shell and a small script to store the
data onto HDFS. You probably wouldn't need all the fancy tagging, routing,
and serialization features that Flume has.

Hope this helps.

Regards
Mike

On Sun, Oct 14, 2012 at 5:49 PM, Otis Gospodnetic <
[email protected]> wrote:

> Hi,
>
> We're considering using Flume for transport of potentially large
> "documents" (think documents that can be as small as tweets or as large as
> PDF files).
>
> I'm wondering if Flume is suitable for transporting potentially large
> documents (in the most reliable mode, too) or if there is something
> inherent in Flume that makes it a poor choice for this use case?
>
> Thanks,
> Otis
> ----
> Performance Monitoring for Solr / ElasticSearch / HBase -
> http://sematext.com/spm
>

Re: Flume for multi KB or MB docs?

Reply via email to