Hi

Based on our observations on our production setup in flume:

We have seen file roll sink delivering almost 1% events greater than those
delivered by HDFS sink per day.
(We have replicating setup and two different
file channels for the sinks).

Configuration :
========
Flume version:1.3.1
Flume topology: 30 first tier machines and 3 second tier machines (which deliver to HDFS and local file system)
HDFS compression codec :lzop
Channels : File channel for every source-sink pair.
Hadoop version :1.0.3 (Apache Hadoop)

Things are working fine but we see some data loss in the HDFS (though not very huge
1 million in 1 billion events).

Is it possible in some scenario? (Just to add datanodes of the hadoop cluster are highly loaded. Can that lead to any disaster?)

Regards,
Jagadish

Reply via email to