HDFS sink data loss possible ?

Jagadish Bihani Wed, 29 May 2013 07:14:00 -0700

Hi

Based on our observations on our production setup in flume:


We have seen file roll sink delivering almost 1% events greater than those
delivered by HDFS sink per day.
(We have replicating setup and two different
file channels for the sinks).

Configuration :
========
Flume version:1.3.1

Flume topology: 30 first tier machines and 3 second tier machines (whichdeliver to HDFS and local file system)

HDFS compression codec :lzop
Channels : File channel for every source-sink pair.
Hadoop version :1.0.3 (Apache Hadoop)

Things are working fine but we see some data loss in the HDFS (thoughnot very huge

1 million in 1 billion events).

Is it possible in some scenario? (Just to add datanodes of the hadoopcluster are highly loaded. Can that lead to any disaster?)


Regards,
Jagadish

HDFS sink data loss possible ?

Reply via email to