HDFS sink Bucketwriter working

Jagadish Bihani Wed, 26 Sep 2012 20:24:12 -0700

Hi

I had few doubts about HDFS sink Bucketwriter :


-- How does HDFS's bucketwriter works? What criteria does it use to create
another bucket?

-- Creation of a file in HDFS is function of how many parameters ? Initially

I thought it is function of only rolling parameter(interval/size). Butapparently

it is also function 'batchsize' and 'txnEventMax'.

-- If my requirement is that; If I get data from 10 Avro sinks to asingle avro source andI want to dump it to HDFS with fixed size (say 64 MB) file. What shouldI do?Presently If I set it 64 MB rolling size; Bucketwriter creates manyfiles ( I suspect itis = trxEventMax) and after a while it throws exceptions like 'too manyopen files'. (I have limit of

75000 open file descriptors).

Information about above things will be of great help to tune flumeproperly for the requirements.


Reagards,
Jagadish

HDFS sink Bucketwriter working

Reply via email to