Hi

I had few doubts about HDFS sink Bucketwriter :

-- How does HDFS's bucketwriter works? What criteria does it use to create
another bucket?

-- Creation of a file in HDFS is function of how many parameters ? Initially
I thought it is function of only rolling parameter(interval/size). But apparently
it is also function 'batchsize' and 'txnEventMax'.

-- If my requirement is that; If I get data from 10 Avro sinks to a single avro source and I want to dump it to HDFS with fixed size (say 64 MB) file. What should I do? Presently If I set it 64 MB rolling size; Bucketwriter creates many files ( I suspect it is = trxEventMax) and after a while it throws exceptions like 'too many open files'. (I have limit of
75000 open file descriptors).

Information about above things will be of great help to tune flume properly for the requirements.

Reagards,
Jagadish

Reply via email to