Hi
I had few doubts about HDFS sink Bucketwriter :
-- How does HDFS's bucketwriter works? What criteria does it use to create
another bucket?
-- Creation of a file in HDFS is function of how many parameters ? Initially
I thought it is function of only rolling parameter(interval/size). But
apparently
it is also function 'batchsize' and 'txnEventMax'.
-- If my requirement is that; If I get data from 10 Avro sinks to a
single avro source and
I want to dump it to HDFS with fixed size (say 64 MB) file. What should
I do?
Presently If I set it 64 MB rolling size; Bucketwriter creates many
files ( I suspect it
is = trxEventMax) and after a while it throws exceptions like 'too many
open files'. (I have limit of
75000 open file descriptors).
Information about above things will be of great help to tune flume
properly for the requirements.
Reagards,
Jagadish