Reading Flume spoolDir in parallel

Haidang N Tue, 16 Sep 2014 11:27:12 -0700

Since I'm not allowed to set up Flume on prod servers, I have to download the 
logs, put them in a Flume spoolDir and have a sink to consume from the channel 
and write to Cassandra. Everything is working fine.
However, as I have a lot of log files in the spoolDir, and the current setup is 
only processing 1 file at a time, it's taking a while. I want to be able to 
process many files concurrently. One way I thought of is to use the spoolDir 
but distribute the files into 5-10 different directories, and define multiple 
sources/channels/sinks, but this is a bit clumsy. Is there a better way to 
achieve this?
Thanks

Reading Flume spoolDir in parallel

Reply via email to