We are attempting to use the Spooling Directory Source to read data into Flume. Due to certain restrictions, we're stuck with placing around 2000 files in the directory to get processed, every 3-4 minutes.
The source does not seem to be able to keep up with this load and seems to get progressively slower as we go along. I'm fairly certain that it has to do with the fact there are so many files, versus a couple really huge files (but this is what we have to deal with). >From what I can tell the source seems to be single threaded, which is not ideal for this situation. I was thinking of a couple options. 1. Create multiple Spooling Directory sources, pointing them to the same directory, and changing the trackerDir. 2. Creating multiple Spooling Directory sources, pointing them to different directories (if we can move the files to different dirs). 3. Use some other source. But given that these files are the inputs I have to work with, not sure if there is another viable option. Maybe Exec source with 'tail', however, I don't think that would be viable either. Does anyone have any suggestions? Is it even plausible to use multiple spool sources on the same directory? Is there a config I'm missing to process more than one file at a time? Any help would be appreciated. -Tim
