Hey I came across chukwa from a blog post. And it looks like it there is a real effort in collecting data from multiple sources and pumping it into the HDFS.
I was looking at this pdf from the wiki https://wiki.apache.org/hadoop/Chukwa?action=AttachFile&do=view&target=ChukwaPoster.pdf And the chart in the middle seems to imply that 2 of the agents you can have is one that takes in streaming data and one that is associated with Log4J and works with log files in particular. I'm pretty new to Hadoop so I'm trying to learn a lot about it in a short time, but what I'm looking for is some kind of system that will monitor a directory somewhere for files being placed there. I don't know what kind of files they could be, csv's, psv's, doc's, txt's, and many others. A later stage would be formatting, parsing and analyzing but for now I just want to be able to detect when a File is placed there. After a file has been detected than it should be sent on it's way to be placed into the HDFS. This should be a completely autonomous and automatic process (or as much as possible). Is this something Chukwa can help me with? If not do you know of any system that might do what I want? I've kind of read a little about Oozie, Falcon, Flume, Scribe, and a couple other projects but I don't think I've found what I'm looking for. Also any information you could provide to help me on my way or clear up any misunderstanding I may have would be great! Thanks [https://mail.google.com/mail/u/0/images/cleardot.gif] jmerv...@rcanalytics.com