Hey I came across chukwa from a blog post. And it looks like it  there is a 
real effort in collecting data from multiple sources and pumping it into the 
HDFS.

I was looking at this pdf from the wiki 
https://wiki.apache.org/hadoop/Chukwa?action=AttachFile&do=view&target=ChukwaPoster.pdf

And the chart in the middle seems to imply that 2 of the agents you can have is 
one that takes in streaming data and one that is associated with Log4J and 
works with log files in particular.

I'm pretty new to Hadoop so I'm trying to learn a lot about it in a short time, 
but what I'm looking for is some kind of system that will monitor a directory 
somewhere for files being placed there. I don't know what kind of files they 
could be, csv's, psv's, doc's, txt's, and many others. A later stage would be 
formatting, parsing and analyzing but for now I just want to be able to detect 
when a File is placed there. After a file has been detected than it should be 
sent on it's way to be placed into the HDFS. This should be a completely 
autonomous and automatic process (or as much as possible).

Is this something Chukwa can help me with? If not do you know of any system 
that might do what I want? I've kind of read a little about Oozie, Falcon, 
Flume, Scribe, and a couple other projects but I don't think I've found what 
I'm looking for.  Also any information you could provide to help me on my way 
or clear up any misunderstanding I may have would be great!

Thanks
[https://mail.google.com/mail/u/0/images/cleardot.gif]
jmerv...@rcanalytics.com

Reply via email to