On Thu, Jul 18, 2013 at 11:08 PM, Sunita Arvind <[email protected]>wrote:
> Hello friends, > > I am new to flume and have written a python script to fetch some data from > social media. My response is JSON. I am seeking help on following issues: > 1. I am finding it hard to make python and flume talk. Is it just my > ignorance or it is indeed a long route? AFAIK, I need to understand thrift > API and Avro etc to achieve this. I also read about pipes. Would this be a > simple implementation > Python would work fine. As said, you can use HTTP Source. Alternatively, you can also use Avro source using Avro's python client > > 2. I am equally comfortable (uncomfortable) in java. Hence wondering if > its better to re-write my application in Java so that I can easily > integrate it with flume. Are there any advantages of having a java > application, as all of hadoop is java? > The advantage would be that you can use Flume's Client SDK, reducing a lot of work. IMHO, it doesn't matter to Flume as to who is pushing the data > > 3. I need to schedule the agent to run on a daily basis. Which of the > above approaches would help me achieve this easily? > Looks like you have a batch job which would execute at a point of time during the day. If that's the case, please have a re-look if you need Flume. Flume can definitely be used, but you could directly do a load on HDFS. Again, cannot conclude based on the information provided. > > 4. Going by this - > http://mail-archives.apache.org/mod_mbox/flume-user/201306.mbox/%[email protected]%3Elooks > like we need to manually clean up disk space even with flume. I am > not clear on the advantages I would have with flume over using a simple > cron job to do the task. I can manually write statements like "hadoop fs > -put <location of output file on local> <location on hdfs>" in the cron job > instead. > The ML thread pointed is related to RollingFileSink, not HDFS sink, so it's not valid in context of HDFS sink. HTH ! > > Appreciate your help and guidance > > regards, > Sunita > -- thanks ashish Blog: http://www.ashishpaliwal.com/blog My Photo Galleries: http://www.pbase.com/ashishpaliwal
