Hi Mayur, Those 3 modes are 3 differents ways to use Hadoop, however, the only production mode here is the fully distributed one. The 2 others are more for local testing. How many nodes are you expecting to use hadoop on?
JM 2013/3/7 Mayur Patil <[email protected]>: > Hello, > > Now I am slowly understanding Hadoop working. > > As I want to collect the logs from three machines > > including Master itself . My small query is > > which mode should I implement for this?? > > Standalone Operation > Pseudo-Distributed Operation > Fully-Distributed Operation > > Seeking for guidance, > > Thank you !! > -- > Cheers, > Mayur > > > > >>> Hi mayur, >>> >>> Flume is used for data collection. Pig is used for data processing. >>> For eg, if you have a bunch of servers that you want to collect the >>> logs from and push to HDFS - you would use flume. Now if you need to >>> run some analysis on that data, you could use pig to do that. >>> >>> Sent from my iPhone >>> >>> On Feb 14, 2013, at 1:39 AM, Mayur Patil <[email protected]> >>> wrote: >>> >>> > Hello, >>> > >>> > I just read about Pig >>> > >>> >> Pig >>> >> A data flow language and execution environment for exploring very >>> > large datasets. >>> >> Pig runs on HDFS and MapReduce clusters. >>> > >>> > What the actual difference between Pig and Flume makes in logs >>> > clustering?? >>> > >>> > Thank you !! >>> > -- >>> > Cheers, >>> > Mayur. >>> > >>> > >>> > >>> >> Hey Mayur, >>> >>> >>> >>> If you are collecting logs from multiple servers then you can use >>> >>> flume >>> >>> for the same. >>> >>> >>> >>> if the contents of the logs are different in format then you can >>> >>> just >>> >>> use >>> >>> textfileinput format to read and write into any other format you want >>> >>> for >>> >>> your processing in later part of your projects >>> >>> >>> >>> first thing you need to learn is how to setup hadoop >>> >>> then you can try writing sample hadoop mapreduce jobs to read from >>> >>> text >>> >>> file and then process them and write the results into another file >>> >>> then you can integrate flume as your log collection mechanism >>> >>> once you get hold on the system then you can decide more on which >>> >>> paths >>> >>> you want to follow based on your requirements for storage, compute >>> >>> time, >>> >>> compute capacity, compression etc >>> >>> >>> >> -------------- >>> >> -------------- >>> >> >>> >>> Hi, >>> >>> >>> >>> Please read basics on how hadoop works. >>> >>> >>> >>> Then start your hands on with map reduce coding. >>> >>> >>> >>> The tool which has been made for you is flume , but don't see tool >>> >>> till >>> >>> you complete above two steps. >>> >>> >>> >>> Good luck , keep us posted. >>> >>> >>> >>> Regards, >>> >>> >>> >>> Jagat Singh >>> >>> >>> >>> ----------- >>> >>> Sent from Mobile , short and crisp. >>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <[email protected]> >>> >>> wrote: >>> >>> >>> >>>> Hello, >>> >>>> >>> >>>> I am new to Hadoop. I am doing a project in cloud in which I >>> >>>> >>> >>>> have to use hadoop for Map-reduce. It is such that I am going >>> >>>> >>> >>>> to collect logs from 2-3 machines having different locations. >>> >>>> >>> >>>> The logs are also in different formats such as .rtf .log .txt >>> >>>> >>> >>>> Later, I have to collect and convert them to one format and >>> >>>> >>> >>>> collect to one location. >>> >>>> >>> >>>> So I am asking which module of Hadoop that I need to study >>> >>>> >>> >>>> for this implementation?? Or whole framework should I need >>> >>>> >>> >>>> to study ?? >>> >>>> >>> >>>> Seeking for guidance, >>> >>>> >>> >>>> Thank you !! > > > > > -- > Cheers, > Mayur.
