Re: [Hadoop-Help]About Map-Reduce implementation

Jean-Marc Spaggiari Thu, 07 Mar 2013 19:01:01 -0800

Hi Mayur,

Those 3 modes are 3 differents ways to use Hadoop, however, the only
production mode here is the fully distributed one. The 2 others are
more for local testing. How many nodes are you expecting to use hadoop
on?


JM


2013/3/7 Mayur Patil <[email protected]>:
> Hello,
>
>    Now I am slowly understanding Hadoop working.
>
>   As I want to collect the logs from three machines
>
>   including Master itself . My small query is
>
>   which mode should I implement for this??
>
>                   Standalone Operation
>                   Pseudo-Distributed Operation
>                   Fully-Distributed Operation
>
>      Seeking for guidance,
>
>      Thank you !!
> --
> Cheers,
> Mayur
>
>
>
>
>>> Hi mayur,
>>>
>>> Flume is used for data collection. Pig is used for data processing.
>>> For eg, if you have a bunch of servers that you want to collect the
>>> logs from and push to HDFS - you would use flume. Now if you need to
>>> run some analysis on that data, you could use pig to do that.
>>>
>>> Sent from my iPhone
>>>
>>> On Feb 14, 2013, at 1:39 AM, Mayur Patil <[email protected]>
>>> wrote:
>>>
>>> > Hello,
>>> >
>>> >   I just read about Pig
>>> >
>>> >> Pig
>>> >> A data flow language and execution environment for exploring very
>>> > large datasets.
>>> >> Pig runs on HDFS and MapReduce clusters.
>>> >
>>> >   What the actual difference between Pig and Flume makes in logs
>>> > clustering??
>>> >
>>> >   Thank you !!
>>> > --
>>> > Cheers,
>>> > Mayur.
>>> >
>>> >
>>> >
>>> >> Hey Mayur,
>>> >>>
>>> >>> If you are collecting logs from multiple servers then you can use
>>> >>> flume
>>> >>> for the same.
>>> >>>
>>> >>> if the contents of the logs are different in format  then you can
>>> >>> just
>>> >>> use
>>> >>> textfileinput format to read and write into any other format you want
>>> >>> for
>>> >>> your processing in later part of your projects
>>> >>>
>>> >>> first thing you need to learn is how to setup hadoop
>>> >>> then you can try writing sample hadoop mapreduce jobs to read from
>>> >>> text
>>> >>> file and then process them and write the results into another file
>>> >>> then you can integrate flume as your log collection mechanism
>>> >>> once you get hold on the system then you can decide more on which
>>> >>> paths
>>> >>> you want to follow based on your requirements for storage, compute
>>> >>> time,
>>> >>> compute capacity, compression etc
>>> >>>
>>> >> --------------
>>> >> --------------
>>> >>
>>> >>> Hi,
>>> >>>
>>> >>> Please read basics on how hadoop works.
>>> >>>
>>> >>> Then start your hands on with map reduce coding.
>>> >>>
>>> >>> The tool which has been made for you is flume , but don't see tool
>>> >>> till
>>> >>> you complete above two steps.
>>> >>>
>>> >>> Good luck , keep us posted.
>>> >>>
>>> >>> Regards,
>>> >>>
>>> >>> Jagat Singh
>>> >>>
>>> >>> -----------
>>> >>> Sent from Mobile , short and crisp.
>>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <[email protected]>
>>> >>> wrote:
>>> >>>
>>> >>>> Hello,
>>> >>>>
>>> >>>>    I am new to Hadoop. I am doing a project in cloud in which I
>>> >>>>
>>> >>>>    have to use hadoop for Map-reduce. It is such that I am going
>>> >>>>
>>> >>>>    to collect logs from 2-3 machines having different locations.
>>> >>>>
>>> >>>>    The logs are also in different formats such as .rtf .log .txt
>>> >>>>
>>> >>>>    Later, I have to collect and convert them to one format and
>>> >>>>
>>> >>>>    collect to one location.
>>> >>>>
>>> >>>>    So I am asking which module of Hadoop that I need to study
>>> >>>>
>>> >>>>    for this implementation?? Or whole framework should I need
>>> >>>>
>>> >>>>    to study ??
>>> >>>>
>>> >>>>    Seeking for guidance,
>>> >>>>
>>> >>>>    Thank you !!
>
>
>
>
> --
> Cheers,
> Mayur.

Re: [Hadoop-Help]About Map-Reduce implementation

Reply via email to