What I really want to know is,in Pig, how can I read an input data set only
once and generate multiple instances with distinct keys for each data point
and do a group-by?

Best regards,

Ey-Chih Chow


On Tue, Oct 15, 2013 at 10:16 AM, Pradeep Gollakota <pradeep...@gmail.com>wrote:

> I'm not aware of anyway to do that. I think you're also missing the spirit
> of Pig. Pig is meant to be a data workflow language. Describe a workflow
> for your data using PigLatin and Pig will then compile your script to
> MapReduce jobs. The number of MapReduce jobs that it generates is the
> smallest number of jobs (based on the optimizers) that Pig thinks it needs
> to complete the workflow.
>
> Why do you want to control the number of MR jobs?
>
>
> On Tue, Oct 15, 2013 at 10:07 AM, ey-chih chow <eyc...@gmail.com> wrote:
>
> > Thanks everybody.  Is there anyway we can programmatically control the
> > number of M-R jobs that a Pig script will generate, similar to write M-R
> > jobs in Java?
> >
> > Best regards,
> >
> > Ey-Chih Chow
> >
> >
> > On Tue, Oct 15, 2013 at 6:14 AM, Shahab Yunus <shahab.yu...@gmail.com
> > >wrote:
> >
> > > And Geert's comment about using external-to-Pig approach reminds me
> that,
> > > then you have Netflix's PigLipstick too. Nice visual tool for actual
> > > execution and stores job history as well.
> > >
> > > Regards,
> > > Shahab
> > >
> > >
> > > On Tue, Oct 15, 2013 at 8:51 AM, Geert Van Landeghem <
> g...@foundation.be
> > > >wrote:
> > >
> > > > You can also use ambrose to monitor execution of your pig script at
> > > > runtime. Remark: from pig-0.11 on.
> > > >
> > > > It show you the DAG of MR jobs and which are currently being
> executed.
> > As
> > > > long as pig-ambrose is connected to the execution of your script
> > > (workflow)
> > > > you can replay the workflow.
> > > >
> > > > --
> > > > kind regards,
> > > >  Geert
> > > >
> > > >
> > > >
> > > >
> > > > On 15-okt.-2013, at 14:43, Shahab Yunus <shahab.yu...@gmail.com>
> > wrote:
> > > >
> > > > > Have you tried using ILLUSTRATE and EXPLAIN command? As far as I
> > know,
> > > I
> > > > > don't think they give you the exact number as it depends on the
> > actual
> > > > data
> > > > > but I believe you can interpret it/extrapolate it from the
> > information
> > > > > provided by these commands.
> > > > >
> > > > > Regards,
> > > > > Shahab
> > > > >
> > > > >
> > > > > On Tue, Oct 15, 2013 at 3:57 AM, ey-chih chow <eyc...@gmail.com>
> > > wrote:
> > > > >
> > > > >> Hi,
> > > > >>
> > > > >> I have a Pig script that has two group-by statements on the the
> > input
> > > > data
> > > > >> set.  Is there anybody knows how many M-R jobs the script will
> > > generate?
> > > > >> Thanks.
> > > > >>
> > > > >> Best regards,
> > > > >>
> > > > >> Ey-Chih Chow
> > > > >>
> > > >
> > > >
> > >
> >
>

Reply via email to