Thanks.  This is what I want.

Best regards,

Ey-Chih


On Tue, Oct 15, 2013 at 1:50 PM, Alan Gates <[email protected]> wrote:

> Pig handles doing multiple group bys on the same input, often in a single
> MR job.  So:
>
> A = load 'file';
> B = group A by $0;
> C = foreach B generate group, COUNT(A);
> store C into 'output1';
> D = group A by $1;
> E = foreach D generate group, COUNT(A);
> store D into 'output2';
>
> This can be done in a single MR job.  Is that what you're looking for?
>
> Alan.
>
> On Oct 15, 2013, at 12:12 PM, ey-chih chow wrote:
>
> > What I really want to know is,in Pig, how can I read an input data set
> only
> > once and generate multiple instances with distinct keys for each data
> point
> > and do a group-by?
> >
> > Best regards,
> >
> > Ey-Chih Chow
> >
> >
> > On Tue, Oct 15, 2013 at 10:16 AM, Pradeep Gollakota <
> [email protected]>wrote:
> >
> >> I'm not aware of anyway to do that. I think you're also missing the
> spirit
> >> of Pig. Pig is meant to be a data workflow language. Describe a workflow
> >> for your data using PigLatin and Pig will then compile your script to
> >> MapReduce jobs. The number of MapReduce jobs that it generates is the
> >> smallest number of jobs (based on the optimizers) that Pig thinks it
> needs
> >> to complete the workflow.
> >>
> >> Why do you want to control the number of MR jobs?
> >>
> >>
> >> On Tue, Oct 15, 2013 at 10:07 AM, ey-chih chow <[email protected]>
> wrote:
> >>
> >>> Thanks everybody.  Is there anyway we can programmatically control the
> >>> number of M-R jobs that a Pig script will generate, similar to write
> M-R
> >>> jobs in Java?
> >>>
> >>> Best regards,
> >>>
> >>> Ey-Chih Chow
> >>>
> >>>
> >>> On Tue, Oct 15, 2013 at 6:14 AM, Shahab Yunus <[email protected]
> >>>> wrote:
> >>>
> >>>> And Geert's comment about using external-to-Pig approach reminds me
> >> that,
> >>>> then you have Netflix's PigLipstick too. Nice visual tool for actual
> >>>> execution and stores job history as well.
> >>>>
> >>>> Regards,
> >>>> Shahab
> >>>>
> >>>>
> >>>> On Tue, Oct 15, 2013 at 8:51 AM, Geert Van Landeghem <
> >> [email protected]
> >>>>> wrote:
> >>>>
> >>>>> You can also use ambrose to monitor execution of your pig script at
> >>>>> runtime. Remark: from pig-0.11 on.
> >>>>>
> >>>>> It show you the DAG of MR jobs and which are currently being
> >> executed.
> >>> As
> >>>>> long as pig-ambrose is connected to the execution of your script
> >>>> (workflow)
> >>>>> you can replay the workflow.
> >>>>>
> >>>>> --
> >>>>> kind regards,
> >>>>> Geert
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 15-okt.-2013, at 14:43, Shahab Yunus <[email protected]>
> >>> wrote:
> >>>>>
> >>>>>> Have you tried using ILLUSTRATE and EXPLAIN command? As far as I
> >>> know,
> >>>> I
> >>>>>> don't think they give you the exact number as it depends on the
> >>> actual
> >>>>> data
> >>>>>> but I believe you can interpret it/extrapolate it from the
> >>> information
> >>>>>> provided by these commands.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Shahab
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Oct 15, 2013 at 3:57 AM, ey-chih chow <[email protected]>
> >>>> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I have a Pig script that has two group-by statements on the the
> >>> input
> >>>>> data
> >>>>>>> set.  Is there anybody knows how many M-R jobs the script will
> >>>> generate?
> >>>>>>> Thanks.
> >>>>>>>
> >>>>>>> Best regards,
> >>>>>>>
> >>>>>>> Ey-Chih Chow
> >>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Reply via email to