Thanks. This is what I want. Best regards,
Ey-Chih On Tue, Oct 15, 2013 at 1:50 PM, Alan Gates <[email protected]> wrote: > Pig handles doing multiple group bys on the same input, often in a single > MR job. So: > > A = load 'file'; > B = group A by $0; > C = foreach B generate group, COUNT(A); > store C into 'output1'; > D = group A by $1; > E = foreach D generate group, COUNT(A); > store D into 'output2'; > > This can be done in a single MR job. Is that what you're looking for? > > Alan. > > On Oct 15, 2013, at 12:12 PM, ey-chih chow wrote: > > > What I really want to know is,in Pig, how can I read an input data set > only > > once and generate multiple instances with distinct keys for each data > point > > and do a group-by? > > > > Best regards, > > > > Ey-Chih Chow > > > > > > On Tue, Oct 15, 2013 at 10:16 AM, Pradeep Gollakota < > [email protected]>wrote: > > > >> I'm not aware of anyway to do that. I think you're also missing the > spirit > >> of Pig. Pig is meant to be a data workflow language. Describe a workflow > >> for your data using PigLatin and Pig will then compile your script to > >> MapReduce jobs. The number of MapReduce jobs that it generates is the > >> smallest number of jobs (based on the optimizers) that Pig thinks it > needs > >> to complete the workflow. > >> > >> Why do you want to control the number of MR jobs? > >> > >> > >> On Tue, Oct 15, 2013 at 10:07 AM, ey-chih chow <[email protected]> > wrote: > >> > >>> Thanks everybody. Is there anyway we can programmatically control the > >>> number of M-R jobs that a Pig script will generate, similar to write > M-R > >>> jobs in Java? > >>> > >>> Best regards, > >>> > >>> Ey-Chih Chow > >>> > >>> > >>> On Tue, Oct 15, 2013 at 6:14 AM, Shahab Yunus <[email protected] > >>>> wrote: > >>> > >>>> And Geert's comment about using external-to-Pig approach reminds me > >> that, > >>>> then you have Netflix's PigLipstick too. Nice visual tool for actual > >>>> execution and stores job history as well. > >>>> > >>>> Regards, > >>>> Shahab > >>>> > >>>> > >>>> On Tue, Oct 15, 2013 at 8:51 AM, Geert Van Landeghem < > >> [email protected] > >>>>> wrote: > >>>> > >>>>> You can also use ambrose to monitor execution of your pig script at > >>>>> runtime. Remark: from pig-0.11 on. > >>>>> > >>>>> It show you the DAG of MR jobs and which are currently being > >> executed. > >>> As > >>>>> long as pig-ambrose is connected to the execution of your script > >>>> (workflow) > >>>>> you can replay the workflow. > >>>>> > >>>>> -- > >>>>> kind regards, > >>>>> Geert > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On 15-okt.-2013, at 14:43, Shahab Yunus <[email protected]> > >>> wrote: > >>>>> > >>>>>> Have you tried using ILLUSTRATE and EXPLAIN command? As far as I > >>> know, > >>>> I > >>>>>> don't think they give you the exact number as it depends on the > >>> actual > >>>>> data > >>>>>> but I believe you can interpret it/extrapolate it from the > >>> information > >>>>>> provided by these commands. > >>>>>> > >>>>>> Regards, > >>>>>> Shahab > >>>>>> > >>>>>> > >>>>>> On Tue, Oct 15, 2013 at 3:57 AM, ey-chih chow <[email protected]> > >>>> wrote: > >>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> I have a Pig script that has two group-by statements on the the > >>> input > >>>>> data > >>>>>>> set. Is there anybody knows how many M-R jobs the script will > >>>> generate? > >>>>>>> Thanks. > >>>>>>> > >>>>>>> Best regards, > >>>>>>> > >>>>>>> Ey-Chih Chow > >>>>>>> > >>>>> > >>>>> > >>>> > >>> > >> > > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. >
