If it is just counting the no. of records in a file then how about having a short 3 liner : LOGS= LOAD 'log'; LOGS_GROUP= GROUP LOGS ALL; LOG_COUNT = FOREACH LOGS_GROUP GENERATE COUNT(LOGS);
It did the trick for me. Warm Regards, Tariq cloudfront.blogspot.com On Mon, May 13, 2013 at 11:57 PM, Shahab Yunus <[email protected]>wrote: > Not terribly efficient but at the top of my head: GROUP ALL and then do a > COUNT (or COUNT (*). You can implement a follow-up script or add this in > the existing script once the file has been generated. > > Regards, > Shahab > > > On Mon, May 13, 2013 at 2:16 PM, Mix Nin <[email protected]> wrote: > >> Ok, let re modify my requirement. I should have specified in the >> beginning itself. >> >> I need to get count of records in an HDFS file created by a PIG script >> and the store the count in a text file. This should be done automatically >> on a daily basis without manual intervention >> >> >> On Mon, May 13, 2013 at 11:13 AM, Rahul Bhattacharjee < >> [email protected]> wrote: >> >>> How about the second approach , get the application/job id which the pig >>> creates and submits to cluster and then find the job output counter for >>> that job from the JT. >>> >>> Thanks, >>> Rahul >>> >>> >>> On Mon, May 13, 2013 at 11:37 PM, Mix Nin <[email protected]> wrote: >>> >>>> It is a text file. >>>> >>>> If we want to use wc, we need to copy file from HDFS and then use wc, >>>> and this may take time. Is there a way without copying file from HDFS to >>>> local directory? >>>> >>>> Thanks >>>> >>>> >>>> On Mon, May 13, 2013 at 11:04 AM, Rahul Bhattacharjee < >>>> [email protected]> wrote: >>>> >>>>> few pointers. >>>>> >>>>> what kind of files are we talking about. for text you can use wc , for >>>>> avro data files you can use avro-tools. >>>>> >>>>> or get the job that pig is generating , get the counters for that job >>>>> from the jt of your hadoop cluster. >>>>> >>>>> Thanks, >>>>> Rahul >>>>> >>>>> >>>>> On Mon, May 13, 2013 at 11:21 PM, Mix Nin <[email protected]> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> What is the bets way to get the count of records in an HDFS file >>>>>> generated by a PIG script. >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>> >>>> >>> >> >
