Not terribly efficient but at the top of my head: GROUP ALL and then do a COUNT (or COUNT (*). You can implement a follow-up script or add this in the existing script once the file has been generated.
Regards, Shahab On Mon, May 13, 2013 at 2:16 PM, Mix Nin <[email protected]> wrote: > Ok, let re modify my requirement. I should have specified in the beginning > itself. > > I need to get count of records in an HDFS file created by a PIG script and > the store the count in a text file. This should be done automatically on a > daily basis without manual intervention > > > On Mon, May 13, 2013 at 11:13 AM, Rahul Bhattacharjee < > [email protected]> wrote: > >> How about the second approach , get the application/job id which the pig >> creates and submits to cluster and then find the job output counter for >> that job from the JT. >> >> Thanks, >> Rahul >> >> >> On Mon, May 13, 2013 at 11:37 PM, Mix Nin <[email protected]> wrote: >> >>> It is a text file. >>> >>> If we want to use wc, we need to copy file from HDFS and then use wc, >>> and this may take time. Is there a way without copying file from HDFS to >>> local directory? >>> >>> Thanks >>> >>> >>> On Mon, May 13, 2013 at 11:04 AM, Rahul Bhattacharjee < >>> [email protected]> wrote: >>> >>>> few pointers. >>>> >>>> what kind of files are we talking about. for text you can use wc , for >>>> avro data files you can use avro-tools. >>>> >>>> or get the job that pig is generating , get the counters for that job >>>> from the jt of your hadoop cluster. >>>> >>>> Thanks, >>>> Rahul >>>> >>>> >>>> On Mon, May 13, 2013 at 11:21 PM, Mix Nin <[email protected]> wrote: >>>> >>>>> Hello, >>>>> >>>>> What is the bets way to get the count of records in an HDFS file >>>>> generated by a PIG script. >>>>> >>>>> Thanks >>>>> >>>>> >>>> >>> >> >
