I am just spitballing here. You might want to override the FileOutputFormatter's commit job method , which while committing the job , writes the value of the job output record counter (I think there is a standard counter to give the number of records outputted by the job) to a file in HDFS.
Not sure if we can plug a custom FOC to a pig workflow. Another thing is , you can create a workflow statement in pig (in the same pig script that we are taking about) to get the count of the final bag and then store it in a file. Can you not ? Thanks, Rahul On Mon, May 13, 2013 at 11:46 PM, Mix Nin <[email protected]> wrote: > Ok, let re modify my requirement. I should have specified in the beginning > itself. > > I need to get count of records in an HDFS file created by a PIG script and > the store the count in a text file. This should be done automatically on a > daily basis without manual intervention > > > On Mon, May 13, 2013 at 11:13 AM, Rahul Bhattacharjee < > [email protected]> wrote: > >> How about the second approach , get the application/job id which the pig >> creates and submits to cluster and then find the job output counter for >> that job from the JT. >> >> Thanks, >> Rahul >> >> >> On Mon, May 13, 2013 at 11:37 PM, Mix Nin <[email protected]> wrote: >> >>> It is a text file. >>> >>> If we want to use wc, we need to copy file from HDFS and then use wc, >>> and this may take time. Is there a way without copying file from HDFS to >>> local directory? >>> >>> Thanks >>> >>> >>> On Mon, May 13, 2013 at 11:04 AM, Rahul Bhattacharjee < >>> [email protected]> wrote: >>> >>>> few pointers. >>>> >>>> what kind of files are we talking about. for text you can use wc , for >>>> avro data files you can use avro-tools. >>>> >>>> or get the job that pig is generating , get the counters for that job >>>> from the jt of your hadoop cluster. >>>> >>>> Thanks, >>>> Rahul >>>> >>>> >>>> On Mon, May 13, 2013 at 11:21 PM, Mix Nin <[email protected]> wrote: >>>> >>>>> Hello, >>>>> >>>>> What is the bets way to get the count of records in an HDFS file >>>>> generated by a PIG script. >>>>> >>>>> Thanks >>>>> >>>>> >>>> >>> >> >
