Re: Number of records in an HDFS file

Shahab Yunus Mon, 13 May 2013 11:27:54 -0700

Not terribly efficient but at the top of my head: GROUP ALL and then do a
COUNT (or COUNT (*). You can implement a follow-up script or add this in
the existing script once the file has been generated.


Regards,
Shahab


On Mon, May 13, 2013 at 2:16 PM, Mix Nin <[email protected]> wrote:

> Ok, let re modify my requirement. I should have specified in the beginning
> itself.
>
> I need to get count of records in an HDFS file created by a PIG script and
> the store the count in a text file. This should be done automatically on a
> daily basis without manual intervention
>
>
> On Mon, May 13, 2013 at 11:13 AM, Rahul Bhattacharjee <
> [email protected]> wrote:
>
>> How about the second approach , get the application/job id which the pig
>> creates and submits to cluster and then find the job output counter for
>> that job from the JT.
>>
>> Thanks,
>> Rahul
>>
>>
>> On Mon, May 13, 2013 at 11:37 PM, Mix Nin <[email protected]> wrote:
>>
>>> It is a text file.
>>>
>>> If we want to use wc, we need to copy file from HDFS and then use wc,
>>> and this may take time. Is there a way without copying file from HDFS to
>>> local directory?
>>>
>>> Thanks
>>>
>>>
>>> On Mon, May 13, 2013 at 11:04 AM, Rahul Bhattacharjee <
>>> [email protected]> wrote:
>>>
>>>> few pointers.
>>>>
>>>> what kind of files are we talking about. for text you can use wc , for
>>>> avro data files you can use avro-tools.
>>>>
>>>> or get the job that pig is generating , get the counters for that job
>>>> from the jt of your hadoop cluster.
>>>>
>>>> Thanks,
>>>>  Rahul
>>>>
>>>>
>>>> On Mon, May 13, 2013 at 11:21 PM, Mix Nin <[email protected]> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> What is the bets way to get the count of records in an HDFS file
>>>>> generated by a PIG script.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Number of records in an HDFS file

Reply via email to