Re: Pig Script stored on s3 in workflow - failed to find file

Harsh J Tue, 16 Apr 2013 03:27:39 -0700

The script file must reside on HDFS. Oozie will not read it off of S3.
The script may work with data on S3, however, thats not an issue.


On Sun, Apr 7, 2013 at 10:51 PM, Panshul Whisper <[email protected]> wrote:
> Hello
>
> I am trying to run a pig script which is stored on s3 . The cluster scenario
> is as follows:
> * Cluster is installed on EC2 using Cloudera Manager 4.5 Automatic
> Installation
> * Installed version: CDH4
> * Script location on - s3:/pigfiles
> * running as workflow: -> pig ->  script file:
> s3://panshulpigfiles/nysesamples/nysesamplesaws/countGroups_daily.pig
>
> The Pig Script:
> set fs.s3.awsAccessKeyId xxxxxxxxxxxxxxxxxx
> set fs.s3.awsSecretAccessKey xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> --load the sample input file
> data = load 's3://steamdata/nysedata/NYSE_daily.txt' as (exchange:chararray,
> symbol:chararray, date:chararray, open:float, high:float, low:float,
> close:float, volume:int, adj_close:float);
> --group data by symbols
> symbolgrp = group data by symbol;
> --count data in every group
> symcount = foreach symbolgrp generate group,COUNT(data);
> --order the counted list by count
> symcountordered = order symcount by $1;
> store symcountordered into 's3://steamdata/nyseoutput/daily';
>
> Error:
>
> JA008: File does not exist:
> /nysesamples/nysesamplesaws/countGroups_daily.pig
>
> The log file is attached.
>
> Please help me, what am I doing wrong. I can assure you that the input
> path/file exists on s3 and the AWS key and secret key entered are correct.
>
> Thanking You,
>
>
> --
> Regards,
> Ouch Whisper
> 010101010101



-- 
Harsh J

Re: Pig Script stored on s3 in workflow - failed to find file

Reply via email to