Hi folks,
I am facing a wiered issue.
I am running PIG 0.11 on windows7/64 bit machine with latest version of
cygwin.
I am a weblog which I want to order it by userName to have all the user
activities for the same user together to feed for next line of processing.
I am starting commandprompt -> cygwin.bat -> on the cygwin console go to
D:/ -> pig and typing the following script on grunt shall (local mode).
(Note I've set PIG_HOME, PIG_CLASSPATH correctly).
Script is :
USERACTIVITIES = LOAD '/D:/path/of/logs/useractivities' USING
org.apache.pig.piggybank.storage.CSVExcelStorage(',') AS
(datetimeUnProcessed:chararray, username:chararray, request:chararray);
USERACTIVITIES_ORDERED = ORDER USERACTIVITIES by username;
STORE USERACTIVITIES_ORDERED INTO '/D:/readyfornextinput/useractivities'
USING org.apache.pig.piggybank.storage.CSVExcelStorage(',');
When I do illustrate USERACTIVITIES_ORDERED I see it going smooth.
But when I do store/dump I face wiered issue.
It fails by saying :
java.lang.RuntimeException:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: file:/D:/pigsample_1749383998_1377684507424
When I tried to search this pigsample_number file I could find that in :
D:/tmp/<username>/mapred/local/localRunner
I am not sure how it is happening.
I am not sure if its windows/cygwin related issue or someone saw this on
Linux also.
For reference, you can find the stacktrace attached here:
2013-08-28 15:38:28,863 [Thread-46] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_0004
java.lang.RuntimeException:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: file:/D:/pigsample_1749383998_1377684507424
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:157)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
Input path does not exist: file:/D:/pigsample_1288777582_1377684802262
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
at
org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:190)
at
org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:126)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:131)
... 6 more
Any help on this will be useful.
Regards,
Darpan