Hi, I am using Pig to analyze the percentage of each UserAgents from an
apache log. The following program failed because of ORDER command at the
very last (the result variable is correct and can be dumped out correctly).
I am relative new to Pig and could not figure it out so need you guys to
help. Following is the program and error message. Thanks!

logs = LOAD '$LOGS' USING ApacheCombinedLogLoader AS (remoteHost, hyphen,
user, time, method, uri, protocol, statusCode, responseSize, referer,
userAgent);

uarows = FOREACH logs GENERATE userAgent;
total = FOREACH (GROUP uarows ALL) GENERATE COUNT(uarows) as count;
dump total;

gpuarows = GROUP uarows BY userAgent;
result = FOREACH gpuarows {
       subtotal = COUNT(uarows);
       GENERATE flatten(group) as ua, subtotal AS SUB_TOTAL,
100*(double)subtotal/(double)total.count AS percentage;
       };
orderresult = ORDER result BY SUB_TOTAL DESC;
dump orderresult;

-- what's weird is that 'dump result' works just fine, so it's the ORDER
line makes trouble

Errors:
2013-04-13 10:36:32,409 [Thread-48] INFO  org.apache.hadoop.mapred.MapTask
- record buffer = 262144/327680
2013-04-13 10:36:32,437 [Thread-48] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_0005
java.lang.RuntimeException:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist:
file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_259943398_1365820592017
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:157)
    at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
    at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
Input path does not exist:
file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_259943398_1365820592017
    at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
    at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
    at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:177)
    at
org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:124)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:131)
    ... 6 more
2013-04-13 10:36:32,525 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_local_0005
2013-04-13 10:36:32,526 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Processing aliases orderresult
2013-04-13 10:36:32,526 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- detailed locations: M: orderresult[19,14] C:  R:
2013-04-13 10:36:37,536 [main] WARN
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to
stop immediately on failure.
2013-04-13 10:36:37,536 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_local_0005 has failed! Stop running all dependent jobs
2013-04-13 10:36:37,536 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2013-04-13 10:36:37,537 [main] ERROR
org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2013-04-13 10:36:37,538 [main] INFO
org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

HadoopVersion    PigVersion    UserId    StartedAt    FinishedAt    Features
1.0.4    0.11.0    dliu    2013-04-13 10:35:50    2013-04-13 10:36:37
GROUP_BY,ORDER_BY

Some jobs have failed! Stop running all dependent jobs

Job Stats (time in seconds):
JobId    Maps    Reduces    MaxMapTime    MinMapTIme    AvgMapTime
MedianMapTime    MaxReduceTime    MinReduceTime    AvgReduceTime
MedianReducetime    Alias    Feature    Outputs
job_local_0002    1    1    n/a    n/a    n/a    n/a    n/a    n/a
1-18,logs,total,uarows    MULTI_QUERY,COMBINER
job_local_0003    1    1    n/a    n/a    n/a    n/a    n/a    n/a
gpuarows,result    GROUP_BY,COMBINER
job_local_0004    1    1    n/a    n/a    n/a    n/a    n/a    n/a
orderresult    SAMPLER

Failed Jobs:
JobId    Alias    Feature    Message    Outputs
job_local_0005    orderresult    ORDER_BY    Message: Job failed! Error -
NA    file:/tmp/temp-1225021115/tmp-62411972,

Input(s):
Successfully read 0 records from:
"file:///home/dliu/ApacheLogAnalysisWithPig/access.log"

Output(s):
Failed to produce result in "file:/tmp/temp-1225021115/tmp-62411972"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_local_0002    ->    job_local_0003,
job_local_0003    ->    job_local_0004,
job_local_0004    ->    job_local_0005,
job_local_0005


2013-04-13 10:36:37,539 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Some jobs have failed! Stop running all dependent jobs
2013-04-13 10:36:37,541 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1066: Unable to open iterator for alias orderresult
Details at logfile:
/home/dliu/ApacheLogAnalysisWithPig/pig_1365820535568.log

Reply via email to