[
https://issues.apache.org/jira/browse/PIG-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551935
]
Ted Dunning commented on PIG-51:
--------------------------------
I think I am still seeing this issue or a cousin even after applying this
patch. I don't know enough to be sure, however.
grunt> ls /logs/search/2007/12/10
/logs/search/2007/12/10/part-00000<r 3> 1313515859
/logs/search/2007/12/10/part-00001<r 3> 1313535390
/logs/search/2007/12/10/part-00002<r 3> 1313485045
/logs/search/2007/12/10/part-00003<r 3> 1313536061
grunt> a = load '/logs/search/2007/12/10' as (eventType, date, month,
week, day, hour, id, videoId, VisitorUID, engineName, query, offset);
b = filter a by (id neq '-');
grunt> b = filter a by (id neq '-');
grunt> c = group b by id;
grunt> describe c
c: (group, b: (eventType, date, month, week, day, hour, id, videoId,
VisitorUID, engineName, query, offset ) )
grunt> d = foreach c {
click = filter b by eventType eq '/search/click';
generate COUNT(click);
}
>> >> >> grunt> describe d
d: (count1 )
grunt> e = group d by 1;
grunt> describe e
e: (group: ( ), d: (count1 ) )
grunt> f = foreach e generate COUNT(*), SUM(d.count1);
grunt> dump f
----- MapReduce Job -----
Input: [/logs/search/2007/12/10:org.apache.pig.builtin.PigStorage()]
Map: [[*]->[FILTER BY ([PROJECT $6] neq ['-'])]]
Group: [GENERATE {[PROJECT $6],[*]}]
Combine: null
Reduce: GENERATE {[COUNT(GENERATE {[PROJECT $1]->[FILTER BY ([PROJECT $0] eq
['/search/click'])]})]}
Output: /tmp/temp1435257199/tmp1109313480:org.apache.pig.builtin.BinStorage
Split: null
Map parallelism: -1
Reduce parallelism: -1
Job jar size = 482135
2007-12-14 12:04:44,776 [main] INFO org.apache.pig - Pig progress = 0%
2007-12-14 12:04:57,832 [main] INFO org.apache.pig - Pig progress = 0%
2007-12-14 12:04:59,841 [main] INFO org.apache.pig - Pig progress = 0%
2007-12-14 12:05:01,849 [main] INFO org.apache.pig - Pig progress = 0%
2007-12-14 12:05:03,857 [main] INFO org.apache.pig - Pig progress = 0%
2007-12-14 12:05:05,865 [main] INFO org.apache.pig - Pig progress = 1%
2007-12-14 12:05:07,873 [main] INFO org.apache.pig - Pig progress = 1%
2007-12-14 12:05:09,881 [main] INFO org.apache.pig - Pig progress = 2%
2007-12-14 12:05:11,889 [main] INFO org.apache.pig - Pig progress = 2%
2007-12-14 12:05:13,897 [main] INFO org.apache.pig - Pig progress = 2%
2007-12-14 12:05:15,905 [main] INFO org.apache.pig - Pig progress = 3%
2007-12-14 12:05:17,913 [main] INFO org.apache.pig - Pig progress = 3%
2007-12-14 12:05:21,929 [main] INFO org.apache.pig - Pig progress = 3%
2007-12-14 12:05:23,937 [main] INFO org.apache.pig - Pig progress = 4%
2007-12-14 12:05:25,945 [main] INFO org.apache.pig - Pig progress = 4%
2007-12-14 12:05:27,953 [main] INFO org.apache.pig - Pig progress = 4%
2007-12-14 12:05:29,961 [main] INFO org.apache.pig - Pig progress = 5%
2007-12-14 12:05:31,969 [main] INFO org.apache.pig - Pig progress = 5%
2007-12-14 12:05:33,977 [main] INFO org.apache.pig - Pig progress = 5%
2007-12-14 12:05:37,993 [main] INFO org.apache.pig - Pig progress = 5%
2007-12-14 12:05:40,001 [main] INFO org.apache.pig - Pig progress = 6%
2007-12-14 12:05:42,009 [main] INFO org.apache.pig - Pig progress = 6%
2007-12-14 12:05:44,016 [main] INFO org.apache.pig - Pig progress = 6%
2007-12-14 12:05:46,024 [main] INFO org.apache.pig - Pig progress = 7%
2007-12-14 12:05:48,032 [main] INFO org.apache.pig - Pig progress = 7%
2007-12-14 12:05:50,040 [main] INFO org.apache.pig - Pig progress = 8%
2007-12-14 12:05:52,051 [main] INFO org.apache.pig - Pig progress = 8%
2007-12-14 12:05:54,060 [main] INFO org.apache.pig - Pig progress = 8%
2007-12-14 12:05:56,068 [main] INFO org.apache.pig - Pig progress = 8%
2007-12-14 12:05:58,077 [main] INFO org.apache.pig - Pig progress = 8%
2007-12-14 12:06:00,085 [main] INFO org.apache.pig - Pig progress = 9%
2007-12-14 12:06:02,092 [main] INFO org.apache.pig - Pig progress = 9%
2007-12-14 12:06:04,100 [main] INFO org.apache.pig - Pig progress = 9%
2007-12-14 12:06:08,116 [main] INFO org.apache.pig - Pig progress = 10%
2007-12-14 12:06:10,124 [main] INFO org.apache.pig - Pig progress = 10%
2007-12-14 12:06:12,133 [main] INFO org.apache.pig - Pig progress = 10%
2007-12-14 12:06:18,160 [main] INFO org.apache.pig - Pig progress = 10%
2007-12-14 12:06:20,168 [main] INFO org.apache.pig - Pig progress = 10%
2007-12-14 12:06:22,176 [main] INFO org.apache.pig - Pig progress = 11%
2007-12-14 12:06:24,184 [main] INFO org.apache.pig - Pig progress = 11%
2007-12-14 12:06:26,192 [main] INFO org.apache.pig - Pig progress = 11%
2007-12-14 12:06:28,201 [main] INFO org.apache.pig - Pig progress = 12%
2007-12-14 12:06:30,208 [main] INFO org.apache.pig - Pig progress = 12%
2007-12-14 12:06:32,216 [main] INFO org.apache.pig - Pig progress = 12%
2007-12-14 12:06:34,224 [main] INFO org.apache.pig - Pig progress = 13%
2007-12-14 12:06:36,232 [main] INFO org.apache.pig - Pig progress = 13%
2007-12-14 12:06:38,240 [main] INFO org.apache.pig - Pig progress = 13%
2007-12-14 12:06:40,251 [main] INFO org.apache.pig - Pig progress = 14%
2007-12-14 12:06:42,260 [main] INFO org.apache.pig - Pig progress = 14%
2007-12-14 12:06:44,268 [main] INFO org.apache.pig - Pig progress = 15%
2007-12-14 12:06:46,276 [main] INFO org.apache.pig - Pig progress = 15%
2007-12-14 12:06:48,285 [main] INFO org.apache.pig - Pig progress = 15%
2007-12-14 12:06:50,292 [main] INFO org.apache.pig - Pig progress = 15%
2007-12-14 12:06:52,300 [main] INFO org.apache.pig - Pig progress = 16%
2007-12-14 12:06:56,316 [main] INFO org.apache.pig - Pig progress = 16%
2007-12-14 12:06:58,324 [main] INFO org.apache.pig - Pig progress = 17%
2007-12-14 12:07:00,332 [main] INFO org.apache.pig - Pig progress = 17%
2007-12-14 12:07:02,340 [main] INFO org.apache.pig - Pig progress = 17%
2007-12-14 12:07:04,348 [main] INFO org.apache.pig - Pig progress = 17%
...
2007-12-14 12:07:06,374 [main] ERROR org.apache.pig - Error message from task
(map) tip_200712121227_0004_m_000071 java.lang.RuntimeException:
java.io.IOException: Column number out of range: 6 -- ( )
at org.apache.pig.impl.eval.ProjectSpec.eval(ProjectSpec.java:95)
at org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35)
at org.apache.pig.impl.eval.EvalSpec.simpleEval(EvalSpec.java:216)
at org.apache.pig.impl.eval.cond.CompCond.eval(CompCond.java:58)
at org.apache.pig.impl.eval.FilterSpec$1.add(FilterSpec.java:58)
at
org.apache.pig.impl.mapreduceExec.PigMapReduce.run(PigMapReduce.java:113)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)
Caused by: java.io.IOException: Column number out of range: 6 -- (
)
at org.apache.pig.data.Tuple.getField(Tuple.java:147)
at org.apache.pig.impl.eval.ProjectSpec.eval(ProjectSpec.java:85)
... 7 more
2007-12-14 12:07:06,374 [main] ERROR org.apache.pig - Error message from task
(map) tip_200712121227_0004_m_000072
2007-12-14 12:07:06,374 [main] ERROR org.apache.pig - Error message from task
(map) tip_200712121227_0004_m_000073
2007-12-14 12:07:06,374 [main] ERROR org.apache.pig - Error message from task
(map) tip_200712121227_0004_m_000074
2007-12-14 12:07:06,374 [main] ERROR org.apache.pig - Error message from task
(map) tip_200712121227_0004_m_000075 java.lang.RuntimeException:
java.io.IOException: Column number out of range: 6 -- (full, 50)
at org.apache.pig.impl.eval.ProjectSpec.eval(ProjectSpec.java:95)
at org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35)
at org.apache.pig.impl.eval.EvalSpec.simpleEval(EvalSpec.java:216)
at org.apache.pig.impl.eval.cond.CompCond.eval(CompCond.java:58)
at org.apache.pig.impl.eval.FilterSpec$1.add(FilterSpec.java:58)
at
org.apache.pig.impl.mapreduceExec.PigMapReduce.run(PigMapReduce.java:113)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)
Caused by: java.io.IOException: Column number out of range: 6 -- (full, 50)
at org.apache.pig.data.Tuple.getField(Tuple.java:147)
at org.apache.pig.impl.eval.ProjectSpec.eval(ProjectSpec.java:85)
... 7 more
2007-12-14 12:07:06,374 [main] ERROR org.apache.pig - Error message from task
(map) tip_200712121227_0004_m_000076
2007-12-14 12:07:06,374 [main] ERROR org.apache.pig - Error message from task
(map) tip_200712121227_0004_m_000079
2007-12-14 12:07:06,375 [main] ERROR org.apache.pig - Error message from task
(reduce) tip_200712121227_0004_r_000000
2007-12-14 12:07:06,375 [main] ERROR org.apache.pig - Error message from task
(reduce) tip_200712121227_0004_r_000001
2007-12-14 12:07:06,375 [main] ERROR org.apache.pig - Error message from task
(reduce) tip_200712121227_0004_r_000002
2007-12-14 12:07:06,375 [main] ERROR org.apache.pig - Error message from task
(reduce) tip_200712121227_0004_r_000003
Job failed
grunt>
> Combiner gives wrong result in the presence of flattening
> ---------------------------------------------------------
>
> Key: PIG-51
> URL: https://issues.apache.org/jira/browse/PIG-51
> Project: Pig
> Issue Type: Bug
> Reporter: Utkarsh Srivastava
> Priority: Critical
> Attachments: combiner-flatten.patch
>
>
> If you do something like
> a = load ... as (f1,f2,f3);
> b = group a by (f1,f2);
> c = foreach b generate flatten(group), SUM(a.f3);
> The reduce side refers to field number expecting data will not have been
> flattened yet. But if the combiner kicks in, it already flattens the group,
> leading to column references being wrong.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.