[jira] [Updated] (PIG-5033) MultiQueryOptimizerTez creates bad plan with union, split and FRJoin
[ https://issues.apache.org/jira/browse/PIG-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-5033: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 0.16.1 0.17.0 Status: Resolved (was: Patch Available) Committed to branch-0.16 and trunk. Thanks [~tmwoodruff] for tracking down the issue and the patch. Thanks Daniel for the review. > MultiQueryOptimizerTez creates bad plan with union, split and FRJoin > > > Key: PIG-5033 > URL: https://issues.apache.org/jira/browse/PIG-5033 > Project: Pig > Issue Type: Bug > Components: tez >Affects Versions: 0.16.0 >Reporter: Travis Woodruff >Assignee: Rohini Palaniswamy > Fix For: 0.17.0, 0.16.1 > > Attachments: PIG-5033-2.patch, PIG-5033.patch, input1, input2, input3 > > > This script produces incorrect results: > {code} > a = load 'file:///tmp/input1' as (x:int, y:int); > b = load 'file:///tmp/input2' as (x:int, y:int); > u = union a,b; > c = load 'file:///tmp/input3' as (x:int, y:int); > e = filter c by y > 3; > f = filter c by y < 2; > g = join u by x left, e by x using 'replicated'; > h = join g by u::x left, f by x using 'replicated'; > store h into 'file:///tmp/pigoutput'; > {code} > Without the union, or with opt.multiquery=false, or with non-replicated > joins, it works as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-5033) MultiQueryOptimizerTez creates bad plan with union, split and FRJoin
[ https://issues.apache.org/jira/browse/PIG-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-5033: Attachment: PIG-5033-2.patch [~tmwoodruff], Sorry. Had to take over the patch as it required little more work. Hope that is fine with you. [~daijy], Changes done - Inside of the union block, was not checking if the output from predecessor was to a scalar or replicate join. It always assumed the input went to POShuffleValueInputTez. - Changed TezCompilerUtil.isNonPackageInput to return true only for POFRJoinTez. We only care about scalars and replicate join and it was returning true for POShuffleValueInputTez (union) as well. > MultiQueryOptimizerTez creates bad plan with union, split and FRJoin > > > Key: PIG-5033 > URL: https://issues.apache.org/jira/browse/PIG-5033 > Project: Pig > Issue Type: Bug > Components: tez >Affects Versions: 0.16.0 >Reporter: Travis Woodruff > Attachments: PIG-5033-2.patch, PIG-5033.patch, input1, input2, input3 > > > This script produces incorrect results: > {code} > a = load 'file:///tmp/input1' as (x:int, y:int); > b = load 'file:///tmp/input2' as (x:int, y:int); > u = union a,b; > c = load 'file:///tmp/input3' as (x:int, y:int); > e = filter c by y > 3; > f = filter c by y < 2; > g = join u by x left, e by x using 'replicated'; > h = join g by u::x left, f by x using 'replicated'; > store h into 'file:///tmp/pigoutput'; > {code} > Without the union, or with opt.multiquery=false, or with non-replicated > joins, it works as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-5033) MultiQueryOptimizerTez creates bad plan with union, split and FRJoin
[ https://issues.apache.org/jira/browse/PIG-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Travis Woodruff updated PIG-5033: - Status: Patch Available (was: Open) Here's an attempt at a fix. I have no confidence that this is the best (or even right) way to fix. > MultiQueryOptimizerTez creates bad plan with union, split and FRJoin > > > Key: PIG-5033 > URL: https://issues.apache.org/jira/browse/PIG-5033 > Project: Pig > Issue Type: Bug > Components: tez >Affects Versions: 0.16.0 >Reporter: Travis Woodruff > Attachments: PIG-5033.patch, input1, input2, input3 > > > This script produces incorrect results: > {code} > a = load 'file:///tmp/input1' as (x:int, y:int); > b = load 'file:///tmp/input2' as (x:int, y:int); > u = union a,b; > c = load 'file:///tmp/input3' as (x:int, y:int); > e = filter c by y > 3; > f = filter c by y < 2; > g = join u by x left, e by x using 'replicated'; > h = join g by u::x left, f by x using 'replicated'; > store h into 'file:///tmp/pigoutput'; > {code} > Without the union, or with opt.multiquery=false, or with non-replicated > joins, it works as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-5033) MultiQueryOptimizerTez creates bad plan with union, split and FRJoin
[ https://issues.apache.org/jira/browse/PIG-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Travis Woodruff updated PIG-5033: - Attachment: PIG-5033.patch > MultiQueryOptimizerTez creates bad plan with union, split and FRJoin > > > Key: PIG-5033 > URL: https://issues.apache.org/jira/browse/PIG-5033 > Project: Pig > Issue Type: Bug > Components: tez >Affects Versions: 0.16.0 >Reporter: Travis Woodruff > Attachments: PIG-5033.patch, input1, input2, input3 > > > This script produces incorrect results: > {code} > a = load 'file:///tmp/input1' as (x:int, y:int); > b = load 'file:///tmp/input2' as (x:int, y:int); > u = union a,b; > c = load 'file:///tmp/input3' as (x:int, y:int); > e = filter c by y > 3; > f = filter c by y < 2; > g = join u by x left, e by x using 'replicated'; > h = join g by u::x left, f by x using 'replicated'; > store h into 'file:///tmp/pigoutput'; > {code} > Without the union, or with opt.multiquery=false, or with non-replicated > joins, it works as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-5033) MultiQueryOptimizerTez creates bad plan with union, split and FRJoin
[ https://issues.apache.org/jira/browse/PIG-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Travis Woodruff updated PIG-5033: - Attachment: input3 input2 input1 > MultiQueryOptimizerTez creates bad plan with union, split and FRJoin > > > Key: PIG-5033 > URL: https://issues.apache.org/jira/browse/PIG-5033 > Project: Pig > Issue Type: Bug > Components: tez >Affects Versions: 0.16.0 >Reporter: Travis Woodruff > Attachments: input1, input2, input3 > > > This script produces incorrect results: > {code} > a = load 'file:///tmp/input1' as (x:int, y:int); > b = load 'file:///tmp/input2' as (x:int, y:int); > u = union a,b; > c = load 'file:///tmp/input3' as (x:int, y:int); > e = filter c by y > 3; > f = filter c by y < 2; > g = join u by x left, e by x using 'replicated'; > h = join g by u::x left, f by x using 'replicated'; > store h into 'file:///tmp/pigoutput'; > {code} > Without the union, or with opt.multiquery=false, or with non-replicated > joins, it works as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-5033) MultiQueryOptimizerTez creates bad plan with union, split and FRJoin
[ https://issues.apache.org/jira/browse/PIG-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Travis Woodruff updated PIG-5033: - Description: This script produces incorrect results: {code} a = load 'file:///tmp/input1' as (x:int, y:int); b = load 'file:///tmp/input2' as (x:int, y:int); u = union a,b; c = load 'file:///tmp/input3' as (x:int, y:int); e = filter c by y > 3; f = filter c by y < 2; g = join u by x left, e by x using 'replicated'; h = join g by u::x left, f by x using 'replicated'; store h into 'file:///tmp/pigoutput'; {code} Without the union, or with opt.multiquery=false, or with non-replicated joins, it works as expected. was: This script produces incorrect results: {code} a = load 'file:///tmp/input1' as (x:int, y:int); b = load 'file:///tmp/input1' as (x:int, y:int); u = union a,b; c = load 'file:///tmp/input3' as (x:int, y:int); e = filter c by y > 3; f = filter c by y < 2; g = join u by x left, e by x using 'replicated'; h = join g by u::x left, f by x using 'replicated'; store h into 'file:///tmp/pigoutput'; {code} Without the union, or with opt.multiquery=false, or with non-replicated joins, it works as expected. > MultiQueryOptimizerTez creates bad plan with union, split and FRJoin > > > Key: PIG-5033 > URL: https://issues.apache.org/jira/browse/PIG-5033 > Project: Pig > Issue Type: Bug > Components: tez >Affects Versions: 0.16.0 >Reporter: Travis Woodruff > > This script produces incorrect results: > {code} > a = load 'file:///tmp/input1' as (x:int, y:int); > b = load 'file:///tmp/input2' as (x:int, y:int); > u = union a,b; > c = load 'file:///tmp/input3' as (x:int, y:int); > e = filter c by y > 3; > f = filter c by y < 2; > g = join u by x left, e by x using 'replicated'; > h = join g by u::x left, f by x using 'replicated'; > store h into 'file:///tmp/pigoutput'; > {code} > Without the union, or with opt.multiquery=false, or with non-replicated > joins, it works as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)