[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions
[ https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314227#comment-14314227 ] Aihua Xu commented on HIVE-9228: Thanks for your contribution. Navis. > Problem with subquery using windowing functions > --- > > Key: HIVE-9228 > URL: https://issues.apache.org/jira/browse/HIVE-9228 > Project: Hive > Issue Type: Bug > Components: PTF-Windowing >Affects Versions: 0.14.0, 0.13.1, 1.0.0 >Reporter: Aihua Xu >Assignee: Navis > Fix For: 1.2.0 > > Attachments: HIVE-9228.1.patch.txt, HIVE-9228.2.patch.txt, > HIVE-9228.3.patch.txt, create_table_tab1.sql, tab1.csv > > Original Estimate: 96h > Remaining Estimate: 96h > > The following query with window functions failed. The internal query works > fine. > select col1, col2, col3 from (select col1,col2, col3, count(case when col4=1 > then 1 end ) over (partition by col1, col2) as col5, row_number() over > (partition by col1, col2 order by col4) as col6 from tab1) t; > HIVE generates an execution plan with 2 jobs. > 1. The first job is to basically calculate window function for col5. > 2. The second job is to calculate window function for col6 and output. > The plan says the first job outputs the columns (col1, col2, col3, col4) to a > tmp file since only these columns are used in later stage. While, the PTF > operator for the first job outputs (_wcol0, col1, col2, col3, col4) with > _wcol0 as the result of the window function even it's not used. > In the second job, the map operator still reads the 4 columns (col1, col2, > col3, col4) from the temp file using the plan. That causes the exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions
[ https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312928#comment-14312928 ] Ashutosh Chauhan commented on HIVE-9228: +1 > Problem with subquery using windowing functions > --- > > Key: HIVE-9228 > URL: https://issues.apache.org/jira/browse/HIVE-9228 > Project: Hive > Issue Type: Bug > Components: PTF-Windowing >Affects Versions: 0.13.1 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-9228.1.patch.txt, HIVE-9228.2.patch.txt, > HIVE-9228.3.patch.txt, create_table_tab1.sql, tab1.csv > > Original Estimate: 96h > Remaining Estimate: 96h > > The following query with window functions failed. The internal query works > fine. > select col1, col2, col3 from (select col1,col2, col3, count(case when col4=1 > then 1 end ) over (partition by col1, col2) as col5, row_number() over > (partition by col1, col2 order by col4) as col6 from tab1) t; > HIVE generates an execution plan with 2 jobs. > 1. The first job is to basically calculate window function for col5. > 2. The second job is to calculate window function for col6 and output. > The plan says the first job outputs the columns (col1, col2, col3, col4) to a > tmp file since only these columns are used in later stage. While, the PTF > operator for the first job outputs (_wcol0, col1, col2, col3, col4) with > _wcol0 as the result of the window function even it's not used. > In the second job, the map operator still reads the 4 columns (col1, col2, > col3, col4) from the temp file using the plan. That causes the exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions
[ https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312790#comment-14312790 ] Hive QA commented on HIVE-9228: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12697415/HIVE-9228.3.patch.txt {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7531 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_timestamp_funcs {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2724/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2724/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2724/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12697415 - PreCommit-HIVE-TRUNK-Build > Problem with subquery using windowing functions > --- > > Key: HIVE-9228 > URL: https://issues.apache.org/jira/browse/HIVE-9228 > Project: Hive > Issue Type: Bug > Components: PTF-Windowing >Affects Versions: 0.13.1 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-9228.1.patch.txt, HIVE-9228.2.patch.txt, > HIVE-9228.3.patch.txt, create_table_tab1.sql, tab1.csv > > Original Estimate: 96h > Remaining Estimate: 96h > > The following query with window functions failed. The internal query works > fine. > select col1, col2, col3 from (select col1,col2, col3, count(case when col4=1 > then 1 end ) over (partition by col1, col2) as col5, row_number() over > (partition by col1, col2 order by col4) as col6 from tab1) t; > HIVE generates an execution plan with 2 jobs. > 1. The first job is to basically calculate window function for col5. > 2. The second job is to calculate window function for col6 and output. > The plan says the first job outputs the columns (col1, col2, col3, col4) to a > tmp file since only these columns are used in later stage. While, the PTF > operator for the first job outputs (_wcol0, col1, col2, col3, col4) with > _wcol0 as the result of the window function even it's not used. > In the second job, the map operator still reads the 4 columns (col1, col2, > col3, col4) from the temp file using the plan. That causes the exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions
[ https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312646#comment-14312646 ] Aihua Xu commented on HIVE-9228: [~navis] Thanks for the update. I'm new to Hive and was not very comfortable with my change myself. Thanks for your breaking in. :) > Problem with subquery using windowing functions > --- > > Key: HIVE-9228 > URL: https://issues.apache.org/jira/browse/HIVE-9228 > Project: Hive > Issue Type: Bug > Components: PTF-Windowing >Affects Versions: 0.13.1 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-9228.1.patch.txt, HIVE-9228.2.patch.txt, > HIVE-9228.3.patch.txt, create_table_tab1.sql, tab1.csv > > Original Estimate: 96h > Remaining Estimate: 96h > > The following query with window functions failed. The internal query works > fine. > select col1, col2, col3 from (select col1,col2, col3, count(case when col4=1 > then 1 end ) over (partition by col1, col2) as col5, row_number() over > (partition by col1, col2 order by col4) as col6 from tab1) t; > HIVE generates an execution plan with 2 jobs. > 1. The first job is to basically calculate window function for col5. > 2. The second job is to calculate window function for col6 and output. > The plan says the first job outputs the columns (col1, col2, col3, col4) to a > tmp file since only these columns are used in later stage. While, the PTF > operator for the first job outputs (_wcol0, col1, col2, col3, col4) with > _wcol0 as the result of the window function even it's not used. > In the second job, the map operator still reads the 4 columns (col1, col2, > col3, col4) from the temp file using the plan. That causes the exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions
[ https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14311850#comment-14311850 ] Navis commented on HIVE-9228: - [~aihuaxu] Sorry for my breaking in on this issue. I've been working on codes around CP for other issues and not wanted others waste time to understand complicated PTF operation. I think the fix is almost done. Sorry again. > Problem with subquery using windowing functions > --- > > Key: HIVE-9228 > URL: https://issues.apache.org/jira/browse/HIVE-9228 > Project: Hive > Issue Type: Bug > Components: PTF-Windowing >Affects Versions: 0.13.1 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-9228.1.patch.txt, HIVE-9228.2.patch.txt, > create_table_tab1.sql, tab1.csv > > Original Estimate: 96h > Remaining Estimate: 96h > > The following query with window functions failed. The internal query works > fine. > select col1, col2, col3 from (select col1,col2, col3, count(case when col4=1 > then 1 end ) over (partition by col1, col2) as col5, row_number() over > (partition by col1, col2 order by col4) as col6 from tab1) t; > HIVE generates an execution plan with 2 jobs. > 1. The first job is to basically calculate window function for col5. > 2. The second job is to calculate window function for col6 and output. > The plan says the first job outputs the columns (col1, col2, col3, col4) to a > tmp file since only these columns are used in later stage. While, the PTF > operator for the first job outputs (_wcol0, col1, col2, col3, col4) with > _wcol0 as the result of the window function even it's not used. > In the second job, the map operator still reads the 4 columns (col1, col2, > col3, col4) from the temp file using the plan. That causes the exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions
[ https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295254#comment-14295254 ] Aihua Xu commented on HIVE-9228: [~navis] Since you are working on this issue and have much better knowledge on windowing functions than me, do you want me to assign the issue to you? > Problem with subquery using windowing functions > --- > > Key: HIVE-9228 > URL: https://issues.apache.org/jira/browse/HIVE-9228 > Project: Hive > Issue Type: Bug > Components: PTF-Windowing >Affects Versions: 0.13.1 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-9228.1.patch.txt, HIVE-9228.2.patch.txt, > create_table_tab1.sql, tab1.csv > > Original Estimate: 96h > Remaining Estimate: 96h > > The following query with window functions failed. The internal query works > fine. > select col1, col2, col3 from (select col1,col2, col3, count(case when col4=1 > then 1 end ) over (partition by col1, col2) as col5, row_number() over > (partition by col1, col2 order by col4) as col6 from tab1) t; > HIVE generates an execution plan with 2 jobs. > 1. The first job is to basically calculate window function for col5. > 2. The second job is to calculate window function for col6 and output. > The plan says the first job outputs the columns (col1, col2, col3, col4) to a > tmp file since only these columns are used in later stage. While, the PTF > operator for the first job outputs (_wcol0, col1, col2, col3, col4) with > _wcol0 as the result of the window function even it's not used. > In the second job, the map operator still reads the 4 columns (col1, col2, > col3, col4) from the temp file using the plan. That causes the exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions
[ https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294710#comment-14294710 ] Hive QA commented on HIVE-9228: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12694887/HIVE-9228.2.patch.txt {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 7401 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_ptf org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ptf org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2541/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2541/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2541/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12694887 - PreCommit-HIVE-TRUNK-Build > Problem with subquery using windowing functions > --- > > Key: HIVE-9228 > URL: https://issues.apache.org/jira/browse/HIVE-9228 > Project: Hive > Issue Type: Bug > Components: PTF-Windowing >Affects Versions: 0.13.1 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-9228.1.patch.txt, HIVE-9228.2.patch.txt, > create_table_tab1.sql, tab1.csv > > Original Estimate: 96h > Remaining Estimate: 96h > > The following query with window functions failed. The internal query works > fine. > select col1, col2, col3 from (select col1,col2, col3, count(case when col4=1 > then 1 end ) over (partition by col1, col2) as col5, row_number() over > (partition by col1, col2 order by col4) as col6 from tab1) t; > HIVE generates an execution plan with 2 jobs. > 1. The first job is to basically calculate window function for col5. > 2. The second job is to calculate window function for col6 and output. > The plan says the first job outputs the columns (col1, col2, col3, col4) to a > tmp file since only these columns are used in later stage. While, the PTF > operator for the first job outputs (_wcol0, col1, col2, col3, col4) with > _wcol0 as the result of the window function even it's not used. > In the second job, the map operator still reads the 4 columns (col1, col2, > col3, col4) from the temp file using the plan. That causes the exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions
[ https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294595#comment-14294595 ] Navis commented on HIVE-9228: - Yes, when PTF column is not selected, we should prune the function itself in PTF operator. But I thought it's trivial case not to select the column which was calculated with heavy cost. And select operator would be removed by IdentityProjectRemover if it's not needed. By the way, could you review HIVE-9138 first? It's hard to debug something on PTF without any explain result. > Problem with subquery using windowing functions > --- > > Key: HIVE-9228 > URL: https://issues.apache.org/jira/browse/HIVE-9228 > Project: Hive > Issue Type: Bug > Components: PTF-Windowing >Affects Versions: 0.13.1 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-9228.1.patch.txt, HIVE-9228.2.patch.txt, > create_table_tab1.sql, tab1.csv > > Original Estimate: 96h > Remaining Estimate: 96h > > The following query with window functions failed. The internal query works > fine. > select col1, col2, col3 from (select col1,col2, col3, count(case when col4=1 > then 1 end ) over (partition by col1, col2) as col5, row_number() over > (partition by col1, col2 order by col4) as col6 from tab1) t; > HIVE generates an execution plan with 2 jobs. > 1. The first job is to basically calculate window function for col5. > 2. The second job is to calculate window function for col6 and output. > The plan says the first job outputs the columns (col1, col2, col3, col4) to a > tmp file since only these columns are used in later stage. While, the PTF > operator for the first job outputs (_wcol0, col1, col2, col3, col4) with > _wcol0 as the result of the window function even it's not used. > In the second job, the map operator still reads the 4 columns (col1, col2, > col3, col4) from the temp file using the plan. That causes the exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions
[ https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294585#comment-14294585 ] Ashutosh Chauhan commented on HIVE-9228: Wondering if it is possible to fix this without adding a redundant SelectOp. SelectOp adds latency, so it will be good to avoid this. Adding operators just so we can plan properly and not falter later on in optimization doesnt seem like a good design choice. > Problem with subquery using windowing functions > --- > > Key: HIVE-9228 > URL: https://issues.apache.org/jira/browse/HIVE-9228 > Project: Hive > Issue Type: Bug > Components: PTF-Windowing >Affects Versions: 0.13.1 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-9228.1.patch.txt, HIVE-9228.2.patch.txt, > create_table_tab1.sql, tab1.csv > > Original Estimate: 96h > Remaining Estimate: 96h > > The following query with window functions failed. The internal query works > fine. > select col1, col2, col3 from (select col1,col2, col3, count(case when col4=1 > then 1 end ) over (partition by col1, col2) as col5, row_number() over > (partition by col1, col2 order by col4) as col6 from tab1) t; > HIVE generates an execution plan with 2 jobs. > 1. The first job is to basically calculate window function for col5. > 2. The second job is to calculate window function for col6 and output. > The plan says the first job outputs the columns (col1, col2, col3, col4) to a > tmp file since only these columns are used in later stage. While, the PTF > operator for the first job outputs (_wcol0, col1, col2, col3, col4) with > _wcol0 as the result of the window function even it's not used. > In the second job, the map operator still reads the 4 columns (col1, col2, > col3, col4) from the temp file using the plan. That causes the exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions
[ https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289997#comment-14289997 ] Hive QA commented on HIVE-9228: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12694092/HIVE-9228.1.patch.txt {color:red}ERROR:{color} -1 due to 68 failed/errored test(s), 7347 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_excludeHadoop20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_multi org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_simple_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_subq_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_column_access_stats org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_logical org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadataOnlyOptimizer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadataonly1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_gby2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join_filter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_outer_join5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_union_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_vc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rcfile_union org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_25 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in_having org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin_having org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_views org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_table_access_keys_stats org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union24 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union28 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union30 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_null org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_6_subq org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_mapjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_mapjoin_reduce org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_ptf org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_simple_select org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_in org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_filter_join_breaktask org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_filter_join_breaktask2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mrr org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_in org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_mapjoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_dynam
[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions
[ https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289285#comment-14289285 ] Aihua Xu commented on HIVE-9228: Thanks, [~navis]. Will you make a code review for this patch as well? > Problem with subquery using windowing functions > --- > > Key: HIVE-9228 > URL: https://issues.apache.org/jira/browse/HIVE-9228 > Project: Hive > Issue Type: Bug > Components: PTF-Windowing >Affects Versions: 0.13.1 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-9228.1.patch.txt, create_table_tab1.sql, tab1.csv > > Original Estimate: 96h > Remaining Estimate: 96h > > The following query with window functions failed. The internal query works > fine. > select col1, col2, col3 from (select col1,col2, col3, count(case when col4=1 > then 1 end ) over (partition by col1, col2) as col5, row_number() over > (partition by col1, col2 order by col4) as col6 from tab1) t; > HIVE generates an execution plan with 2 jobs. > 1. The first job is to basically calculate window function for col5. > 2. The second job is to calculate window function for col6 and output. > The plan says the first job outputs the columns (col1, col2, col3, col4) to a > tmp file since only these columns are used in later stage. While, the PTF > operator for the first job outputs (_wcol0, col1, col2, col3, col4) with > _wcol0 as the result of the window function even it's not used. > In the second job, the map operator still reads the 4 columns (col1, col2, > col3, col4) from the temp file using the plan. That causes the exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions
[ https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279016#comment-14279016 ] Aihua Xu commented on HIVE-9228: [~rhbutani] Do you have comments on this issue? > Problem with subquery using windowing functions > --- > > Key: HIVE-9228 > URL: https://issues.apache.org/jira/browse/HIVE-9228 > Project: Hive > Issue Type: Bug > Components: PTF-Windowing >Affects Versions: 0.13.1 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: create_table_tab1.sql, tab1.csv > > Original Estimate: 96h > Remaining Estimate: 96h > > The following query with window functions failed. The internal query works > fine. > select col1, col2, col3 from (select col1,col2, col3, count(case when col4=1 > then 1 end ) over (partition by col1, col2) as col5, row_number() over > (partition by col1, col2 order by col4) as col6 from tab1) t; > HIVE generates an execution plan with 2 jobs. > 1. The first job is to basically calculate window function for col5. > 2. The second job is to calculate window function for col6 and output. > The plan says the first job outputs the columns (col1, col2, col3, col4) to a > tmp file since only these columns are used in later stage. While, the PTF > operator for the first job outputs (_wcol0, col1, col2, col3, col4) with > _wcol0 as the result of the window function even it's not used. > In the second job, the map operator still reads the 4 columns (col1, col2, > col3, col4) from the temp file using the plan. That causes the exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions
[ https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264639#comment-14264639 ] Aihua Xu commented on HIVE-9228: [~ashutoshc] Are you working on this area, any idea? The following window query throws ArrayOutOfBoundExcption. select st_fips_cd, zip_cd_5, hh_surr_key from ( select st_fips_cd, zip_cd_5, hh_surr_key, count( case when advtg_len_rsdnc_cd = '1' then 1 end ) over (partition by st_fips_cd, zip_cd_5) as CNT_ADVTG_LEN_RSDNC_CD_1, row_number() over (partition by st_fips_cd, zip_cd_5 order by hh_surr_key asc) as analytic_row_number3 from hh_agg where analytic_row_number2 = 1 ) t; Here is the explain extend output for the query. At the File Sink Operator of stage 1 below, seems like it should only output 4 columns while the temp table in fact output one additional column (the value of CNT_ADVTG_LEN_RSDNC_CD_1). I’m investigating toward such mismatch, but anyone can confirm and provide additional info that will be helpful. STAGE DEPENDENCIES: Stage-1 is a root stage Stage-2 depends on stages: Stage-1 Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: hh_agg Statistics: Num rows: 33208 Data size: 10361206 Basic stats: COMPLETE Column stats: NONE GatherStats: false Filter Operator isSamplingPred: false predicate: (analytic_row_number2 = 1) (type: boolean) Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: st_fips_cd (type: string), zip_cd_5 (type: string), st_fips_cd (type: string), zip_cd_5 (type: string) sort order: Map-reduce partition columns: st_fips_cd (type: string), zip_cd_5 (type: string) Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column stats: NONE tag: -1 value expressions: st_fips_cd (type: string), zip_cd_5 (type: string), hh_surr_key (type: bigint), advtg_len_rsdnc_cd (type: string) Path -> Alias: file:/Users/axu/Documents/localDB/23982_debug [t:hh_agg] Path -> Partition: file:/Users/axu/Documents/localDB/23982_debug Partition base file name: 23982_debug input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat properties: COLUMN_STATS_ACCURATE true EXTERNAL TRUE bucket_count -1 columns st_fips_cd,zip_cd_5,hh_surr_key,nbr_hh_in_zip,nbr_nr_adults_in_hh,hh_pop,advtg_len_rsdnc_cd,advtg_home_ownr_cd,dsf_season_cd,advtg_hh_edu_cd,advtg_hh_occupn_cd,advtg_child_presnc_cd,advtg_hh_age_cd,zip_avg_age,zip_mdn_age,mail_rspns_buy_cd,cnt_gend_cd_1,cnt_gend_cd_2,cnt_gend_cd_3,cnt_gend_cd_unk,cnt_advtg_marital_stat_cd_1,cnt_advtg_marital_stat_cd_2,cnt_advtg_marital_stat_cd_unk,cnt_nbr_tradeline_0,cnt_nbr_tradeline_1,cnt_nbr_tradeline_2,cnt_nbr_tradeline_3,cnt_nbr_tradeline_4,cnt_nbr_tradeline_5,cnt_nbr_tradeline_6,cnt_nbr_tradeline_7,cnt_nbr_tradeline_8,cnt_nbr_tradeline_9,cnt_nbr_tradeline_unk,advtg_dwell_type_cd,prprty_mkt_val_cd,zip_avg_prprty_mkt_val,zip_mdn_prprty_mkt_val,zip_avg_home_eqty_amt,zip_mdn_home_eqty_amt,trgt_inc_cd,zip_avg_trgt_inc_narrow_band,zip_mdn_trgt_inc_narrow_band,zip_avg_inc_prodc_asset_cd,zip_mdn_inc_prodc_asset_cd,zip_avg_net_wrth_cd,zip_mdn_net_wrth_cd,rylty_trgt_mktg_val_scr_cd,analytic_row_number2 columns.comments columns.types string:string:bigint:bigint:bigint:tinyint:string:string:string:string:string:string:string:double:double:int:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:string:string:double:double:double:double:string:double:double:double:double:double:double:string:int field.delim , file.inputformat org.apache.hadoop.mapred.TextInputFormat file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat location file:/Users/axu/Documents/localDB/23982_debug name default.hh_agg numFiles 0 numRows 0 rawDataSize 0 serialization.ddl struct hh_agg { string st_fips_cd, string zip_cd_5, i64 hh_surr_key, i64 nbr_hh_in_zip, i64 nbr_nr_adults_in_hh, byte hh_pop, string advtg_len_rsdnc_cd, string advtg_home_ownr_cd, string dsf_season_cd, string advtg_hh_edu_cd, string advtg_hh_occupn_cd, string advtg_child_presnc_cd, string advtg_hh_age_cd, double zip_avg_age, double zip_mdn_age, i32 mail_rspns_buy_cd, i64 cnt_gend_cd_1, i64 cnt_gend_cd_2, i64 cnt_gend_cd_3, i64 cnt_gend_cd_unk, i64 cnt_advt