[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions

2015-02-10 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314227#comment-14314227
 ] 

Aihua Xu commented on HIVE-9228:


Thanks for your contribution. Navis.

> Problem with subquery using windowing functions
> ---
>
> Key: HIVE-9228
> URL: https://issues.apache.org/jira/browse/HIVE-9228
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.14.0, 0.13.1, 1.0.0
>Reporter: Aihua Xu
>Assignee: Navis
> Fix For: 1.2.0
>
> Attachments: HIVE-9228.1.patch.txt, HIVE-9228.2.patch.txt, 
> HIVE-9228.3.patch.txt, create_table_tab1.sql, tab1.csv
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> The following query with window functions failed. The internal query works 
> fine.
> select col1, col2, col3 from (select col1,col2, col3, count(case when col4=1 
> then 1 end ) over (partition by col1, col2) as col5, row_number() over 
> (partition by col1, col2 order by col4) as col6 from tab1) t;
> HIVE generates an execution plan with 2 jobs. 
> 1. The first job is to basically calculate window function for col5.  
> 2. The second job is to calculate window function for col6 and output.
> The plan says the first job outputs the columns (col1, col2, col3, col4) to a 
> tmp file since only these columns are used in later stage. While, the PTF 
> operator for the first job outputs (_wcol0, col1, col2, col3, col4) with 
> _wcol0 as the result of the window function even it's not used. 
> In the second job, the map operator still reads the 4 columns (col1, col2, 
> col3, col4) from the temp file using the plan. That causes the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions

2015-02-09 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312928#comment-14312928
 ] 

Ashutosh Chauhan commented on HIVE-9228:


+1

> Problem with subquery using windowing functions
> ---
>
> Key: HIVE-9228
> URL: https://issues.apache.org/jira/browse/HIVE-9228
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.13.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-9228.1.patch.txt, HIVE-9228.2.patch.txt, 
> HIVE-9228.3.patch.txt, create_table_tab1.sql, tab1.csv
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> The following query with window functions failed. The internal query works 
> fine.
> select col1, col2, col3 from (select col1,col2, col3, count(case when col4=1 
> then 1 end ) over (partition by col1, col2) as col5, row_number() over 
> (partition by col1, col2 order by col4) as col6 from tab1) t;
> HIVE generates an execution plan with 2 jobs. 
> 1. The first job is to basically calculate window function for col5.  
> 2. The second job is to calculate window function for col6 and output.
> The plan says the first job outputs the columns (col1, col2, col3, col4) to a 
> tmp file since only these columns are used in later stage. While, the PTF 
> operator for the first job outputs (_wcol0, col1, col2, col3, col4) with 
> _wcol0 as the result of the window function even it's not used. 
> In the second job, the map operator still reads the 4 columns (col1, col2, 
> col3, col4) from the temp file using the plan. That causes the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions

2015-02-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312790#comment-14312790
 ] 

Hive QA commented on HIVE-9228:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12697415/HIVE-9228.3.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7531 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_timestamp_funcs
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2724/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2724/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2724/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12697415 - PreCommit-HIVE-TRUNK-Build

> Problem with subquery using windowing functions
> ---
>
> Key: HIVE-9228
> URL: https://issues.apache.org/jira/browse/HIVE-9228
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.13.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-9228.1.patch.txt, HIVE-9228.2.patch.txt, 
> HIVE-9228.3.patch.txt, create_table_tab1.sql, tab1.csv
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> The following query with window functions failed. The internal query works 
> fine.
> select col1, col2, col3 from (select col1,col2, col3, count(case when col4=1 
> then 1 end ) over (partition by col1, col2) as col5, row_number() over 
> (partition by col1, col2 order by col4) as col6 from tab1) t;
> HIVE generates an execution plan with 2 jobs. 
> 1. The first job is to basically calculate window function for col5.  
> 2. The second job is to calculate window function for col6 and output.
> The plan says the first job outputs the columns (col1, col2, col3, col4) to a 
> tmp file since only these columns are used in later stage. While, the PTF 
> operator for the first job outputs (_wcol0, col1, col2, col3, col4) with 
> _wcol0 as the result of the window function even it's not used. 
> In the second job, the map operator still reads the 4 columns (col1, col2, 
> col3, col4) from the temp file using the plan. That causes the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions

2015-02-09 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312646#comment-14312646
 ] 

Aihua Xu commented on HIVE-9228:


 [~navis] Thanks for the update. I'm new to Hive and was not very comfortable 
with my change myself. Thanks for your breaking in. :)

> Problem with subquery using windowing functions
> ---
>
> Key: HIVE-9228
> URL: https://issues.apache.org/jira/browse/HIVE-9228
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.13.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-9228.1.patch.txt, HIVE-9228.2.patch.txt, 
> HIVE-9228.3.patch.txt, create_table_tab1.sql, tab1.csv
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> The following query with window functions failed. The internal query works 
> fine.
> select col1, col2, col3 from (select col1,col2, col3, count(case when col4=1 
> then 1 end ) over (partition by col1, col2) as col5, row_number() over 
> (partition by col1, col2 order by col4) as col6 from tab1) t;
> HIVE generates an execution plan with 2 jobs. 
> 1. The first job is to basically calculate window function for col5.  
> 2. The second job is to calculate window function for col6 and output.
> The plan says the first job outputs the columns (col1, col2, col3, col4) to a 
> tmp file since only these columns are used in later stage. While, the PTF 
> operator for the first job outputs (_wcol0, col1, col2, col3, col4) with 
> _wcol0 as the result of the window function even it's not used. 
> In the second job, the map operator still reads the 4 columns (col1, col2, 
> col3, col4) from the temp file using the plan. That causes the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions

2015-02-08 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14311850#comment-14311850
 ] 

Navis commented on HIVE-9228:
-

[~aihuaxu] Sorry for my breaking in on this issue. I've been working on codes 
around CP for other issues and not wanted others waste time to understand 
complicated PTF operation. I think the fix is almost done. Sorry again.

> Problem with subquery using windowing functions
> ---
>
> Key: HIVE-9228
> URL: https://issues.apache.org/jira/browse/HIVE-9228
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.13.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-9228.1.patch.txt, HIVE-9228.2.patch.txt, 
> create_table_tab1.sql, tab1.csv
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> The following query with window functions failed. The internal query works 
> fine.
> select col1, col2, col3 from (select col1,col2, col3, count(case when col4=1 
> then 1 end ) over (partition by col1, col2) as col5, row_number() over 
> (partition by col1, col2 order by col4) as col6 from tab1) t;
> HIVE generates an execution plan with 2 jobs. 
> 1. The first job is to basically calculate window function for col5.  
> 2. The second job is to calculate window function for col6 and output.
> The plan says the first job outputs the columns (col1, col2, col3, col4) to a 
> tmp file since only these columns are used in later stage. While, the PTF 
> operator for the first job outputs (_wcol0, col1, col2, col3, col4) with 
> _wcol0 as the result of the window function even it's not used. 
> In the second job, the map operator still reads the 4 columns (col1, col2, 
> col3, col4) from the temp file using the plan. That causes the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions

2015-01-28 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295254#comment-14295254
 ] 

Aihua Xu commented on HIVE-9228:


[~navis] Since you are working on this issue and have much better knowledge on 
windowing functions than me, do you want me to assign the issue to you? 

> Problem with subquery using windowing functions
> ---
>
> Key: HIVE-9228
> URL: https://issues.apache.org/jira/browse/HIVE-9228
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.13.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-9228.1.patch.txt, HIVE-9228.2.patch.txt, 
> create_table_tab1.sql, tab1.csv
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> The following query with window functions failed. The internal query works 
> fine.
> select col1, col2, col3 from (select col1,col2, col3, count(case when col4=1 
> then 1 end ) over (partition by col1, col2) as col5, row_number() over 
> (partition by col1, col2 order by col4) as col6 from tab1) t;
> HIVE generates an execution plan with 2 jobs. 
> 1. The first job is to basically calculate window function for col5.  
> 2. The second job is to calculate window function for col6 and output.
> The plan says the first job outputs the columns (col1, col2, col3, col4) to a 
> tmp file since only these columns are used in later stage. While, the PTF 
> operator for the first job outputs (_wcol0, col1, col2, col3, col4) with 
> _wcol0 as the result of the window function even it's not used. 
> In the second job, the map operator still reads the 4 columns (col1, col2, 
> col3, col4) from the temp file using the plan. That causes the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions

2015-01-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294710#comment-14294710
 ] 

Hive QA commented on HIVE-9228:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12694887/HIVE-9228.2.patch.txt

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 7401 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_ptf
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ptf
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2541/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2541/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2541/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12694887 - PreCommit-HIVE-TRUNK-Build

> Problem with subquery using windowing functions
> ---
>
> Key: HIVE-9228
> URL: https://issues.apache.org/jira/browse/HIVE-9228
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.13.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-9228.1.patch.txt, HIVE-9228.2.patch.txt, 
> create_table_tab1.sql, tab1.csv
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> The following query with window functions failed. The internal query works 
> fine.
> select col1, col2, col3 from (select col1,col2, col3, count(case when col4=1 
> then 1 end ) over (partition by col1, col2) as col5, row_number() over 
> (partition by col1, col2 order by col4) as col6 from tab1) t;
> HIVE generates an execution plan with 2 jobs. 
> 1. The first job is to basically calculate window function for col5.  
> 2. The second job is to calculate window function for col6 and output.
> The plan says the first job outputs the columns (col1, col2, col3, col4) to a 
> tmp file since only these columns are used in later stage. While, the PTF 
> operator for the first job outputs (_wcol0, col1, col2, col3, col4) with 
> _wcol0 as the result of the window function even it's not used. 
> In the second job, the map operator still reads the 4 columns (col1, col2, 
> col3, col4) from the temp file using the plan. That causes the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions

2015-01-27 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294595#comment-14294595
 ] 

Navis commented on HIVE-9228:
-

Yes, when PTF column is not selected, we should prune the function itself in 
PTF operator. But I thought it's trivial case not to select the column which 
was calculated with heavy cost. And select operator would be removed by 
IdentityProjectRemover if it's not needed. 
By the way, could you review HIVE-9138 first? It's hard to debug something on 
PTF without any explain result.

> Problem with subquery using windowing functions
> ---
>
> Key: HIVE-9228
> URL: https://issues.apache.org/jira/browse/HIVE-9228
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.13.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-9228.1.patch.txt, HIVE-9228.2.patch.txt, 
> create_table_tab1.sql, tab1.csv
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> The following query with window functions failed. The internal query works 
> fine.
> select col1, col2, col3 from (select col1,col2, col3, count(case when col4=1 
> then 1 end ) over (partition by col1, col2) as col5, row_number() over 
> (partition by col1, col2 order by col4) as col6 from tab1) t;
> HIVE generates an execution plan with 2 jobs. 
> 1. The first job is to basically calculate window function for col5.  
> 2. The second job is to calculate window function for col6 and output.
> The plan says the first job outputs the columns (col1, col2, col3, col4) to a 
> tmp file since only these columns are used in later stage. While, the PTF 
> operator for the first job outputs (_wcol0, col1, col2, col3, col4) with 
> _wcol0 as the result of the window function even it's not used. 
> In the second job, the map operator still reads the 4 columns (col1, col2, 
> col3, col4) from the temp file using the plan. That causes the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions

2015-01-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294585#comment-14294585
 ] 

Ashutosh Chauhan commented on HIVE-9228:


Wondering if it is possible to fix this without adding a redundant SelectOp. 
SelectOp adds latency, so it will be good to avoid this. Adding operators just 
so we can plan properly and not falter later on in optimization doesnt seem 
like a good design choice.

> Problem with subquery using windowing functions
> ---
>
> Key: HIVE-9228
> URL: https://issues.apache.org/jira/browse/HIVE-9228
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.13.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-9228.1.patch.txt, HIVE-9228.2.patch.txt, 
> create_table_tab1.sql, tab1.csv
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> The following query with window functions failed. The internal query works 
> fine.
> select col1, col2, col3 from (select col1,col2, col3, count(case when col4=1 
> then 1 end ) over (partition by col1, col2) as col5, row_number() over 
> (partition by col1, col2 order by col4) as col6 from tab1) t;
> HIVE generates an execution plan with 2 jobs. 
> 1. The first job is to basically calculate window function for col5.  
> 2. The second job is to calculate window function for col6 and output.
> The plan says the first job outputs the columns (col1, col2, col3, col4) to a 
> tmp file since only these columns are used in later stage. While, the PTF 
> operator for the first job outputs (_wcol0, col1, col2, col3, col4) with 
> _wcol0 as the result of the window function even it's not used. 
> In the second job, the map operator still reads the 4 columns (col1, col2, 
> col3, col4) from the temp file using the plan. That causes the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions

2015-01-23 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289997#comment-14289997
 ] 

Hive QA commented on HIVE-9228:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12694092/HIVE-9228.1.patch.txt

{color:red}ERROR:{color} -1 due to 68 failed/errored test(s), 7347 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_excludeHadoop20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_multi
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_simple_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_subq_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_column_access_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_logical
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadataOnlyOptimizer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadataonly1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_gby2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_outer_join5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_union_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_vc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rcfile_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_25
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_views
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_table_access_keys_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union24
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union28
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union30
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_null
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_6_subq
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_mapjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_simple_select
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_filter_join_breaktask
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_filter_join_breaktask2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mrr
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_mapjoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_dynam

[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions

2015-01-23 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289285#comment-14289285
 ] 

Aihua Xu commented on HIVE-9228:


Thanks, [~navis]. Will you make a code review for this patch as well?

> Problem with subquery using windowing functions
> ---
>
> Key: HIVE-9228
> URL: https://issues.apache.org/jira/browse/HIVE-9228
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.13.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-9228.1.patch.txt, create_table_tab1.sql, tab1.csv
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> The following query with window functions failed. The internal query works 
> fine.
> select col1, col2, col3 from (select col1,col2, col3, count(case when col4=1 
> then 1 end ) over (partition by col1, col2) as col5, row_number() over 
> (partition by col1, col2 order by col4) as col6 from tab1) t;
> HIVE generates an execution plan with 2 jobs. 
> 1. The first job is to basically calculate window function for col5.  
> 2. The second job is to calculate window function for col6 and output.
> The plan says the first job outputs the columns (col1, col2, col3, col4) to a 
> tmp file since only these columns are used in later stage. While, the PTF 
> operator for the first job outputs (_wcol0, col1, col2, col3, col4) with 
> _wcol0 as the result of the window function even it's not used. 
> In the second job, the map operator still reads the 4 columns (col1, col2, 
> col3, col4) from the temp file using the plan. That causes the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions

2015-01-15 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279016#comment-14279016
 ] 

Aihua Xu commented on HIVE-9228:


[~rhbutani] Do you have comments on this issue? 


> Problem with subquery using windowing functions
> ---
>
> Key: HIVE-9228
> URL: https://issues.apache.org/jira/browse/HIVE-9228
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.13.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: create_table_tab1.sql, tab1.csv
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> The following query with window functions failed. The internal query works 
> fine.
> select col1, col2, col3 from (select col1,col2, col3, count(case when col4=1 
> then 1 end ) over (partition by col1, col2) as col5, row_number() over 
> (partition by col1, col2 order by col4) as col6 from tab1) t;
> HIVE generates an execution plan with 2 jobs. 
> 1. The first job is to basically calculate window function for col5.  
> 2. The second job is to calculate window function for col6 and output.
> The plan says the first job outputs the columns (col1, col2, col3, col4) to a 
> tmp file since only these columns are used in later stage. While, the PTF 
> operator for the first job outputs (_wcol0, col1, col2, col3, col4) with 
> _wcol0 as the result of the window function even it's not used. 
> In the second job, the map operator still reads the 4 columns (col1, col2, 
> col3, col4) from the temp file using the plan. That causes the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions

2015-01-05 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264639#comment-14264639
 ] 

Aihua Xu commented on HIVE-9228:


[~ashutoshc] Are you working on this area, any idea?

The following window query throws ArrayOutOfBoundExcption. 

select st_fips_cd, zip_cd_5, hh_surr_key
from
(
select st_fips_cd, zip_cd_5, hh_surr_key,
count( case when advtg_len_rsdnc_cd = '1' then 1 end ) over (partition by 
st_fips_cd, zip_cd_5) as CNT_ADVTG_LEN_RSDNC_CD_1,
row_number() over (partition by st_fips_cd, zip_cd_5 order by hh_surr_key asc) 
as analytic_row_number3
from hh_agg
where analytic_row_number2 = 1
) t;


Here is the explain extend output for the query.  At the File Sink Operator of 
stage 1 below, seems like it should only output 4 columns while the temp table 
in fact output one additional column (the value of CNT_ADVTG_LEN_RSDNC_CD_1). 
I’m investigating toward such mismatch, but anyone can confirm and provide 
additional info that will be helpful.

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: hh_agg
Statistics: Num rows: 33208 Data size: 10361206 Basic stats: 
COMPLETE Column stats: NONE
GatherStats: false
Filter Operator
  isSamplingPred: false
  predicate: (analytic_row_number2 = 1) (type: boolean)
  Statistics: Num rows: 16604 Data size: 5180603 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: st_fips_cd (type: string), zip_cd_5 (type: 
string), st_fips_cd (type: string), zip_cd_5 (type: string)
sort order: 
Map-reduce partition columns: st_fips_cd (type: string), 
zip_cd_5 (type: string)
Statistics: Num rows: 16604 Data size: 5180603 Basic stats: 
COMPLETE Column stats: NONE
tag: -1
value expressions: st_fips_cd (type: string), zip_cd_5 (type: 
string), hh_surr_key (type: bigint), advtg_len_rsdnc_cd (type: string)
  Path -> Alias:
file:/Users/axu/Documents/localDB/23982_debug [t:hh_agg]
  Path -> Partition:
file:/Users/axu/Documents/localDB/23982_debug 
  Partition
base file name: 23982_debug
input format: org.apache.hadoop.mapred.TextInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
  COLUMN_STATS_ACCURATE true
  EXTERNAL TRUE
  bucket_count -1
  columns 
st_fips_cd,zip_cd_5,hh_surr_key,nbr_hh_in_zip,nbr_nr_adults_in_hh,hh_pop,advtg_len_rsdnc_cd,advtg_home_ownr_cd,dsf_season_cd,advtg_hh_edu_cd,advtg_hh_occupn_cd,advtg_child_presnc_cd,advtg_hh_age_cd,zip_avg_age,zip_mdn_age,mail_rspns_buy_cd,cnt_gend_cd_1,cnt_gend_cd_2,cnt_gend_cd_3,cnt_gend_cd_unk,cnt_advtg_marital_stat_cd_1,cnt_advtg_marital_stat_cd_2,cnt_advtg_marital_stat_cd_unk,cnt_nbr_tradeline_0,cnt_nbr_tradeline_1,cnt_nbr_tradeline_2,cnt_nbr_tradeline_3,cnt_nbr_tradeline_4,cnt_nbr_tradeline_5,cnt_nbr_tradeline_6,cnt_nbr_tradeline_7,cnt_nbr_tradeline_8,cnt_nbr_tradeline_9,cnt_nbr_tradeline_unk,advtg_dwell_type_cd,prprty_mkt_val_cd,zip_avg_prprty_mkt_val,zip_mdn_prprty_mkt_val,zip_avg_home_eqty_amt,zip_mdn_home_eqty_amt,trgt_inc_cd,zip_avg_trgt_inc_narrow_band,zip_mdn_trgt_inc_narrow_band,zip_avg_inc_prodc_asset_cd,zip_mdn_inc_prodc_asset_cd,zip_avg_net_wrth_cd,zip_mdn_net_wrth_cd,rylty_trgt_mktg_val_scr_cd,analytic_row_number2
  columns.comments 
  columns.types 
string:string:bigint:bigint:bigint:tinyint:string:string:string:string:string:string:string:double:double:int:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:string:string:double:double:double:double:string:double:double:double:double:double:double:string:int
  field.delim ,
  file.inputformat org.apache.hadoop.mapred.TextInputFormat
  file.outputformat 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  location file:/Users/axu/Documents/localDB/23982_debug
  name default.hh_agg
  numFiles 0
  numRows 0
  rawDataSize 0
  serialization.ddl struct hh_agg { string st_fips_cd, string 
zip_cd_5, i64 hh_surr_key, i64 nbr_hh_in_zip, i64 nbr_nr_adults_in_hh, byte 
hh_pop, string advtg_len_rsdnc_cd, string advtg_home_ownr_cd, string 
dsf_season_cd, string advtg_hh_edu_cd, string advtg_hh_occupn_cd, string 
advtg_child_presnc_cd, string advtg_hh_age_cd, double zip_avg_age, double 
zip_mdn_age, i32 mail_rspns_buy_cd, i64 cnt_gend_cd_1, i64 cnt_gend_cd_2, i64 
cnt_gend_cd_3, i64 cnt_gend_cd_unk, i64 cnt_advt