[jira] [Commented] (HIVE-13453) Support ORDER BY and windowing clause in partitioning clause with distinct function

2016-12-22 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15772203#comment-15772203
 ] 

Lefty Leverenz commented on HIVE-13453:
---

Thanks for the wiki documentation, [~aihuaxu].

* [Windowing and Analytics -- Enhancements to Hive QL -- 4. Distinct support in 
Hive 2.1.0 and later | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics#LanguageManualWindowingAndAnalytics-EnhancementstoHiveQL]

> Support ORDER BY and windowing clause in partitioning clause with distinct 
> function
> ---
>
> Key: HIVE-13453
> URL: https://issues.apache.org/jira/browse/HIVE-13453
> Project: Hive
>  Issue Type: Sub-task
>  Components: PTF-Windowing
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Fix For: 2.2.0
>
> Attachments: HIVE-13453.1.patch, HIVE-13453.2.patch, 
> HIVE-13453.3.patch, HIVE-13453.4.patch
>
>
> Current distinct function on partitioning doesn't support order by and 
> windowing clause due to performance reason. Explore an efficient way to 
> support that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15360) Nested column pruning: add pruned column paths to explain output

2016-12-22 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15772181#comment-15772181
 ] 

Ferdinand Xu commented on HIVE-15360:
-

+1

> Nested column pruning: add pruned column paths to explain output
> 
>
> Key: HIVE-15360
> URL: https://issues.apache.org/jira/browse/HIVE-15360
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-15360.1.patch, HIVE-15360.2.patch, 
> HIVE-15360.3.patch
>
>
> We should add the pruned nested column paths to the explain output for easier 
> tracing and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15499) Nested column pruning: don't prune paths when a SerDe is used only for serializing

2016-12-22 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15772178#comment-15772178
 ] 

Ferdinand Xu commented on HIVE-15499:
-

+1

> Nested column pruning: don't prune paths when a SerDe is used only for 
> serializing
> --
>
> Key: HIVE-15499
> URL: https://issues.apache.org/jira/browse/HIVE-15499
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15499.1.patch, HIVE-15499.2.patch
>
>
> In {{FileSinkOperator}}, a serializer is created to write output data. When 
> initializing it we should not read the 
> {{ColumnProjectionUtils.READ_NESTED_COLUMN_PATH_CONF_STR}} property since 
> this is only used for the read path, and the path may not match the schema 
> for the output table (for instance, in the case of insert).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15360) Nested column pruning: add pruned column paths to explain output

2016-12-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15772145#comment-15772145
 ] 

Hive QA commented on HIVE-15360:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844518/HIVE-15360.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10895 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=134)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2711/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2711/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2711/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844518 - PreCommit-HIVE-Build

> Nested column pruning: add pruned column paths to explain output
> 
>
> Key: HIVE-15360
> URL: https://issues.apache.org/jira/browse/HIVE-15360
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-15360.1.patch, HIVE-15360.2.patch, 
> HIVE-15360.3.patch
>
>
> We should add the pruned nested column paths to the explain output for easier 
> tracing and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15504) ArrayIndexOutOfBoundsException in GenericUDFTrunc::initialize

2016-12-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15772049#comment-15772049
 ] 

Hive QA commented on HIVE-15504:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844516/HIVE-15504.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10881 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=115)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2710/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2710/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2710/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844516 - PreCommit-HIVE-Build

> ArrayIndexOutOfBoundsException in GenericUDFTrunc::initialize
> -
>
> Key: HIVE-15504
> URL: https://issues.apache.org/jira/browse/HIVE-15504
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-15504.1.patch
>
>
> SELECT TRUNC(d_date) FROM test_date_dim throws ArrayIndexOutOfBounds 
> exception.
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFTrunc.initialize(GenericUDFTrunc.java:128)
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:139)
> at 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:236)
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:1102)
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1357)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> at 
> org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:76)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:227)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15499) Nested column pruning: don't prune paths when a SerDe is used only for serializing

2016-12-22 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15499:

Attachment: HIVE-15499.2.patch

> Nested column pruning: don't prune paths when a SerDe is used only for 
> serializing
> --
>
> Key: HIVE-15499
> URL: https://issues.apache.org/jira/browse/HIVE-15499
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15499.1.patch, HIVE-15499.2.patch
>
>
> In {{FileSinkOperator}}, a serializer is created to write output data. When 
> initializing it we should not read the 
> {{ColumnProjectionUtils.READ_NESTED_COLUMN_PATH_CONF_STR}} property since 
> this is only used for the read path, and the path may not match the schema 
> for the output table (for instance, in the case of insert).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15503) LLAP: Fix use of Runtime.getRuntime.maxMemory in Hive operators

2016-12-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771971#comment-15771971
 ] 

Hive QA commented on HIVE-15503:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844513/HIVE-15503.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10881 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=95)

[join_cond_pushdown_unqual4.q,union_remove_7.q,join13.q,join_vc.q,groupby_cube1.q,bucket_map_join_spark2.q,sample3.q,smb_mapjoin_19.q,stats16.q,union23.q,union.q,union31.q,cbo_udf_udaf.q,ptf_decimal.q,bucketmapjoin2.q]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters]
 (batchId=136)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=134)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_1] 
(batchId=92)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] 
(batchId=92)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=92)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2709/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2709/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2709/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844513 - PreCommit-HIVE-Build

> LLAP: Fix use of Runtime.getRuntime.maxMemory in Hive operators
> ---
>
> Key: HIVE-15503
> URL: https://issues.apache.org/jira/browse/HIVE-15503
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-15503.1.patch, HIVE-15503.WIP.patch
>
>
> {code}
> ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java:
> maxHashTblMemory = (long) (memoryPercentage * 
> Runtime.getRuntime().maxMemory());
> ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:// Total Free 
> Memory = maxMemory() - Used Memory;
> ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:long 
> totalFreeMemory = Runtime.getRuntime().maxMemory() -
> {code}
> This will not work very well with LLAP because of the memory sharing by 
> executors. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15360) Nested column pruning: add pruned column paths to explain output

2016-12-22 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15360:

Attachment: HIVE-15360.3.patch

Re-attaching patch v3 to trigger test again.

> Nested column pruning: add pruned column paths to explain output
> 
>
> Key: HIVE-15360
> URL: https://issues.apache.org/jira/browse/HIVE-15360
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-15360.1.patch, HIVE-15360.2.patch, 
> HIVE-15360.3.patch
>
>
> We should add the pruned nested column paths to the explain output for easier 
> tracing and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15360) Nested column pruning: add pruned column paths to explain output

2016-12-22 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15360:

Attachment: (was: HIVE-15360.3.patch)

> Nested column pruning: add pruned column paths to explain output
> 
>
> Key: HIVE-15360
> URL: https://issues.apache.org/jira/browse/HIVE-15360
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-15360.1.patch, HIVE-15360.2.patch
>
>
> We should add the pruned nested column paths to the explain output for easier 
> tracing and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15112) Implement Parquet vectorization reader for Struct type

2016-12-22 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771934#comment-15771934
 ] 

Chao Sun commented on HIVE-15112:
-

+1 on the latest patch.

> Implement Parquet vectorization reader for Struct type
> --
>
> Key: HIVE-15112
> URL: https://issues.apache.org/jira/browse/HIVE-15112
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Fix For: 2.2.0
>
> Attachments: HIVE-15112.1.patch, HIVE-15112.2.patch, 
> HIVE-15112.3.patch, HIVE-15112.4.patch, HIVE-15112.patch
>
>
> Like HIVE-14815, we need support Parquet vectorized reader for struct type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15504) ArrayIndexOutOfBoundsException in GenericUDFTrunc::initialize

2016-12-22 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15504:

Status: Patch Available  (was: Open)

> ArrayIndexOutOfBoundsException in GenericUDFTrunc::initialize
> -
>
> Key: HIVE-15504
> URL: https://issues.apache.org/jira/browse/HIVE-15504
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-15504.1.patch
>
>
> SELECT TRUNC(d_date) FROM test_date_dim throws ArrayIndexOutOfBounds 
> exception.
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFTrunc.initialize(GenericUDFTrunc.java:128)
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:139)
> at 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:236)
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:1102)
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1357)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> at 
> org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:76)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:227)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15504) ArrayIndexOutOfBoundsException in GenericUDFTrunc::initialize

2016-12-22 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15504:

Attachment: HIVE-15504.1.patch

> ArrayIndexOutOfBoundsException in GenericUDFTrunc::initialize
> -
>
> Key: HIVE-15504
> URL: https://issues.apache.org/jira/browse/HIVE-15504
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15504.1.patch
>
>
> SELECT TRUNC(d_date) FROM test_date_dim throws ArrayIndexOutOfBounds 
> exception.
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFTrunc.initialize(GenericUDFTrunc.java:128)
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:139)
> at 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:236)
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:1102)
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1357)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> at 
> org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:76)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:227)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15504) ArrayIndexOutOfBoundsException in GenericUDFTrunc::initialize

2016-12-22 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15504:

Priority: Trivial  (was: Minor)

> ArrayIndexOutOfBoundsException in GenericUDFTrunc::initialize
> -
>
> Key: HIVE-15504
> URL: https://issues.apache.org/jira/browse/HIVE-15504
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Rajesh Balamohan
>Priority: Trivial
> Attachments: HIVE-15504.1.patch
>
>
> SELECT TRUNC(d_date) FROM test_date_dim throws ArrayIndexOutOfBounds 
> exception.
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFTrunc.initialize(GenericUDFTrunc.java:128)
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:139)
> at 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:236)
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:1102)
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1357)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> at 
> org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:76)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:227)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15476) ObjectStore.getMTableColumnStatistics() should check if colNames is empty

2016-12-22 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771845#comment-15771845
 ] 

Naveen Gangam commented on HIVE-15476:
--

Thanks for the comments. The proposed fix is consistent with the current 
behavior. So +1 for me.

> ObjectStore.getMTableColumnStatistics() should check if colNames is empty
> -
>
> Key: HIVE-15476
> URL: https://issues.apache.org/jira/browse/HIVE-15476
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Minor
> Attachments: HIVE-15476.1.patch, HIVE-15476.2.patch
>
>
> See the following exception in the log. Can't find out which exact query 
> causes it though.
> {noformat}
> [pool-4-thread-31]: Exception thrown
> Method/Identifier expected at character 37 in "tableName == t1 && dbName == 
> t2 && ()"
> org.datanucleus.store.query.QueryCompilerSyntaxException: Method/Identifier 
> expected at character 37 in "tableName == t1 && dbName == t2 && ()"
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processPrimary(JDOQLParser.java:810)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processUnaryExpression(JDOQLParser.java:656)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processMultiplicativeExpression(JDOQLParser.java:582)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processAdditiveExpression(JDOQLParser.java:553)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processRelationalExpression(JDOQLParser.java:467)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processAndExpression(JDOQLParser.java:450)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processExclusiveOrExpression(JDOQLParser.java:436)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processInclusiveOrExpression(JDOQLParser.java:422)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processConditionalAndExpression(JDOQLParser.java:408)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processConditionalOrExpression(JDOQLParser.java:389)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processExpression(JDOQLParser.java:378)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processPrimary(JDOQLParser.java:785)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processUnaryExpression(JDOQLParser.java:656)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processMultiplicativeExpression(JDOQLParser.java:582)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processAdditiveExpression(JDOQLParser.java:553)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processRelationalExpression(JDOQLParser.java:467)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processAndExpression(JDOQLParser.java:450)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processExclusiveOrExpression(JDOQLParser.java:436)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processInclusiveOrExpression(JDOQLParser.java:422)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processConditionalAndExpression(JDOQLParser.java:412)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processConditionalOrExpression(JDOQLParser.java:389)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processExpression(JDOQLParser.java:378)
>   at org.datanucleus.query.compiler.JDOQLParser.parse(JDOQLParser.java:99)
>   at 
> org.datanucleus.query.compiler.JavaQueryCompiler.compileFilter(JavaQueryCompiler.java:467)
>   at 
> org.datanucleus.query.compiler.JDOQLCompiler.compile(JDOQLCompiler.java:113)
>   at 
> org.datanucleus.store.query.AbstractJDOQLQuery.compileInternal(AbstractJDOQLQuery.java:367)
>   at 
> org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:240)
>   at org.datanucleus.store.query.Query.executeQuery(Query.java:1744)
>   at org.datanucleus.store.query.Query.executeWithArray(Query.java:1672)
>   at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:312)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getMTableColumnStatistics(ObjectStore.java:6505)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.access$1200(ObjectStore.java:171)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore$9.getJdoResult(ObjectStore.java:6566)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore$9.getJdoResult(ObjectStore.java:6555)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2629)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatisticsInternal(ObjectStore.java:6554)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatistics(ObjectStore.java:6548)
> 

[jira] [Commented] (HIVE-15112) Implement Parquet vectorization reader for Struct type

2016-12-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771840#comment-15771840
 ] 

Hive QA commented on HIVE-15112:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844501/HIVE-15112.4.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10877 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=144)

[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_char_mapjoin1.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2708/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2708/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2708/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844501 - PreCommit-HIVE-Build

> Implement Parquet vectorization reader for Struct type
> --
>
> Key: HIVE-15112
> URL: https://issues.apache.org/jira/browse/HIVE-15112
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Fix For: 2.2.0
>
> Attachments: HIVE-15112.1.patch, HIVE-15112.2.patch, 
> HIVE-15112.3.patch, HIVE-15112.4.patch, HIVE-15112.patch
>
>
> Like HIVE-14815, we need support Parquet vectorized reader for struct type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15503) LLAP: Fix use of Runtime.getRuntime.maxMemory in Hive operators

2016-12-22 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771818#comment-15771818
 ] 

Prasanth Jayachandran commented on HIVE-15503:
--

[~hagleitn] The patch is ready for review now. I am still working on the tests. 
The current test flushes after every row because memory usage shows high 
compared to max value. I will update the patch after identifying the root 
cause. 

> LLAP: Fix use of Runtime.getRuntime.maxMemory in Hive operators
> ---
>
> Key: HIVE-15503
> URL: https://issues.apache.org/jira/browse/HIVE-15503
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-15503.1.patch, HIVE-15503.WIP.patch
>
>
> {code}
> ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java:
> maxHashTblMemory = (long) (memoryPercentage * 
> Runtime.getRuntime().maxMemory());
> ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:// Total Free 
> Memory = maxMemory() - Used Memory;
> ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:long 
> totalFreeMemory = Runtime.getRuntime().maxMemory() -
> {code}
> This will not work very well with LLAP because of the memory sharing by 
> executors. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15503) LLAP: Fix use of Runtime.getRuntime.maxMemory in Hive operators

2016-12-22 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15503:
-
Attachment: HIVE-15503.1.patch

> LLAP: Fix use of Runtime.getRuntime.maxMemory in Hive operators
> ---
>
> Key: HIVE-15503
> URL: https://issues.apache.org/jira/browse/HIVE-15503
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-15503.1.patch, HIVE-15503.WIP.patch
>
>
> {code}
> ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java:
> maxHashTblMemory = (long) (memoryPercentage * 
> Runtime.getRuntime().maxMemory());
> ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:// Total Free 
> Memory = maxMemory() - Used Memory;
> ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:long 
> totalFreeMemory = Runtime.getRuntime().maxMemory() -
> {code}
> This will not work very well with LLAP because of the memory sharing by 
> executors. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15503) LLAP: Fix use of Runtime.getRuntime.maxMemory in Hive operators

2016-12-22 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15503:
-
Status: Patch Available  (was: Open)

> LLAP: Fix use of Runtime.getRuntime.maxMemory in Hive operators
> ---
>
> Key: HIVE-15503
> URL: https://issues.apache.org/jira/browse/HIVE-15503
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-15503.1.patch, HIVE-15503.WIP.patch
>
>
> {code}
> ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java:
> maxHashTblMemory = (long) (memoryPercentage * 
> Runtime.getRuntime().maxMemory());
> ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:// Total Free 
> Memory = maxMemory() - Used Memory;
> ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:long 
> totalFreeMemory = Runtime.getRuntime().maxMemory() -
> {code}
> This will not work very well with LLAP because of the memory sharing by 
> executors. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8373) OOM for a simple query with spark.master=local [Spark Branch]

2016-12-22 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771778#comment-15771778
 ] 

Rui Li commented on HIVE-8373:
--

Thanks [~asears] for the inputs. Just curious, in which case should 
MaxMetaspaceSize be set? I think it may be useful to find leaks in 
classloading. Otherwise, I guess users don't have to set an upper bound for the 
meta space right?

> OOM for a simple query with spark.master=local [Spark Branch]
> -
>
> Key: HIVE-8373
> URL: https://issues.apache.org/jira/browse/HIVE-8373
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: liyunzhang_intel
>
> I have a straigh forward query to run in Spark local mode, but get an OOM 
> even though the data volumn is tiny:
> {code}
> Exception in thread "Spark Context Cleaner" 
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "Spark Context Cleaner"
> Exception in thread "Executor task launch worker-1" 
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "Executor task launch worker-1"
> Exception in thread "Keep-Alive-Timer" 
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "Keep-Alive-Timer"
> Exception in thread "Driver Heartbeater" 
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "Driver Heartbeater"
> {code}
> The query is:
> {code}
> select product_name, avg(item_price) as avg_price from product join item on 
> item.product_pk=product.product_pk group by product_name order by avg_price;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15357) Fix and re-enable the spark-only tests

2016-12-22 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-15357:
--
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks Chao for the review!

> Fix and re-enable the spark-only tests
> --
>
> Key: HIVE-15357
> URL: https://issues.apache.org/jira/browse/HIVE-15357
> Project: Hive
>  Issue Type: Test
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: 2.2.0
>
> Attachments: HIVE-15357.1.patch
>
>
> Defined by {{spark.only.query.files}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14956) Parallelize TestHCatLoader

2016-12-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771725#comment-15771725
 ] 

Hive QA commented on HIVE-14956:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844476/HIVE-14956.04.patch

{color:green}SUCCESS:{color} +1 due to 11 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10895 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=233)
org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testMetastoreProxyUser 
(batchId=218)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2707/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2707/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2707/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844476 - PreCommit-HIVE-Build

> Parallelize TestHCatLoader
> --
>
> Key: HIVE-14956
> URL: https://issues.apache.org/jira/browse/HIVE-14956
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-14956.01.patch, HIVE-14956.02.patch, 
> HIVE-14956.03.patch, HIVE-14956.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14956) Parallelize TestHCatLoader

2016-12-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771640#comment-15771640
 ] 

Hive QA commented on HIVE-14956:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844476/HIVE-14956.04.patch

{color:green}SUCCESS:{color} +1 due to 11 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10876 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=233)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2706/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2706/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2706/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844476 - PreCommit-HIVE-Build

> Parallelize TestHCatLoader
> --
>
> Key: HIVE-14956
> URL: https://issues.apache.org/jira/browse/HIVE-14956
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-14956.01.patch, HIVE-14956.02.patch, 
> HIVE-14956.03.patch, HIVE-14956.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15112) Implement Parquet vectorization reader for Struct type

2016-12-22 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-15112:

Attachment: HIVE-15112.4.patch

> Implement Parquet vectorization reader for Struct type
> --
>
> Key: HIVE-15112
> URL: https://issues.apache.org/jira/browse/HIVE-15112
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Fix For: 2.2.0
>
> Attachments: HIVE-15112.1.patch, HIVE-15112.2.patch, 
> HIVE-15112.3.patch, HIVE-15112.4.patch, HIVE-15112.patch
>
>
> Like HIVE-14815, we need support Parquet vectorized reader for struct type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15055) Column pruning for nested fields in Parquet

2016-12-22 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771609#comment-15771609
 ] 

Ferdinand Xu commented on HIVE-15055:
-

Thanks [~csun] for the design document and benchmark information. Could you add 
a little more to describe what query and table structure you used in the 
benchmark for a better understand?

> Column pruning for nested fields in Parquet
> ---
>
> Key: HIVE-15055
> URL: https://issues.apache.org/jira/browse/HIVE-15055
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer, Physical Optimizer, 
> Serializers/Deserializers
>Reporter: Chao Sun
>Assignee: Chao Sun
>  Labels: performance
> Attachments: benchmark-hos.pdf, design-doc-nested-column-pruning.pdf
>
>
> Some columnar file formats such as Parquet store fields in struct type also 
> column by column using encoding described in Google Dramel pager. It's very 
> common in big data where data are stored in structs while queries only needs 
> a subset of the the fields in the structs. However, presently Hive still 
> needs to read the whole struct regardless whether all fields are selected. 
> Therefore, pruning unwanted sub-fields in struct or nested fields at file 
> reading time would be a big performance boost for such scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15503) LLAP: Fix use of Runtime.getRuntime.maxMemory in Hive operators

2016-12-22 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-15503:
-
Attachment: HIVE-15503.WIP.patch

[~hagleitn] This is WIP patch. I will post final patch today for review. Have 
to run through manual tests and see nothing breaks. 

> LLAP: Fix use of Runtime.getRuntime.maxMemory in Hive operators
> ---
>
> Key: HIVE-15503
> URL: https://issues.apache.org/jira/browse/HIVE-15503
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-15503.WIP.patch
>
>
> {code}
> ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java:
> maxHashTblMemory = (long) (memoryPercentage * 
> Runtime.getRuntime().maxMemory());
> ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:// Total Free 
> Memory = maxMemory() - Used Memory;
> ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:long 
> totalFreeMemory = Runtime.getRuntime().maxMemory() -
> {code}
> This will not work very well with LLAP because of the memory sharing by 
> executors. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15476) ObjectStore.getMTableColumnStatistics() should check if colNames is empty

2016-12-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771557#comment-15771557
 ] 

Hive QA commented on HIVE-15476:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844472/HIVE-15476.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10895 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hive.hcatalog.api.TestHCatClientNotification.createTable 
(batchId=220)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2705/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2705/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2705/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844472 - PreCommit-HIVE-Build

> ObjectStore.getMTableColumnStatistics() should check if colNames is empty
> -
>
> Key: HIVE-15476
> URL: https://issues.apache.org/jira/browse/HIVE-15476
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Minor
> Attachments: HIVE-15476.1.patch, HIVE-15476.2.patch
>
>
> See the following exception in the log. Can't find out which exact query 
> causes it though.
> {noformat}
> [pool-4-thread-31]: Exception thrown
> Method/Identifier expected at character 37 in "tableName == t1 && dbName == 
> t2 && ()"
> org.datanucleus.store.query.QueryCompilerSyntaxException: Method/Identifier 
> expected at character 37 in "tableName == t1 && dbName == t2 && ()"
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processPrimary(JDOQLParser.java:810)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processUnaryExpression(JDOQLParser.java:656)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processMultiplicativeExpression(JDOQLParser.java:582)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processAdditiveExpression(JDOQLParser.java:553)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processRelationalExpression(JDOQLParser.java:467)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processAndExpression(JDOQLParser.java:450)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processExclusiveOrExpression(JDOQLParser.java:436)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processInclusiveOrExpression(JDOQLParser.java:422)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processConditionalAndExpression(JDOQLParser.java:408)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processConditionalOrExpression(JDOQLParser.java:389)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processExpression(JDOQLParser.java:378)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processPrimary(JDOQLParser.java:785)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processUnaryExpression(JDOQLParser.java:656)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processMultiplicativeExpression(JDOQLParser.java:582)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processAdditiveExpression(JDOQLParser.java:553)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processRelationalExpression(JDOQLParser.java:467)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processAndExpression(JDOQLParser.java:450)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processExclusiveOrExpression(JDOQLParser.java:436)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processInclusiveOrExpression(JDOQLParser.java:422)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processConditionalAndExpression(JDOQLParser.java:412)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processConditionalOrExpression(JDOQLParser.java:389)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processExpression(JDOQLParser.java:378)
>   at org.datanucleus.query.compiler.JDOQLParser.parse(JDOQLParser.java:99)
>   at 
> org.datanucleus.query.compiler.JavaQueryCompiler.compileFilter(JavaQueryCompiler.java:467)
>   at 
> org.datanucleus.query.compiler.JDOQLCompiler.compile(JDOQLCompiler.java:113)
>   at 
> 

[jira] [Commented] (HIVE-15499) Nested column pruning: don't prune paths when a SerDe is used only for serializing

2016-12-22 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771523#comment-15771523
 ] 

Ferdinand Xu commented on HIVE-15499:
-

LGTM, just a minor comment. Please remove extra space before nested_tbl_3.
{code}
+DROP TABLE IF EXISTS  nested_tbl_3;
{code}

> Nested column pruning: don't prune paths when a SerDe is used only for 
> serializing
> --
>
> Key: HIVE-15499
> URL: https://issues.apache.org/jira/browse/HIVE-15499
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15499.1.patch
>
>
> In {{FileSinkOperator}}, a serializer is created to write output data. When 
> initializing it we should not read the 
> {{ColumnProjectionUtils.READ_NESTED_COLUMN_PATH_CONF_STR}} property since 
> this is only used for the read path, and the path may not match the schema 
> for the output table (for instance, in the case of insert).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-15360) Nested column pruning: add pruned column paths to explain output

2016-12-22 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771497#comment-15771497
 ] 

Ferdinand Xu edited comment on HIVE-15360 at 12/23/16 12:50 AM:


Thanks [~csun] for the patch. LGTM. Can you rerun the precommit to check 
"TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=144)

[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema"?
 It doesn't happen in HIVE-15499.


was (Author: ferd):
Thanks [~csun] for the patch. LGTM +1

> Nested column pruning: add pruned column paths to explain output
> 
>
> Key: HIVE-15360
> URL: https://issues.apache.org/jira/browse/HIVE-15360
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-15360.1.patch, HIVE-15360.2.patch, 
> HIVE-15360.3.patch
>
>
> We should add the pruned nested column paths to the explain output for easier 
> tracing and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15360) Nested column pruning: add pruned column paths to explain output

2016-12-22 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771497#comment-15771497
 ] 

Ferdinand Xu commented on HIVE-15360:
-

Thanks [~csun] for the patch. LGTM +1

> Nested column pruning: add pruned column paths to explain output
> 
>
> Key: HIVE-15360
> URL: https://issues.apache.org/jira/browse/HIVE-15360
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-15360.1.patch, HIVE-15360.2.patch, 
> HIVE-15360.3.patch
>
>
> We should add the pruned nested column paths to the explain output for easier 
> tracing and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15498) sum() over (order by c) should default the windowing spec to RangeBoundarySpec

2016-12-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771490#comment-15771490
 ] 

Hive QA commented on HIVE-15498:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844469/HIVE-15498.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10895 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2704/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2704/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2704/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844469 - PreCommit-HIVE-Build

> sum() over (order by c) should default the windowing spec to RangeBoundarySpec
> --
>
> Key: HIVE-15498
> URL: https://issues.apache.org/jira/browse/HIVE-15498
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 2.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15498.1.patch, HIVE-15498.2.patch
>
>
> Currently {{sum() over (partition by a)}} without order by is defaulted 
> windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
> c)}} is defaulted to ValueBoundarySpec. It's not consistent and the user gets 
> confused of the switch from "rows between" to "range between" by adding 
> "order by c" clause.
> From the comment 
> {noformat}
>   /*
>* - A Window Frame that has only the /start/boundary, then it is 
> interpreted as:
>  BETWEEN  AND CURRENT ROW
>* - A Window Specification with an Order Specification and no Window
>*   Frame is interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
>* - A Window Specification with no Order and no Window Frame is 
> interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
>*/
> {noformat}
> We intended to set as "row between" (RangeBoundarySpec), not "range between" 
> (ValueBoundarySpec). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15501) Add INSTR and MONTHS_BETWEEN to UDFs that are Vectorized

2016-12-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771421#comment-15771421
 ] 

Hive QA commented on HIVE-15501:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844460/HIVE-15501.02.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10867 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=144)

[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_char_mapjoin1.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_udf1]
 (batchId=148)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2703/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2703/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2703/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844460 - PreCommit-HIVE-Build

> Add INSTR and MONTHS_BETWEEN to UDFs that are Vectorized
> 
>
> Key: HIVE-15501
> URL: https://issues.apache.org/jira/browse/HIVE-15501
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15501.01.patch, HIVE-15501.02.patch
>
>
> Add INSTR to special list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15487) LLAP: Improvements to random selection while scheduling

2016-12-22 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771369#comment-15771369
 ] 

Gunther Hagleitner commented on HIVE-15487:
---

+1 LGTM

> LLAP: Improvements to random selection while scheduling
> ---
>
> Key: HIVE-15487
> URL: https://issues.apache.org/jira/browse/HIVE-15487
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-15487.1.patch
>
>
> Currently llap scheduler, picks up random host when no locality information 
> is specified or when all requested hosts are busy serving other requests with 
> forced locality. In such cases, we can pick up the next available node in 
> consistent order to get better locality instead of random selection. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15489) Alternatively use table scan stats for HoS

2016-12-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771300#comment-15771300
 ] 

Hive QA commented on HIVE-15489:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844456/HIVE-15489.wip.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10876 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_inner_join]
 (batchId=161)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_sortmerge_join_10]
 (batchId=125)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucket_map_join_tez1]
 (batchId=130)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucket_map_join_tez2]
 (batchId=99)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[identity_project_remove_skip]
 (batchId=115)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union22] 
(batchId=100)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_mapjoin_reduce]
 (batchId=128)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2702/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2702/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2702/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844456 - PreCommit-HIVE-Build

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15489.wip.patch
>
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15502) CTAS on S3 is broken with credentials exception

2016-12-22 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771285#comment-15771285
 ] 

Sahil Takiar commented on HIVE-15502:
-

I never had to do that in the past, but I can try and see if that works.

> CTAS on S3 is broken with credentials exception
> ---
>
> Key: HIVE-15502
> URL: https://issues.apache.org/jira/browse/HIVE-15502
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> Simple CTAS queries that read from S3, and write to the local fs throw the 
> following exception:
> {code}
> com.amazonaws.AmazonClientException: Unable to load AWS credentials from any 
> provider in the chain
>   at 
> com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117)
>   at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521)
>   at 
> com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
>   at 
> com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297)
>   at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
>   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
>   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
>   at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.isEmptyPath(Utilities.java:2308)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.isEmptyPath(Utilities.java:2304)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.getInputPaths(Utilities.java:3013)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:342)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2168)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1824)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1511)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1222)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1212)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:777)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:715)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:642)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Job Submission failed with exception 
> 'com.amazonaws.AmazonClientException(Unable to load AWS credentials from any 
> provider in the chain)'
> {code}
> Seems to only happen when trying to connect to S3 from map tasks. My 
> {{hive-site.xml}} has the following entries:
> {code}
> 
>   
> mapreduce.framework.name
> local
>   
>   
> mapred.job.tracker
> local
>   
>   
> fs.default.name
> file:///
>   
>   
> fs.s3a.access.key
> [ACCESS-KEY]
>   
>   
> fs.s3a.secret.key
> [SECRET-KEY]
>   
> 
> {code}
> I've also noticed that now I need to copy the AWS S3 SDK jars into the lib 
> folder before running Hive locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15502) CTAS on S3 is broken with credentials exception

2016-12-22 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771243#comment-15771243
 ] 

Vihang Karajgaonkar commented on HIVE-15502:


[~stakiar] Shouldn't these keys be present in the core-site.xml too for the Map 
tasks to succeed?

> CTAS on S3 is broken with credentials exception
> ---
>
> Key: HIVE-15502
> URL: https://issues.apache.org/jira/browse/HIVE-15502
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> Simple CTAS queries that read from S3, and write to the local fs throw the 
> following exception:
> {code}
> com.amazonaws.AmazonClientException: Unable to load AWS credentials from any 
> provider in the chain
>   at 
> com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117)
>   at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521)
>   at 
> com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
>   at 
> com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297)
>   at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
>   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
>   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
>   at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.isEmptyPath(Utilities.java:2308)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.isEmptyPath(Utilities.java:2304)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.getInputPaths(Utilities.java:3013)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:342)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2168)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1824)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1511)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1222)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1212)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:777)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:715)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:642)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Job Submission failed with exception 
> 'com.amazonaws.AmazonClientException(Unable to load AWS credentials from any 
> provider in the chain)'
> {code}
> Seems to only happen when trying to connect to S3 from map tasks. My 
> {{hive-site.xml}} has the following entries:
> {code}
> 
>   
> mapreduce.framework.name
> local
>   
>   
> mapred.job.tracker
> local
>   
>   
> fs.default.name
> file:///
>   
>   
> fs.s3a.access.key
> [ACCESS-KEY]
>   
>   
> fs.s3a.secret.key
> [SECRET-KEY]
>   
> 
> {code}
> I've also noticed that now I need to copy the AWS S3 SDK jars into the lib 
> folder before running Hive locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15498) sum() over (order by c) should default the windowing spec to RangeBoundarySpec

2016-12-22 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771230#comment-15771230
 ] 

Aihua Xu commented on HIVE-15498:
-

[~ashutoshc], [~ychena]  Can you help review the change? Seems the users are 
using sum() over (order by c) as a running total, but actually we are 
incorrectly adding "range between".

Here is what Oracle does (See 
http://www.oracle.com/technetwork/issue-archive/2013/13-mar/o23sql-1906475.html)
 which returns a running total.

> sum() over (order by c) should default the windowing spec to RangeBoundarySpec
> --
>
> Key: HIVE-15498
> URL: https://issues.apache.org/jira/browse/HIVE-15498
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 2.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15498.1.patch, HIVE-15498.2.patch
>
>
> Currently {{sum() over (partition by a)}} without order by is defaulted 
> windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
> c)}} is defaulted to ValueBoundarySpec. It's not consistent and the user gets 
> confused of the switch from "rows between" to "range between" by adding 
> "order by c" clause.
> From the comment 
> {noformat}
>   /*
>* - A Window Frame that has only the /start/boundary, then it is 
> interpreted as:
>  BETWEEN  AND CURRENT ROW
>* - A Window Specification with an Order Specification and no Window
>*   Frame is interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
>* - A Window Specification with no Order and no Window Frame is 
> interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
>*/
> {noformat}
> We intended to set as "row between" (RangeBoundarySpec), not "range between" 
> (ValueBoundarySpec). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14956) Parallelize TestHCatLoader

2016-12-22 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-14956:
---
Attachment: HIVE-14956.04.patch

Fixed minor issues

> Parallelize TestHCatLoader
> --
>
> Key: HIVE-14956
> URL: https://issues.apache.org/jira/browse/HIVE-14956
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-14956.01.patch, HIVE-14956.02.patch, 
> HIVE-14956.03.patch, HIVE-14956.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14956) Parallelize TestHCatLoader

2016-12-22 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771210#comment-15771210
 ] 

Aihua Xu commented on HIVE-14956:
-

The patch looks good to me. +1.

> Parallelize TestHCatLoader
> --
>
> Key: HIVE-14956
> URL: https://issues.apache.org/jira/browse/HIVE-14956
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-14956.01.patch, HIVE-14956.02.patch, 
> HIVE-14956.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15485) Investigate the DoAs failure in HoS

2016-12-22 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771202#comment-15771202
 ] 

Chaoyu Tang commented on HIVE-15485:


HIVE-14383 is the right way to renew the delegation token for a long running 
HoS session. Spark needs the principal/keytab passed in via --principal and 
--keytab options, and does the renewal by copying the keytab to the cluster and 
handling login to kerberos inside the application. 
But the option --principal, --keytab could not work with --proxy-user in 
spark-submit.sh as suggested by [~vanzin], so at this moment we could support 
either the token renewal or the impersonation, but not both.

> Investigate the DoAs failure in HoS
> ---
>
> Key: HIVE-15485
> URL: https://issues.apache.org/jira/browse/HIVE-15485
> Project: Hive
>  Issue Type: Bug
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>
> With DoAs enabled, HoS failed with following errors:
> {code}
> Exception in thread "main" org.apache.hadoop.security.AccessControlException: 
> systest tries to renew a token with renewer hive
>   at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:484)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:7543)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:555)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.renewDelegationToken(AuthorizationProviderProxyClientProtocol.java:674)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:999)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2141)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1783)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2135)
> {code}
> It is related to the change from HIVE-14383. It looks like that SparkSubmit 
> logs in Kerberos with passed in hive principal/keytab and then tries to 
> create a hdfs delegation token for user systest with renewer hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15499) Nested column pruning: don't prune paths when a SerDe is used only for serializing

2016-12-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771201#comment-15771201
 ] 

Hive QA commented on HIVE-15499:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844455/HIVE-15499.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10895 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2701/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2701/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2701/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844455 - PreCommit-HIVE-Build

> Nested column pruning: don't prune paths when a SerDe is used only for 
> serializing
> --
>
> Key: HIVE-15499
> URL: https://issues.apache.org/jira/browse/HIVE-15499
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15499.1.patch
>
>
> In {{FileSinkOperator}}, a serializer is created to write output data. When 
> initializing it we should not read the 
> {{ColumnProjectionUtils.READ_NESTED_COLUMN_PATH_CONF_STR}} property since 
> this is only used for the read path, and the path may not match the schema 
> for the output table (for instance, in the case of insert).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14956) Parallelize TestHCatLoader

2016-12-22 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-14956:
---
Attachment: HIVE-14956.03.patch

Addressed comments by [~aihuaxu] on review board

> Parallelize TestHCatLoader
> --
>
> Key: HIVE-14956
> URL: https://issues.apache.org/jira/browse/HIVE-14956
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-14956.01.patch, HIVE-14956.02.patch, 
> HIVE-14956.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14701) Some tests in MiniLlap is not showing bucketing information in describe

2016-12-22 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771182#comment-15771182
 ] 

Gunther Hagleitner commented on HIVE-14701:
---

[~prasanth_j] you said this has been fixed elsewhere?

> Some tests in MiniLlap is not showing bucketing information in describe
> ---
>
> Key: HIVE-14701
> URL: https://issues.apache.org/jira/browse/HIVE-14701
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>
> Some bucketing related tests like infer_bucket_sort_reducers_power_two.q when 
> run in MiniLlapCliDriver is not showing bucketing information in describe 
> formatted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15498) sum() over (order by c) should default the windowing spec to RangeBoundarySpec

2016-12-22 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771152#comment-15771152
 ] 

Aihua Xu commented on HIVE-15498:
-

Other databases like Oracle also works in this way.

> sum() over (order by c) should default the windowing spec to RangeBoundarySpec
> --
>
> Key: HIVE-15498
> URL: https://issues.apache.org/jira/browse/HIVE-15498
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 2.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15498.1.patch, HIVE-15498.2.patch
>
>
> Currently {{sum() over (partition by a)}} without order by is defaulted 
> windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
> c)}} is defaulted to ValueBoundarySpec. It's not consistent and the user gets 
> confused of the switch from "rows between" to "range between" by adding 
> "order by c" clause.
> From the comment 
> {noformat}
>   /*
>* - A Window Frame that has only the /start/boundary, then it is 
> interpreted as:
>  BETWEEN  AND CURRENT ROW
>* - A Window Specification with an Order Specification and no Window
>*   Frame is interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
>* - A Window Specification with no Order and no Window Frame is 
> interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
>*/
> {noformat}
> We intended to set as "row between" (RangeBoundarySpec), not "range between" 
> (ValueBoundarySpec). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15476) ObjectStore.getMTableColumnStatistics() should check if colNames is empty

2016-12-22 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771148#comment-15771148
 ] 

Aihua Xu commented on HIVE-15476:
-

>From the directSQL implementation, we are returning NULL when columns are not 
>provided. 

> ObjectStore.getMTableColumnStatistics() should check if colNames is empty
> -
>
> Key: HIVE-15476
> URL: https://issues.apache.org/jira/browse/HIVE-15476
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Minor
> Attachments: HIVE-15476.1.patch, HIVE-15476.2.patch
>
>
> See the following exception in the log. Can't find out which exact query 
> causes it though.
> {noformat}
> [pool-4-thread-31]: Exception thrown
> Method/Identifier expected at character 37 in "tableName == t1 && dbName == 
> t2 && ()"
> org.datanucleus.store.query.QueryCompilerSyntaxException: Method/Identifier 
> expected at character 37 in "tableName == t1 && dbName == t2 && ()"
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processPrimary(JDOQLParser.java:810)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processUnaryExpression(JDOQLParser.java:656)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processMultiplicativeExpression(JDOQLParser.java:582)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processAdditiveExpression(JDOQLParser.java:553)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processRelationalExpression(JDOQLParser.java:467)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processAndExpression(JDOQLParser.java:450)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processExclusiveOrExpression(JDOQLParser.java:436)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processInclusiveOrExpression(JDOQLParser.java:422)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processConditionalAndExpression(JDOQLParser.java:408)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processConditionalOrExpression(JDOQLParser.java:389)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processExpression(JDOQLParser.java:378)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processPrimary(JDOQLParser.java:785)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processUnaryExpression(JDOQLParser.java:656)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processMultiplicativeExpression(JDOQLParser.java:582)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processAdditiveExpression(JDOQLParser.java:553)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processRelationalExpression(JDOQLParser.java:467)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processAndExpression(JDOQLParser.java:450)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processExclusiveOrExpression(JDOQLParser.java:436)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processInclusiveOrExpression(JDOQLParser.java:422)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processConditionalAndExpression(JDOQLParser.java:412)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processConditionalOrExpression(JDOQLParser.java:389)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processExpression(JDOQLParser.java:378)
>   at org.datanucleus.query.compiler.JDOQLParser.parse(JDOQLParser.java:99)
>   at 
> org.datanucleus.query.compiler.JavaQueryCompiler.compileFilter(JavaQueryCompiler.java:467)
>   at 
> org.datanucleus.query.compiler.JDOQLCompiler.compile(JDOQLCompiler.java:113)
>   at 
> org.datanucleus.store.query.AbstractJDOQLQuery.compileInternal(AbstractJDOQLQuery.java:367)
>   at 
> org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:240)
>   at org.datanucleus.store.query.Query.executeQuery(Query.java:1744)
>   at org.datanucleus.store.query.Query.executeWithArray(Query.java:1672)
>   at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:312)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getMTableColumnStatistics(ObjectStore.java:6505)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.access$1200(ObjectStore.java:171)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore$9.getJdoResult(ObjectStore.java:6566)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore$9.getJdoResult(ObjectStore.java:6555)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2629)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatisticsInternal(ObjectStore.java:6554)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatistics(ObjectStore.java:6548)
>   at 

[jira] [Updated] (HIVE-15476) ObjectStore.getMTableColumnStatistics() should check if colNames is empty

2016-12-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15476:

Attachment: HIVE-15476.2.patch

patch-2: minor change to add comment for the function.

> ObjectStore.getMTableColumnStatistics() should check if colNames is empty
> -
>
> Key: HIVE-15476
> URL: https://issues.apache.org/jira/browse/HIVE-15476
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Minor
> Attachments: HIVE-15476.1.patch, HIVE-15476.2.patch
>
>
> See the following exception in the log. Can't find out which exact query 
> causes it though.
> {noformat}
> [pool-4-thread-31]: Exception thrown
> Method/Identifier expected at character 37 in "tableName == t1 && dbName == 
> t2 && ()"
> org.datanucleus.store.query.QueryCompilerSyntaxException: Method/Identifier 
> expected at character 37 in "tableName == t1 && dbName == t2 && ()"
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processPrimary(JDOQLParser.java:810)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processUnaryExpression(JDOQLParser.java:656)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processMultiplicativeExpression(JDOQLParser.java:582)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processAdditiveExpression(JDOQLParser.java:553)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processRelationalExpression(JDOQLParser.java:467)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processAndExpression(JDOQLParser.java:450)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processExclusiveOrExpression(JDOQLParser.java:436)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processInclusiveOrExpression(JDOQLParser.java:422)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processConditionalAndExpression(JDOQLParser.java:408)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processConditionalOrExpression(JDOQLParser.java:389)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processExpression(JDOQLParser.java:378)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processPrimary(JDOQLParser.java:785)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processUnaryExpression(JDOQLParser.java:656)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processMultiplicativeExpression(JDOQLParser.java:582)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processAdditiveExpression(JDOQLParser.java:553)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processRelationalExpression(JDOQLParser.java:467)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processAndExpression(JDOQLParser.java:450)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processExclusiveOrExpression(JDOQLParser.java:436)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processInclusiveOrExpression(JDOQLParser.java:422)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processConditionalAndExpression(JDOQLParser.java:412)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processConditionalOrExpression(JDOQLParser.java:389)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processExpression(JDOQLParser.java:378)
>   at org.datanucleus.query.compiler.JDOQLParser.parse(JDOQLParser.java:99)
>   at 
> org.datanucleus.query.compiler.JavaQueryCompiler.compileFilter(JavaQueryCompiler.java:467)
>   at 
> org.datanucleus.query.compiler.JDOQLCompiler.compile(JDOQLCompiler.java:113)
>   at 
> org.datanucleus.store.query.AbstractJDOQLQuery.compileInternal(AbstractJDOQLQuery.java:367)
>   at 
> org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:240)
>   at org.datanucleus.store.query.Query.executeQuery(Query.java:1744)
>   at org.datanucleus.store.query.Query.executeWithArray(Query.java:1672)
>   at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:312)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getMTableColumnStatistics(ObjectStore.java:6505)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.access$1200(ObjectStore.java:171)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore$9.getJdoResult(ObjectStore.java:6566)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore$9.getJdoResult(ObjectStore.java:6555)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2629)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatisticsInternal(ObjectStore.java:6554)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatistics(ObjectStore.java:6548)
>   at 

[jira] [Commented] (HIVE-15360) Nested column pruning: add pruned column paths to explain output

2016-12-22 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771135#comment-15771135
 ] 

Chao Sun commented on HIVE-15360:
-

Test failure for {{orc_ppd_basic}} not related. Couldn't reproduce it locally. 
Also it didn't appear in the previous runs.

> Nested column pruning: add pruned column paths to explain output
> 
>
> Key: HIVE-15360
> URL: https://issues.apache.org/jira/browse/HIVE-15360
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-15360.1.patch, HIVE-15360.2.patch, 
> HIVE-15360.3.patch
>
>
> We should add the pruned nested column paths to the explain output for easier 
> tracing and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15416) CAST to string does not work for large decimal numbers

2016-12-22 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771131#comment-15771131
 ] 

Daniel Dai commented on HIVE-15416:
---

Yes, it is fair to still consider it is a bug, as UDFToString is a builtin UDF 
and is implicitly invoked by Hive. We shall convert all builtin UDFs to 
GenericUDF. As for length of varchar, I should say it is varchar(41) to avoid 
the situation you described, the size is just a maximum.

> CAST to string does not work for large decimal numbers
> --
>
> Key: HIVE-15416
> URL: https://issues.apache.org/jira/browse/HIVE-15416
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Pavel Benes
>Assignee: Daniel Dai
>
> The cast of large decimal values to string does not work and produces NULL 
> values. 
> Steps to reproduce:
> {code}
> hive> create table test_hive_bug30(decimal_col DECIMAL(30,0));
> OK
> {code}
> {code}
> hive> insert into test_hive_bug30 VALUES (123), 
> (9), 
> (99),(999);
> Query ID = benesp_20161212135717_5d16d7f4-7b84-409e-ad00-36085deaae54
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1480833176011_2469)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 ..   SUCCEEDED  1  100   0  
>  0
> 
> VERTICES: 01/01  [==>>] 100%  ELAPSED TIME: 7.69 s
> 
> Loading data to table default.test_hive_bug30
> Table default.test_hive_bug30 stats: [numFiles=1, numRows=4, totalSize=68, 
> rawDataSize=64]
> OK
> Time taken: 8.239 seconds
> {code}
> {code}
> hive> select CAST(decimal_col AS STRING) from test_hive_bug30;
> OK
> 123
> NULL
> NULL
> NULL
> Time taken: 0.043 seconds, Fetched: 4 row(s)
> {code}
> The numbers with 29 and 30 digits should be exported, but they are converted 
> to NULL instead. 
> The values are stored correctly as can be seen here:
> {code}
> hive> select * from test_hive_bug30;
> OK
> 123
> 9
> 99
> NULL
> Time taken: 0.447 seconds, Fetched: 4 row(s)
> {code}
> The same issue does not exists for smaller numbers (e.g. DECIMAL(10)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15055) Column pruning for nested fields in Parquet

2016-12-22 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15055:

Attachment: benchmark-hos.pdf

> Column pruning for nested fields in Parquet
> ---
>
> Key: HIVE-15055
> URL: https://issues.apache.org/jira/browse/HIVE-15055
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer, Physical Optimizer, 
> Serializers/Deserializers
>Reporter: Chao Sun
>Assignee: Chao Sun
>  Labels: performance
> Attachments: benchmark-hos.pdf, design-doc-nested-column-pruning.pdf
>
>
> Some columnar file formats such as Parquet store fields in struct type also 
> column by column using encoding described in Google Dramel pager. It's very 
> common in big data where data are stored in structs while queries only needs 
> a subset of the the fields in the structs. However, presently Hive still 
> needs to read the whole struct regardless whether all fields are selected. 
> Therefore, pruning unwanted sub-fields in struct or nested fields at file 
> reading time would be a big performance boost for such scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15498) sum() over (order by c) should default the windowing spec to RangeBoundarySpec

2016-12-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15498:

Attachment: HIVE-15498.2.patch

patch-2: Update affected unit tests. 

> sum() over (order by c) should default the windowing spec to RangeBoundarySpec
> --
>
> Key: HIVE-15498
> URL: https://issues.apache.org/jira/browse/HIVE-15498
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 2.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15498.1.patch, HIVE-15498.2.patch
>
>
> Currently {{sum() over (partition by a)}} without order by is defaulted 
> windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
> c)}} is defaulted to ValueBoundarySpec. It's not consistent and the user gets 
> confused of the switch from "rows between" to "range between" by adding 
> "order by c" clause.
> From the comment 
> {noformat}
>   /*
>* - A Window Frame that has only the /start/boundary, then it is 
> interpreted as:
>  BETWEEN  AND CURRENT ROW
>* - A Window Specification with an Order Specification and no Window
>*   Frame is interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
>* - A Window Specification with no Order and no Window Frame is 
> interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
>*/
> {noformat}
> We intended to set as "row between" (RangeBoundarySpec), not "range between" 
> (ValueBoundarySpec). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15501) Add INSTR and MONTHS_BETWEEN to UDFs that are Vectorized

2016-12-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771086#comment-15771086
 ] 

Hive QA commented on HIVE-15501:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844451/HIVE-15501.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10892 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_udf1]
 (batchId=148)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=229)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2700/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2700/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2700/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844451 - PreCommit-HIVE-Build

> Add INSTR and MONTHS_BETWEEN to UDFs that are Vectorized
> 
>
> Key: HIVE-15501
> URL: https://issues.apache.org/jira/browse/HIVE-15501
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15501.01.patch, HIVE-15501.02.patch
>
>
> Add INSTR to special list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15501) Add INSTR and MONTHS_BETWEEN to UDFs that are Vectorized

2016-12-22 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771076#comment-15771076
 ] 

Pengcheng Xiong commented on HIVE-15501:


+1

> Add INSTR and MONTHS_BETWEEN to UDFs that are Vectorized
> 
>
> Key: HIVE-15501
> URL: https://issues.apache.org/jira/browse/HIVE-15501
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15501.01.patch, HIVE-15501.02.patch
>
>
> Add INSTR to special list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15055) Column pruning for nested fields in Parquet

2016-12-22 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15055:

Attachment: design-doc-nested-column-pruning.pdf

> Column pruning for nested fields in Parquet
> ---
>
> Key: HIVE-15055
> URL: https://issues.apache.org/jira/browse/HIVE-15055
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer, Physical Optimizer, 
> Serializers/Deserializers
>Reporter: Chao Sun
>Assignee: Chao Sun
>  Labels: performance
> Attachments: design-doc-nested-column-pruning.pdf
>
>
> Some columnar file formats such as Parquet store fields in struct type also 
> column by column using encoding described in Google Dramel pager. It's very 
> common in big data where data are stored in structs while queries only needs 
> a subset of the the fields in the structs. However, presently Hive still 
> needs to read the whole struct regardless whether all fields are selected. 
> Therefore, pruning unwanted sub-fields in struct or nested fields at file 
> reading time would be a big performance boost for such scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15055) Column pruning for nested fields in Parquet

2016-12-22 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15055:

Labels: performance  (was: )

> Column pruning for nested fields in Parquet
> ---
>
> Key: HIVE-15055
> URL: https://issues.apache.org/jira/browse/HIVE-15055
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer, Physical Optimizer, 
> Serializers/Deserializers
>Reporter: Chao Sun
>Assignee: Chao Sun
>  Labels: performance
>
> Some columnar file formats such as Parquet store fields in struct type also 
> column by column using encoding described in Google Dramel pager. It's very 
> common in big data where data are stored in structs while queries only needs 
> a subset of the the fields in the structs. However, presently Hive still 
> needs to read the whole struct regardless whether all fields are selected. 
> Therefore, pruning unwanted sub-fields in struct or nested fields at file 
> reading time would be a big performance boost for such scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15055) Column pruning for nested fields in Parquet

2016-12-22 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15055:

Component/s: Serializers/Deserializers

> Column pruning for nested fields in Parquet
> ---
>
> Key: HIVE-15055
> URL: https://issues.apache.org/jira/browse/HIVE-15055
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer, Physical Optimizer, 
> Serializers/Deserializers
>Reporter: Chao Sun
>Assignee: Chao Sun
>
> Some columnar file formats such as Parquet store fields in struct type also 
> column by column using encoding described in Google Dramel pager. It's very 
> common in big data where data are stored in structs while queries only needs 
> a subset of the the fields in the structs. However, presently Hive still 
> needs to read the whole struct regardless whether all fields are selected. 
> Therefore, pruning unwanted sub-fields in struct or nested fields at file 
> reading time would be a big performance boost for such scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15501) Add INSTR and MONTHS_BETWEEN to UDFs that are Vectorized

2016-12-22 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15501:

Attachment: HIVE-15501.02.patch

> Add INSTR and MONTHS_BETWEEN to UDFs that are Vectorized
> 
>
> Key: HIVE-15501
> URL: https://issues.apache.org/jira/browse/HIVE-15501
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15501.01.patch, HIVE-15501.02.patch
>
>
> Add INSTR to special list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15501) Add INSTR and MONTHS_BETWEEN to UDFs that are Vectorized

2016-12-22 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15501:

Summary: Add INSTR and MONTHS_BETWEEN to UDFs that are Vectorized  (was: 
Add INSTR to UDFs that are Vectorized)

> Add INSTR and MONTHS_BETWEEN to UDFs that are Vectorized
> 
>
> Key: HIVE-15501
> URL: https://issues.apache.org/jira/browse/HIVE-15501
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15501.01.patch
>
>
> Add INSTR to special list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15476) ObjectStore.getMTableColumnStatistics() should check if colNames is empty

2016-12-22 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771028#comment-15771028
 ] 

Naveen Gangam commented on HIVE-15476:
--

[~aihuaxu] When the query ends up being "tableName == t1 && dbName == t2 && 
()", we would want the query to be {{"tableName == t1 && dbName == t2}} right? 
so that should still return the stats for all columns in the table? With the 
patch, we just return when the column list is empty, so no table stats for that 
table are returned. 
What should the behavior be? Thanks

> ObjectStore.getMTableColumnStatistics() should check if colNames is empty
> -
>
> Key: HIVE-15476
> URL: https://issues.apache.org/jira/browse/HIVE-15476
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Minor
> Attachments: HIVE-15476.1.patch
>
>
> See the following exception in the log. Can't find out which exact query 
> causes it though.
> {noformat}
> [pool-4-thread-31]: Exception thrown
> Method/Identifier expected at character 37 in "tableName == t1 && dbName == 
> t2 && ()"
> org.datanucleus.store.query.QueryCompilerSyntaxException: Method/Identifier 
> expected at character 37 in "tableName == t1 && dbName == t2 && ()"
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processPrimary(JDOQLParser.java:810)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processUnaryExpression(JDOQLParser.java:656)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processMultiplicativeExpression(JDOQLParser.java:582)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processAdditiveExpression(JDOQLParser.java:553)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processRelationalExpression(JDOQLParser.java:467)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processAndExpression(JDOQLParser.java:450)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processExclusiveOrExpression(JDOQLParser.java:436)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processInclusiveOrExpression(JDOQLParser.java:422)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processConditionalAndExpression(JDOQLParser.java:408)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processConditionalOrExpression(JDOQLParser.java:389)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processExpression(JDOQLParser.java:378)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processPrimary(JDOQLParser.java:785)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processUnaryExpression(JDOQLParser.java:656)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processMultiplicativeExpression(JDOQLParser.java:582)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processAdditiveExpression(JDOQLParser.java:553)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processRelationalExpression(JDOQLParser.java:467)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processAndExpression(JDOQLParser.java:450)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processExclusiveOrExpression(JDOQLParser.java:436)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processInclusiveOrExpression(JDOQLParser.java:422)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processConditionalAndExpression(JDOQLParser.java:412)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processConditionalOrExpression(JDOQLParser.java:389)
>   at 
> org.datanucleus.query.compiler.JDOQLParser.processExpression(JDOQLParser.java:378)
>   at org.datanucleus.query.compiler.JDOQLParser.parse(JDOQLParser.java:99)
>   at 
> org.datanucleus.query.compiler.JavaQueryCompiler.compileFilter(JavaQueryCompiler.java:467)
>   at 
> org.datanucleus.query.compiler.JDOQLCompiler.compile(JDOQLCompiler.java:113)
>   at 
> org.datanucleus.store.query.AbstractJDOQLQuery.compileInternal(AbstractJDOQLQuery.java:367)
>   at 
> org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:240)
>   at org.datanucleus.store.query.Query.executeQuery(Query.java:1744)
>   at org.datanucleus.store.query.Query.executeWithArray(Query.java:1672)
>   at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:312)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getMTableColumnStatistics(ObjectStore.java:6505)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.access$1200(ObjectStore.java:171)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore$9.getJdoResult(ObjectStore.java:6566)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore$9.getJdoResult(ObjectStore.java:6555)
>   at 
> 

[jira] [Updated] (HIVE-15489) Alternatively use table scan stats for HoS

2016-12-22 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15489:

Attachment: HIVE-15489.wip.patch

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15489.wip.patch
>
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15489) Alternatively use table scan stats for HoS

2016-12-22 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15489:

Attachment: (was: HIVE-15489.wip.patch)

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15360) Nested column pruning: add pruned column paths to explain output

2016-12-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771001#comment-15771001
 ] 

Hive QA commented on HIVE-15360:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/1282/HIVE-15360.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10865 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=144)

[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_char_mapjoin1.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=134)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2699/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2699/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2699/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 1282 - PreCommit-HIVE-Build

> Nested column pruning: add pruned column paths to explain output
> 
>
> Key: HIVE-15360
> URL: https://issues.apache.org/jira/browse/HIVE-15360
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-15360.1.patch, HIVE-15360.2.patch, 
> HIVE-15360.3.patch
>
>
> We should add the pruned nested column paths to the explain output for easier 
> tracing and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15499) Nested column pruning: don't prune paths when a SerDe is used only for serializing

2016-12-22 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15499:

Status: Patch Available  (was: In Progress)

> Nested column pruning: don't prune paths when a SerDe is used only for 
> serializing
> --
>
> Key: HIVE-15499
> URL: https://issues.apache.org/jira/browse/HIVE-15499
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15499.1.patch
>
>
> In {{FileSinkOperator}}, a serializer is created to write output data. When 
> initializing it we should not read the 
> {{ColumnProjectionUtils.READ_NESTED_COLUMN_PATH_CONF_STR}} property since 
> this is only used for the read path, and the path may not match the schema 
> for the output table (for instance, in the case of insert).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-15499) Nested column pruning: don't prune paths when a SerDe is used only for serializing

2016-12-22 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-15499 started by Chao Sun.
---
> Nested column pruning: don't prune paths when a SerDe is used only for 
> serializing
> --
>
> Key: HIVE-15499
> URL: https://issues.apache.org/jira/browse/HIVE-15499
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15499.1.patch
>
>
> In {{FileSinkOperator}}, a serializer is created to write output data. When 
> initializing it we should not read the 
> {{ColumnProjectionUtils.READ_NESTED_COLUMN_PATH_CONF_STR}} property since 
> this is only used for the read path, and the path may not match the schema 
> for the output table (for instance, in the case of insert).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-15499) Nested column pruning: don't prune paths when a SerDe is used only for serializing

2016-12-22 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-15499 started by Chao Sun.
---
> Nested column pruning: don't prune paths when a SerDe is used only for 
> serializing
> --
>
> Key: HIVE-15499
> URL: https://issues.apache.org/jira/browse/HIVE-15499
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15499.1.patch
>
>
> In {{FileSinkOperator}}, a serializer is created to write output data. When 
> initializing it we should not read the 
> {{ColumnProjectionUtils.READ_NESTED_COLUMN_PATH_CONF_STR}} property since 
> this is only used for the read path, and the path may not match the schema 
> for the output table (for instance, in the case of insert).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15499) Nested column pruning: don't prune paths when a SerDe is used only for serializing

2016-12-22 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15499:

Attachment: HIVE-15499.1.patch

> Nested column pruning: don't prune paths when a SerDe is used only for 
> serializing
> --
>
> Key: HIVE-15499
> URL: https://issues.apache.org/jira/browse/HIVE-15499
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15499.1.patch
>
>
> In {{FileSinkOperator}}, a serializer is created to write output data. When 
> initializing it we should not read the 
> {{ColumnProjectionUtils.READ_NESTED_COLUMN_PATH_CONF_STR}} property since 
> this is only used for the read path, and the path may not match the schema 
> for the output table (for instance, in the case of insert).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work stopped] (HIVE-15499) Nested column pruning: don't prune paths when a SerDe is used only for serializing

2016-12-22 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-15499 stopped by Chao Sun.
---
> Nested column pruning: don't prune paths when a SerDe is used only for 
> serializing
> --
>
> Key: HIVE-15499
> URL: https://issues.apache.org/jira/browse/HIVE-15499
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15499.1.patch
>
>
> In {{FileSinkOperator}}, a serializer is created to write output data. When 
> initializing it we should not read the 
> {{ColumnProjectionUtils.READ_NESTED_COLUMN_PATH_CONF_STR}} property since 
> this is only used for the read path, and the path may not match the schema 
> for the output table (for instance, in the case of insert).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-15500) fix the test failure dbtxnmgr_showlocks

2016-12-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu resolved HIVE-15500.
-
Resolution: Cannot Reproduce

It's already fixed by HIVE-15376 minutes ago.

> fix the test failure dbtxnmgr_showlocks
> ---
>
> Key: HIVE-15500
> URL: https://issues.apache.org/jira/browse/HIVE-15500
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Attachments: HIVE-15500.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15500) fix the test failure dbtxnmgr_showlocks

2016-12-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15500:

Assignee: (was: Aihua Xu)

> fix the test failure dbtxnmgr_showlocks
> ---
>
> Key: HIVE-15500
> URL: https://issues.apache.org/jira/browse/HIVE-15500
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Priority: Trivial
> Attachments: HIVE-15500.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15501) Add INSTR to UDFs that are Vectorized

2016-12-22 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15770928#comment-15770928
 ] 

Pengcheng Xiong commented on HIVE-15501:


[~mmccline], there is one more
{code}
2016-12-22T10:43:32,066  INFO [f0f0bd80-b9b2-4890-b66d-e8ad22526f35 main] 
physical.Vectorizer: Cannot vectorize UDF 
GenericUDFMonthsBetween(Column[_col3], Column[_col90])
2016-12-22T10:43:32,066  INFO [f0f0bd80-b9b2-4890-b66d-e8ad22526f35 main] 
physical.Vectorizer: Cannot vectorize map work key expression
2016-12-22T10:43:32,066  INFO [f0f0bd80-b9b2-4890-b66d-e8ad22526f35 main] 
physical.Vectorizer: MapWork Operator: MAPJOIN could not be vectorized.
{code}

> Add INSTR to UDFs that are Vectorized
> -
>
> Key: HIVE-15501
> URL: https://issues.apache.org/jira/browse/HIVE-15501
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15501.01.patch
>
>
> Add INSTR to special list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15501) Add INSTR to UDFs that are Vectorized

2016-12-22 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15501:

Status: Patch Available  (was: Open)

> Add INSTR to UDFs that are Vectorized
> -
>
> Key: HIVE-15501
> URL: https://issues.apache.org/jira/browse/HIVE-15501
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15501.01.patch
>
>
> Add INSTR to special list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15501) Add INSTR to UDFs that are Vectorized

2016-12-22 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15501:

Attachment: HIVE-15501.01.patch

> Add INSTR to UDFs that are Vectorized
> -
>
> Key: HIVE-15501
> URL: https://issues.apache.org/jira/browse/HIVE-15501
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15501.01.patch
>
>
> Add INSTR to special list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15500) fix the test failure dbtxnmgr_showlocks

2016-12-22 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15770911#comment-15770911
 ] 

Prasanth Jayachandran commented on HIVE-15500:
--

already fixed in HIVE-15376?

> fix the test failure dbtxnmgr_showlocks
> ---
>
> Key: HIVE-15500
> URL: https://issues.apache.org/jira/browse/HIVE-15500
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Attachments: HIVE-15500.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15500) fix the test failure dbtxnmgr_showlocks

2016-12-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15500:

Attachment: HIVE-15500.1.patch

> fix the test failure dbtxnmgr_showlocks
> ---
>
> Key: HIVE-15500
> URL: https://issues.apache.org/jira/browse/HIVE-15500
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Trivial
> Attachments: HIVE-15500.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15473) Progress Bar on Beeline client

2016-12-22 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15770877#comment-15770877
 ] 

Thejas M Nair commented on HIVE-15473:
--

The information we present right now (in ux-demo.gif link), is reasonably 
generic for progress notification. Different execution engines would have 
different terminology and that can be addressed by sending different 
labels/headers with progress information. 
The first option is generic enough for use cases we know or anticipate and is 
also simpler. I think we should go for that.


> Progress Bar on Beeline client
> --
>
> Key: HIVE-15473
> URL: https://issues.apache.org/jira/browse/HIVE-15473
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, HiveServer2
>Affects Versions: 2.1.1
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
>
> Hive Cli allows showing progress bar for tez execution engine as shown in 
> https://issues.apache.org/jira/secure/attachment/12678767/ux-demo.gif
> it would be great to have similar progress bar displayed when user is 
> connecting via beeline command line client as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15489) Alternatively use table scan stats for HoS

2016-12-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15770840#comment-15770840
 ] 

Hive QA commented on HIVE-15489:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/1280/HIVE-15489.wip.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2698/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2698/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2698/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2016-12-22 19:12:08.482
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-2698/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2016-12-22 19:12:08.484
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   4ba713c..ee35ccb  master -> origin/master
+ git reset --hard HEAD
HEAD is now at 4ba713c HIVE-15335: Fast Decimal (Matt McCline, reviewed by 
Sergey Shelukhin, Prasanth Jayachandran, Owen O'Malley)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 2 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at ee35ccb HIVE-15376 : Improve heartbeater scheduling for 
transactions (Wei Zheng, reviewed by Eugene Koifman)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2016-12-22 19:12:09.663
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: 
a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java:
 No such file or directory
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 1280 - PreCommit-HIVE-Build

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15489.wip.patch
>
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15498) sum() over (order by c) should default the windowing spec to RangeBoundarySpec

2016-12-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15770833#comment-15770833
 ] 

Hive QA commented on HIVE-15498:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844434/HIVE-15498.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 28 failed/errored test(s), 10865 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=144)

[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_char_mapjoin1.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_windowing] 
(batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_windowing] 
(batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_showlocks] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[leadlag] (batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ptf_general_queries] 
(batchId=6)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[temp_table_windowing_expressions]
 (batchId=58)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[windowing_decimal] 
(batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[windowing_expressions] 
(batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[windowing_multipartitioning]
 (batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[windowing_udaf] 
(batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[windowing_windowspec] 
(batchId=16)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_windowing_2]
 (batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_windowing]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage2] 
(batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage3] 
(batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ptf] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ptf_streaming]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[special_character_in_tabnames_1]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_ptf]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[windowing] 
(batchId=147)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ptf] (batchId=101)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ptf_general_queries]
 (batchId=97)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ptf_streaming] 
(batchId=116)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] 
(batchId=121)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[windowing] 
(batchId=116)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2697/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2697/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2697/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 28 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844434 - PreCommit-HIVE-Build

> sum() over (order by c) should default the windowing spec to RangeBoundarySpec
> --
>
> Key: HIVE-15498
> URL: https://issues.apache.org/jira/browse/HIVE-15498
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 2.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15498.1.patch
>
>
> Currently {{sum() over (partition by a)}} without order by is defaulted 
> windowing to 

[jira] [Updated] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-22 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15376:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks Eugene for review!

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Fix For: 2.2.0
>
> Attachments: HIVE-15376.1.patch, HIVE-15376.10.patch, 
> HIVE-15376.11.patch, HIVE-15376.12.patch, HIVE-15376.13.patch, 
> HIVE-15376.14.patch, HIVE-15376.2.patch, HIVE-15376.3.patch, 
> HIVE-15376.4.patch, HIVE-15376.5.patch, HIVE-15376.6.patch, 
> HIVE-15376.7.patch, HIVE-15376.8.patch, HIVE-15376.9.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15113) SHOW CREATE TABLE on skewed table returns statement without skew definition

2016-12-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15113:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks Yongzhi for reviewing.

I didn't change the output for the special case of 1 skew column since such 
syntax with () also works. 

> SHOW CREATE TABLE on skewed table returns statement without skew definition
> ---
>
> Key: HIVE-15113
> URL: https://issues.apache.org/jira/browse/HIVE-15113
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Wojciech Meler
>Assignee: Aihua Xu
> Fix For: 2.2.0
>
> Attachments: HIVE-15113.1.patch, HIVE-15113.2.patch
>
>
> CREATE TABLE IF NOT EXISTS testskew (key int, value STRING)
> SKEWED BY (key) ON (1,5,6) STORED AS DIRECTORIES
> STORED AS ORC;
> SHOW CREATE TABLE testskew;
> CREATE TABLE `testskew`(
> 2   `key` int, 
> 3   `value` string)
> 4 ROW FORMAT SERDE 
> 5   'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
> 6 STORED AS INPUTFORMAT 
> 7   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
> 8 OUTPUTFORMAT 
> 9   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> 10LOCATION
> 11  'hdfs://nameservice1/user/hive/warehouse/private_wmeler.db/testskew'
> 12TBLPROPERTIES (
> 13  'COLUMN_STATS_ACCURATE'='true', 
> 14  'numFiles'='4', 
> 15  'numRows'='19', 
> 16  'rawDataSize'='1736', 
> 17  'totalSize'='1184', 
> 18  'transient_lastDdlTime'='1478098814')



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-22 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15770721#comment-15770721
 ] 

Wei Zheng commented on HIVE-15376:
--

HIVE-15345 fixed some typos in the code, but didn't update the q out files. 
That caused ptest to keep failing for this test.
Hearbeat -> Heartbeat

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.10.patch, 
> HIVE-15376.11.patch, HIVE-15376.12.patch, HIVE-15376.13.patch, 
> HIVE-15376.14.patch, HIVE-15376.2.patch, HIVE-15376.3.patch, 
> HIVE-15376.4.patch, HIVE-15376.5.patch, HIVE-15376.6.patch, 
> HIVE-15376.7.patch, HIVE-15376.8.patch, HIVE-15376.9.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15360) Nested column pruning: add pruned column paths to explain output

2016-12-22 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15360:

Attachment: HIVE-15360.3.patch

> Nested column pruning: add pruned column paths to explain output
> 
>
> Key: HIVE-15360
> URL: https://issues.apache.org/jira/browse/HIVE-15360
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-15360.1.patch, HIVE-15360.2.patch, 
> HIVE-15360.3.patch
>
>
> We should add the pruned nested column paths to the explain output for easier 
> tracing and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15489) Alternatively use table scan stats for HoS

2016-12-22 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15489:

Attachment: (was: HIVE-15489.wip.patch)

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15489.wip.patch
>
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15489) Alternatively use table scan stats for HoS

2016-12-22 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15489:

Attachment: HIVE-15489.wip.patch

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15489.wip.patch
>
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14956) Parallelize TestHCatLoader

2016-12-22 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15770677#comment-15770677
 ] 

Vihang Karajgaonkar commented on HIVE-14956:


Failed tests are unrelated. [~vgumashta] can you please review? Thanks!

> Parallelize TestHCatLoader
> --
>
> Key: HIVE-14956
> URL: https://issues.apache.org/jira/browse/HIVE-14956
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-14956.01.patch, HIVE-14956.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15498) sum() over (order by c) should default the windowing spec to RangeBoundarySpec

2016-12-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15498:

Description: 
Currently {{sum() over (partition by a)}} without order by is defaulted 
windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
c)}} is defaulted to ValueBoundarySpec. It's not consistent and the user gets 
confused of the switch from "rows between" to "range between" by adding "order 
by c" clause.

>From the comment 
{noformat}
  /*
   * - A Window Frame that has only the /start/boundary, then it is interpreted 
as:
 BETWEEN  AND CURRENT ROW
   * - A Window Specification with an Order Specification and no Window
   *   Frame is interpreted as:
 ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
   * - A Window Specification with no Order and no Window Frame is interpreted 
as:
 ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
   */
{noformat}
We intended to set as "row between" (RangeBoundarySpec), not "range between" 
(ValueBoundarySpec). 


  was:
Currently {{sum() over (partition by a)}} without order by is defaulted 
windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
c)}} is defaulted to ValueBoundarySpec.

>From the comment 
{noformat}
  /*
   * - A Window Frame that has only the /start/boundary, then it is interpreted 
as:
 BETWEEN  AND CURRENT ROW
   * - A Window Specification with an Order Specification and no Window
   *   Frame is interpreted as:
 ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
   * - A Window Specification with no Order and no Window Frame is interpreted 
as:
 ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
   */
{noformat}
We intended to set as "row between" (RangeBoundarySpec), not "range between" 
(ValueBoundarySpec). 



> sum() over (order by c) should default the windowing spec to RangeBoundarySpec
> --
>
> Key: HIVE-15498
> URL: https://issues.apache.org/jira/browse/HIVE-15498
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 2.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15498.1.patch
>
>
> Currently {{sum() over (partition by a)}} without order by is defaulted 
> windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
> c)}} is defaulted to ValueBoundarySpec. It's not consistent and the user gets 
> confused of the switch from "rows between" to "range between" by adding 
> "order by c" clause.
> From the comment 
> {noformat}
>   /*
>* - A Window Frame that has only the /start/boundary, then it is 
> interpreted as:
>  BETWEEN  AND CURRENT ROW
>* - A Window Specification with an Order Specification and no Window
>*   Frame is interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
>* - A Window Specification with no Order and no Window Frame is 
> interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
>*/
> {noformat}
> We intended to set as "row between" (RangeBoundarySpec), not "range between" 
> (ValueBoundarySpec). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15498) sum() over (order by c) should default the windowing spec to RangeBoundarySpec

2016-12-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15498:

Description: 
Currently {{sum() over (partition by a)}} without order by is defaulted 
windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
c)}} is defaulted to ValueBoundarySpec.

>From the comment 
{noformat}
  /*
   * - A Window Frame that has only the /start/boundary, then it is interpreted 
as:
 BETWEEN  AND CURRENT ROW
   * - A Window Specification with an Order Specification and no Window
   *   Frame is interpreted as:
 ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
   * - A Window Specification with no Order and no Window Frame is interpreted 
as:
 ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
   */
{noformat}
We intended to set as "row between" (RangeBoundarySpec), not "range between" 
(ValueBoundarySpec). 


  was:
Currently {{sum() over (partition by a)}} without order by is defaulted 
windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
c)}} is defaulted to ValueBoundarySpec.

>From the comment 
{noformat}
  /*
   * - A Window Frame that has only the /start/boundary, then it is interpreted 
as:
 BETWEEN  AND CURRENT ROW
   * - A Window Specification with an Order Specification and no Window
   *   Frame is interpreted as:
 ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
   * - A Window Specification with no Order and no Window Frame is interpreted 
as:
 ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
   */
{noformat}
We intended to set as "row between", not "range between". 



> sum() over (order by c) should default the windowing spec to RangeBoundarySpec
> --
>
> Key: HIVE-15498
> URL: https://issues.apache.org/jira/browse/HIVE-15498
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 2.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15498.1.patch
>
>
> Currently {{sum() over (partition by a)}} without order by is defaulted 
> windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
> c)}} is defaulted to ValueBoundarySpec.
> From the comment 
> {noformat}
>   /*
>* - A Window Frame that has only the /start/boundary, then it is 
> interpreted as:
>  BETWEEN  AND CURRENT ROW
>* - A Window Specification with an Order Specification and no Window
>*   Frame is interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
>* - A Window Specification with no Order and no Window Frame is 
> interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
>*/
> {noformat}
> We intended to set as "row between" (RangeBoundarySpec), not "range between" 
> (ValueBoundarySpec). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15498) sum() over (order by c) should default the windowing spec to RangeBoundarySpec

2016-12-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15498:

Description: 
Currently {{sum() over (partition by a)}} without order by is defaulted 
windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
c)}} is defaulted to ValueBoundarySpec.

>From the comment 
{noformat}
  /*
   * - A Window Frame that has only the /start/boundary, then it is interpreted 
as:
 BETWEEN  AND CURRENT ROW
   * - A Window Specification with an Order Specification and no Window
   *   Frame is interpreted as:
 ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
   * - A Window Specification with no Order and no Window Frame is interpreted 
as:
 ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
   */
{noformat}
We were trying to set as "row between", not "range between". 


  was:
Currently {{sum() over (partition by a)}} without order by will default 
windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
c)}} will default to ValueBoundarySpec.

>From the comment 
{noformat}
  /*
   * - A Window Frame that has only the /start/boundary, then it is interpreted 
as:
 BETWEEN  AND CURRENT ROW
   * - A Window Specification with an Order Specification and no Window
   *   Frame is interpreted as:
 ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
   * - A Window Specification with no Order and no Window Frame is interpreted 
as:
 ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
   */
{noformat}
We were trying to set as "row between", not "range between". 



> sum() over (order by c) should default the windowing spec to RangeBoundarySpec
> --
>
> Key: HIVE-15498
> URL: https://issues.apache.org/jira/browse/HIVE-15498
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 2.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15498.1.patch
>
>
> Currently {{sum() over (partition by a)}} without order by is defaulted 
> windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
> c)}} is defaulted to ValueBoundarySpec.
> From the comment 
> {noformat}
>   /*
>* - A Window Frame that has only the /start/boundary, then it is 
> interpreted as:
>  BETWEEN  AND CURRENT ROW
>* - A Window Specification with an Order Specification and no Window
>*   Frame is interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
>* - A Window Specification with no Order and no Window Frame is 
> interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
>*/
> {noformat}
> We were trying to set as "row between", not "range between". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15498) sum() over (order by c) should default the windowing spec to RangeBoundarySpec

2016-12-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15498:

Description: 
Currently {{sum() over (partition by a)}} without order by is defaulted 
windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
c)}} is defaulted to ValueBoundarySpec.

>From the comment 
{noformat}
  /*
   * - A Window Frame that has only the /start/boundary, then it is interpreted 
as:
 BETWEEN  AND CURRENT ROW
   * - A Window Specification with an Order Specification and no Window
   *   Frame is interpreted as:
 ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
   * - A Window Specification with no Order and no Window Frame is interpreted 
as:
 ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
   */
{noformat}
We intended to set as "row between", not "range between". 


  was:
Currently {{sum() over (partition by a)}} without order by is defaulted 
windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
c)}} is defaulted to ValueBoundarySpec.

>From the comment 
{noformat}
  /*
   * - A Window Frame that has only the /start/boundary, then it is interpreted 
as:
 BETWEEN  AND CURRENT ROW
   * - A Window Specification with an Order Specification and no Window
   *   Frame is interpreted as:
 ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
   * - A Window Specification with no Order and no Window Frame is interpreted 
as:
 ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
   */
{noformat}
We were trying to set as "row between", not "range between". 



> sum() over (order by c) should default the windowing spec to RangeBoundarySpec
> --
>
> Key: HIVE-15498
> URL: https://issues.apache.org/jira/browse/HIVE-15498
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 2.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15498.1.patch
>
>
> Currently {{sum() over (partition by a)}} without order by is defaulted 
> windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
> c)}} is defaulted to ValueBoundarySpec.
> From the comment 
> {noformat}
>   /*
>* - A Window Frame that has only the /start/boundary, then it is 
> interpreted as:
>  BETWEEN  AND CURRENT ROW
>* - A Window Specification with an Order Specification and no Window
>*   Frame is interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
>* - A Window Specification with no Order and no Window Frame is 
> interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
>*/
> {noformat}
> We intended to set as "row between", not "range between". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15498) sum() over (order by c) should default the windowing spec to RangeBoundarySpec

2016-12-22 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15770642#comment-15770642
 ] 

Aihua Xu commented on HIVE-15498:
-

The only change in the code is:

{noformat}
wFrame = new WindowFrameSpec(   
   new ValueBoundarySpec(Direction.PRECEDING, BoundarySpec.UNBOUNDED_AMOUNT),   

  new CurrentRowSpec()  
);
{noformat}

to 
{noformat}
wFrame = new WindowFrameSpec(   
   new RangeBoundarySpec(Direction.PRECEDING, BoundarySpec.UNBOUNDED_AMOUNT),   

  new CurrentRowSpec()  
);
{noformat}

RangeBoundarySpec is for "ROWS BETWEEN" and ValueBoundarySpec is for "RANGE 
BETWEEN". So it's a little confusing.

> sum() over (order by c) should default the windowing spec to RangeBoundarySpec
> --
>
> Key: HIVE-15498
> URL: https://issues.apache.org/jira/browse/HIVE-15498
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 2.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15498.1.patch
>
>
> Currently {{sum() over (partition by a)}} without order by will default 
> windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
> c)}} will default to ValueBoundarySpec.
> From the comment 
> {noformat}
>   /*
>* - A Window Frame that has only the /start/boundary, then it is 
> interpreted as:
>  BETWEEN  AND CURRENT ROW
>* - A Window Specification with an Order Specification and no Window
>*   Frame is interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
>* - A Window Specification with no Order and no Window Frame is 
> interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
>*/
> {noformat}
> We were trying to set as "row between", not "range between". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15498) sum() over (order by c) should default the windowing spec to RangeBoundarySpec

2016-12-22 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15770637#comment-15770637
 ] 

Aihua Xu commented on HIVE-15498:
-

That RangeBoundarySpec should be what the user expects and with that, streaming 
is supported. With ValueBoundarySpec, streaming is not supported, so the 
performance is bad. 

> sum() over (order by c) should default the windowing spec to RangeBoundarySpec
> --
>
> Key: HIVE-15498
> URL: https://issues.apache.org/jira/browse/HIVE-15498
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 2.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15498.1.patch
>
>
> Currently {{sum() over (partition by a)}} without order by will default 
> windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
> c)}} will default to ValueBoundarySpec.
> From the comment 
> {noformat}
>   /*
>* - A Window Frame that has only the /start/boundary, then it is 
> interpreted as:
>  BETWEEN  AND CURRENT ROW
>* - A Window Specification with an Order Specification and no Window
>*   Frame is interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
>* - A Window Specification with no Order and no Window Frame is 
> interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
>*/
> {noformat}
> We were trying to set as "row between", not "range between". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15498) sum() over (order by c) should default the windowing spec to RangeBoundarySpec

2016-12-22 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15770625#comment-15770625
 ] 

Aihua Xu commented on HIVE-15498:
-

patch-1: change the default to RangeBoundarySpec for the case sum() over (order 
by c)  to be consistent.

> sum() over (order by c) should default the windowing spec to RangeBoundarySpec
> --
>
> Key: HIVE-15498
> URL: https://issues.apache.org/jira/browse/HIVE-15498
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 2.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15498.1.patch
>
>
> Currently {{sum() over (partition by a)}} without order by will default 
> windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
> c)}} will default to ValueBoundarySpec.
> From the comment 
> {noformat}
>   /*
>* - A Window Frame that has only the /start/boundary, then it is 
> interpreted as:
>  BETWEEN  AND CURRENT ROW
>* - A Window Specification with an Order Specification and no Window
>*   Frame is interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
>* - A Window Specification with no Order and no Window Frame is 
> interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
>*/
> {noformat}
> We were trying to set as "row between", not "range between". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15498) sum() over (order by c) should default the windowing spec to RangeBoundarySpec

2016-12-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15498:

Status: Patch Available  (was: Open)

> sum() over (order by c) should default the windowing spec to RangeBoundarySpec
> --
>
> Key: HIVE-15498
> URL: https://issues.apache.org/jira/browse/HIVE-15498
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 2.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15498.1.patch
>
>
> Currently {{sum() over (partition by a)}} without order by will default 
> windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
> c)}} will default to ValueBoundarySpec.
> From the comment 
> {noformat}
>   /*
>* - A Window Frame that has only the /start/boundary, then it is 
> interpreted as:
>  BETWEEN  AND CURRENT ROW
>* - A Window Specification with an Order Specification and no Window
>*   Frame is interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
>* - A Window Specification with no Order and no Window Frame is 
> interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
>*/
> {noformat}
> We were trying to set as "row between", not "range between". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15498) sum() over (order by c) should default the windowing spec to RangeBoundarySpec

2016-12-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15498:

Attachment: HIVE-15498.1.patch

> sum() over (order by c) should default the windowing spec to RangeBoundarySpec
> --
>
> Key: HIVE-15498
> URL: https://issues.apache.org/jira/browse/HIVE-15498
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 2.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-15498.1.patch
>
>
> Currently {{sum() over (partition by a)}} without order by will default 
> windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
> c)}} will default to ValueBoundarySpec.
> From the comment 
> {noformat}
>   /*
>* - A Window Frame that has only the /start/boundary, then it is 
> interpreted as:
>  BETWEEN  AND CURRENT ROW
>* - A Window Specification with an Order Specification and no Window
>*   Frame is interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
>* - A Window Specification with no Order and no Window Frame is 
> interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
>*/
> {noformat}
> We were trying to set as "row between", not "range between". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15498) sum() over (order by c) should default the windowing spec to RangeBoundarySpec

2016-12-22 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-15498:

Description: 
Currently {{sum() over (partition by a)}} without order by will default 
windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
c)}} will default to ValueBoundarySpec.

>From the comment 
{noformat}
  /*
   * - A Window Frame that has only the /start/boundary, then it is interpreted 
as:
 BETWEEN  AND CURRENT ROW
   * - A Window Specification with an Order Specification and no Window
   *   Frame is interpreted as:
 ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
   * - A Window Specification with no Order and no Window Frame is interpreted 
as:
 ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
   */
{noformat}
We were trying to set as "row between", not "range between". 


  was:
Currently {{sum() over (partition by a)}} without order by will default 
windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
c)}} will default to ValueBoundarySpec.

>From the comment 
{noformat}
  /*
   * - A Window Frame that has only the /start/boundary, then it is interpreted 
as:
 BETWEEN  AND CURRENT ROW
   * - A Window Specification with an Order Specification and no Window
   *   Frame is interpreted as:
 ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
   * - A Window Specification with no Order and no Window Frame is interpreted 
as:
 ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
   */
{noformat}
We were trying to set as "row between". 



> sum() over (order by c) should default the windowing spec to RangeBoundarySpec
> --
>
> Key: HIVE-15498
> URL: https://issues.apache.org/jira/browse/HIVE-15498
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 2.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>
> Currently {{sum() over (partition by a)}} without order by will default 
> windowing to RangeBoundarySpec while  {{sum() over (partition by a order by 
> c)}} will default to ValueBoundarySpec.
> From the comment 
> {noformat}
>   /*
>* - A Window Frame that has only the /start/boundary, then it is 
> interpreted as:
>  BETWEEN  AND CURRENT ROW
>* - A Window Specification with an Order Specification and no Window
>*   Frame is interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
>* - A Window Specification with no Order and no Window Frame is 
> interpreted as:
>  ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
>*/
> {noformat}
> We were trying to set as "row between", not "range between". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15112) Implement Parquet vectorization reader for Struct type

2016-12-22 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15770555#comment-15770555
 ] 

Chao Sun commented on HIVE-15112:
-

[~Ferd] what about {{VectorizedDictionaryEncodingColumnReaderTest}} and 
{{VectorizedColumnReaderTest}}? do you need to still call then
{{TestVectorizedDictionaryEncodingColumnReader}} and 
{{TestVectorizedColumnReader}}? otherwise test will not be triggered on these 
two I think.

> Implement Parquet vectorization reader for Struct type
> --
>
> Key: HIVE-15112
> URL: https://issues.apache.org/jira/browse/HIVE-15112
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Fix For: 2.2.0
>
> Attachments: HIVE-15112.1.patch, HIVE-15112.2.patch, 
> HIVE-15112.3.patch, HIVE-15112.patch
>
>
> Like HIVE-14815, we need support Parquet vectorized reader for struct type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-22 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15770530#comment-15770530
 ] 

Eugene Koifman commented on HIVE-15376:
---

+1 patch 14

Why did the output in dbtxnmgr_showlocks.q.out change?  I can't tell what the 
diff is

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.10.patch, 
> HIVE-15376.11.patch, HIVE-15376.12.patch, HIVE-15376.13.patch, 
> HIVE-15376.14.patch, HIVE-15376.2.patch, HIVE-15376.3.patch, 
> HIVE-15376.4.patch, HIVE-15376.5.patch, HIVE-15376.6.patch, 
> HIVE-15376.7.patch, HIVE-15376.8.patch, HIVE-15376.9.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12971) Hive Support for Kudu

2016-12-22 Thread bimal tandel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15770351#comment-15770351
 ] 

bimal tandel commented on HIVE-12971:
-

My plan is to complete the storage handler for the latest version of kudu next 
week.  

> Hive Support for Kudu
> -
>
> Key: HIVE-12971
> URL: https://issues.apache.org/jira/browse/HIVE-12971
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Lenni Kuff
>Assignee: bimal tandel
>
> JIRA for tracking work related to Hive/Kudu integration.
> It would be useful to allow Kudu data to be accessible via Hive. This would 
> involve creating a Kudu SerDe/StorageHandler and implementing support for 
> QUERY and DML commands like SELECT, INSERT, UPDATE, and DELETE. Kudu 
> Input/OutputFormats classes already exist. The work can be staged to support 
> this functionality incrementally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15497) Unthrown SerDeException in ThriftJDBCBinarySerDe.java

2016-12-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15770237#comment-15770237
 ] 

ASF GitHub Bot commented on HIVE-15497:
---

GitHub user lifove opened a pull request:

https://github.com/apache/hive/pull/126

HIVE-15497: Fix an unthrown SerDeException

https://issues.apache.org/jira/browse/HIVE-15497

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lifove/hive HIVE-15497

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/126.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #126


commit 76cd811bd9372d45a75103344aa932f00338e10b
Author: lifove 
Date:   2016-12-22T14:53:08Z

HIVE-15497: Fix an unthrown SerDeException




> Unthrown SerDeException in ThriftJDBCBinarySerDe.java
> -
>
> Key: HIVE-15497
> URL: https://issues.apache.org/jira/browse/HIVE-15497
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Jaechang Nam
>Priority: Trivial
> Attachments: HIVE-15497.txt
>
>
> There is an unthrown SerDeException in 
> serde/src/java/org/apache/hadoop/hive/serde2/thrift/ThriftJDBCBinarySerDe.java
>  (found in the currenet github snapshot, 
> 4ba713ccd85c3706d195aeef9476e6e6363f1c21)
> {code}
>  91 initializeRowAndColumns();
>  92 try {
>  93   thriftFormatter.initialize(conf, tbl);
>  94 } catch (Exception e) {
>  95   new SerDeException(e);
>  96 }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15497) Unthrown SerDeException in ThriftJDBCBinarySerDe.java

2016-12-22 Thread Jaechang Nam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaechang Nam updated HIVE-15497:

Attachment: HIVE-15497.txt

> Unthrown SerDeException in ThriftJDBCBinarySerDe.java
> -
>
> Key: HIVE-15497
> URL: https://issues.apache.org/jira/browse/HIVE-15497
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Jaechang Nam
>Priority: Trivial
> Attachments: HIVE-15497.txt
>
>
> There is an unthrown SerDeException in 
> serde/src/java/org/apache/hadoop/hive/serde2/thrift/ThriftJDBCBinarySerDe.java
>  (found in the currenet github snapshot, 
> 4ba713ccd85c3706d195aeef9476e6e6363f1c21)
> {code}
>  91 initializeRowAndColumns();
>  92 try {
>  93   thriftFormatter.initialize(conf, tbl);
>  94 } catch (Exception e) {
>  95   new SerDeException(e);
>  96 }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15324) Enable round() function to accept scale argument as non-constants

2016-12-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15770057#comment-15770057
 ] 

Hive QA commented on HIVE-15324:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844382/HIVE-15324.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10895 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_showlocks] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_round] (batchId=70)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=134)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2696/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2696/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2696/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844382 - PreCommit-HIVE-Build

> Enable round() function to accept scale argument as non-constants
> -
>
> Key: HIVE-15324
> URL: https://issues.apache.org/jira/browse/HIVE-15324
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-15324.1.patch, HIVE-15324.patch
>
>
> round() function should accept  scale argument as non-constants, it will 
> enable queries like: 
> {quote}
> create table sampletable(c double, d int);
> select round(c,d) from sampletable;
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15325) Scale is greater than decimal values trunc(d,s) returns wrong results

2016-12-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15769971#comment-15769971
 ] 

Hive QA commented on HIVE-15325:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844372/HIVE-15325.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10880 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=102)

[vector_decimal_aggregate.q,ppd_join3.q,auto_join23.q,join10.q,union_remove_11.q,union_ppr.q,union_remove_19.q,join32.q,groupby_multi_single_reducer2.q,input18.q,stats3.q,parquet_join.q,join26.q,groupby1.q,join_reorder2.q]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_showlocks] 
(batchId=71)
org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener.org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener
 (batchId=209)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2695/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2695/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2695/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844372 - PreCommit-HIVE-Build

> Scale is greater than decimal values trunc(d,s) returns wrong results
> -
>
> Key: HIVE-15325
> URL: https://issues.apache.org/jira/browse/HIVE-15325
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-15325.patch
>
>
> Scale is greater than decimal values trunc(d,s) returns wrong results.
> {quote}
> select trunc(1234567891.1234567891,15), trunc(1234567891.1234567891,25), 
> trunc(1234567891.1234567891,20), trunc(1234567891.1234567891,50) FROM src 
> tablesample (1 rows);
> {quote}
> Add tests with negative numbers as well as no-op (e.g. select trunc (12.34, 
> 100))



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15416) CAST to string does not work for large decimal numbers

2016-12-22 Thread Pavel Benes (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15769952#comment-15769952
 ] 

Pavel Benes commented on HIVE-15416:


Thanks for your reply, I can confirm the approach you describe works. I have 
however two comments to it:
 - It does not look like a proper solution, but rather workaround. The user may 
not be aware of those UDF/GenericUDF subtleties and even in case he is, he 
still needs to treat the DECIMAL columns differently than other types, that can 
be cast to string without problems.
 - Is the VARCHAR size 38 enough to contain even the largest decimal values, 
possibly also including minus sign and decimal separator?


> CAST to string does not work for large decimal numbers
> --
>
> Key: HIVE-15416
> URL: https://issues.apache.org/jira/browse/HIVE-15416
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Pavel Benes
>Assignee: Daniel Dai
>
> The cast of large decimal values to string does not work and produces NULL 
> values. 
> Steps to reproduce:
> {code}
> hive> create table test_hive_bug30(decimal_col DECIMAL(30,0));
> OK
> {code}
> {code}
> hive> insert into test_hive_bug30 VALUES (123), 
> (9), 
> (99),(999);
> Query ID = benesp_20161212135717_5d16d7f4-7b84-409e-ad00-36085deaae54
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1480833176011_2469)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 ..   SUCCEEDED  1  100   0  
>  0
> 
> VERTICES: 01/01  [==>>] 100%  ELAPSED TIME: 7.69 s
> 
> Loading data to table default.test_hive_bug30
> Table default.test_hive_bug30 stats: [numFiles=1, numRows=4, totalSize=68, 
> rawDataSize=64]
> OK
> Time taken: 8.239 seconds
> {code}
> {code}
> hive> select CAST(decimal_col AS STRING) from test_hive_bug30;
> OK
> 123
> NULL
> NULL
> NULL
> Time taken: 0.043 seconds, Fetched: 4 row(s)
> {code}
> The numbers with 29 and 30 digits should be exported, but they are converted 
> to NULL instead. 
> The values are stored correctly as can be seen here:
> {code}
> hive> select * from test_hive_bug30;
> OK
> 123
> 9
> 99
> NULL
> Time taken: 0.447 seconds, Fetched: 4 row(s)
> {code}
> The same issue does not exists for smaller numbers (e.g. DECIMAL(10)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >