[jira] [Comment Edited] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified

2017-08-16 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129923#comment-16129923
 ] 

liyunzhang_intel edited comment on HIVE-17321 at 8/17/17 5:06 AM:
--

[~lirui]: patch looks good. the changes about *q.out is all about orc?


was (Author: kellyzly):
[~lirui]: patch looks good. But I  have 1 question, why the statistic of 
limitpushdown.q changes? 
before
{code}
if ((OrcInputFormat.class.isAssignableFrom(inputFormat) ||
  MapredParquetInputFormat.class.isAssignableFrom(inputFormat)) && 
(noScan || partialScan)) {

{code}

Now
{code}
  if ((OrcInputFormat.class.isAssignableFrom(inputFormat) ||
  MapredParquetInputFormat.class.isAssignableFrom(inputFormat))
{code}
If the InputFormat is TextFile, i think your patch will not change the result.  
 If my understanding is not right, tell me.

> HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan 
> is not specified
> -
>
> Key: HIVE-17321
> URL: https://issues.apache.org/jira/browse/HIVE-17321
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Attachments: HIVE-17321.1.patch, HIVE-17321.2.patch
>
>
> Need to implement HIVE-9560 for Spark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified

2017-08-16 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129923#comment-16129923
 ] 

liyunzhang_intel edited comment on HIVE-17321 at 8/17/17 5:06 AM:
--

[~lirui]: patch looks good.Are the changes about *q.out all about orc?


was (Author: kellyzly):
[~lirui]: patch looks good. the changes about *q.out is all about orc?

> HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan 
> is not specified
> -
>
> Key: HIVE-17321
> URL: https://issues.apache.org/jira/browse/HIVE-17321
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Attachments: HIVE-17321.1.patch, HIVE-17321.2.patch
>
>
> Need to implement HIVE-9560 for Spark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified

2017-08-16 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129923#comment-16129923
 ] 

liyunzhang_intel commented on HIVE-17321:
-

[~lirui]: patch looks good. But I  have 1 question, why the statistic of 
limitpushdown.q changes? 
before
{code}
if ((OrcInputFormat.class.isAssignableFrom(inputFormat) ||
  MapredParquetInputFormat.class.isAssignableFrom(inputFormat)) && 
(noScan || partialScan)) {

{code}

Now
{code}
  if ((OrcInputFormat.class.isAssignableFrom(inputFormat) ||
  MapredParquetInputFormat.class.isAssignableFrom(inputFormat))
{code}
If the InputFormat is TextFile, i think your patch will not change the result.  
 If my understanding is not right, tell me.

> HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan 
> is not specified
> -
>
> Key: HIVE-17321
> URL: https://issues.apache.org/jira/browse/HIVE-17321
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Attachments: HIVE-17321.1.patch, HIVE-17321.2.patch
>
>
> Need to implement HIVE-9560 for Spark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified

2017-08-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129911#comment-16129911
 ] 

Xuefu Zhang commented on HIVE-17321:


+1 patch looks good to me. [~kellyzly], please let us know if you have more 
questions/comments.

> HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan 
> is not specified
> -
>
> Key: HIVE-17321
> URL: https://issues.apache.org/jira/browse/HIVE-17321
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Attachments: HIVE-17321.1.patch, HIVE-17321.2.patch
>
>
> Need to implement HIVE-9560 for Spark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15104) Hive on Spark generate more shuffle data than hive on mr

2017-08-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129895#comment-16129895
 ] 

Xuefu Zhang commented on HIVE-15104:


Hi [~lirui], thanks for continuing the work here. The improvement is impressive 
and not much perf degradation is observed. Let me get back my old benchmarks 
and see if those patches help.

> Hive on Spark generate more shuffle data than hive on mr
> 
>
> Key: HIVE-15104
> URL: https://issues.apache.org/jira/browse/HIVE-15104
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.2.1
>Reporter: wangwenli
>Assignee: Rui Li
> Attachments: HIVE-15104.1.patch, HIVE-15104.2.patch, 
> HIVE-15104.3.patch, HIVE-15104.4.patch, TPC-H 100G.xlsx
>
>
> the same sql,  running on spark  and mr engine, will generate different size 
> of shuffle data.
> i think it is because of hive on mr just serialize part of HiveKey, but hive 
> on spark which using kryo will serialize full of Hivekey object.  
> what is your opionion?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17343) create a mechanism to get rid of some globals in HS2

2017-08-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129890#comment-16129890
 ] 

Hive QA commented on HIVE-17343:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12882262/HIVE-17343.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6431/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6431/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6431/

Messages:
{noformat}
 This message was trimmed, see log for full details 
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MGlobalPrivilege
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MDBPrivilege
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MTablePrivilege
ENHANCED (Persistable) : 
org.apache.hadoop.hive.metastore.model.MPartitionPrivilege
ENHANCED (Persistable) : 
org.apache.hadoop.hive.metastore.model.MTableColumnPrivilege
ENHANCED (Persistable) : 
org.apache.hadoop.hive.metastore.model.MPartitionColumnPrivilege
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MPartitionEvent
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MMasterKey
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MDelegationToken
ENHANCED (Persistable) : 
org.apache.hadoop.hive.metastore.model.MTableColumnStatistics
ENHANCED (Persistable) : 
org.apache.hadoop.hive.metastore.model.MPartitionColumnStatistics
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MVersionTable
ENHANCED (Persistable) : 
org.apache.hadoop.hive.metastore.model.MMetastoreDBProperties
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MResourceUri
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MFunction
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MNotificationLog
ENHANCED (Persistable) : 
org.apache.hadoop.hive.metastore.model.MNotificationNextId
DataNucleus Enhancer completed with success for 31 classes. Timings : input=281 
ms, enhance=314 ms, total=595 ms. Consult the log for full details
ANTLR Parser Generator  Version 3.5.2
Output file 
/data/hiveptest/working/apache-github-source-source/metastore/target/generated-sources/antlr3/org/apache/hadoop/hive/metastore/parser/FilterParser.java
 does not exist: must build 
/data/hiveptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g
org/apache/hadoop/hive/metastore/parser/Filter.g
ANTLR Parser Generator  Version 3.5.2
Output file 
/data/hiveptest/working/apache-github-source-source/ql/target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HiveLexer.java
 does not exist: must build 
/data/hiveptest/working/apache-github-source-source/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g
org/apache/hadoop/hive/ql/parse/HiveLexer.g
Output file 
/data/hiveptest/working/apache-github-source-source/ql/target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HiveParser.java
 does not exist: must build 
/data/hiveptest/working/apache-github-source-source/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
org/apache/hadoop/hive/ql/parse/HiveParser.g
Output file 
/data/hiveptest/working/apache-github-source-source/ql/target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HintParser.java
 does not exist: must build 
/data/hiveptest/working/apache-github-source-source/ql/src/java/org/apache/hadoop/hive/ql/parse/HintParser.g
org/apache/hadoop/hive/ql/parse/HintParser.g
Generating vector expression code
Generating vector expression test code
ERROR StatusLogger No log4j2 configuration file found. Using default 
configuration: logging only errors to the console.
[ERROR] COMPILATION ERROR : 
[ERROR] 
/data/hiveptest/working/apache-github-source-source/hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatLoaderEncryption.java:[278,9]
 method getForHiveCommandInternal in class 
org.apache.hadoop.hive.ql.processors.CommandProcessorFactory cannot be applied 
to given types;
  required: 
java.lang.String[],org.apache.hadoop.hive.conf.HiveConf,org.apache.hadoop.hive.ql.session.HiveServerEnvironment,boolean
  found: java.lang.String[],org.apache.hadoop.hive.conf.HiveConf,boolean
  reason: actual and formal argument lists differ in length
[ERROR] ResourceManager : unable to find resource 'VM_global_library.vm' in any 
resource loader.
Loading source files for package org.apache.hive.hcatalog.templeton...
[parsing started 
RegularFileObject[/data/hiveptest/working/apache-github-source-source/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/CompleteDelegator.java]]
[parsing completed 35ms]
[parsing started 

[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified

2017-08-16 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129891#comment-16129891
 ] 

Rui Li commented on HIVE-17321:
---

Latest failures are not related. Changes to the golden files are all about 
statistics which is expected.
[~kellyzly], [~xuefuz] could you take a look? Thanks.

> HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan 
> is not specified
> -
>
> Key: HIVE-17321
> URL: https://issues.apache.org/jira/browse/HIVE-17321
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Attachments: HIVE-17321.1.patch, HIVE-17321.2.patch
>
>
> Need to implement HIVE-9560 for Spark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17343) create a mechanism to get rid of some globals in HS2

2017-08-16 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-17343:
---

Assignee: Sergey Shelukhin

> create a mechanism to get rid of some globals in HS2
> 
>
> Key: HIVE-17343
> URL: https://issues.apache.org/jira/browse/HIVE-17343
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17343.patch
>
>
> The intent is to initialize things once in HS2 ctor/init, and then be able to 
> access them from queries, etc. without using globals or threadlocals.
> Things like future workload management work, LLAP coordinator, materialized 
> view registry, etc. could be accessed this way.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified

2017-08-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129875#comment-16129875
 ] 

Hive QA commented on HIVE-17321:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12882252/HIVE-17321.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10977 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6430/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6430/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6430/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12882252 - PreCommit-HIVE-Build

> HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan 
> is not specified
> -
>
> Key: HIVE-17321
> URL: https://issues.apache.org/jira/browse/HIVE-17321
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Attachments: HIVE-17321.1.patch, HIVE-17321.2.patch
>
>
> Need to implement HIVE-9560 for Spark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17343) create a mechanism to get rid of some globals in HS2

2017-08-16 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17343:

Description: 
The intent is to initialize things once in HS2 ctor/init, and then be able to 
access them from queries, etc. without using globals or threadlocals.
Things like future workload management work, LLAP coordinator, materialized 
view registry, etc. could be accessed this way.

  was:The intent is to initialize things once in HS2 ctor/init, and then be 
able to access them from queries, etc. without using globals or threadlocals.


> create a mechanism to get rid of some globals in HS2
> 
>
> Key: HIVE-17343
> URL: https://issues.apache.org/jira/browse/HIVE-17343
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: HIVE-17343.patch
>
>
> The intent is to initialize things once in HS2 ctor/init, and then be able to 
> access them from queries, etc. without using globals or threadlocals.
> Things like future workload management work, LLAP coordinator, materialized 
> view registry, etc. could be accessed this way.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17343) create a mechanism to get rid of some globals in HS2

2017-08-16 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17343:

Status: Patch Available  (was: Open)

> create a mechanism to get rid of some globals in HS2
> 
>
> Key: HIVE-17343
> URL: https://issues.apache.org/jira/browse/HIVE-17343
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: HIVE-17343.patch
>
>
> The intent is to initialize things once in HS2 ctor/init, and then be able to 
> access them from queries, etc. without using globals or threadlocals.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17343) create a mechanism to get rid of some globals in HS2

2017-08-16 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17343:

Attachment: HIVE-17343.patch

[~thejas] does this make sense? I might add this to a couple more places, esp. 
the repetitive code in all the tasks that should do the same thing as TezTask. 


> create a mechanism to get rid of some globals in HS2
> 
>
> Key: HIVE-17343
> URL: https://issues.apache.org/jira/browse/HIVE-17343
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: HIVE-17343.patch
>
>
> The intent is to initialize things once in HS2 ctor/init, and then be able to 
> access them from queries, etc. without using globals or threadlocals.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15104) Hive on Spark generate more shuffle data than hive on mr

2017-08-16 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129871#comment-16129871
 ] 

Rui Li commented on HIVE-15104:
---

Hi [~xuefuz], with HIVE-17114 and HIVE-17321 the benchmark results become more 
stable and the improvement is a little higher. Here's the latest [100GB TPC-DS 
result|https://docs.google.com/spreadsheets/d/1ba-AbUpJOHNb0_5PZyWQHzrH4wfRljMxQP9vA9JACHg/edit?usp=sharing].
Would you mind share your benchmark tool so that I can look into the perf 
degradation? Thanks.

> Hive on Spark generate more shuffle data than hive on mr
> 
>
> Key: HIVE-15104
> URL: https://issues.apache.org/jira/browse/HIVE-15104
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.2.1
>Reporter: wangwenli
>Assignee: Rui Li
> Attachments: HIVE-15104.1.patch, HIVE-15104.2.patch, 
> HIVE-15104.3.patch, HIVE-15104.4.patch, TPC-H 100G.xlsx
>
>
> the same sql,  running on spark  and mr engine, will generate different size 
> of shuffle data.
> i think it is because of hive on mr just serialize part of HiveKey, but hive 
> on spark which using kryo will serialize full of Hivekey object.  
> what is your opionion?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17343) create a mechanism to get rid of some globals in HS2

2017-08-16 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17343:

Summary: create a mechanism to get rid of some globals in HS2  (was: create 
a mechanism to get rid of globals in HS2)

> create a mechanism to get rid of some globals in HS2
> 
>
> Key: HIVE-17343
> URL: https://issues.apache.org/jira/browse/HIVE-17343
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> The intent is to initialize things once in HS2 ctor/init, and then be able to 
> access them from queries, etc. without using globals or threadlocals.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17286) Avoid expensive String serialization/deserialization for bitvectors

2017-08-16 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129832#comment-16129832
 ] 

Ashutosh Chauhan commented on HIVE-17286:
-

+1 pending tests

> Avoid expensive String serialization/deserialization for bitvectors
> ---
>
> Key: HIVE-17286
> URL: https://issues.apache.org/jira/browse/HIVE-17286
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17286.01.patch, HIVE-17286.02.patch, 
> HIVE-17286.03.patch, HIVE-17286.04.patch, HIVE-17286.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores

2017-08-16 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129807#comment-16129807
 ] 

Rui Li commented on HIVE-17292:
---

Hi [~pvary], as Xuefu agrees, let's only fix the yarn tests for now. Please 
provide a polished patch and create an RB for it. Thanks.

> Change TestMiniSparkOnYarnCliDriver test configuration to use the configured 
> cores
> --
>
> Key: HIVE-17292
> URL: https://issues.apache.org/jira/browse/HIVE-17292
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark, Test
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17292.1.patch, HIVE-17292.2.patch, 
> HIVE-17292.3.patch, HIVE-17292.5.patch
>
>
> Currently the {{hive-site.xml}} for the {{TestMiniSparkOnYarnCliDriver}} test 
> defines 2 cores, and 2 executors, but only 1 is used, because the MiniCluster 
> does not allows the creation of the 3rd container.
> The FairScheduler uses 1GB increments for memory, but the containers would 
> like to use only 512MB. We should change the fairscheduler configuration to 
> use only the requested 512MB



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified

2017-08-16 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-17321:
--
Attachment: HIVE-17321.2.patch

Update golden files

> HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan 
> is not specified
> -
>
> Key: HIVE-17321
> URL: https://issues.apache.org/jira/browse/HIVE-17321
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Minor
> Attachments: HIVE-17321.1.patch, HIVE-17321.2.patch
>
>
> Need to implement HIVE-9560 for Spark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17342) Where condition with 1=0 should be treated similar to limit 0

2017-08-16 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-17342:
---
Description: 
In some cases, queries may get executed with where condition mentioning to 
"1=0" to get schema. E.g 
{noformat}
SELECT * FROM (select avg(d_year) as  y from date_dim where d_year>1999) q 
WHERE 1=0
{noformat}

Currently hive executes the query; it would be good to consider this similar to 
"limit 0" which does not execute the query.

{code}
hive> explain SELECT * FROM (select avg(d_year) as  y from date_dim where 
d_year>1999) q WHERE 1=0;
OK
Plan optimized by CBO.

Vertex dependency in root stage
Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)

Stage-0
  Fetch Operator
limit:-1
Stage-1
  Reducer 2 vectorized, llap
  File Output Operator [FS_13]
Group By Operator [GBY_12] (rows=1 width=76)
  Output:["_col0"],aggregations:["avg(VALUE._col0)"]
<-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized, llap
  PARTITION_ONLY_SHUFFLE [RS_11]
Group By Operator [GBY_10] (rows=1 width=76)
  Output:["_col0"],aggregations:["avg(d_year)"]
  Filter Operator [FIL_9] (rows=1 width=0)
predicate:false
TableScan [TS_0] (rows=1 width=0)
  
default@date_dim,date_dim,Tbl:PARTIAL,Col:NONE,Output:["d_year"]
{code}

It does generate 0 splits, but does send a DAG plan to the AM and receive 0 
rows as output.

  was:
In some cases, queries may get executed with where condition mentioning to 
"1=0" to get schema. E.g 
{noformat}
SELECT * FROM (select avg(d_year) as  y from date_dim where d_year>1999) q 
WHERE 1=0
{noformat}

Currently hive executes the query; it would be good to consider this similar to 
"limit 0" which does not execute the query.


> Where condition with 1=0 should be treated similar to limit 0
> -
>
> Key: HIVE-17342
> URL: https://issues.apache.org/jira/browse/HIVE-17342
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
>
> In some cases, queries may get executed with where condition mentioning to 
> "1=0" to get schema. E.g 
> {noformat}
> SELECT * FROM (select avg(d_year) as  y from date_dim where d_year>1999) q 
> WHERE 1=0
> {noformat}
> Currently hive executes the query; it would be good to consider this similar 
> to "limit 0" which does not execute the query.
> {code}
> hive> explain SELECT * FROM (select avg(d_year) as  y from date_dim where 
> d_year>1999) q WHERE 1=0;
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized, llap
>   File Output Operator [FS_13]
> Group By Operator [GBY_12] (rows=1 width=76)
>   Output:["_col0"],aggregations:["avg(VALUE._col0)"]
> <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized, llap
>   PARTITION_ONLY_SHUFFLE [RS_11]
> Group By Operator [GBY_10] (rows=1 width=76)
>   Output:["_col0"],aggregations:["avg(d_year)"]
>   Filter Operator [FIL_9] (rows=1 width=0)
> predicate:false
> TableScan [TS_0] (rows=1 width=0)
>   
> default@date_dim,date_dim,Tbl:PARTIAL,Col:NONE,Output:["d_year"]
> {code}
> It does generate 0 splits, but does send a DAG plan to the AM and receive 0 
> rows as output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17340) TxnHandler.checkLock() - reduce number of SQL statements

2017-08-16 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17340:
--
Summary: TxnHandler.checkLock() - reduce number of SQL statements  (was: 
TxnHandler.checkLock(Connection dbConn, long extLockId) optimization)

> TxnHandler.checkLock() - reduce number of SQL statements
> 
>
> Key: HIVE-17340
> URL: https://issues.apache.org/jira/browse/HIVE-17340
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> This calls acquire(Connection dbConn, Statement stmt, long extLockId, 
> LockInfo lockInfo)
> for each lock in the same DB transaction - 1 Update stmt per acquire().
> There is no reason all of them cannot be sent in 1 statement if all the locks 
> are granted
> With a lot of partitions this can be a perf issue



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17341) DbTxnManger.startHeartbeat() - randomize initial delay

2017-08-16 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17341:
--
Summary: DbTxnManger.startHeartbeat() - randomize initial delay  (was: 
DbTxnManger.startHeartbeat())

> DbTxnManger.startHeartbeat() - randomize initial delay
> --
>
> Key: HIVE-17341
> URL: https://issues.apache.org/jira/browse/HIVE-17341
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> This sets up a fixed delay for all heartebeats.  If many queries land on the 
> server at the same time,
> they will wake up and start hearbeating at the same time causing a bottleneck.
> Add some random element to heatbeat delay.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17341) DbTxnManger.startHeartbeat()

2017-08-16 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17341:
-


> DbTxnManger.startHeartbeat()
> 
>
> Key: HIVE-17341
> URL: https://issues.apache.org/jira/browse/HIVE-17341
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> This sets up a fixed delay for all heartebeats.  If many queries land on the 
> server at the same time,
> they will wake up and start hearbeating at the same time causing a bottleneck.
> Add some random element to heatbeat delay.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17340) TxnHandler.checkLock(Connection dbConn, long extLockId) optimization

2017-08-16 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17340:
-


> TxnHandler.checkLock(Connection dbConn, long extLockId) optimization
> 
>
> Key: HIVE-17340
> URL: https://issues.apache.org/jira/browse/HIVE-17340
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> This calls acquire(Connection dbConn, Statement stmt, long extLockId, 
> LockInfo lockInfo)
> for each lock in the same DB transaction - 1 Update stmt per acquire().
> There is no reason all of them cannot be sent in 1 statement if all the locks 
> are granted
> With a lot of partitions this can be a perf issue



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17100) Improve HS2 operation logs for REPL commands.

2017-08-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129712#comment-16129712
 ] 

Hive QA commented on HIVE-17100:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12882203/HIVE-17100.04.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10977 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6429/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6429/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6429/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12882203 - PreCommit-HIVE-Build

> Improve HS2 operation logs for REPL commands.
> -
>
> Key: HIVE-17100
> URL: https://issues.apache.org/jira/browse/HIVE-17100
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch, 
> HIVE-17100.03.patch, HIVE-17100.04.patch
>
>
> It is necessary to log the progress the replication tasks in a structured 
> manner as follows.
> *+Bootstrap Dump:+*
> * At the start of bootstrap dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (BOOTSTRAP)
> * (Estimated) Total number of tables/views to dump
> * (Estimated) Total number of functions to dump.
> * Dump Start Time{color}
> * After each table dump, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table dump end time
> * Table dump progress. Format is Table sequence no/(Estimated) Total number 
> of tables and views.{color}
> * After each function dump, will add a log as follows
> {color:#59afe1}* Function Name
> * Function dump end time
> * Function dump progress. Format is Function sequence no/(Estimated) Total 
> number of functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> dump.
> {color:#59afe1}* Database Name.
> * Dump Type (BOOTSTRAP).
> * Dump End Time.
> * (Actual) Total number of tables/views dumped.
> * (Actual) Total number of functions dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The actual and estimated number of tables/functions may not match if 
> any table/function is dropped when dump in progress.
> *+Bootstrap Load:+*
> * At the start of bootstrap load, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump directory
> * Load Type (BOOTSTRAP)
> * Total number of tables/views to load
> * Total number of functions to load.
> * Load Start Time{color}
> * After each table load, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table load completion time
> * Table load progress. Format is Table sequence no/Total number of tables and 
> views.{color}
> * After each function load, will add a log as follows
> {color:#59afe1}* Function Name
> * Function load completion time
> * Function load progress. Format is Function sequence no/Total number of 
> functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> load.
> 

[jira] [Updated] (HIVE-17233) Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs.

2017-08-16 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17233:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Thanks for reviewing, [~thejas]. I've checked this into {{master}}, 
{{branch-2}}, and {{branch-2.2}}.

> Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs.
> 
>
> Key: HIVE-17233
> URL: https://issues.apache.org/jira/browse/HIVE-17233
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17233.1.patch
>
>
> This has to do with {{HIVE-15575}}. {{TezCompiler}} seems to set 
> {{mapred.input.dir.recursive}} to {{true}}. This is acceptable for Hive jobs, 
> since this allows Hive to consume its peculiar {{UNION ALL}} output, where 
> the output of each relation is stored in a separate sub-directory of the 
> output-dir.
> For such output to be readable through HCatalog (via Pig/HCatLoader), 
> {{mapred.input.dir.recursive}} should be set from {{HCatInputFormat}} as 
> well. Otherwise, one gets zero records for that input.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17336) Missing class 'org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat' from Hive on Spark when inserting into hbase based table

2017-08-16 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129700#comment-16129700
 ] 

Vihang Karajgaonkar commented on HIVE-17336:


LGTM +1 thanks for the change [~aihuaxu]

> Missing class 'org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat' from 
> Hive on Spark when inserting into hbase based table
> ---
>
> Key: HIVE-17336
> URL: https://issues.apache.org/jira/browse/HIVE-17336
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-17336.1.patch
>
>
> When inserting into a hbase based table from hive on spark, the following 
> exception is thrown 
> {noformat}
> Error while processing statement: FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat
> Serialization trace:
> inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
> tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)
> conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> invertedWorkGraph (org.apache.hadoop.hive.ql.plan.SparkWork)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:183)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:326)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:314)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:759)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObjectOrNull(SerializationUtilities.java:201)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:132)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:178)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:178)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
>  at 
> 

[jira] [Updated] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks

2017-08-16 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-15686:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.4.0
   Status: Resolved  (was: Patch Available)

Checked into {{branch-2}}. It looks like {{branch-2.2}} has HIVE-14688 checked 
in, obviating this change. :/

Thanks for the review, [~owen.omalley]! :]

> Partitions on Remote HDFS break encryption-zone checks
> --
>
> Key: HIVE-15686
> URL: https://issues.apache.org/jira/browse/HIVE-15686
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 2.4.0
>
> Attachments: HIVE-15686.1-branch-2.patch
>
>
> This is in relation to HIVE-13243, which fixes encryption-zone checks for 
> external tables.
> Unfortunately, this is still borked for partitions with remote HDFS paths. 
> The code fails as follows:
> {noformat}
> 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer 
> (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during 
> processing of message.
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, 
> expected: hdfs://local-cluster-n1.myth.net:8020
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985)
> at 
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974)
> at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I have a really simple fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks

2017-08-16 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129693#comment-16129693
 ] 

Mithun Radhakrishnan commented on HIVE-15686:
-

As established in 
[HIVE-8472|https://issues.apache.org/jira/browse/HIVE-8472?focusedCommentId=16128283=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16128283],
 these tests seem busted on {{branch-2}}. 

> Partitions on Remote HDFS break encryption-zone checks
> --
>
> Key: HIVE-15686
> URL: https://issues.apache.org/jira/browse/HIVE-15686
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-15686.1-branch-2.patch
>
>
> This is in relation to HIVE-13243, which fixes encryption-zone checks for 
> external tables.
> Unfortunately, this is still borked for partitions with remote HDFS paths. 
> The code fails as follows:
> {noformat}
> 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer 
> (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during 
> processing of message.
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, 
> expected: hdfs://local-cluster-n1.myth.net:8020
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985)
> at 
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974)
> at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I have a really simple fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17327) LLAP IO: restrict native file ID usage to default FS to avoid hypothetical collisions when HDFS federation is used

2017-08-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129628#comment-16129628
 ] 

Hive QA commented on HIVE-17327:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12882182/HIVE-17327.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10977 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistCpAs (batchId=250)
org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistcp (batchId=250)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6428/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6428/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6428/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12882182 - PreCommit-HIVE-Build

> LLAP IO: restrict native file ID usage to default FS to avoid hypothetical 
> collisions when HDFS federation is used
> --
>
> Key: HIVE-17327
> URL: https://issues.apache.org/jira/browse/HIVE-17327
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17327.01.patch, HIVE-17327.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17325) Clean up intermittently failing unit tests

2017-08-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17325:
--
Summary: Clean up intermittently failing unit tests  (was: Clean up 
intermittently failing uni tests)

> Clean up intermittently failing unit tests
> --
>
> Key: HIVE-17325
> URL: https://issues.apache.org/jira/browse/HIVE-17325
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Reporter: Alan Gates
>Assignee: Alan Gates
>
> We have a number of intermittently failing tests.  I propose to disable these 
> so that we can get clean (or at least cleaner) CI runs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15899) check CTAS over acid table

2017-08-16 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-15899:
--
Issue Type: Sub-task  (was: Task)
Parent: HIVE-17204

> check CTAS over acid table 
> ---
>
> Key: HIVE-15899
> URL: https://issues.apache.org/jira/browse/HIVE-15899
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> need to add a test to check if create table as works correctly with acid 
> tables



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17336) Missing class 'org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat' from Hive on Spark when inserting into hbase based table

2017-08-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129527#comment-16129527
 ] 

Hive QA commented on HIVE-17336:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12882177/HIVE-17336.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10977 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6427/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6427/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6427/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12882177 - PreCommit-HIVE-Build

> Missing class 'org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat' from 
> Hive on Spark when inserting into hbase based table
> ---
>
> Key: HIVE-17336
> URL: https://issues.apache.org/jira/browse/HIVE-17336
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-17336.1.patch
>
>
> When inserting into a hbase based table from hive on spark, the following 
> exception is thrown 
> {noformat}
> Error while processing statement: FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat
> Serialization trace:
> inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
> tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)
> conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> invertedWorkGraph (org.apache.hadoop.hive.ql.plan.SparkWork)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:183)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:326)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:314)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:759)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObjectOrNull(SerializationUtilities.java:201)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:132)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>  at 
> 

[jira] [Commented] (HIVE-17325) Clean up intermittently failing uni tests

2017-08-16 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129515#comment-16129515
 ] 

Alan Gates commented on HIVE-17325:
---

Thanks for tracking down all these things.

I'm all for fixing these issues and reverting those we can't fix over just 
turning off the tests.  Let's use this as an umbrella JIRA to track those 
others and get them in, then we'll turn off whatever tests we can't fix or 
revert.

> Clean up intermittently failing uni tests
> -
>
> Key: HIVE-17325
> URL: https://issues.apache.org/jira/browse/HIVE-17325
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Reporter: Alan Gates
>Assignee: Alan Gates
>
> We have a number of intermittently failing tests.  I propose to disable these 
> so that we can get clean (or at least cleaner) CI runs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17233) Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs.

2017-08-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129473#comment-16129473
 ] 

Thejas M Nair commented on HIVE-17233:
--

+1

> Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs.
> 
>
> Key: HIVE-17233
> URL: https://issues.apache.org/jira/browse/HIVE-17233
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17233.1.patch
>
>
> This has to do with {{HIVE-15575}}. {{TezCompiler}} seems to set 
> {{mapred.input.dir.recursive}} to {{true}}. This is acceptable for Hive jobs, 
> since this allows Hive to consume its peculiar {{UNION ALL}} output, where 
> the output of each relation is stored in a separate sub-directory of the 
> output-dir.
> For such output to be readable through HCatalog (via Pig/HCatLoader), 
> {{mapred.input.dir.recursive}} should be set from {{HCatInputFormat}} as 
> well. Otherwise, one gets zero records for that input.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.

2017-08-16 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17100:

Attachment: HIVE-17100.04.patch

Added 04.patch with below changes.
- Added new metastore api to get the number of events for the given DB with 
from event ID.
- Added tag for each log message such as REPL_START, TABLE_DUMP...
- Added dbName as part of all logs to map properly.

> Improve HS2 operation logs for REPL commands.
> -
>
> Key: HIVE-17100
> URL: https://issues.apache.org/jira/browse/HIVE-17100
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch, 
> HIVE-17100.03.patch, HIVE-17100.04.patch
>
>
> It is necessary to log the progress the replication tasks in a structured 
> manner as follows.
> *+Bootstrap Dump:+*
> * At the start of bootstrap dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (BOOTSTRAP)
> * (Estimated) Total number of tables/views to dump
> * (Estimated) Total number of functions to dump.
> * Dump Start Time{color}
> * After each table dump, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table dump end time
> * Table dump progress. Format is Table sequence no/(Estimated) Total number 
> of tables and views.{color}
> * After each function dump, will add a log as follows
> {color:#59afe1}* Function Name
> * Function dump end time
> * Function dump progress. Format is Function sequence no/(Estimated) Total 
> number of functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> dump.
> {color:#59afe1}* Database Name.
> * Dump Type (BOOTSTRAP).
> * Dump End Time.
> * (Actual) Total number of tables/views dumped.
> * (Actual) Total number of functions dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The actual and estimated number of tables/functions may not match if 
> any table/function is dropped when dump in progress.
> *+Bootstrap Load:+*
> * At the start of bootstrap load, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump directory
> * Load Type (BOOTSTRAP)
> * Total number of tables/views to load
> * Total number of functions to load.
> * Load Start Time{color}
> * After each table load, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table load completion time
> * Table load progress. Format is Table sequence no/Total number of tables and 
> views.{color}
> * After each function load, will add a log as follows
> {color:#59afe1}* Function Name
> * Function load completion time
> * Function load progress. Format is Function sequence no/Total number of 
> functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> load.
> {color:#59afe1}* Database Name.
> * Load Type (BOOTSTRAP).
> * Load End Time.
> * Total number of tables/views loaded.
> * Total number of functions loaded.
> * Last Repl ID of the loaded database.{color}
> *+Incremental Dump:+*
> * At the start of database dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (INCREMENTAL)
> * (Estimated) Total number of events to dump.
> * Dump Start Time{color}
> * After each event dump, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event dump end time
> * Event dump progress. Format is Event sequence no/ (Estimated) Total number 
> of events.{color}
> * After completion of all event dumps, will add a log as follows.
> {color:#59afe1}* Database Name.
> * Dump Type (INCREMENTAL).
> * Dump End Time.
> * (Actual) Total number of events dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The estimated number of events can be terribly inaccurate with actual 
> number as we don’t have the number of events upfront until we read from 
> metastore NotificationEvents table.
> *+Incremental Load:+*
> * At the start of incremental load, will add one log with below details.
> {color:#59afe1}* Target Database Name 
> * Dump directory
> * Load Type (INCREMENTAL)
> * Total number of events to load
> * Load Start Time{color}
> * After each event load, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event load end time
> * Event load progress. Format is Event sequence no/ Total number of 
> events.{color}
> * After completion of all event loads, will add a log as follows to 
> 

[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.

2017-08-16 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17100:

Status: Patch Available  (was: Open)

> Improve HS2 operation logs for REPL commands.
> -
>
> Key: HIVE-17100
> URL: https://issues.apache.org/jira/browse/HIVE-17100
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch, 
> HIVE-17100.03.patch, HIVE-17100.04.patch
>
>
> It is necessary to log the progress the replication tasks in a structured 
> manner as follows.
> *+Bootstrap Dump:+*
> * At the start of bootstrap dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (BOOTSTRAP)
> * (Estimated) Total number of tables/views to dump
> * (Estimated) Total number of functions to dump.
> * Dump Start Time{color}
> * After each table dump, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table dump end time
> * Table dump progress. Format is Table sequence no/(Estimated) Total number 
> of tables and views.{color}
> * After each function dump, will add a log as follows
> {color:#59afe1}* Function Name
> * Function dump end time
> * Function dump progress. Format is Function sequence no/(Estimated) Total 
> number of functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> dump.
> {color:#59afe1}* Database Name.
> * Dump Type (BOOTSTRAP).
> * Dump End Time.
> * (Actual) Total number of tables/views dumped.
> * (Actual) Total number of functions dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The actual and estimated number of tables/functions may not match if 
> any table/function is dropped when dump in progress.
> *+Bootstrap Load:+*
> * At the start of bootstrap load, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump directory
> * Load Type (BOOTSTRAP)
> * Total number of tables/views to load
> * Total number of functions to load.
> * Load Start Time{color}
> * After each table load, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table load completion time
> * Table load progress. Format is Table sequence no/Total number of tables and 
> views.{color}
> * After each function load, will add a log as follows
> {color:#59afe1}* Function Name
> * Function load completion time
> * Function load progress. Format is Function sequence no/Total number of 
> functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> load.
> {color:#59afe1}* Database Name.
> * Load Type (BOOTSTRAP).
> * Load End Time.
> * Total number of tables/views loaded.
> * Total number of functions loaded.
> * Last Repl ID of the loaded database.{color}
> *+Incremental Dump:+*
> * At the start of database dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (INCREMENTAL)
> * (Estimated) Total number of events to dump.
> * Dump Start Time{color}
> * After each event dump, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event dump end time
> * Event dump progress. Format is Event sequence no/ (Estimated) Total number 
> of events.{color}
> * After completion of all event dumps, will add a log as follows.
> {color:#59afe1}* Database Name.
> * Dump Type (INCREMENTAL).
> * Dump End Time.
> * (Actual) Total number of events dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The estimated number of events can be terribly inaccurate with actual 
> number as we don’t have the number of events upfront until we read from 
> metastore NotificationEvents table.
> *+Incremental Load:+*
> * At the start of incremental load, will add one log with below details.
> {color:#59afe1}* Target Database Name 
> * Dump directory
> * Load Type (INCREMENTAL)
> * Total number of events to load
> * Load Start Time{color}
> * After each event load, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event load end time
> * Event load progress. Format is Event sequence no/ Total number of 
> events.{color}
> * After completion of all event loads, will add a log as follows to 
> consolidate the load.
> {color:#59afe1}* Target Database Name.
> * Load Type (INCREMENTAL).
> * Load End Time.
> * Total number of events loaded.
> * Last Repl ID of the loaded database.{color}



--
This message was sent by Atlassian JIRA

[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.

2017-08-16 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17100:

Status: Open  (was: Patch Available)

> Improve HS2 operation logs for REPL commands.
> -
>
> Key: HIVE-17100
> URL: https://issues.apache.org/jira/browse/HIVE-17100
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch, 
> HIVE-17100.03.patch
>
>
> It is necessary to log the progress the replication tasks in a structured 
> manner as follows.
> *+Bootstrap Dump:+*
> * At the start of bootstrap dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (BOOTSTRAP)
> * (Estimated) Total number of tables/views to dump
> * (Estimated) Total number of functions to dump.
> * Dump Start Time{color}
> * After each table dump, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table dump end time
> * Table dump progress. Format is Table sequence no/(Estimated) Total number 
> of tables and views.{color}
> * After each function dump, will add a log as follows
> {color:#59afe1}* Function Name
> * Function dump end time
> * Function dump progress. Format is Function sequence no/(Estimated) Total 
> number of functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> dump.
> {color:#59afe1}* Database Name.
> * Dump Type (BOOTSTRAP).
> * Dump End Time.
> * (Actual) Total number of tables/views dumped.
> * (Actual) Total number of functions dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The actual and estimated number of tables/functions may not match if 
> any table/function is dropped when dump in progress.
> *+Bootstrap Load:+*
> * At the start of bootstrap load, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump directory
> * Load Type (BOOTSTRAP)
> * Total number of tables/views to load
> * Total number of functions to load.
> * Load Start Time{color}
> * After each table load, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table load completion time
> * Table load progress. Format is Table sequence no/Total number of tables and 
> views.{color}
> * After each function load, will add a log as follows
> {color:#59afe1}* Function Name
> * Function load completion time
> * Function load progress. Format is Function sequence no/Total number of 
> functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> load.
> {color:#59afe1}* Database Name.
> * Load Type (BOOTSTRAP).
> * Load End Time.
> * Total number of tables/views loaded.
> * Total number of functions loaded.
> * Last Repl ID of the loaded database.{color}
> *+Incremental Dump:+*
> * At the start of database dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (INCREMENTAL)
> * (Estimated) Total number of events to dump.
> * Dump Start Time{color}
> * After each event dump, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event dump end time
> * Event dump progress. Format is Event sequence no/ (Estimated) Total number 
> of events.{color}
> * After completion of all event dumps, will add a log as follows.
> {color:#59afe1}* Database Name.
> * Dump Type (INCREMENTAL).
> * Dump End Time.
> * (Actual) Total number of events dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The estimated number of events can be terribly inaccurate with actual 
> number as we don’t have the number of events upfront until we read from 
> metastore NotificationEvents table.
> *+Incremental Load:+*
> * At the start of incremental load, will add one log with below details.
> {color:#59afe1}* Target Database Name 
> * Dump directory
> * Load Type (INCREMENTAL)
> * Total number of events to load
> * Load Start Time{color}
> * After each event load, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event load end time
> * Event load progress. Format is Event sequence no/ Total number of 
> events.{color}
> * After completion of all event loads, will add a log as follows to 
> consolidate the load.
> {color:#59afe1}* Target Database Name.
> * Load Type (INCREMENTAL).
> * Load End Time.
> * Total number of events loaded.
> * Last Repl ID of the loaded database.{color}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks

2017-08-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129423#comment-16129423
 ] 

Hive QA commented on HIVE-15686:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12882174/HIVE-15686.1-branch-2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10603 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs]
 (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=139)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[explaindenpendencydiffengs]
 (batchId=115)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] 
(batchId=125)
org.apache.hadoop.hive.ql.security.TestExtendedAcls.testPartition (batchId=228)
org.apache.hadoop.hive.ql.security.TestFolderPermissions.testPartition 
(batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=176)
org.apache.hive.jdbc.TestJdbcDriver2.testSelectExecAsync2 (batchId=222)
org.apache.hive.jdbc.TestJdbcDriver2.testYarnATSGuid (batchId=222)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6426/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6426/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6426/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12882174 - PreCommit-HIVE-Build

> Partitions on Remote HDFS break encryption-zone checks
> --
>
> Key: HIVE-15686
> URL: https://issues.apache.org/jira/browse/HIVE-15686
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-15686.1-branch-2.patch
>
>
> This is in relation to HIVE-13243, which fixes encryption-zone checks for 
> external tables.
> Unfortunately, this is still borked for partitions with remote HDFS paths. 
> The code fails as follows:
> {noformat}
> 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer 
> (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during 
> processing of message.
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, 
> expected: hdfs://local-cluster-n1.myth.net:8020
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985)
> at 
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974)
> at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767)
> at 
> 

[jira] [Commented] (HIVE-17327) LLAP IO: restrict native file ID usage to default FS to avoid hypothetical collisions when HDFS federation is used

2017-08-16 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129352#comment-16129352
 ] 

Gopal V commented on HIVE-17327:


LGTM - +1 tests pending on 01 patch

> LLAP IO: restrict native file ID usage to default FS to avoid hypothetical 
> collisions when HDFS federation is used
> --
>
> Key: HIVE-17327
> URL: https://issues.apache.org/jira/browse/HIVE-17327
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17327.01.patch, HIVE-17327.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17233) Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs.

2017-08-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129347#comment-16129347
 ] 

Hive QA commented on HIVE-17233:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12880101/HIVE-17233.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10976 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6425/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6425/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6425/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12880101 - PreCommit-HIVE-Build

> Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs.
> 
>
> Key: HIVE-17233
> URL: https://issues.apache.org/jira/browse/HIVE-17233
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17233.1.patch
>
>
> This has to do with {{HIVE-15575}}. {{TezCompiler}} seems to set 
> {{mapred.input.dir.recursive}} to {{true}}. This is acceptable for Hive jobs, 
> since this allows Hive to consume its peculiar {{UNION ALL}} output, where 
> the output of each relation is stored in a separate sub-directory of the 
> output-dir.
> For such output to be readable through HCatalog (via Pig/HCatLoader), 
> {{mapred.input.dir.recursive}} should be set from {{HCatInputFormat}} as 
> well. Otherwise, one gets zero records for that input.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-17256) add a notion of a guaranteed task to LLAP

2017-08-16 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-17256.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

Committed to master. Thanks for the review!
I will write up something in HS2 patch.

> add a notion of a guaranteed task to LLAP
> -
>
> Key: HIVE-17256
> URL: https://issues.apache.org/jira/browse/HIVE-17256
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-17256.01.patch, HIVE-17256.patch
>
>
> Tasks are basically on two levels, guaranteed and speculative, with 
> speculative being the default. As long as noone uses the new flag, the tasks 
> behave the same.
> All the tasks that do have the flag also behave the same with regard to each 
> other.
> The difference is that a guaranteed task is always higher priority, and 
> preempts, a speculative task. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17330) refactor TezSessionPoolManager to separate its multiple functions

2017-08-16 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17330:

Attachment: HIVE-17330.01.patch

Accounting for nulls in non-HS2 case

> refactor TezSessionPoolManager to separate its multiple functions
> -
>
> Key: HIVE-17330
> URL: https://issues.apache.org/jira/browse/HIVE-17330
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17330.01.patch, HIVE-17330.patch
>
>
> TezSessionPoolManager would retain things specific to current Hive session 
> management. 
> The session pool itself, as well as expiration tracking, the pool session 
> implementation, and some config validation can be separated out and made 
> independent from the pool.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-15899) check CTAS over acid table

2017-08-16 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013166#comment-16013166
 ] 

Eugene Koifman edited comment on HIVE-15899 at 8/16/17 7:40 PM:


When bucketing requirement is relaxed, CTAS can be successful.  How will acid 
write look like?
It cannot retain 1/ and 2/ subdirectories.  Should each leg rely on writerId to 
create a set of unique tranche files across all subdirs so that acid "move" can 
put all of them into a delta?  
Then how do we later remove the "move" for S3?

perhaps it should rely on statementId and create delta_x_x_0, delta_x_x_1, etc 


was (Author: ekoifman):
When bucketing requirement is relaxed, CTAS can be successful.  How will acid 
write look like?
It cannot retain 1/ and 2/ subdirectories.  Should each leg rely on writerId to 
create a set of unique tranche files across all subdirs so that acid "move" can 
put all of them into a delta?  
Then how do we later remove the "move" for S3?

> check CTAS over acid table 
> ---
>
> Key: HIVE-15899
> URL: https://issues.apache.org/jira/browse/HIVE-15899
> Project: Hive
>  Issue Type: Task
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> need to add a test to check if create table as works correctly with acid 
> tables



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17195) Long chain of tasks created by REPL LOAD shouldn't cause stack corruption.

2017-08-16 Thread Sankar Hariappan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129300#comment-16129300
 ] 

Sankar Hariappan commented on HIVE-17195:
-

Thanks for the commit [~thejas] and for the review [~anishek]!

> Long chain of tasks created by REPL LOAD shouldn't cause stack corruption.
> --
>
> Key: HIVE-17195
> URL: https://issues.apache.org/jira/browse/HIVE-17195
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DAG, DR, Executor, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17195.01.patch, HIVE-17195.02.patch
>
>
> Currently, long chain REPL LOAD tasks lead to huge recursive calls when try 
> to traverse the DAG.
> For example, getMRTasks, getTezTasks, getSparkTasks and iterateTasks methods 
> run recursively to traverse the DAG.
> Need to modify this traversal logic to reduce stack usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17183) Disable rename operations during bootstrap dump

2017-08-16 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17183:

Status: Patch Available  (was: Open)

> Disable rename operations during bootstrap dump
> ---
>
> Key: HIVE-17183
> URL: https://issues.apache.org/jira/browse/HIVE-17183
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17183.01.patch, HIVE-17183.02.patch, 
> HIVE-17183.03.patch
>
>
> Currently, bootstrap dump shall lead to data loss when any rename happens 
> while dump in progress. This feature can be supported in next phase 
> development as it need proper design to keep track of renamed 
> tables/partitions. 
> So, for time being, we shall disable rename operations when bootstrap dump in 
> progress to avoid any inconsistent state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17183) Disable rename operations during bootstrap dump

2017-08-16 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17183:

Attachment: HIVE-17183.03.patch

> Disable rename operations during bootstrap dump
> ---
>
> Key: HIVE-17183
> URL: https://issues.apache.org/jira/browse/HIVE-17183
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17183.01.patch, HIVE-17183.02.patch, 
> HIVE-17183.03.patch
>
>
> Currently, bootstrap dump shall lead to data loss when any rename happens 
> while dump in progress. This feature can be supported in next phase 
> development as it need proper design to keep track of renamed 
> tables/partitions. 
> So, for time being, we shall disable rename operations when bootstrap dump in 
> progress to avoid any inconsistent state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17183) Disable rename operations during bootstrap dump

2017-08-16 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17183:

Attachment: (was: HIVE-17183.03.patch)

> Disable rename operations during bootstrap dump
> ---
>
> Key: HIVE-17183
> URL: https://issues.apache.org/jira/browse/HIVE-17183
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17183.01.patch, HIVE-17183.02.patch
>
>
> Currently, bootstrap dump shall lead to data loss when any rename happens 
> while dump in progress. This feature can be supported in next phase 
> development as it need proper design to keep track of renamed 
> tables/partitions. 
> So, for time being, we shall disable rename operations when bootstrap dump in 
> progress to avoid any inconsistent state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17339) Acid feature parity laundry list

2017-08-16 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17339:
-


> Acid feature parity laundry list
> 
>
> Key: HIVE-17339
> URL: https://issues.apache.org/jira/browse/HIVE-17339
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> 1. insert into T select  - this can sometimes use DISTCP  
> (hive.exec.copyfile.maxsize).  What does this mean for acid?
> 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17183) Disable rename operations during bootstrap dump

2017-08-16 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17183:

Status: Open  (was: Patch Available)

> Disable rename operations during bootstrap dump
> ---
>
> Key: HIVE-17183
> URL: https://issues.apache.org/jira/browse/HIVE-17183
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17183.01.patch, HIVE-17183.02.patch, 
> HIVE-17183.03.patch
>
>
> Currently, bootstrap dump shall lead to data loss when any rename happens 
> while dump in progress. This feature can be supported in next phase 
> development as it need proper design to keep track of renamed 
> tables/partitions. 
> So, for time being, we shall disable rename operations when bootstrap dump in 
> progress to avoid any inconsistent state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17183) Disable rename operations during bootstrap dump

2017-08-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129284#comment-16129284
 ] 

Hive QA commented on HIVE-17183:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12882172/HIVE-17183.03.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10977 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6424/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6424/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6424/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12882172 - PreCommit-HIVE-Build

> Disable rename operations during bootstrap dump
> ---
>
> Key: HIVE-17183
> URL: https://issues.apache.org/jira/browse/HIVE-17183
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17183.01.patch, HIVE-17183.02.patch, 
> HIVE-17183.03.patch
>
>
> Currently, bootstrap dump shall lead to data loss when any rename happens 
> while dump in progress. This feature can be supported in next phase 
> development as it need proper design to keep track of renamed 
> tables/partitions. 
> So, for time being, we shall disable rename operations when bootstrap dump in 
> progress to avoid any inconsistent state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17328) Remove special handling for Acid tables wherever possible

2017-08-16 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17328:
--
Description: 
There are various places in the code that do something like 
{noformat}
if(acid update or delete) {
 do something
}
else {
do something else
}
{noformat}
this complicates the code and makes it so that acid code path is not properly 
tested in many new non-acid features or bug fixes.

Some work to simplify this was done in HIVE-15844.

_SortedDynPartitionOptimizer_ has some special logic
_ReduceSinkOperator_ relies on partitioning columns for update/delete be 
_UDFToInteger(RecordIdentifier)_ which is set up in _SemanticAnalyzer_.  
Consequently _SemanticAnalyzer_ has special logic to set it up.
_FileSinkOperator_ has some specialization.

_AbstractCorrelationProcCtx_ makes changes specific to acid writes setting 
hive.optimize.reducededuplication.min.reducer=1


With acid 2.0 (HIVE-17089) a lot more of it can simplified/removed.
Generally, Acid Insert follows the same code path as regular insert except that 
the writer in _FileSinkOperator_ is Acid specific.
So all the specialization is to route Update/Delete events to the right place.

We can do the U=D+I early in the operator pipeline so that an Update is a Hive 
multi-insert with 1 leg being the Insert leg and the other being the Delete leg 
(like Merge stmt).
The Delete events themselves don't need to be routed in any particular way if 
we always ship all delete_delta files for each split.  This is ok since delete 
events are very small and highly compressible.  What is shipped is independent 
of what needs to be loaded into memory.

This would allow removing almost all special code paths.
If need be we can also have the compactor rewrite the delete files so that the 
name of the file matches the contents and make it as if they were bucketed 
properly and use it reduce what needs to be shipped for each split.  This may 
help with some extreme cases where someone updates 1B rows.


This would in particular allow DISTRIBUTE BY for update/delete
Is this currently supported for Acid insert?
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy


  was:
There are various places in the code that do something like 
{noformat}
if(acid update or delete) {
 do something
}
else {
do something else
}
{noformat}
this complicates the code and makes it so that acid code path is not properly 
tested in many new non-acid features or bug fixes.

Some work to simplify this was done in HIVE-15844.

_SortedDynPartitionOptimizer_ has some special logic
_ReduceSinkOperator_ relies on partitioning columns for update/delete be 
_UDFToInteger(RecordIdentifier)_ which is set up in _SemanticAnalyzer_.  
Consequently _SemanticAnalyzer_ has special logic to set it up.
_FileSinkOperator_ has some specialization.

_AbstractCorrelationProcCtx_ makes changes specific to acid writes setting 
hive.optimize.reducededuplication.min.reducer=1


With acid 2.0 (HIVE-17089) a lot more of it can simplified/removed.
Generally, Acid Insert follows the same code path as regular insert except that 
the writer in _FileSinkOperator_ is Acid specific.
So all the specialization is to route Update/Delete events to the right place.

We can do the U=D+I early in the operator pipeline so that an Update is a Hive 
multi-insert with 1 leg being the Insert leg and the other being the Delete leg 
(like Merge stmt).
The Delete events themselves don't need to be routed in any particular way if 
we always ship all delete_delta files for each split.  This is ok since delete 
events are very small and highly compressible.  What is shipped is independent 
of what needs to be loaded into memory.

This would allow removing almost all special code paths.
If need be we can also have the compactor rewrite the delete files so that the 
name of the file matches the contents and make it as if they were bucketed 
properly and use it reduce what needs to be shipped for each split.  This may 
help with some extreme cases where someone updates 1B rows.



> Remove special handling for Acid tables wherever possible
> -
>
> Key: HIVE-17328
> URL: https://issues.apache.org/jira/browse/HIVE-17328
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> There are various places in the code that do something like 
> {noformat}
> if(acid update or delete) {
>  do something
> }
> else {
> do something else
> }
> {noformat}
> this complicates the code and makes it so that acid code path is not properly 
> tested in many new non-acid features or bug fixes.
> Some work to simplify this was done in HIVE-15844.
> _SortedDynPartitionOptimizer_ has some special logic
> _ReduceSinkOperator_ relies on 

[jira] [Updated] (HIVE-17327) LLAP IO: restrict native file ID usage to default FS to avoid hypothetical collisions when HDFS federation is used

2017-08-16 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17327:

Attachment: HIVE-17327.01.patch

Fixing the NPE. Looks like setupPool is not called outside of HS2.

> LLAP IO: restrict native file ID usage to default FS to avoid hypothetical 
> collisions when HDFS federation is used
> --
>
> Key: HIVE-17327
> URL: https://issues.apache.org/jira/browse/HIVE-17327
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17327.01.patch, HIVE-17327.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-13875) Beeline ignore where clause when it is the last line of file and missing a EOL hence give wrong query result

2017-08-16 Thread Chad Laurent (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129205#comment-16129205
 ] 

Chad Laurent commented on HIVE-13875:
-

affects last line of query in general, not just where clause.  I ran into this 
with group by so my query did not return any results at all.

> Beeline ignore where clause when it is the last line of file and missing a 
> EOL hence give wrong query result
> 
>
> Key: HIVE-13875
> URL: https://issues.apache.org/jira/browse/HIVE-13875
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.2.1
>Reporter: Lu Ji
>Priority: Minor
>
> Steps to reproduce:
> Say we have a simple table:
> {code}
> select * from lji.lu_test;
> +---+--+--+
> | lu_test.name  | lu_test.country  |
> +---+--+--+
> | john  | us   |
> | hong  | cn   |
> +---+--+--+
> 2 rows selected (0.04 seconds)
> {code}
> We have a simple query in a file. But note this file missing the last EOL.
> {code}
> cat -A test.hql
> use lji;$
> select * from lu_test$
> where country='us';[lji@~]$
> {code}
> Then if we execute file using both hive CLI and beeline + HS2, we have 
> different result.
> {code}
> [lji@~]$ hive -f test.hql
> WARNING: Use "yarn jar" to launch YARN applications.
> Logging initialized using configuration in 
> file:/etc/hive/2.3.4.7-4/0/hive-log4j.properties
> OK
> Time taken: 1.624 seconds
> OK
> johnus
> Time taken: 1.482 seconds, Fetched: 1 row(s)
> [lji@~]$ beeline -u "jdbc:hive2://XXX:1/default;principal=hive/_HOST@XXX" 
> -f test.hql
> WARNING: Use "yarn jar" to launch YARN applications.
> Connecting to jdbc:hive2://XXXl:1/default;principal=hive/_HOST@XXX
> Connected to: Apache Hive (version 1.2.1.2.3.4.7-4)
> Driver: Hive JDBC (version 1.2.1.2.3.4.7-4)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> 0: jdbc:hive2://XXX> use lji;
> No rows affected (0.06 seconds)
> 0: jdbc:hive2://XXX> select * from lu_test
> 0: jdbc:hive2://XXX> where 
> country='us';+---+--+--+
> | lu_test.name  | lu_test.country  |
> +---+--+--+
> | john  | us   |
> | hong  | cn   |
> +---+--+--+
> 2 rows selected (0.073 seconds)
> 0: jdbc:hive2://XXX>
> Closing: 0: jdbc:hive2://XXX:1/default;principal=hive/_HOST@XXX
> {code}
> Obviously, beeline gave the wrong result. It ignore the where clause in the 
> last line.
> I know it is quit weird for a file missing the last EOL, but for whatever 
> reason, we kind of having quit some files in this state. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17336) Missing class 'org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat' from Hive on Spark when inserting into hbase based table

2017-08-16 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129182#comment-16129182
 ] 

Aihua Xu commented on HIVE-17336:
-

[~xuefuz] Can you take a look the simple change? Also, I'm wondering why we are 
making a copy of the JobConf here. Seems we can directly work on original conf 
object.

{noformat}
JobConf jobConf = new JobConf(conf);
jobConf.set(MR_JAR_PROPERTY, "");
for (BaseWork work : sparkWork.getAllWork()) {
  work.configureJobConf(jobConf);
}
addJars(jobConf.get(MR_JAR_PROPERTY)); 
{noformat}

> Missing class 'org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat' from 
> Hive on Spark when inserting into hbase based table
> ---
>
> Key: HIVE-17336
> URL: https://issues.apache.org/jira/browse/HIVE-17336
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-17336.1.patch
>
>
> When inserting into a hbase based table from hive on spark, the following 
> exception is thrown 
> {noformat}
> Error while processing statement: FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat
> Serialization trace:
> inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
> tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)
> conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> invertedWorkGraph (org.apache.hadoop.hive.ql.plan.SparkWork)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:183)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:326)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:314)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:759)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObjectOrNull(SerializationUtilities.java:201)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:132)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:178)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
>  at 
> 

[jira] [Updated] (HIVE-17336) Missing class 'org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat' from Hive on Spark when inserting into hbase based table

2017-08-16 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-17336:

Status: Patch Available  (was: Open)

patch-1: Hive on spark is incorrectly adding jar from conf (tmpjar is empty) 
rather than the copied jobConf. With this change, HoS is able to add required 
hbase jars.

> Missing class 'org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat' from 
> Hive on Spark when inserting into hbase based table
> ---
>
> Key: HIVE-17336
> URL: https://issues.apache.org/jira/browse/HIVE-17336
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-17336.1.patch
>
>
> When inserting into a hbase based table from hive on spark, the following 
> exception is thrown 
> {noformat}
> Error while processing statement: FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat
> Serialization trace:
> inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
> tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)
> conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> invertedWorkGraph (org.apache.hadoop.hive.ql.plan.SparkWork)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:183)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:326)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:314)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:759)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObjectOrNull(SerializationUtilities.java:201)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:132)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:178)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:178)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
>  at 
> 

[jira] [Updated] (HIVE-17336) Missing class 'org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat' from Hive on Spark when inserting into hbase based table

2017-08-16 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-17336:

Attachment: HIVE-17336.1.patch

> Missing class 'org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat' from 
> Hive on Spark when inserting into hbase based table
> ---
>
> Key: HIVE-17336
> URL: https://issues.apache.org/jira/browse/HIVE-17336
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-17336.1.patch
>
>
> When inserting into a hbase based table from hive on spark, the following 
> exception is thrown 
> {noformat}
> Error while processing statement: FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat
> Serialization trace:
> inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
> tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)
> conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> invertedWorkGraph (org.apache.hadoop.hive.ql.plan.SparkWork)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:183)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:326)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:314)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:759)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObjectOrNull(SerializationUtilities.java:201)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:132)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:178)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:178)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
>  at 

[jira] [Updated] (HIVE-17195) Long chain of tasks created by REPL LOAD shouldn't cause stack corruption.

2017-08-16 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-17195:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the patch [~sankarh] and for the review [~anishek]


> Long chain of tasks created by REPL LOAD shouldn't cause stack corruption.
> --
>
> Key: HIVE-17195
> URL: https://issues.apache.org/jira/browse/HIVE-17195
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DAG, DR, Executor, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17195.01.patch, HIVE-17195.02.patch
>
>
> Currently, long chain REPL LOAD tasks lead to huge recursive calls when try 
> to traverse the DAG.
> For example, getMRTasks, getTezTasks, getSparkTasks and iterateTasks methods 
> run recursively to traverse the DAG.
> Need to modify this traversal logic to reduce stack usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17195) Long chain of tasks created by REPL LOAD shouldn't cause stack corruption.

2017-08-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129160#comment-16129160
 ] 

Thejas M Nair commented on HIVE-17195:
--

+1
The cleanup of older code can be done in a follow up patch. I will create 
another jira for that. Lets not get perfect come in the way of good.


> Long chain of tasks created by REPL LOAD shouldn't cause stack corruption.
> --
>
> Key: HIVE-17195
> URL: https://issues.apache.org/jira/browse/HIVE-17195
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DAG, DR, Executor, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17195.01.patch, HIVE-17195.02.patch
>
>
> Currently, long chain REPL LOAD tasks lead to huge recursive calls when try 
> to traverse the DAG.
> For example, getMRTasks, getTezTasks, getSparkTasks and iterateTasks methods 
> run recursively to traverse the DAG.
> Need to modify this traversal logic to reduce stack usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17233) Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs.

2017-08-16 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129156#comment-16129156
 ] 

Mithun Radhakrishnan commented on HIVE-17233:
-

[~thejas], could I please bother you to have a look at this one? It's a simple 
fix, and shouldn't take much of your time.

> Set "mapred.input.dir.recursive" for HCatInputFormat-based jobs.
> 
>
> Key: HIVE-17233
> URL: https://issues.apache.org/jira/browse/HIVE-17233
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17233.1.patch
>
>
> This has to do with {{HIVE-15575}}. {{TezCompiler}} seems to set 
> {{mapred.input.dir.recursive}} to {{true}}. This is acceptable for Hive jobs, 
> since this allows Hive to consume its peculiar {{UNION ALL}} output, where 
> the output of each relation is stored in a separate sub-directory of the 
> output-dir.
> For such output to be readable through HCatalog (via Pig/HCatLoader), 
> {{mapred.input.dir.recursive}} should be set from {{HCatInputFormat}} as 
> well. Otherwise, one gets zero records for that input.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks

2017-08-16 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-15686:

Status: Patch Available  (was: Open)

> Partitions on Remote HDFS break encryption-zone checks
> --
>
> Key: HIVE-15686
> URL: https://issues.apache.org/jira/browse/HIVE-15686
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-15686.1-branch-2.patch
>
>
> This is in relation to HIVE-13243, which fixes encryption-zone checks for 
> external tables.
> Unfortunately, this is still borked for partitions with remote HDFS paths. 
> The code fails as follows:
> {noformat}
> 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer 
> (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during 
> processing of message.
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, 
> expected: hdfs://local-cluster-n1.myth.net:8020
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985)
> at 
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974)
> at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I have a really simple fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks

2017-08-16 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-15686:

Status: Open  (was: Patch Available)

Re-submitting for {{branch-2}} tests.

> Partitions on Remote HDFS break encryption-zone checks
> --
>
> Key: HIVE-15686
> URL: https://issues.apache.org/jira/browse/HIVE-15686
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-15686.1-branch-2.patch
>
>
> This is in relation to HIVE-13243, which fixes encryption-zone checks for 
> external tables.
> Unfortunately, this is still borked for partitions with remote HDFS paths. 
> The code fails as follows:
> {noformat}
> 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer 
> (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during 
> processing of message.
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, 
> expected: hdfs://local-cluster-n1.myth.net:8020
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985)
> at 
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974)
> at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I have a really simple fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks

2017-08-16 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-15686:

Attachment: HIVE-15686.1-branch-2.patch

Renamed patch, to apply to {{branch-2}}.

> Partitions on Remote HDFS break encryption-zone checks
> --
>
> Key: HIVE-15686
> URL: https://issues.apache.org/jira/browse/HIVE-15686
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-15686.1-branch-2.patch
>
>
> This is in relation to HIVE-13243, which fixes encryption-zone checks for 
> external tables.
> Unfortunately, this is still borked for partitions with remote HDFS paths. 
> The code fails as follows:
> {noformat}
> 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer 
> (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during 
> processing of message.
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, 
> expected: hdfs://local-cluster-n1.myth.net:8020
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985)
> at 
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974)
> at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I have a really simple fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15686) Partitions on Remote HDFS break encryption-zone checks

2017-08-16 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-15686:

Attachment: (was: HIVE-15686.branch-2.patch)

> Partitions on Remote HDFS break encryption-zone checks
> --
>
> Key: HIVE-15686
> URL: https://issues.apache.org/jira/browse/HIVE-15686
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-15686.1-branch-2.patch
>
>
> This is in relation to HIVE-13243, which fixes encryption-zone checks for 
> external tables.
> Unfortunately, this is still borked for partitions with remote HDFS paths. 
> The code fails as follows:
> {noformat}
> 2015-12-09 19:26:14,997 ERROR [pool-4-thread-1476] server.TThreadPoolServer 
> (TThreadPoolServer.java:run_aroundBody0(305)) - Error occurred during 
> processing of message.
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://remote-cluster-nn1.myth.net:8020/dbs/mythdb/myth_table/dt=20170120, 
> expected: hdfs://local-cluster-n1.myth.net:8020
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1985)
> at 
> org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1290)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.checkTrashPurgeCombination(HiveMetaStore.java:1746)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partitions_req(HiveMetaStore.java:2974)
> at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy5.drop_partitions_req(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:10005)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_partitions_req.getResult(ThriftHiveMetastore.java:9989)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:767)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$2.run(HadoopThriftAuthBridge.java:763)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:285)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I have a really simple fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17183) Disable rename operations during bootstrap dump

2017-08-16 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17183:

Status: Patch Available  (was: Open)

> Disable rename operations during bootstrap dump
> ---
>
> Key: HIVE-17183
> URL: https://issues.apache.org/jira/browse/HIVE-17183
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17183.01.patch, HIVE-17183.02.patch, 
> HIVE-17183.03.patch
>
>
> Currently, bootstrap dump shall lead to data loss when any rename happens 
> while dump in progress. This feature can be supported in next phase 
> development as it need proper design to keep track of renamed 
> tables/partitions. 
> So, for time being, we shall disable rename operations when bootstrap dump in 
> progress to avoid any inconsistent state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17183) Disable rename operations during bootstrap dump

2017-08-16 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17183:

Attachment: HIVE-17183.03.patch

Added 03.patch with fixes for Anishek's comments.

> Disable rename operations during bootstrap dump
> ---
>
> Key: HIVE-17183
> URL: https://issues.apache.org/jira/browse/HIVE-17183
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17183.01.patch, HIVE-17183.02.patch, 
> HIVE-17183.03.patch
>
>
> Currently, bootstrap dump shall lead to data loss when any rename happens 
> while dump in progress. This feature can be supported in next phase 
> development as it need proper design to keep track of renamed 
> tables/partitions. 
> So, for time being, we shall disable rename operations when bootstrap dump in 
> progress to avoid any inconsistent state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17218) Canonical-ize hostnames for Hive metastore, and HS2 servers.

2017-08-16 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17218:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Committed to {{master}}, {{branch-2}}, and {{branch-2.2}}. Thanks for the 
review, [~thejas]! :]

> Canonical-ize hostnames for Hive metastore, and HS2 servers.
> 
>
> Key: HIVE-17218
> URL: https://issues.apache.org/jira/browse/HIVE-17218
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore, Security
>Affects Versions: 1.2.2, 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17218.1.patch
>
>
> Currently, the {{HiveMetastoreClient}} and {{HiveConnection}} do not 
> canonical-ize the hostnames of the metastore/HS2 servers. In deployments 
> where there are multiple such servers behind a VIP, this causes a number of 
> inconveniences:
> # The client-side configuration (e.g. {{hive.metastore.uris}} in 
> {{hive-site.xml}}) needs to specify the VIP's hostname, and cannot use a 
> simplified CNAME, in the thrift URL. If the 
> {{hive.metastore.kerberos.principal}} is specified using {{_HOST}}, one sees 
> GSS failures as follows:
> {noformat}
> hive --hiveconf hive.metastore.kerberos.principal=hive/_h...@grid.myth.net 
> --hiveconf 
> hive.metastore.uris="thrift://simplified-hcat-cname.grid.myth.net:56789"
> ...
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:542)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
> ...
> {noformat}
> This is because {{_HOST}} is filled in with the CNAME, and not the 
> canonicalized name.
> # Oozie workflows that use HCat {{}} have to always use the VIP 
> hostname, and can't use {{_HOST}}-based service principals, if the CNAME 
> differs from the VIP name.
> If the client-code simply canonical-ized the hostnames, it would enable the 
> use of both simplified CNAMEs, and _HOST in service principals.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17183) Disable rename operations during bootstrap dump

2017-08-16 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17183:

Status: Open  (was: Patch Available)

> Disable rename operations during bootstrap dump
> ---
>
> Key: HIVE-17183
> URL: https://issues.apache.org/jira/browse/HIVE-17183
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17183.01.patch, HIVE-17183.02.patch
>
>
> Currently, bootstrap dump shall lead to data loss when any rename happens 
> while dump in progress. This feature can be supported in next phase 
> development as it need proper design to keep track of renamed 
> tables/partitions. 
> So, for time being, we shall disable rename operations when bootstrap dump in 
> progress to avoid any inconsistent state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17272) when hive.vectorized.execution.enabled is true, query on empty partitioned table fails with NPE

2017-08-16 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-17272:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks [~vihangk1] for reviewing.

> when hive.vectorized.execution.enabled is true, query on empty partitioned 
> table fails with NPE
> ---
>
> Key: HIVE-17272
> URL: https://issues.apache.org/jira/browse/HIVE-17272
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.1.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Fix For: 3.0.0
>
> Attachments: HIVE-17272.2.patch
>
>
> {noformat}
> set hive.vectorized.execution.enabled=true;
> CREATE TABLE `tab`(`x` int) PARTITIONED BY ( `y` int) stored as parquet;
> select * from tab t1 join tab t2 where t1.x=t2.x;
> {noformat}
> The query fails with the following exception.
> {noformat}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.createAndInitPartitionContext(VectorMapOperator.java:386)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.internalSetChildren(VectorMapOperator.java:559)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.setChildren(VectorMapOperator.java:474)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:106) 
> ~[hive-exec-2.3.0.jar:2.3.0]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_101]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_101]
> at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
> ~[hadoop-common-2.6.0.jar:?]
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) 
> ~[hadoop-common-2.6.0.jar:?]
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
> ~[hadoop-common-2.6.0.jar:?]
> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) 
> ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_101]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_101]
> at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
> ~[hadoop-common-2.6.0.jar:?]
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) 
> ~[hadoop-common-2.6.0.jar:?]
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
> ~[hadoop-common-2.6.0.jar:?]
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413) 
> ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?]
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) 
> ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?]
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
>  ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_101]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[?:1.8.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[?:1.8.0_101]
> at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_101]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17169) Avoid extra call to KeyProvider::getMetadata()

2017-08-16 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17169:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Committed to {{master}}, {{branch-2}}, and {{branch-2.2}}. Thanks for the 
review, [~owen.omalley]! :]

> Avoid extra call to KeyProvider::getMetadata()
> --
>
> Key: HIVE-17169
> URL: https://issues.apache.org/jira/browse/HIVE-17169
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17169.1-branch-2.patch, HIVE-17169.1.patch
>
>
> Here's the code from {{Hadoop23Shims}}:
> {code:title=Hadoop23Shims.java|borderStyle=solid}
> @Override
> public int comparePathKeyStrength(Path path1, Path path2) throws 
> IOException {
>   EncryptionZone zone1, zone2;
>   zone1 = hdfsAdmin.getEncryptionZoneForPath(path1);
>   zone2 = hdfsAdmin.getEncryptionZoneForPath(path2);
>   if (zone1 == null && zone2 == null) {
> return 0;
>   } else if (zone1 == null) {
> return -1;
>   } else if (zone2 == null) {
> return 1;
>   }
>   return compareKeyStrength(zone1.getKeyName(), zone2.getKeyName());
> }
> private int compareKeyStrength(String keyname1, String keyname2) throws 
> IOException {
>   KeyProvider.Metadata meta1, meta2;
>   if (keyProvider == null) {
> throw new IOException("HDFS security key provider is not configured 
> on your server.");
>   }
>   meta1 = keyProvider.getMetadata(keyname1);
>   meta2 = keyProvider.getMetadata(keyname2);
>   if (meta1.getBitLength() < meta2.getBitLength()) {
> return -1;
>   } else if (meta1.getBitLength() == meta2.getBitLength()) {
> return 0;
>   } else {
> return 1;
>   }
> }
>   }
> {code}
> It turns out that {{EncryptionZone}} already has the cipher's bit-length 
> stored in a member variable. One shouldn't need an additional name-node call 
> ({{KeyProvider::getMetadata()}}) only to fetch it again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17169) Avoid extra call to KeyProvider::getMetadata()

2017-08-16 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129118#comment-16129118
 ] 

Mithun Radhakrishnan commented on HIVE-17169:
-

As established in 
[HIVE-8472|https://issues.apache.org/jira/browse/HIVE-8472?focusedCommentId=16128283=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16128283],
 these tests seem busted on {{branch-2}}. 

> Avoid extra call to KeyProvider::getMetadata()
> --
>
> Key: HIVE-17169
> URL: https://issues.apache.org/jira/browse/HIVE-17169
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17169.1-branch-2.patch, HIVE-17169.1.patch
>
>
> Here's the code from {{Hadoop23Shims}}:
> {code:title=Hadoop23Shims.java|borderStyle=solid}
> @Override
> public int comparePathKeyStrength(Path path1, Path path2) throws 
> IOException {
>   EncryptionZone zone1, zone2;
>   zone1 = hdfsAdmin.getEncryptionZoneForPath(path1);
>   zone2 = hdfsAdmin.getEncryptionZoneForPath(path2);
>   if (zone1 == null && zone2 == null) {
> return 0;
>   } else if (zone1 == null) {
> return -1;
>   } else if (zone2 == null) {
> return 1;
>   }
>   return compareKeyStrength(zone1.getKeyName(), zone2.getKeyName());
> }
> private int compareKeyStrength(String keyname1, String keyname2) throws 
> IOException {
>   KeyProvider.Metadata meta1, meta2;
>   if (keyProvider == null) {
> throw new IOException("HDFS security key provider is not configured 
> on your server.");
>   }
>   meta1 = keyProvider.getMetadata(keyname1);
>   meta2 = keyProvider.getMetadata(keyname2);
>   if (meta1.getBitLength() < meta2.getBitLength()) {
> return -1;
>   } else if (meta1.getBitLength() == meta2.getBitLength()) {
> return 0;
>   } else {
> return 1;
>   }
> }
>   }
> {code}
> It turns out that {{EncryptionZone}} already has the cipher's bit-length 
> stored in a member variable. One shouldn't need an additional name-node call 
> ({{KeyProvider::getMetadata()}}) only to fetch it again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores

2017-08-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129121#comment-16129121
 ] 

Xuefu Zhang commented on HIVE-17292:


Given that we need to update a lot of test files, I'm fine with only fixing 
mini-yarn tests. Thanks for working on this.

> Change TestMiniSparkOnYarnCliDriver test configuration to use the configured 
> cores
> --
>
> Key: HIVE-17292
> URL: https://issues.apache.org/jira/browse/HIVE-17292
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark, Test
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17292.1.patch, HIVE-17292.2.patch, 
> HIVE-17292.3.patch, HIVE-17292.5.patch
>
>
> Currently the {{hive-site.xml}} for the {{TestMiniSparkOnYarnCliDriver}} test 
> defines 2 cores, and 2 executors, but only 1 is used, because the MiniCluster 
> does not allows the creation of the 3rd container.
> The FairScheduler uses 1GB increments for memory, but the containers would 
> like to use only 512MB. We should change the fairscheduler configuration to 
> use only the requested 512MB



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17316) Use String.startsWith for the hidden configuration variables

2017-08-16 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-17316:
---
Description: Currently HiveConf variables which should not be displayed to 
the user need to be enumerated. We should enhance this to be able to hide 
configuration variables by string prefix not just full equality.  (was: 
Currently HiveConf variables which should not be displayed to the user need to 
be enumerated. We should enhance this to be able to hide configuration 
variables by substring not just full equality.)

> Use String.startsWith for the hidden configuration variables
> 
>
> Key: HIVE-17316
> URL: https://issues.apache.org/jira/browse/HIVE-17316
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-17316.01.patch, HIVE-17316.02.patch, 
> HIVE-17316.03.patch
>
>
> Currently HiveConf variables which should not be displayed to the user need 
> to be enumerated. We should enhance this to be able to hide configuration 
> variables by string prefix not just full equality.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17316) Use String.startsWith for the hidden configuration variables

2017-08-16 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-17316:
---
Summary: Use String.startsWith for the hidden configuration variables  
(was: Use String.contains for the hidden configuration variables)

> Use String.startsWith for the hidden configuration variables
> 
>
> Key: HIVE-17316
> URL: https://issues.apache.org/jira/browse/HIVE-17316
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-17316.01.patch, HIVE-17316.02.patch, 
> HIVE-17316.03.patch
>
>
> Currently HiveConf variables which should not be displayed to the user need 
> to be enumerated. We should enhance this to be able to hide configuration 
> variables by substring not just full equality.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17322) Serialise BeeLine qtest execution to prevent flakyness

2017-08-16 Thread Barna Zsombor Klara (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129117#comment-16129117
 ] 

Barna Zsombor Klara commented on HIVE-17322:


[~pvary]
I agree. I raised and linked the followup Jira and resolved the other two.

> Serialise BeeLine qtest execution to prevent flakyness
> --
>
> Key: HIVE-17322
> URL: https://issues.apache.org/jira/browse/HIVE-17322
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-17322.01.patch, HIVE-17322.02.patch, 
> HIVE-17322.03.patch, HIVE-17322.04.patch, HIVE-17322.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-16959) Flaky Test : TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]

2017-08-16 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara resolved HIVE-16959.

Resolution: Fixed

Flakiness should be fixed by HIVE-17322. Please reopen if test starts failing 
again.

> Flaky Test : 
> TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
> 
>
> Key: HIVE-16959
> URL: https://issues.apache.org/jira/browse/HIVE-16959
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Barna Zsombor Klara
>
> Test failed on the pre-commit but runs locally.
> Error Message
> Client result comparison failed with error code = 1 while executing 
> fname=insert_overwrite_local_directory_1
> 1172d1171
> < k21=v21#k22=v22#k31=v31:foo2



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-16796) Flaky Test : TestBeeLineDriver.testCliDriver[create_merge_compressed]

2017-08-16 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara resolved HIVE-16796.

Resolution: Fixed

Flakiness should be fixed by HIVE-17322. Please reopen if test starts failing 
again.

> Flaky Test : TestBeeLineDriver.testCliDriver[create_merge_compressed]
> -
>
> Key: HIVE-16796
> URL: https://issues.apache.org/jira/browse/HIVE-16796
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>
> Has been failing in many of the recent builds.
> https://builds.apache.org/job/PreCommit-HIVE-Build/5481/
> https://builds.apache.org/job/PreCommit-HIVE-Build/5482/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs

2017-08-16 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17181:

  Resolution: Fixed
   Fix Version/s: 2.4.0
  3.0.0
Target Version/s: 3.0.0, 2.4.0
  Status: Resolved  (was: Patch Available)

Committed to {{master}}, {{branch-2}}, and {{branch-2.2}}. Thank you for the 
review, [~thejas]! :]

> HCatOutputFormat should expose complete output-schema (including 
> partition-keys) for dynamic-partitioning MR jobs
> -
>
> Key: HIVE-17181
> URL: https://issues.apache.org/jira/browse/HIVE-17181
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17181.1-branch-2.patch, HIVE-17181.1.patch, 
> HIVE-17181.2.patch, HIVE-17181.3.patch
>
>
> Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic 
> partitioning are expected to call the following API methods:
> # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to 
> write to. This call populates the {{OutputJobInfo}} with details fetched from 
> the Metastore.
> # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data 
> being written.
> It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows:
> {code:java}
> HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf));
> {code}
> Unfortunately, {{getTableSchema()}} returns only the record-schema, not the 
> entire table's schema. We'll need a better API for use in M/R jobs to get the 
> complete table-schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-08-16 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-8472:
---
Fix Version/s: (was: 2.2.0)

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0, 3.0.0, 2.4.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-8472.1-branch-2.patch, HIVE-8472.1.patch, 
> HIVE-8472.2-branch-2.patch, HIVE-8472.3.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs

2017-08-16 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129106#comment-16129106
 ] 

Mithun Radhakrishnan commented on HIVE-17181:
-

As established in 
[HIVE-8472|https://issues.apache.org/jira/browse/HIVE-8472?focusedCommentId=16128283=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16128283],
 these tests seem busted on {{branch-2}}. The failures are in tests that do not 
use {{HCatOutputFormat}}.

> HCatOutputFormat should expose complete output-schema (including 
> partition-keys) for dynamic-partitioning MR jobs
> -
>
> Key: HIVE-17181
> URL: https://issues.apache.org/jira/browse/HIVE-17181
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17181.1-branch-2.patch, HIVE-17181.1.patch, 
> HIVE-17181.2.patch, HIVE-17181.3.patch
>
>
> Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic 
> partitioning are expected to call the following API methods:
> # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to 
> write to. This call populates the {{OutputJobInfo}} with details fetched from 
> the Metastore.
> # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data 
> being written.
> It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows:
> {code:java}
> HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf));
> {code}
> Unfortunately, {{getTableSchema()}} returns only the record-schema, not the 
> entire table's schema. We'll need a better API for use in M/R jobs to get the 
> complete table-schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-08-16 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-8472:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.2.0
   2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Checked into {{master}}, {{branch-2}}, and {{branch-2.2}}. Thank you for the 
reviews and guidance, [~alangates] && [~leftylev].

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0, 3.0.0, 2.4.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Fix For: 3.0.0, 2.4.0, 2.2.0
>
> Attachments: HIVE-8472.1-branch-2.patch, HIVE-8472.1.patch, 
> HIVE-8472.2-branch-2.patch, HIVE-8472.3.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-08-16 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129084#comment-16129084
 ] 

Mithun Radhakrishnan commented on HIVE-8472:


Ok, that settles it. These tests are busted on {{branch-2}}. Committing my 
changes, shortly.

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0, 3.0.0, 2.4.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1-branch-2.patch, HIVE-8472.1.patch, 
> HIVE-8472.2-branch-2.patch, HIVE-8472.3.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17336) Missing class 'org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat' from Hive on Spark when inserting into hbase based table

2017-08-16 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu reassigned HIVE-17336:
---


> Missing class 'org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat' from 
> Hive on Spark when inserting into hbase based table
> ---
>
> Key: HIVE-17336
> URL: https://issues.apache.org/jira/browse/HIVE-17336
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>
> When inserting into a hbase based table from hive on spark, the following 
> exception is thrown 
> {noformat}
> Error while processing statement: FAILED: Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat
> Serialization trace:
> inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
> tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)
> conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> invertedWorkGraph (org.apache.hadoop.hive.ql.plan.SparkWork)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:183)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:326)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:314)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:759)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObjectOrNull(SerializationUtilities.java:201)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:132)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:178)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:178)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
>  at 
> 

[jira] [Commented] (HIVE-17300) WebUI query plan graphs

2017-08-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129003#comment-16129003
 ] 

Hive QA commented on HIVE-17300:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12882138/HIVE-17300.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10976 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[ptf_matchpath] 
(batchId=243)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6423/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6423/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6423/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12882138 - PreCommit-HIVE-Build

> WebUI query plan graphs
> ---
>
> Key: HIVE-17300
> URL: https://issues.apache.org/jira/browse/HIVE-17300
> Project: Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Karen Coppage
>Assignee: Karen Coppage
> Attachments: complete_success.png, full_mapred_stats.png, 
> graph_with_mapred_stats.png, HIVE-17300.3.patch, HIVE-17300.patch, 
> last_stage_error.png, last_stage_running.png, non_mapred_task_selected.png
>
>
> Hi all,
> I’m working on a feature of the Hive WebUI Query Plan tab that would provide 
> the option to display the query plan as a nice graph (scroll down for 
> screenshots). If you click on one of the graph’s stages, the plan for that 
> stage appears as text below. 
> Stages are color-coded if they have a status (Success, Error, Running), and 
> the rest are grayed out. Coloring is based on status already available in the 
> WebUI, under the Stages tab.
> There is an additional option to display stats for MapReduce tasks. This 
> includes the job’s ID, tracking URL (where the logs are found), and mapper 
> and reducer numbers/progress, among other info. 
> The library I’m using for the graph is called vis.js (http://visjs.org/). It 
> has an Apache license, and the only necessary file to be included from this 
> library is about 700 KB.
> I tried to keep server-side changes minimal, and graph generation is taken 
> care of by the client. Plans with more than a given number of stages 
> (default: 25) won't be displayed in order to preserve resources.
> I’d love to hear any and all input from the community about this feature: do 
> you think it’s useful, and is there anything important I’m missing?
> Thanks,
> Karen Coppage
> Review request: https://reviews.apache.org/r/61663/
> Any input is welcome!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17272) when hive.vectorized.execution.enabled is true, query on empty partitioned table fails with NPE

2017-08-16 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128999#comment-16128999
 ] 

Aihua Xu commented on HIVE-17272:
-

[~ngangam] With my fix above, we shouldn't run into NPE in the place you 
mentioned above since it returns before calling init() from 
createAndInitPartitionContext().

> when hive.vectorized.execution.enabled is true, query on empty partitioned 
> table fails with NPE
> ---
>
> Key: HIVE-17272
> URL: https://issues.apache.org/jira/browse/HIVE-17272
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.1.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-17272.2.patch
>
>
> {noformat}
> set hive.vectorized.execution.enabled=true;
> CREATE TABLE `tab`(`x` int) PARTITIONED BY ( `y` int) stored as parquet;
> select * from tab t1 join tab t2 where t1.x=t2.x;
> {noformat}
> The query fails with the following exception.
> {noformat}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.createAndInitPartitionContext(VectorMapOperator.java:386)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.internalSetChildren(VectorMapOperator.java:559)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.setChildren(VectorMapOperator.java:474)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:106) 
> ~[hive-exec-2.3.0.jar:2.3.0]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_101]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_101]
> at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
> ~[hadoop-common-2.6.0.jar:?]
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) 
> ~[hadoop-common-2.6.0.jar:?]
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
> ~[hadoop-common-2.6.0.jar:?]
> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) 
> ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_101]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_101]
> at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
> ~[hadoop-common-2.6.0.jar:?]
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) 
> ~[hadoop-common-2.6.0.jar:?]
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
> ~[hadoop-common-2.6.0.jar:?]
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413) 
> ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?]
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) 
> ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?]
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
>  ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_101]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[?:1.8.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[?:1.8.0_101]
> at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_101]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17272) when hive.vectorized.execution.enabled is true, query on empty partitioned table fails with NPE

2017-08-16 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128971#comment-16128971
 ] 

Aihua Xu commented on HIVE-17272:
-

[~ngangam] I don't see a case which will fail in that function. Do you have 
one? 

> when hive.vectorized.execution.enabled is true, query on empty partitioned 
> table fails with NPE
> ---
>
> Key: HIVE-17272
> URL: https://issues.apache.org/jira/browse/HIVE-17272
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.1.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-17272.2.patch
>
>
> {noformat}
> set hive.vectorized.execution.enabled=true;
> CREATE TABLE `tab`(`x` int) PARTITIONED BY ( `y` int) stored as parquet;
> select * from tab t1 join tab t2 where t1.x=t2.x;
> {noformat}
> The query fails with the following exception.
> {noformat}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.createAndInitPartitionContext(VectorMapOperator.java:386)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.internalSetChildren(VectorMapOperator.java:559)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.setChildren(VectorMapOperator.java:474)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:106) 
> ~[hive-exec-2.3.0.jar:2.3.0]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_101]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_101]
> at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
> ~[hadoop-common-2.6.0.jar:?]
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) 
> ~[hadoop-common-2.6.0.jar:?]
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
> ~[hadoop-common-2.6.0.jar:?]
> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) 
> ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_101]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_101]
> at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
> ~[hadoop-common-2.6.0.jar:?]
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) 
> ~[hadoop-common-2.6.0.jar:?]
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
> ~[hadoop-common-2.6.0.jar:?]
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413) 
> ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?]
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) 
> ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?]
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
>  ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_101]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[?:1.8.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[?:1.8.0_101]
> at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_101]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17267) Make HMS Notification Listeners typesafe

2017-08-16 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128913#comment-16128913
 ] 

Peter Vary commented on HIVE-17267:
---

+1 LGTM

> Make HMS Notification Listeners typesafe
> 
>
> Key: HIVE-17267
> URL: https://issues.apache.org/jira/browse/HIVE-17267
> Project: Hive
>  Issue Type: Bug
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-17267.01.patch, HIVE-17267.02.patch, 
> HIVE-17267.03.patch
>
>
> Currently in the HMS we support two types of notification listeners, 
> transactional and non-transactional ones. Transactional listeners will only 
> be invoked if the jdbc transaction finished successfully while 
> non-transactional ones are supposed to be resilient and will be invoked in 
> any case, even for failures.
> Having the same type for these two is a source of confusion and opens the 
> door for misconfigurations. We should try to fix this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17322) Serialise BeeLine qtest execution to prevent flakyness

2017-08-16 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-17322:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master.
Thanks for your contribution [~zsombor.klara]!

Could you please create a follow-up jira, so we know we have to determine the 
root cause of the problem, and we might also close HIVE-16796 and HIVE-16959, 
so if the problem pops up again we can see it? What do you think?

Thanks,
Peter

> Serialise BeeLine qtest execution to prevent flakyness
> --
>
> Key: HIVE-17322
> URL: https://issues.apache.org/jira/browse/HIVE-17322
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-17322.01.patch, HIVE-17322.02.patch, 
> HIVE-17322.03.patch, HIVE-17322.04.patch, HIVE-17322.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17300) WebUI query plan graphs

2017-08-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128904#comment-16128904
 ] 

Hive QA commented on HIVE-17300:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12882135/HIVE-17300.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10976 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
org.apache.hive.minikdc.TestSSLWithMiniKdc.testConnection (batchId=241)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6422/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6422/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6422/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12882135 - PreCommit-HIVE-Build

> WebUI query plan graphs
> ---
>
> Key: HIVE-17300
> URL: https://issues.apache.org/jira/browse/HIVE-17300
> Project: Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Karen Coppage
>Assignee: Karen Coppage
> Attachments: complete_success.png, full_mapred_stats.png, 
> graph_with_mapred_stats.png, HIVE-17300.3.patch, HIVE-17300.patch, 
> last_stage_error.png, last_stage_running.png, non_mapred_task_selected.png
>
>
> Hi all,
> I’m working on a feature of the Hive WebUI Query Plan tab that would provide 
> the option to display the query plan as a nice graph (scroll down for 
> screenshots). If you click on one of the graph’s stages, the plan for that 
> stage appears as text below. 
> Stages are color-coded if they have a status (Success, Error, Running), and 
> the rest are grayed out. Coloring is based on status already available in the 
> WebUI, under the Stages tab.
> There is an additional option to display stats for MapReduce tasks. This 
> includes the job’s ID, tracking URL (where the logs are found), and mapper 
> and reducer numbers/progress, among other info. 
> The library I’m using for the graph is called vis.js (http://visjs.org/). It 
> has an Apache license, and the only necessary file to be included from this 
> library is about 700 KB.
> I tried to keep server-side changes minimal, and graph generation is taken 
> care of by the client. Plans with more than a given number of stages 
> (default: 25) won't be displayed in order to preserve resources.
> I’d love to hear any and all input from the community about this feature: do 
> you think it’s useful, and is there anything important I’m missing?
> Thanks,
> Karen Coppage
> Review request: https://reviews.apache.org/r/61663/
> Any input is welcome!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17198) Flaky test: TestBeeLineDriver [smb_mapjoin_7]

2017-08-16 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128901#comment-16128901
 ] 

Peter Vary commented on HIVE-17198:
---

[~janulatha]: Is it possible to have a little more details of this problem? 
When did you meet this error? Is there anything in the hive.log or anywhere 
else which could help understand the problem?

Thanks,
Peter

> Flaky test: TestBeeLineDriver [smb_mapjoin_7]
> -
>
> Key: HIVE-17198
> URL: https://issues.apache.org/jira/browse/HIVE-17198
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Janaki Lahorani
>
> Error:
> Exception running or analyzing the results of the query file: 
> org.apache.hive.beeline.QFile@4f7b68ad



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17322) Serialise BeeLine qtest execution to prevent flakyness

2017-08-16 Thread Barna Zsombor Klara (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128859#comment-16128859
 ] 

Barna Zsombor Klara commented on HIVE-17322:


[~pvary]
BeeLineTest failures seem to have disappeared. I think we can safely assume 
that the parallelism was the source of the flakiness.

> Serialise BeeLine qtest execution to prevent flakyness
> --
>
> Key: HIVE-17322
> URL: https://issues.apache.org/jira/browse/HIVE-17322
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Minor
> Attachments: HIVE-17322.01.patch, HIVE-17322.02.patch, 
> HIVE-17322.03.patch, HIVE-17322.04.patch, HIVE-17322.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17333) Schema changes in HIVE-12274 for Oracle may not work for upgrade

2017-08-16 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam reassigned HIVE-17333:



> Schema changes in HIVE-12274 for Oracle may not work for upgrade
> 
>
> Key: HIVE-17333
> URL: https://issues.apache.org/jira/browse/HIVE-17333
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>
> According to 
> https://asktom.oracle.com/pls/asktom/f?p=100:11:0P11_QUESTION_ID:1770086700346491686
>  (reported in HIVE-12274)
> The alter table command to change the column datatype from {{VARCHAR}} to 
> {{CLOB}} may not work. So the correct way to accomplish this is to add a new 
> temp column, copy the value from the current column, drop the current column 
> and rename the new column to old column.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17300) WebUI query plan graphs

2017-08-16 Thread Karen Coppage (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-17300:
-
Description: 
Hi all,

I’m working on a feature of the Hive WebUI Query Plan tab that would provide 
the option to display the query plan as a nice graph (scroll down for 
screenshots). If you click on one of the graph’s stages, the plan for that 
stage appears as text below. 

Stages are color-coded if they have a status (Success, Error, Running), and the 
rest are grayed out. Coloring is based on status already available in the 
WebUI, under the Stages tab.

There is an additional option to display stats for MapReduce tasks. This 
includes the job’s ID, tracking URL (where the logs are found), and mapper and 
reducer numbers/progress, among other info. 

The library I’m using for the graph is called vis.js (http://visjs.org/). It 
has an Apache license, and the only necessary file to be included from this 
library is about 700 KB.

I tried to keep server-side changes minimal, and graph generation is taken care 
of by the client. Plans with more than a given number of stages (default: 25) 
won't be displayed in order to preserve resources.

I’d love to hear any and all input from the community about this feature: do 
you think it’s useful, and is there anything important I’m missing?

Thanks,

Karen Coppage

Review request: https://reviews.apache.org/r/61663/
Any input is welcome!

  was:
Hi all,

I’m working on a feature of the Hive WebUI Query Plan tab that would provide 
the option to display the query plan as a nice graph (scroll down for 
screenshots). If you click on one of the graph’s stages, the plan for that 
stage appears as text below. 

Stages are color-coded if they have a status (Success, Error, Running), and the 
rest are grayed out. Coloring is based on status already available in the 
WebUI, under the Stages tab.

There is an additional option to display stats for MapReduce tasks. This 
includes the job’s ID, tracking URL (where the logs are found), and mapper and 
reducer numbers/progress, among other info. 

The library I’m using for the graph is called vis.js (http://visjs.org/). It 
has an Apache license, and the only necessary file to be included from this 
library is about 700 KB.

I tried to keep server-side changes minimal, and graph generation is taken care 
of by the client. Plans with more than a given number of stages (default: 25) 
won't be displayed in order to preserve resources.

I’d love to hear any and all input from the community about this feature: do 
you think it’s useful, and is there anything important I’m missing?

Thanks,

Karen Coppage


> WebUI query plan graphs
> ---
>
> Key: HIVE-17300
> URL: https://issues.apache.org/jira/browse/HIVE-17300
> Project: Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Karen Coppage
>Assignee: Karen Coppage
> Attachments: complete_success.png, full_mapred_stats.png, 
> graph_with_mapred_stats.png, HIVE-17300.3.patch, HIVE-17300.patch, 
> last_stage_error.png, last_stage_running.png, non_mapred_task_selected.png
>
>
> Hi all,
> I’m working on a feature of the Hive WebUI Query Plan tab that would provide 
> the option to display the query plan as a nice graph (scroll down for 
> screenshots). If you click on one of the graph’s stages, the plan for that 
> stage appears as text below. 
> Stages are color-coded if they have a status (Success, Error, Running), and 
> the rest are grayed out. Coloring is based on status already available in the 
> WebUI, under the Stages tab.
> There is an additional option to display stats for MapReduce tasks. This 
> includes the job’s ID, tracking URL (where the logs are found), and mapper 
> and reducer numbers/progress, among other info. 
> The library I’m using for the graph is called vis.js (http://visjs.org/). It 
> has an Apache license, and the only necessary file to be included from this 
> library is about 700 KB.
> I tried to keep server-side changes minimal, and graph generation is taken 
> care of by the client. Plans with more than a given number of stages 
> (default: 25) won't be displayed in order to preserve resources.
> I’d love to hear any and all input from the community about this feature: do 
> you think it’s useful, and is there anything important I’m missing?
> Thanks,
> Karen Coppage
> Review request: https://reviews.apache.org/r/61663/
> Any input is welcome!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17300) WebUI query plan graphs

2017-08-16 Thread Karen Coppage (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-17300:
-
Attachment: HIVE-17300.3.patch

> WebUI query plan graphs
> ---
>
> Key: HIVE-17300
> URL: https://issues.apache.org/jira/browse/HIVE-17300
> Project: Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Karen Coppage
>Assignee: Karen Coppage
> Attachments: complete_success.png, full_mapred_stats.png, 
> graph_with_mapred_stats.png, HIVE-17300.3.patch, HIVE-17300.patch, 
> last_stage_error.png, last_stage_running.png, non_mapred_task_selected.png
>
>
> Hi all,
> I’m working on a feature of the Hive WebUI Query Plan tab that would provide 
> the option to display the query plan as a nice graph (scroll down for 
> screenshots). If you click on one of the graph’s stages, the plan for that 
> stage appears as text below. 
> Stages are color-coded if they have a status (Success, Error, Running), and 
> the rest are grayed out. Coloring is based on status already available in the 
> WebUI, under the Stages tab.
> There is an additional option to display stats for MapReduce tasks. This 
> includes the job’s ID, tracking URL (where the logs are found), and mapper 
> and reducer numbers/progress, among other info. 
> The library I’m using for the graph is called vis.js (http://visjs.org/). It 
> has an Apache license, and the only necessary file to be included from this 
> library is about 700 KB.
> I tried to keep server-side changes minimal, and graph generation is taken 
> care of by the client. Plans with more than a given number of stages 
> (default: 25) won't be displayed in order to preserve resources.
> I’d love to hear any and all input from the community about this feature: do 
> you think it’s useful, and is there anything important I’m missing?
> Thanks,
> Karen Coppage



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17300) WebUI query plan graphs

2017-08-16 Thread Karen Coppage (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-17300:
-
Attachment: (was: HIVE-17300.2.patch)

> WebUI query plan graphs
> ---
>
> Key: HIVE-17300
> URL: https://issues.apache.org/jira/browse/HIVE-17300
> Project: Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Karen Coppage
>Assignee: Karen Coppage
> Attachments: complete_success.png, full_mapred_stats.png, 
> graph_with_mapred_stats.png, HIVE-17300.patch, last_stage_error.png, 
> last_stage_running.png, non_mapred_task_selected.png
>
>
> Hi all,
> I’m working on a feature of the Hive WebUI Query Plan tab that would provide 
> the option to display the query plan as a nice graph (scroll down for 
> screenshots). If you click on one of the graph’s stages, the plan for that 
> stage appears as text below. 
> Stages are color-coded if they have a status (Success, Error, Running), and 
> the rest are grayed out. Coloring is based on status already available in the 
> WebUI, under the Stages tab.
> There is an additional option to display stats for MapReduce tasks. This 
> includes the job’s ID, tracking URL (where the logs are found), and mapper 
> and reducer numbers/progress, among other info. 
> The library I’m using for the graph is called vis.js (http://visjs.org/). It 
> has an Apache license, and the only necessary file to be included from this 
> library is about 700 KB.
> I tried to keep server-side changes minimal, and graph generation is taken 
> care of by the client. Plans with more than a given number of stages 
> (default: 25) won't be displayed in order to preserve resources.
> I’d love to hear any and all input from the community about this feature: do 
> you think it’s useful, and is there anything important I’m missing?
> Thanks,
> Karen Coppage



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17300) WebUI query plan graphs

2017-08-16 Thread Karen Coppage (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-17300:
-
Attachment: HIVE-17300.2.patch

> WebUI query plan graphs
> ---
>
> Key: HIVE-17300
> URL: https://issues.apache.org/jira/browse/HIVE-17300
> Project: Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Karen Coppage
>Assignee: Karen Coppage
> Attachments: complete_success.png, full_mapred_stats.png, 
> graph_with_mapred_stats.png, HIVE-17300.2.patch, HIVE-17300.patch, 
> last_stage_error.png, last_stage_running.png, non_mapred_task_selected.png
>
>
> Hi all,
> I’m working on a feature of the Hive WebUI Query Plan tab that would provide 
> the option to display the query plan as a nice graph (scroll down for 
> screenshots). If you click on one of the graph’s stages, the plan for that 
> stage appears as text below. 
> Stages are color-coded if they have a status (Success, Error, Running), and 
> the rest are grayed out. Coloring is based on status already available in the 
> WebUI, under the Stages tab.
> There is an additional option to display stats for MapReduce tasks. This 
> includes the job’s ID, tracking URL (where the logs are found), and mapper 
> and reducer numbers/progress, among other info. 
> The library I’m using for the graph is called vis.js (http://visjs.org/). It 
> has an Apache license, and the only necessary file to be included from this 
> library is about 700 KB.
> I tried to keep server-side changes minimal, and graph generation is taken 
> care of by the client. Plans with more than a given number of stages 
> (default: 25) won't be displayed in order to preserve resources.
> I’d love to hear any and all input from the community about this feature: do 
> you think it’s useful, and is there anything important I’m missing?
> Thanks,
> Karen Coppage



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17332) NullPointer exception when processing query

2017-08-16 Thread Lukas Waldmann (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128805#comment-16128805
 ] 

Lukas Waldmann commented on HIVE-17332:
---

When hive.auto.convert.join is set to true than the query finish without 
exception

> NullPointer exception when processing query
> ---
>
> Key: HIVE-17332
> URL: https://issues.apache.org/jira/browse/HIVE-17332
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Lukas Waldmann
>
> Hive query:
> {code}
> select count(*) from (select * from EXM_BASE_DATA, (select max(snapshot) 
> max_snapshot from EXM_BASE_DATA) s0 where snapshot == max_snapshot) t;
> {code}
> finish with NullPointer exception
> while 
> {code}
> select * from EXM_BASE_DATA, (select max(snapshot) max_snapshot from 
> EXM_BASE_DATA) s0 where snapshot == max_snapshot
> {code}
> is executed without error



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17332) NullPointer exception when processing query

2017-08-16 Thread Lukas Waldmann (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128801#comment-16128801
 ] 

Lukas Waldmann commented on HIVE-17332:
---

snapshot is partitioning column
exception thrown:
{code}
TaskAttempt 3 failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row 
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row 
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:328)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:52)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
... 17 more
Caused by: java.lang.NullPointerException
at java.lang.System.arraycopy(Native Method)
at org.apache.hadoop.io.Text.set(Text.java:225)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow$VarCharExtractorByValue.extract(VectorExtractRow.java:518)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRow(VectorExtractRow.java:744)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.process(VectorReduceSinkOperator.java:102)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:167)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
... 18 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 
killedTasks:1, Vertex vertex_1498932448021_153317_4_00 [Map 5] killed/failed 
due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 4, 
vertexId=vertex_1498932448021_153317_4_04, diagnostics=[Vertex received Kill 
while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, 
failedTasks:0 killedTasks:1, Vertex vertex_1498932448021_153317_4_04 [Reducer 
4] killed/failed due to:OTHER_VERTEX_FAILURE]Vertex killed, vertexName=Reducer 
3, vertexId=vertex_1498932448021_153317_4_03, diagnostics=[Vertex received Kill 
while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, 
failedTasks:0 killedTasks:1, Vertex vertex_1498932448021_153317_4_03 [Reducer 
3] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to 
VERTEX_FAILURE. failedVertices:1 killedVertices:2
{code}

> NullPointer exception when processing query
> ---
>
> Key: HIVE-17332
> URL: https://issues.apache.org/jira/browse/HIVE-17332
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Lukas Waldmann
>
> Hive query:
> {code}
> select count(*) from (select * from EXM_BASE_DATA, (select max(snapshot) 
> max_snapshot from EXM_BASE_DATA) s0 where snapshot == 

[jira] [Commented] (HIVE-17331) Path must be used as key type of the pathToAlises

2017-08-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128792#comment-16128792
 ] 

Hive QA commented on HIVE-17331:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12882115/HIVE-17331.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10976 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=240)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat6]
 (batchId=7)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgress (batchId=222)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6421/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6421/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6421/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12882115 - PreCommit-HIVE-Build

> Path must be used as key type of the pathToAlises
> -
>
> Key: HIVE-17331
> URL: https://issues.apache.org/jira/browse/HIVE-17331
> Project: Hive
>  Issue Type: Bug
>Reporter: Oleg Danilov
>Assignee: Oleg Danilov
>Priority: Minor
> Attachments: HIVE-17331.patch
>
>
> This code uses String instead of Path as key type of the pathToAliases map, 
> so seems like get(String) always null.
> +*GenMapRedUtils.java*+
> {code:java}
> for (int pos = 0; pos < size; pos++) {
>   String taskTmpDir = taskTmpDirLst.get(pos);
>   TableDesc tt_desc = tt_descLst.get(pos);
>   MapWork mWork = plan.getMapWork();
>   if (mWork.getPathToAliases().get(taskTmpDir) == null) {
> taskTmpDir = taskTmpDir.intern();
> Path taskTmpDirPath = 
> StringInternUtils.internUriStringsInPath(new Path(taskTmpDir));
> mWork.removePathToAlias(taskTmpDirPath);
> mWork.addPathToAlias(taskTmpDirPath, taskTmpDir);
> mWork.addPathToPartitionInfo(taskTmpDirPath, new 
> PartitionDesc(tt_desc, null));
> mWork.getAliasToWork().put(taskTmpDir, topOperators.get(pos));
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17272) when hive.vectorized.execution.enabled is true, query on empty partitioned table fails with NPE

2017-08-16 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128774#comment-16128774
 ] 

Naveen Gangam commented on HIVE-17272:
--

[~aihuaxu] Appears NPE could occur in both the init() methods as well. Have you 
investigated these as well?
{code}
public void init(Configuration hconf)
throws Exception {
  VectorPartitionDesc vectorPartDesc = partDesc.getVectorPartitionDesc();
...
  TypeInfo[] dataTypeInfos = vectorPartDesc.getDataTypeInfos();
{code}

Thanks

> when hive.vectorized.execution.enabled is true, query on empty partitioned 
> table fails with NPE
> ---
>
> Key: HIVE-17272
> URL: https://issues.apache.org/jira/browse/HIVE-17272
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.1.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-17272.2.patch
>
>
> {noformat}
> set hive.vectorized.execution.enabled=true;
> CREATE TABLE `tab`(`x` int) PARTITIONED BY ( `y` int) stored as parquet;
> select * from tab t1 join tab t2 where t1.x=t2.x;
> {noformat}
> The query fails with the following exception.
> {noformat}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.createAndInitPartitionContext(VectorMapOperator.java:386)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.internalSetChildren(VectorMapOperator.java:559)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.setChildren(VectorMapOperator.java:474)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:106) 
> ~[hive-exec-2.3.0.jar:2.3.0]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_101]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_101]
> at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
> ~[hadoop-common-2.6.0.jar:?]
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) 
> ~[hadoop-common-2.6.0.jar:?]
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
> ~[hadoop-common-2.6.0.jar:?]
> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) 
> ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_101]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_101]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_101]
> at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
> ~[hadoop-common-2.6.0.jar:?]
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) 
> ~[hadoop-common-2.6.0.jar:?]
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
> ~[hadoop-common-2.6.0.jar:?]
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413) 
> ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?]
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) 
> ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?]
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
>  ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_101]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[?:1.8.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[?:1.8.0_101]
> at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_101]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17322) Serialise BeeLine qtest execution to prevent flakyness

2017-08-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128699#comment-16128699
 ] 

Hive QA commented on HIVE-17322:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12882105/HIVE-17322.05.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10976 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testDelegationTokenSharedStore
 (batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6420/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6420/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6420/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12882105 - PreCommit-HIVE-Build

> Serialise BeeLine qtest execution to prevent flakyness
> --
>
> Key: HIVE-17322
> URL: https://issues.apache.org/jira/browse/HIVE-17322
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Minor
> Attachments: HIVE-17322.01.patch, HIVE-17322.02.patch, 
> HIVE-17322.03.patch, HIVE-17322.04.patch, HIVE-17322.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17325) Clean up intermittently failing uni tests

2017-08-16 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128674#comment-16128674
 ] 

Peter Vary commented on HIVE-17325:
---

You are right [~zsombor.klara], made a copy-paste mistake :)

> Clean up intermittently failing uni tests
> -
>
> Key: HIVE-17325
> URL: https://issues.apache.org/jira/browse/HIVE-17325
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Reporter: Alan Gates
>Assignee: Alan Gates
>
> We have a number of intermittently failing tests.  I propose to disable these 
> so that we can get clean (or at least cleaner) CI runs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >