[jira] [Commented] (HIVE-20930) VectorCoalesce in FILTER mode doesn't take effect

2018-11-27 Thread Teddy Choi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700183#comment-16700183
 ] 

Teddy Choi commented on HIVE-20930:
---

Pushed to master. Thanks, [~ashutoshc]!

> VectorCoalesce in FILTER mode doesn't take effect
> -
>
> Key: HIVE-20930
> URL: https://issues.apache.org/jira/browse/HIVE-20930
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
> Attachments: HIVE-20930.1.patch, HIVE-20930.2.patch, 
> HIVE-20930.3.patch
>
>
> HIVE-20277 fixed vectorized case expressions for FILTER, but VectorCoalesce 
> is still not fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20794) Use Zookeeper for metastore service discovery

2018-11-27 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-20794:
--
Attachment: HIVE-20794.08
Status: Patch Available  (was: In Progress)

The last patch fixed a findbugs warning about synchronization but caused a 
failure in TestActivePassiveHA.testConnectionActivePassiveHAServiceDiscovery 
because the hiveserver2 didn't shutdown within the specified time. After 
debugging it I found that that happened because synchronization on method 
isDeregisteredWithZooKeeper(). This patch moves that synchronization where it 
should be and fixes the test failure as well as findbugs notice. Also updated 
the pull request.

> Use Zookeeper for metastore service discovery
> -
>
> Key: HIVE-20794
> URL: https://issues.apache.org/jira/browse/HIVE-20794
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20794.01, HIVE-20794.02, HIVE-20794.03, 
> HIVE-20794.03, HIVE-20794.04, HIVE-20794.05, HIVE-20794.06, HIVE-20794.07, 
> HIVE-20794.07, HIVE-20794.08
>
>
> Right now, multiple metastore services can be specified in 
> hive.metastore.uris configuration, but that list is static and can not be 
> modified dynamically. Use Zookeeper for dynamic service discovery of 
> metastore.
> h3. Improve ZooKeeperHiveHelper class (suggestions for name welcome)
> The Zookeeper related code (for service discovery) accesses Zookeeper 
> parameters directly from HiveConf. The class is changed so that it could be 
> used for both HiveServer2 and Metastore server and works with both the 
> configurations. Following methods from HiveServer2 are now moved into 
> ZooKeeperHiveHelper. # startZookeeperClient # addServerInstanceToZooKeeper # 
> removeServerInstanceFromZooKeeper
> h3. HiveMetaStore conf changes
>  # THRIFT_URIS (hive.metastore.uris) can also be used to specify ZooKeeper 
> quorum. When THRIFT_SERVICE_DISCOVERY_MODE 
> (hive.metastore.service.discovery.mode) is set to "zookeeper" the URIs are 
> used as ZooKeeper quorum. When it's set to be empty, the URIs are used to 
> locate the metastore directly.
>  # Here's list of Hiveserver2's parameters and their proposed metastore conf 
> counterparts. It looks odd that the Metastore related configurations do not 
> have their macros start with METASTORE, but start with THRIFT. I have just 
> followed naming convention used for other parameters.
>  ** HIVE_SERVER2_ZOOKEEPER_NAMESPACE - THRIFT_ZOOKEEPER_NAMESPACE 
> (hive.metastore.zookeeper.namespace)
>  ** HIVE_ZOOKEEPER_CLIENT_PORT - THRIFT_ZOOKEEPER_CLIENT_PORT 
> (hive.metastore.zookeeper.client.port)
>  ** HIVE_ZOOKEEPER_CONNECTION_TIMEOUT - THRIFT_ZOOKEEPER_CONNECTION_TIMEOUT - 
> (hive.metastore.zookeeper.connection.timeout)
>  ** HIVE_ZOOKEEPER_CONNECTION_MAX_RETRIES - 
> THRIFT_ZOOKEEPER_CONNECTION_MAX_RETRIES 
> (hive.metastore.zookeeper.connection.max.retries)
>  ** HIVE_ZOOKEEPER_CONNECTION_BASESLEEPTIME - 
> THRIFT_ZOOKEEPER_CONNECTION_BASESLEEPTIME 
> (hive.metastore.zookeeper.connection.basesleeptime)
>  # Additional configuration THRIFT_BIND_HOST is used to specify the host 
> address to bind Metastore service to. Right now Metastore binds to *, i.e all 
> addresses. Metastore doesn't then know which of those addresses it should add 
> to the ZooKeeper. THRIFT_BIND_HOST solves that problem. When this 
> configuration is specified the metastore server binds to that address and 
> also adds it to the ZooKeeper if dynamic service discovery mode is ZooKeeper.
> Following Hive ZK configurations seem to be related to managing locks and 
> seem irrelevant for MS ZK.
>  # HIVE_ZOOKEEPER_SESSION_TIMEOUT
>  # HIVE_ZOOKEEPER_CLEAN_EXTRA_NODES
> Since there is no configuration to be published, 
> HIVE_ZOOKEEPER_PUBLISH_CONFIGS does not have a THRIFT counterpart.
> h3. HiveMetaStore class changes
>  # startMetaStore should also register the instance with Zookeeper, when 
> configured.
>  # When shutting a metastore server down it should deregister itself from 
> Zookeeper, when configured.
>  # These changes use the refactored code described above.
> h3. HiveMetaStoreClient class changes
> When service discovery mode is zookeeper, we fetch the metatstore URIs from 
> the specified ZooKeeper and treat those as if they were specified in 
> THRIFT_URIS i.e. use the existing mechanisms to choose a metastore server to 
> connect to and establish a connection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20330) HCatLoader cannot handle multiple InputJobInfo objects for a job with multiple inputs

2018-11-27 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated HIVE-20330:
--
Status: Patch Available  (was: In Progress)

> HCatLoader cannot handle multiple InputJobInfo objects for a job with 
> multiple inputs
> -
>
> Key: HIVE-20330
> URL: https://issues.apache.org/jira/browse/HIVE-20330
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-20330.0.patch, HIVE-20330.1.patch, 
> HIVE-20330.2.patch, HIVE-20330.3.patch
>
>
> While running performance tests on Pig (0.12 and 0.17) we've observed a huge 
> performance drop in a workload that has multiple inputs from HCatLoader.
> The reason is that for a particular MR job with multiple Hive tables as 
> input, Pig calls {{setLocation}} on each {{LoaderFunc (HCatLoader)}} instance 
> but only one table's information (InputJobInfo instance) gets tracked in the 
> JobConf. (This is under config key {{HCatConstants.HCAT_KEY_JOB_INFO}}).
> Any such call overwrites preexisting values, and thus only the last table's 
> information will be considered when Pig calls {{getStatistics}} to calculate 
> and estimate required reducer count.
> In cases when there are 2 input tables, 256GB and 1MB in size respectively, 
> Pig will query the size information from HCat for both of them, but it will 
> either see 1MB+1MB=2MB or 256GB+256GB=0.5TB depending on input order in the 
> execution plan's DAG.
> It should of course see 256.00097GB in total and use 257 reducers by default 
> accordingly.
> In unlucky cases this will be seen as 2MB and 1 reducer will have to struggle 
> with the actual 256.00097GB...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20330) HCatLoader cannot handle multiple InputJobInfo objects for a job with multiple inputs

2018-11-27 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated HIVE-20330:
--
Attachment: HIVE-20330.3.patch

> HCatLoader cannot handle multiple InputJobInfo objects for a job with 
> multiple inputs
> -
>
> Key: HIVE-20330
> URL: https://issues.apache.org/jira/browse/HIVE-20330
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-20330.0.patch, HIVE-20330.1.patch, 
> HIVE-20330.2.patch, HIVE-20330.3.patch
>
>
> While running performance tests on Pig (0.12 and 0.17) we've observed a huge 
> performance drop in a workload that has multiple inputs from HCatLoader.
> The reason is that for a particular MR job with multiple Hive tables as 
> input, Pig calls {{setLocation}} on each {{LoaderFunc (HCatLoader)}} instance 
> but only one table's information (InputJobInfo instance) gets tracked in the 
> JobConf. (This is under config key {{HCatConstants.HCAT_KEY_JOB_INFO}}).
> Any such call overwrites preexisting values, and thus only the last table's 
> information will be considered when Pig calls {{getStatistics}} to calculate 
> and estimate required reducer count.
> In cases when there are 2 input tables, 256GB and 1MB in size respectively, 
> Pig will query the size information from HCat for both of them, but it will 
> either see 1MB+1MB=2MB or 256GB+256GB=0.5TB depending on input order in the 
> execution plan's DAG.
> It should of course see 256.00097GB in total and use 257 reducers by default 
> accordingly.
> In unlucky cases this will be seen as 2MB and 1 reducer will have to struggle 
> with the actual 256.00097GB...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20330) HCatLoader cannot handle multiple InputJobInfo objects for a job with multiple inputs

2018-11-27 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated HIVE-20330:
--
Attachment: (was: HIVE-20330.4.patch)

> HCatLoader cannot handle multiple InputJobInfo objects for a job with 
> multiple inputs
> -
>
> Key: HIVE-20330
> URL: https://issues.apache.org/jira/browse/HIVE-20330
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-20330.0.patch, HIVE-20330.1.patch, 
> HIVE-20330.2.patch
>
>
> While running performance tests on Pig (0.12 and 0.17) we've observed a huge 
> performance drop in a workload that has multiple inputs from HCatLoader.
> The reason is that for a particular MR job with multiple Hive tables as 
> input, Pig calls {{setLocation}} on each {{LoaderFunc (HCatLoader)}} instance 
> but only one table's information (InputJobInfo instance) gets tracked in the 
> JobConf. (This is under config key {{HCatConstants.HCAT_KEY_JOB_INFO}}).
> Any such call overwrites preexisting values, and thus only the last table's 
> information will be considered when Pig calls {{getStatistics}} to calculate 
> and estimate required reducer count.
> In cases when there are 2 input tables, 256GB and 1MB in size respectively, 
> Pig will query the size information from HCat for both of them, but it will 
> either see 1MB+1MB=2MB or 256GB+256GB=0.5TB depending on input order in the 
> execution plan's DAG.
> It should of course see 256.00097GB in total and use 257 reducers by default 
> accordingly.
> In unlucky cases this will be seen as 2MB and 1 reducer will have to struggle 
> with the actual 256.00097GB...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20330) HCatLoader cannot handle multiple InputJobInfo objects for a job with multiple inputs

2018-11-27 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated HIVE-20330:
--
Status: In Progress  (was: Patch Available)

> HCatLoader cannot handle multiple InputJobInfo objects for a job with 
> multiple inputs
> -
>
> Key: HIVE-20330
> URL: https://issues.apache.org/jira/browse/HIVE-20330
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-20330.0.patch, HIVE-20330.1.patch, 
> HIVE-20330.2.patch
>
>
> While running performance tests on Pig (0.12 and 0.17) we've observed a huge 
> performance drop in a workload that has multiple inputs from HCatLoader.
> The reason is that for a particular MR job with multiple Hive tables as 
> input, Pig calls {{setLocation}} on each {{LoaderFunc (HCatLoader)}} instance 
> but only one table's information (InputJobInfo instance) gets tracked in the 
> JobConf. (This is under config key {{HCatConstants.HCAT_KEY_JOB_INFO}}).
> Any such call overwrites preexisting values, and thus only the last table's 
> information will be considered when Pig calls {{getStatistics}} to calculate 
> and estimate required reducer count.
> In cases when there are 2 input tables, 256GB and 1MB in size respectively, 
> Pig will query the size information from HCat for both of them, but it will 
> either see 1MB+1MB=2MB or 256GB+256GB=0.5TB depending on input order in the 
> execution plan's DAG.
> It should of course see 256.00097GB in total and use 257 reducers by default 
> accordingly.
> In unlucky cases this will be seen as 2MB and 1 reducer will have to struggle 
> with the actual 256.00097GB...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20330) HCatLoader cannot handle multiple InputJobInfo objects for a job with multiple inputs

2018-11-27 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated HIVE-20330:
--
Attachment: HIVE-20330.4.patch

> HCatLoader cannot handle multiple InputJobInfo objects for a job with 
> multiple inputs
> -
>
> Key: HIVE-20330
> URL: https://issues.apache.org/jira/browse/HIVE-20330
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-20330.0.patch, HIVE-20330.1.patch, 
> HIVE-20330.2.patch
>
>
> While running performance tests on Pig (0.12 and 0.17) we've observed a huge 
> performance drop in a workload that has multiple inputs from HCatLoader.
> The reason is that for a particular MR job with multiple Hive tables as 
> input, Pig calls {{setLocation}} on each {{LoaderFunc (HCatLoader)}} instance 
> but only one table's information (InputJobInfo instance) gets tracked in the 
> JobConf. (This is under config key {{HCatConstants.HCAT_KEY_JOB_INFO}}).
> Any such call overwrites preexisting values, and thus only the last table's 
> information will be considered when Pig calls {{getStatistics}} to calculate 
> and estimate required reducer count.
> In cases when there are 2 input tables, 256GB and 1MB in size respectively, 
> Pig will query the size information from HCat for both of them, but it will 
> either see 1MB+1MB=2MB or 256GB+256GB=0.5TB depending on input order in the 
> execution plan's DAG.
> It should of course see 256.00097GB in total and use 257 reducers by default 
> accordingly.
> In unlucky cases this will be seen as 2MB and 1 reducer will have to struggle 
> with the actual 256.00097GB...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20794) Use Zookeeper for metastore service discovery

2018-11-27 Thread Ashutosh Bapat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-20794:
--
Status: In Progress  (was: Patch Available)

> Use Zookeeper for metastore service discovery
> -
>
> Key: HIVE-20794
> URL: https://issues.apache.org/jira/browse/HIVE-20794
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20794.01, HIVE-20794.02, HIVE-20794.03, 
> HIVE-20794.03, HIVE-20794.04, HIVE-20794.05, HIVE-20794.06, HIVE-20794.07, 
> HIVE-20794.07
>
>
> Right now, multiple metastore services can be specified in 
> hive.metastore.uris configuration, but that list is static and can not be 
> modified dynamically. Use Zookeeper for dynamic service discovery of 
> metastore.
> h3. Improve ZooKeeperHiveHelper class (suggestions for name welcome)
> The Zookeeper related code (for service discovery) accesses Zookeeper 
> parameters directly from HiveConf. The class is changed so that it could be 
> used for both HiveServer2 and Metastore server and works with both the 
> configurations. Following methods from HiveServer2 are now moved into 
> ZooKeeperHiveHelper. # startZookeeperClient # addServerInstanceToZooKeeper # 
> removeServerInstanceFromZooKeeper
> h3. HiveMetaStore conf changes
>  # THRIFT_URIS (hive.metastore.uris) can also be used to specify ZooKeeper 
> quorum. When THRIFT_SERVICE_DISCOVERY_MODE 
> (hive.metastore.service.discovery.mode) is set to "zookeeper" the URIs are 
> used as ZooKeeper quorum. When it's set to be empty, the URIs are used to 
> locate the metastore directly.
>  # Here's list of Hiveserver2's parameters and their proposed metastore conf 
> counterparts. It looks odd that the Metastore related configurations do not 
> have their macros start with METASTORE, but start with THRIFT. I have just 
> followed naming convention used for other parameters.
>  ** HIVE_SERVER2_ZOOKEEPER_NAMESPACE - THRIFT_ZOOKEEPER_NAMESPACE 
> (hive.metastore.zookeeper.namespace)
>  ** HIVE_ZOOKEEPER_CLIENT_PORT - THRIFT_ZOOKEEPER_CLIENT_PORT 
> (hive.metastore.zookeeper.client.port)
>  ** HIVE_ZOOKEEPER_CONNECTION_TIMEOUT - THRIFT_ZOOKEEPER_CONNECTION_TIMEOUT - 
> (hive.metastore.zookeeper.connection.timeout)
>  ** HIVE_ZOOKEEPER_CONNECTION_MAX_RETRIES - 
> THRIFT_ZOOKEEPER_CONNECTION_MAX_RETRIES 
> (hive.metastore.zookeeper.connection.max.retries)
>  ** HIVE_ZOOKEEPER_CONNECTION_BASESLEEPTIME - 
> THRIFT_ZOOKEEPER_CONNECTION_BASESLEEPTIME 
> (hive.metastore.zookeeper.connection.basesleeptime)
>  # Additional configuration THRIFT_BIND_HOST is used to specify the host 
> address to bind Metastore service to. Right now Metastore binds to *, i.e all 
> addresses. Metastore doesn't then know which of those addresses it should add 
> to the ZooKeeper. THRIFT_BIND_HOST solves that problem. When this 
> configuration is specified the metastore server binds to that address and 
> also adds it to the ZooKeeper if dynamic service discovery mode is ZooKeeper.
> Following Hive ZK configurations seem to be related to managing locks and 
> seem irrelevant for MS ZK.
>  # HIVE_ZOOKEEPER_SESSION_TIMEOUT
>  # HIVE_ZOOKEEPER_CLEAN_EXTRA_NODES
> Since there is no configuration to be published, 
> HIVE_ZOOKEEPER_PUBLISH_CONFIGS does not have a THRIFT counterpart.
> h3. HiveMetaStore class changes
>  # startMetaStore should also register the instance with Zookeeper, when 
> configured.
>  # When shutting a metastore server down it should deregister itself from 
> Zookeeper, when configured.
>  # These changes use the refactored code described above.
> h3. HiveMetaStoreClient class changes
> When service discovery mode is zookeeper, we fetch the metatstore URIs from 
> the specified ZooKeeper and treat those as if they were specified in 
> THRIFT_URIS i.e. use the existing mechanisms to choose a metastore server to 
> connect to and establish a connection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-20974) TezTask should set task exception on failures

2018-11-27 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan reassigned HIVE-20974:
---

Assignee: Rajesh Balamohan

> TezTask should set task exception on failures
> -
>
> Key: HIVE-20974
> URL: https://issues.apache.org/jira/browse/HIVE-20974
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Trivial
>
> TezTask logs the error as "Failed to execute tez graph" and proceeds further. 
> "TaskRunner.runSequentail()" code would not be able to get these exceptions 
> for TezTask. If there are any failure hooks configured, these exceptions 
> wouldn't show up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20440) Create better cache eviction policy for SmallTableCache

2018-11-27 Thread Antal Sinkovits (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits updated HIVE-20440:
---
Attachment: HIVE-20440.15.patch

> Create better cache eviction policy for SmallTableCache
> ---
>
> Key: HIVE-20440
> URL: https://issues.apache.org/jira/browse/HIVE-20440
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
> Attachments: HIVE-20440.01.patch, HIVE-20440.02.patch, 
> HIVE-20440.03.patch, HIVE-20440.04.patch, HIVE-20440.05.patch, 
> HIVE-20440.06.patch, HIVE-20440.07.patch, HIVE-20440.08.patch, 
> HIVE-20440.09.patch, HIVE-20440.10.patch, HIVE-20440.11.patch, 
> HIVE-20440.12.patch, HIVE-20440.13.patch, HIVE-20440.14.patch.txt, 
> HIVE-20440.15.patch
>
>
> Enhance the SmallTableCache, to use guava cache with soft references, so that 
> we evict when there is memory pressure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20971) TestJdbcWithDBTokenStore[*] should both use MiniHiveKdc.getMiniHS2WithKerbWithRemoteHMSWithKerb

2018-11-27 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700120#comment-16700120
 ] 

Hive QA commented on HIVE-20971:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
 1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m  
0s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 10m 38s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-15070/dev-support/hive-personality.sh
 |
| git revision | master / d3f8aba |
| Default Java | 1.8.0_111 |
| modules | C: itests/hive-minikdc U: itests/hive-minikdc |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15070/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> TestJdbcWithDBTokenStore[*] should both use 
> MiniHiveKdc.getMiniHS2WithKerbWithRemoteHMSWithKerb
> ---
>
> Key: HIVE-20971
> URL: https://issues.apache.org/jira/browse/HIVE-20971
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
> Attachments: HIVE-20971.2.patch, HIVE-20971.patch
>
>
> The original intent was to use 
> MiniHiveKdc.getMiniHS2WithKerbWithRemoteHMSWithKerb in both cases



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20330) HCatLoader cannot handle multiple InputJobInfo objects for a job with multiple inputs

2018-11-27 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated HIVE-20330:
--
Status: Patch Available  (was: In Progress)

> HCatLoader cannot handle multiple InputJobInfo objects for a job with 
> multiple inputs
> -
>
> Key: HIVE-20330
> URL: https://issues.apache.org/jira/browse/HIVE-20330
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-20330.0.patch, HIVE-20330.1.patch, 
> HIVE-20330.2.patch
>
>
> While running performance tests on Pig (0.12 and 0.17) we've observed a huge 
> performance drop in a workload that has multiple inputs from HCatLoader.
> The reason is that for a particular MR job with multiple Hive tables as 
> input, Pig calls {{setLocation}} on each {{LoaderFunc (HCatLoader)}} instance 
> but only one table's information (InputJobInfo instance) gets tracked in the 
> JobConf. (This is under config key {{HCatConstants.HCAT_KEY_JOB_INFO}}).
> Any such call overwrites preexisting values, and thus only the last table's 
> information will be considered when Pig calls {{getStatistics}} to calculate 
> and estimate required reducer count.
> In cases when there are 2 input tables, 256GB and 1MB in size respectively, 
> Pig will query the size information from HCat for both of them, but it will 
> either see 1MB+1MB=2MB or 256GB+256GB=0.5TB depending on input order in the 
> execution plan's DAG.
> It should of course see 256.00097GB in total and use 257 reducers by default 
> accordingly.
> In unlucky cases this will be seen as 2MB and 1 reducer will have to struggle 
> with the actual 256.00097GB...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20330) HCatLoader cannot handle multiple InputJobInfo objects for a job with multiple inputs

2018-11-27 Thread Adam Szita (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated HIVE-20330:
--
Attachment: HIVE-20330.2.patch

> HCatLoader cannot handle multiple InputJobInfo objects for a job with 
> multiple inputs
> -
>
> Key: HIVE-20330
> URL: https://issues.apache.org/jira/browse/HIVE-20330
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-20330.0.patch, HIVE-20330.1.patch, 
> HIVE-20330.2.patch
>
>
> While running performance tests on Pig (0.12 and 0.17) we've observed a huge 
> performance drop in a workload that has multiple inputs from HCatLoader.
> The reason is that for a particular MR job with multiple Hive tables as 
> input, Pig calls {{setLocation}} on each {{LoaderFunc (HCatLoader)}} instance 
> but only one table's information (InputJobInfo instance) gets tracked in the 
> JobConf. (This is under config key {{HCatConstants.HCAT_KEY_JOB_INFO}}).
> Any such call overwrites preexisting values, and thus only the last table's 
> information will be considered when Pig calls {{getStatistics}} to calculate 
> and estimate required reducer count.
> In cases when there are 2 input tables, 256GB and 1MB in size respectively, 
> Pig will query the size information from HCat for both of them, but it will 
> either see 1MB+1MB=2MB or 256GB+256GB=0.5TB depending on input order in the 
> execution plan's DAG.
> It should of course see 256.00097GB in total and use 257 reducers by default 
> accordingly.
> In unlucky cases this will be seen as 2MB and 1 reducer will have to struggle 
> with the actual 256.00097GB...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20969) HoS sessionId generation can cause race conditions when uploading files to HDFS

2018-11-27 Thread Peter Vary (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-20969:
--
   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master.

Thanks for the review [~stakiar]!

> HoS sessionId generation can cause race conditions when uploading files to 
> HDFS
> ---
>
> Key: HIVE-20969
> URL: https://issues.apache.org/jira/browse/HIVE-20969
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 4.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20969.2.patch, HIVE-20969.patch
>
>
> The observed exception is:
> {code}
> Caused by: java.io.FileNotFoundException: File does not exist: 
> /tmp/hive/_spark_session_dir/0/hive-exec-2.1.1-SNAPSHOT.jar (inode 21140) 
> [Lease.  Holder: DFSClient_NONMAPREDUCE_304217459_39, pending creates: 1]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2781)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:599)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2660)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:872)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:550)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20794) Use Zookeeper for metastore service discovery

2018-11-27 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700057#comment-16700057
 ] 

Hive QA commented on HIVE-20794:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12949613/HIVE-20794.07

{color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15632 tests 
executed
*Failed tests:*
{noformat}
org.apache.hive.jdbc.TestActivePassiveHA.testConnectionActivePassiveHAServiceDiscovery
 (batchId=259)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15069/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15069/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15069/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12949613 - PreCommit-HIVE-Build

> Use Zookeeper for metastore service discovery
> -
>
> Key: HIVE-20794
> URL: https://issues.apache.org/jira/browse/HIVE-20794
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20794.01, HIVE-20794.02, HIVE-20794.03, 
> HIVE-20794.03, HIVE-20794.04, HIVE-20794.05, HIVE-20794.06, HIVE-20794.07, 
> HIVE-20794.07
>
>
> Right now, multiple metastore services can be specified in 
> hive.metastore.uris configuration, but that list is static and can not be 
> modified dynamically. Use Zookeeper for dynamic service discovery of 
> metastore.
> h3. Improve ZooKeeperHiveHelper class (suggestions for name welcome)
> The Zookeeper related code (for service discovery) accesses Zookeeper 
> parameters directly from HiveConf. The class is changed so that it could be 
> used for both HiveServer2 and Metastore server and works with both the 
> configurations. Following methods from HiveServer2 are now moved into 
> ZooKeeperHiveHelper. # startZookeeperClient # addServerInstanceToZooKeeper # 
> removeServerInstanceFromZooKeeper
> h3. HiveMetaStore conf changes
>  # THRIFT_URIS (hive.metastore.uris) can also be used to specify ZooKeeper 
> quorum. When THRIFT_SERVICE_DISCOVERY_MODE 
> (hive.metastore.service.discovery.mode) is set to "zookeeper" the URIs are 
> used as ZooKeeper quorum. When it's set to be empty, the URIs are used to 
> locate the metastore directly.
>  # Here's list of Hiveserver2's parameters and their proposed metastore conf 
> counterparts. It looks odd that the Metastore related configurations do not 
> have their macros start with METASTORE, but start with THRIFT. I have just 
> followed naming convention used for other parameters.
>  ** HIVE_SERVER2_ZOOKEEPER_NAMESPACE - THRIFT_ZOOKEEPER_NAMESPACE 
> (hive.metastore.zookeeper.namespace)
>  ** HIVE_ZOOKEEPER_CLIENT_PORT - THRIFT_ZOOKEEPER_CLIENT_PORT 
> (hive.metastore.zookeeper.client.port)
>  ** HIVE_ZOOKEEPER_CONNECTION_TIMEOUT - THRIFT_ZOOKEEPER_CONNECTION_TIMEOUT - 
> (hive.metastore.zookeeper.connection.timeout)
>  ** HIVE_ZOOKEEPER_CONNECTION_MAX_RETRIES - 
> THRIFT_ZOOKEEPER_CONNECTION_MAX_RETRIES 
> (hive.metastore.zookeeper.connection.max.retries)
>  ** HIVE_ZOOKEEPER_CONNECTION_BASESLEEPTIME - 
> THRIFT_ZOOKEEPER_CONNECTION_BASESLEEPTIME 
> (hive.metastore.zookeeper.connection.basesleeptime)
>  # Additional configuration THRIFT_BIND_HOST is used to specify the host 
> address to bind Metastore service to. Right now Metastore binds to *, i.e all 
> addresses. Metastore doesn't then know which of those addresses it should add 
> to the ZooKeeper. THRIFT_BIND_HOST solves that problem. When this 
> configuration is specified the metastore server binds to that address and 
> also adds it to the ZooKeeper if dynamic service discovery mode is ZooKeeper.
> Following Hive ZK configurations seem to be related to managing locks and 
> seem irrelevant for MS ZK.
>  # HIVE_ZOOKEEPER_SESSION_TIMEOUT
>  # HIVE_ZOOKEEPER_CLEAN_EXTRA_NODES
> Since there is no configuration to be published, 
> HIVE_ZOOKEEPER_PUBLISH_CONFIGS does not have a THRIFT counterpart.
> h3. HiveMetaStore class changes
>  # startMetaStore should also register the instance with Zookeeper, when 
> configured.
>  # When shutting a metastore server down it should deregister itself from 
> Zookeeper, when configured.
>  # These changes use the refactored code described above.
> h3. HiveMetaStoreClient class 

[jira] [Commented] (HIVE-20794) Use Zookeeper for metastore service discovery

2018-11-27 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700046#comment-16700046
 ] 

Hive QA commented on HIVE-20794:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
54s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
25s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
28s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
48s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
30s{color} | {color:blue} common in master has 65 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  2m 
11s{color} | {color:blue} standalone-metastore/metastore-common in master has 
29 extant Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  1m  
0s{color} | {color:blue} standalone-metastore/metastore-server in master has 
185 extant Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
41s{color} | {color:blue} ql in master has 2312 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
33s{color} | {color:blue} service in master has 48 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
34s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
44s{color} | {color:blue} itests/util in master has 48 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
17s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  4m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 7s{color} | {color:green} The patch standalone-metastore passed checkstyle 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 7s{color} | {color:green} The patch metastore-common passed checkstyle {color} 
|
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} The patch common passed checkstyle {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 6s{color} | {color:green} The patch metastore-server passed checkstyle {color} 
|
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} ql: The patch generated 0 new + 17 unchanged - 4 
fixed = 17 total (was 21) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} The patch service passed checkstyle {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} The patch hive-unit passed checkstyle {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} The patch util passed checkstyle {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 10m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| 

<    1   2