[jira] [Created] (HIVE-23582) LLAP: Make SplitLocationProvider impl pluggable

2020-05-29 Thread Prasanth Jayachandran (Jira)
Prasanth Jayachandran created HIVE-23582:


 Summary: LLAP: Make SplitLocationProvider impl pluggable
 Key: HIVE-23582
 URL: https://issues.apache.org/jira/browse/HIVE-23582
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


LLAP uses HostAffinitySplitLocationProvider implementation by default. For non 
zookeeper based environments, a different split location provider may be used. 
To facilitate that make the SplitLocationProvider implementation class a 
pluggable. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23581) On service discovery mode, the initial port of hiveserver2 to which zookeeper is applied is not changed.

2020-05-29 Thread shinsunwoo (Jira)
shinsunwoo created HIVE-23581:
-

 Summary: On service discovery mode, the initial port of 
hiveserver2 to which zookeeper is applied is not changed.
 Key: HIVE-23581
 URL: https://issues.apache.org/jira/browse/HIVE-23581
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: All Versions
Reporter: shinsunwoo


When accessing hiveserver2 with 
`hive.server2.support.dynamic.service.discovery` and` 
hive.server2.limit.connections.per.user` applied through the hive jdbc driver, 
The jdbc driver is a method of randomly obtaining domain information (host, 
port) information of hivesever2 registered in the zookeeper.

However, if the hiveserver2 obtained from the zookeeper first fails to connect 
due to the setting value of `hive.server2.limit.connections.per.user`, the port 
will not be initialized due to the following code logic.

 

* 
https://github.com/apache/hive/blob/8443e50fdfa284531300f3ab283a7e4959dba623/jdbc/src/java/org/apache/hive/jdbc/ZooKeeperHiveClientHelper.java#L320

 
{code:java}
if ((matcher.group(1).equals("hive.server2.thrift.http.port"))
   && !(connParams.getPort() > 0)) {
  connParams.setPort(Integer.parseInt(matcher.group(2)));
}
{code}
 

Therefore, if the port of the next accessible hiveserver2 is not the first 
port, a problem occurs.

So I modified the port number to be initialized to "-1" whenever the update 
function (updateConnParamsFromZooKeeper) is executed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23580) deleteOnExit set is not cleaned up, causing memory pressure

2020-05-29 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-23580:


 Summary: deleteOnExit set is not cleaned up, causing memory 
pressure
 Key: HIVE-23580
 URL: https://issues.apache.org/jira/browse/HIVE-23580
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 4.0.0
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0


removeScratchDir doesn't always calls cancelDeleteOnExit() on context::clear



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23579) Introduce ReturnTypeInference for sketch functions

2020-05-29 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-23579:
---

 Summary: Introduce ReturnTypeInference for sketch functions
 Key: HIVE-23579
 URL: https://issues.apache.org/jira/browse/HIVE-23579
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich


Currently all sketch function's return type is processed thru the UDF api - 
which is not the best.

A better approach would be to somehow tie in the GenericUDF#initialize method



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23578) Collect ignored tests

2020-05-29 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-23578:
---

 Summary: Collect ignored tests
 Key: HIVE-23578
 URL: https://issues.apache.org/jira/browse/HIVE-23578
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Replace ptest with hive-test-kube

2020-05-29 Thread Zoltan Haindrich

Hey all!

The patch is now in master - so every new PR or a push on it will trigger a new 
run.

Please decide which one would you like to use - open a PR to see the new one work...or upload a patch file to the jira - but please don't do both; because in that case 2 
execution will happen.


The job execution time(2-4 hours) of a single run is a bit higher than the 
usual on the ptest server - this is mostly to increase throughput.

The patch also disabled a set of tests; I will send the full list of skipped 
tests shortly.

cheers,
Zoltan


On 5/27/20 1:50 PM, Zoltan Haindrich wrote:

Hello all!

The new stuff is ready to be switched on-to. It needs to be merged into master 
- and after that anyone who opens a PR will get a run by the new HiveQA infra.
I propose to run the 2 systems side-by-side for some time - the regular master 
builds will start; and we will see how frequently that is polluted by flaky 
tests.

Note that the current patch also disables around ~25 more tests to increase stability - to get a better overview about the disabled tests I think the "direction of the 
information flow" should be altered; what I mean by that is: instead of just throwing in a jira for "disable test x" and opening a new one like "fix test x"; only open the 
latter and place the jira reference into the ignore message; meanwhile also add a regular report about the actually disabled tests - so people who do know about the 
importance of a particular test can get involved.


Note: the builds.apache.org instance will be shutdown somewhere in the future as well...but I think the new one is a good-enough alternative to not have to migrate the 
Hive-precommit job over to https://ci-hadoop.apache.org/.


http://34.66.156.144:8080/job/hive-precommit/job/PR-948/5/
https://issues.apache.org/jira/browse/HIVE-22942
https://github.com/apache/hive/pull/948/files

cheers,
Zoltan

On 5/18/20 1:42 PM, Zoltan Haindrich wrote:

Hey!

On 5/18/20 11:51 AM, Zoltan Chovan wrote:

Thank you for all of your efforts, this looks really promising. With moving
to github PRs, would that also mean that we move away from the reviewboard
for code review?
I didn't thinked about that. I think using github's review interface will remain optional, because both review systems has there own strong points - I wouldn't force 
anyone to use one over the other. (For some patches reviewboard is much better; because it's able to track content moves a bit better than github. - meanwhile github has 
a small feature that enables to mark files as reviewed)
As a matter of fact we had sometimes patches on the jira's which never had neither an RB or a PR to review them - having a PR there at least will make it easier for 
reviewers to comment.



Also, what happens if a PR is updated? Will the tests run for both or just
for the latest version?
It will trigger a new build - if there is already a build in progress that will prevent a new build from starting until it finishes...and there is also a 5 builds/day 
limit; which might induce some wait.


cheers,
Zoltan



Regards,
Zoltan

On Sun, May 17, 2020 at 10:51 PM Zoltan Haindrich  wrote:


Hello all!

The proposed system have become more stable lately - and I think I've
solved a few sources of flakiness.
To be really usable I also wanted to add a way to dynamically
enable/disable a set of tests (for example the replication tests take ~7
hours to execute from the total of 24
hours - and they are also a bit unstable, so not running them when not
neccesary would be beneficial in multiple ways) - but to do this the best
would be to throw in
junit5; unfortunately the current ptest installation uses maven 3.0.5
which doesn't like these kind of things - so instead of hacking a fix for
that I've removed it
from the dev branch for now.

I would like to propose to start an evaluation phase of the new test
procedures(INFRA-20269)
The process would look something like this:
* someone opens a PR - the tests will be run on the changes
* on every active branches the tests will run from time to time
    * this will produce a bunch of test runs on the master branch as well ;
which will show how well the tests behave on the master branch without any
patches
* runs on branches (PRs or active development branches(eg:master)) will be
rate limited to 5 builds/day
* at most ~4 builds at a time - to maximize resource usage
* turnaround time for a build is right now 2 hours - which I feel like a
balanced choice between speed/response time

Possible future benefits:
* toggle features using github tags
* optional testgroups (metastore/replication) tests
* ability to run the metastore verification tests
* possibility to add smoke tests

To enable this I will have to finish the HIVE-22942 ticket - beyond the
new Jenkinsfile which defines the full logic;
although I've sinked a lot of time into fixing all kind of flaky tests I
would would like to disable around ~25 tests.

I also would like to propose a method to verify the stability of a 

Re: Review Request 72553: HIVE-23555 Cancel compaction jobs when hive.compactor.worker.timeout is reached

2020-05-29 Thread Laszlo Pinter via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72553/#review220913
---


Ship it!




Ship It!

- Laszlo Pinter


On May 28, 2020, 8:58 a.m., Peter Vary wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72553/
> ---
> 
> (Updated May 28, 2020, 8:58 a.m.)
> 
> 
> Review request for hive, Karen Coppage and Laszlo Pinter.
> 
> 
> Bugs: HIVE-23555
> https://issues.apache.org/jira/browse/HIVE-23555
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Run the actual execution in a new thread, and use Future.get with timeout
> 
> 
> Diffs
> -
> 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  569de706df 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorTestUtil.java
>  e70d8783bc 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  32fe535b2b 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java 
> ecaad509ed 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java 5fa3d9ad42 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorThread.java 
> b378d40964 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> fa2ede3738 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java
>  aa258b331f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/RemoteCompactorThread.java
>  4235184fec 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java 8180adcd66 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java 366282a30f 
>   ql/src/test/org/apache/hadoop/hive/ql/TxnCommandsBaseForTests.java 
> 3ff68a3c7e 
>   ql/src/test/org/apache/hadoop/hive/ql/stats/TestStatsUpdaterThread.java 
> 84827d1604 
>   ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/CompactorTest.java 
> 9a9ab53fcc 
>   ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestWorker.java 
> 443f982d66 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  e20fdaf03d 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreThread.java
>  ea6155200c 
>   streaming/src/test/org/apache/hive/streaming/TestStreaming.java 6101caac66 
> 
> 
> Diff: https://reviews.apache.org/r/72553/diff/2/
> 
> 
> Testing
> ---
> 
> Created unit tests to check the timeout functionality.
> 
> 
> Thanks,
> 
> Peter Vary
> 
>



[jira] [Created] (HIVE-23577) Utility to generate/manage delegation token for Hive Metastore.

2020-05-29 Thread Dharmesh Jain (Jira)
Dharmesh Jain created HIVE-23577:


 Summary: Utility to generate/manage delegation token for Hive 
Metastore.
 Key: HIVE-23577
 URL: https://issues.apache.org/jira/browse/HIVE-23577
 Project: Hive
  Issue Type: Bug
  Components: Authentication, Metastore, Security, Standalone Metastore
Affects Versions: 3.0.0
 Environment: Secure(Kerberos enabled) environment.
Reporter: Dharmesh Jain


Create a utility to generate/manage delegation token for Hivemetastore on the 
same line of DelegationTokenFetcher for HDFS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23576) Getting partition of type int from metastore sometimes fail on cast error

2020-05-29 Thread Lev Katzav (Jira)
Lev Katzav created HIVE-23576:
-

 Summary: Getting partition of type int from metastore sometimes 
fail on cast error
 Key: HIVE-23576
 URL: https://issues.apache.org/jira/browse/HIVE-23576
 Project: Hive
  Issue Type: Bug
  Components: Hive, Standalone Metastore
Affects Versions: 3.1.2
 Environment: metastore db - postgres (tried on 9.3 and 11.5)
Reporter: Lev Katzav


+pgiven the following situation:+

there are 2 tables (in db "intpartitionbugtest"), each with a few rows:
 # *test_table_int_1* partitioned by *y* of type *int*
 # *test_table_string_1* partitioned by *x* of type *string*

here is the output of the following query on the metastore db:
{code:sql}
select
"PARTITIONS"."PART_ID",
"TBLS"."TBL_NAME",
"FILTER0"."PART_KEY_VAL",
"PART_NAME"
from
"PARTITIONS"
inner join "TBLS" on
"PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID"
inner join "DBS" on
"TBLS"."DB_ID" = "DBS"."DB_ID"
inner join "PARTITION_KEY_VALS" "FILTER0" on
"FILTER0"."PART_ID" = "PARTITIONS"."PART_ID"
{code}
 

!image-2020-05-29-14-15-44-756.png!

+the problem+

when running a hive query on the table *test_table_int_1* that filters on *y=1*
 sometimes the following exception will happen on the metastore

 
{code:java}
javax.jdo.JDODataStoreException: Error executing SQL query "select 
"PARTITIONS"."PART_ID" from "PARTITIONS"  inner join "TBLS" on 
"PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ?   inner 
join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID"  and "DBS"."NAME" = ? inner 
join "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = 
"PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 where "DBS"."CTLG_NAME" 
= ?  and (((case when "FILTER0"."PART_KEY_VAL" <> ? then 
cast("FILTER0"."PART_KEY_VAL" as decimal(21,0)) else null end) = ?))".
at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543)
 ~[datanucleus-api-jdo-4.2.4.jar:?]
at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:391) 
~[datanucleus-api-jdo-4.2.4.jar:?]
at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:267) 
~[datanucleus-api-jdo-4.2.4.jar:?]
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.executeWithArray(MetaStoreDirectSql.java:2003)
 [hive-exec-3.1.2.jar:3.1.2]
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:593)
 [hive-exec-3.1.2.jar:3.1.2]
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:481)
 [hive-exec-3.1.2.jar:3.1.2]
at 
org.apache.hadoop.hive.metastore.ObjectStore$11.getSqlResult(ObjectStore.java:3853)
 [hive-exec-3.1.2.jar:3.1.2]
at 
org.apache.hadoop.hive.metastore.ObjectStore$11.getSqlResult(ObjectStore.java:3843)
 [hive-exec-3.1.2.jar:3.1.2]
at 
org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:3577)
 [hive-exec-3.1.2.jar:3.1.2]
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:3861)
 [hive-exec-3.1.2.jar:3.1.2]
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:3516)
 [hive-exec-3.1.2.jar:3.1.2]
at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source) ~[?:?]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_112]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112]
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) 
[hive-exec-3.1.2.jar:3.1.2]
at com.sun.proxy.$Proxy28.getPartitionsByFilter(Unknown Source) [?:?]
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:5883)
 [hive-exec-3.1.2.jar:3.1.2]
at sun.reflect.GeneratedMethodAccessor69.invoke(Unknown Source) ~[?:?]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_112]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112]
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
 [hive-exec-3.1.2.jar:3.1.2]
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
 [hive-exec-3.1.2.jar:3.1.2]
at com.sun.proxy.$Proxy30.get_partitions_by_filter(Unknown Source) [?:?]
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_filter.getResult(ThriftHiveMetastore.java:16234)
 [hive-exec-3.1.2.jar:3.1.2]
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_filter.getResult(ThriftHiveMetastore.java:16218)
 

Re: Review Request 72462: MSCK REPAIR cannot discover partitions with upper case directory names

2020-05-29 Thread Sankar Hariappan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72462/#review220910
---


Ship it!




Ship It!

- Sankar Hariappan


On May 27, 2020, 5:02 a.m., Adesh Rao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72462/
> ---
> 
> (Updated May 27, 2020, 5:02 a.m.)
> 
> 
> Review request for hive and Sankar Hariappan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The fix converts partition keys to lowercase present in hdfs directory, but 
> store the hdfs directory as is for partition path.
> 
> 
> Diffs
> -
> 
>   itests/src/test/resources/testconfiguration.properties 92ae8c28e8 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/TestMsckCreatePartitionsInBatches.java
>  7821f40a82 
>   ql/src/test/queries/clientnegative/msck_repair_5.q PRE-CREATION 
>   ql/src/test/queries/clientnegative/msck_repair_6.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/msck_repair_4.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/msck_repair_5.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/msck_repair_6.q PRE-CREATION 
>   ql/src/test/results/clientnegative/msck_repair_5.q.out PRE-CREATION 
>   ql/src/test/results/clientnegative/msck_repair_6.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/msck_repair_4.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/msck_repair_5.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/msck_repair_6.q.out PRE-CREATION 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/CheckResult.java
>  5287f47e21 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
>  6f4400a8ef 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
>  f4e109d1b0 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
>  92d10cd0e1 
> 
> 
> Diff: https://reviews.apache.org/r/72462/diff/6/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Adesh Rao
> 
>