[jira] [Created] (HIVE-23557) HiveVolcanoPlanner works with 2 different RelMetadataQuery types which may lead to problems

2020-05-27 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-23557:
---

 Summary: HiveVolcanoPlanner works with 2 different 
RelMetadataQuery types which may lead to problems
 Key: HIVE-23557
 URL: https://issues.apache.org/jira/browse/HIVE-23557
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


switching to trace mode uncovers Volcano estimation issues; which lead to the 
usage of multiple (different) metadataquery sources ; the actual assertion is 
based on the fact that the new metadataquery gives a better estimation for the 
same rel.

the planner switches over to the default metadataquery provider after some 
time; however this switch may cause trouble - as after this switch the already 
cached internal visited nodesets become incorrect.

https://github.com/apache/hive/blob/f49d257c560c81c38259e95023b20c544acb4d10/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L2234




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23556) Support hive.metastore.limit.partition.request for get_partitions_ps

2020-05-27 Thread Toshihiko Uchida (Jira)
Toshihiko Uchida created HIVE-23556:
---

 Summary: Support hive.metastore.limit.partition.request for 
get_partitions_ps
 Key: HIVE-23556
 URL: https://issues.apache.org/jira/browse/HIVE-23556
 Project: Hive
  Issue Type: Improvement
Reporter: Toshihiko Uchida


HIVE-13884 added the configuration hive.metastore.limit.partition.request to 
limit the number of partitions that can be requested.
Currently, it takes in effect for the following MetaStore APIs
* get_partitions,
* get_partitions_with_auth,
* get_partitions_by_filter,
* get_partitions_spec_by_filter,
* get_partitions_by_expr,

but not for
* get_partitions_ps,
* get_partitions_ps_with_auth.

This issue proposes to apply the configuration also to get_partitions_ps and 
get_partitions_ps_with_auth.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23558) Remove compute_stats UDAF

2020-05-27 Thread Jesus Camacho Rodriguez (Jira)
Jesus Camacho Rodriguez created HIVE-23558:
--

 Summary: Remove compute_stats UDAF
 Key: HIVE-23558
 URL: https://issues.apache.org/jira/browse/HIVE-23558
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


HIVE-23530 replaces its usage completely. This issue is to remove it from Hive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Replace ptest with hive-test-kube

2020-05-27 Thread Zoltan Haindrich

Hello all!

The new stuff is ready to be switched on-to. It needs to be merged into master 
- and after that anyone who opens a PR will get a run by the new HiveQA infra.
I propose to run the 2 systems side-by-side for some time - the regular master 
builds will start; and we will see how frequently that is polluted by flaky 
tests.

Note that the current patch also disables around ~25 more tests to increase stability - to get a better overview about the disabled tests I think the "direction of the 
information flow" should be altered; what I mean by that is: instead of just throwing in a jira for "disable test x" and opening a new one like "fix test x"; only open the 
latter and place the jira reference into the ignore message; meanwhile also add a regular report about the actually disabled tests - so people who do know about the 
importance of a particular test can get involved.


Note: the builds.apache.org instance will be shutdown somewhere in the future as well...but I think the new one is a good-enough alternative to not have to migrate the 
Hive-precommit job over to https://ci-hadoop.apache.org/.


http://34.66.156.144:8080/job/hive-precommit/job/PR-948/5/
https://issues.apache.org/jira/browse/HIVE-22942
https://github.com/apache/hive/pull/948/files

cheers,
Zoltan

On 5/18/20 1:42 PM, Zoltan Haindrich wrote:

Hey!

On 5/18/20 11:51 AM, Zoltan Chovan wrote:

Thank you for all of your efforts, this looks really promising. With moving
to github PRs, would that also mean that we move away from the reviewboard
for code review?
I didn't thinked about that. I think using github's review interface will remain optional, because both review systems has there own strong points - I wouldn't force anyone 
to use one over the other. (For some patches reviewboard is much better; because it's able to track content moves a bit better than github. - meanwhile github has a small 
feature that enables to mark files as reviewed)
As a matter of fact we had sometimes patches on the jira's which never had neither an RB or a PR to review them - having a PR there at least will make it easier for 
reviewers to comment.



Also, what happens if a PR is updated? Will the tests run for both or just
for the latest version?
It will trigger a new build - if there is already a build in progress that will prevent a new build from starting until it finishes...and there is also a 5 builds/day 
limit; which might induce some wait.


cheers,
Zoltan



Regards,
Zoltan

On Sun, May 17, 2020 at 10:51 PM Zoltan Haindrich  wrote:


Hello all!

The proposed system have become more stable lately - and I think I've
solved a few sources of flakiness.
To be really usable I also wanted to add a way to dynamically
enable/disable a set of tests (for example the replication tests take ~7
hours to execute from the total of 24
hours - and they are also a bit unstable, so not running them when not
neccesary would be beneficial in multiple ways) - but to do this the best
would be to throw in
junit5; unfortunately the current ptest installation uses maven 3.0.5
which doesn't like these kind of things - so instead of hacking a fix for
that I've removed it
from the dev branch for now.

I would like to propose to start an evaluation phase of the new test
procedures(INFRA-20269)
The process would look something like this:
* someone opens a PR - the tests will be run on the changes
* on every active branches the tests will run from time to time
    * this will produce a bunch of test runs on the master branch as well ;
which will show how well the tests behave on the master branch without any
patches
* runs on branches (PRs or active development branches(eg:master)) will be
rate limited to 5 builds/day
* at most ~4 builds at a time - to maximize resource usage
* turnaround time for a build is right now 2 hours - which I feel like a
balanced choice between speed/response time

Possible future benefits:
* toggle features using github tags
* optional testgroups (metastore/replication) tests
* ability to run the metastore verification tests
* possibility to add smoke tests

To enable this I will have to finish the HIVE-22942 ticket - beyond the
new Jenkinsfile which defines the full logic;
although I've sinked a lot of time into fixing all kind of flaky tests I
would would like to disable around ~25 tests.

I also would like to propose a method to verify the stability of a single
test: run it a 100 times in series at the same place where the precommit
tests are running.
This will put the bar high enough that only totally stable tests could
satisfy it (a 99% stable test has 36% chance to pass this without being
caught :D)
After this will be in service it could be used to: validate that an
existing test is unstable (before disabling it) - and then used again to
prove that it got fixed during
re-enabling it.

Please let me know what you think!

cheers,
Zoltan



On 4/29/20 4:28 PM, Zoltan Haindrich wrote:

Hey All!

I was planning to replace the ptest stuff 

q test changes

2020-05-27 Thread Miklos Gergely
Hi,

Due to a recent modification the q tests in hive are executed by
TestMiniLlapLocalCliDriver by default, i.e. if they are not declared to be
handled otherwise in the testconfiguration.properties. Also from now on q
tests that are to be executed with TestCliDriver must be declared in the
mr.query.files section of testconfiguration.properties.

Please also note that a check was added to the build process which enforces
from now on that the tests specified in any section
of testconfiguration.properties must be written in lexicographical order.

Thanks

-- 
*Miklós Gergely* | Staff Software Engineer
t. +36 (30) 579-6433 <00>
cloudera.com 
[image: Cloudera] 

[image: Cloudera on Twitter]  [image:
Cloudera on Facebook]  [image: Cloudera
on LinkedIn] 
--


[jira] [Created] (HIVE-23559) Optimise Hive::moveAcidFiles for cloud storage

2020-05-27 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-23559:
---

 Summary: Optimise Hive::moveAcidFiles for cloud storage
 Key: HIVE-23559
 URL: https://issues.apache.org/jira/browse/HIVE-23559
 Project: Hive
  Issue Type: Bug
Reporter: Rajesh Balamohan


[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L4752]

It ends up transferring DELTA, DELETE_DELTA, BASE prefixes sequentially from 
staging to final location.

This causes delays even with simple updates statements, which updates smaller 
number of records in cloud storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Review Request 72553: HIVE-23555 Cancel compaction jobs when hive.compactor.worker.timeout is reached

2020-05-27 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72553/
---

Review request for hive, Karen Coppage and Laszlo Pinter.


Bugs: HIVE-23555
https://issues.apache.org/jira/browse/HIVE-23555


Repository: hive-git


Description
---

Run the actual execution in a new thread, and use Future.get with timeout


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorThread.java 
b378d40964 
  
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/RemoteCompactorThread.java 
4235184fec 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java 8180adcd66 
  ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/CompactorTest.java 
9a9ab53fcc 
  ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestWorker.java 
443f982d66 


Diff: https://reviews.apache.org/r/72553/diff/1/


Testing
---

Created unit tests to check the timeout functionality.


Thanks,

Peter Vary



Re: Review Request 72526: HIVE-23493

2020-05-27 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72526/#review220896
---




ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinOptimization.java
Lines 54 (patched)


Cool comment! :)



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinOptimization.java
Lines 99 (patched)


Could you add some more comments within this method (even if they are 
short) just going over the overall workflow?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinOptimization.java
Lines 104 (patched)


What if root has multiple inputs? This may not be supported right now, but 
to ensure correctness, add a check on the number of inputs and bail out 
otherwise.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinOptimization.java
Lines 129 (patched)


_SourceTable_ class name is a bit misleading. Is this another mapping? 
_SourceTableMapping_?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinOptimization.java
Lines 204 (patched)


One way to extend this to more complex expressions is to use a RexVisitor 
to gather the RexTableInputRef(s) in the expression.
Then if they all come from the same table, you execute logic similar to 
what you are doing now.
However, the _mapping_ will be from position to RexNode.
When you are checking whether the keys are present, you would need to check 
for the TABLE_INPUT_REF RexNodes in those targets.
Finally, a RexShuttle can be used to replace RexTableInputRefs by the 
corresponding RexInputRef after additional join.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinOptimization.java
Lines 226 (patched)


It seems problematic do skip the _cast_ here. Can't this potentially 
produce incorrect results?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinOptimization.java
Lines 250 (patched)


Nit. Renaming RelNodes to _leftInput_ and _rightInput_ may make things more 
clear.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinOptimization.java
Lines 283 (patched)


Doesn't it make things easier if we just keep a MultiMap here? It seems you 
could potentially get rid of the bit sets too.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinOptimization.java
Lines 284 (patched)


These bitsets as well as _targetFields_ below are not actually immutable 
(you are calling _set_ on them). Iirc the _set_ operation actually instantiates 
a new bitset under the hood and returns it, hence I believe this may not be 
correct (the actual bitset is not changing). Even if it does, it will not very 
efficient.
If you are keeping mutable bitsets, the better choice is probably to use 
the mutable BitSet.
Otherwise, the best option is to try to make this whole class inmmutable, 
e.g., use a ImmutableBitSet.builder in _getExpressionLineageOf_ and then pass 
an immutablebitset in the constructor.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinRule.java
Lines 35 (patched)


We should be able to reuse some of the code below if this rule extends 
_HiveRelFieldTrimmerRule_. You can have a trimmer instance in the super class, 
pass it in the constructor. Then this subclass will only pass the trimmer 
implementation while reusing rest of logic. Does it make sense?



ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
Lines 2389 (patched)


Unless I am mistaken, this will execute _HiveCardinalityPreservingJoinRule_ 
at least twice.
Although maybe this is obvious, is _HiveProjectMerge_ actually needed?
If it is, a better choice is to use a different partial program for it.


- Jesús Camacho Rodríguez


On May 26, 2020, 6:29 p.m., Krisztian Kasa wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72526/
> ---
> 
> (Updated May 26, 2020, 6:29 p.m.)
> 
> 

Review Request 72544: Remove hcatalog streaming

2020-05-27 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72544/
---

Review request for hive and prasanthj.


Bugs: HIVE-19926
https://issues.apache.org/jira/browse/HIVE-19926


Repository: hive-git


Description
---

Remove hcatalog streaming


Diffs
-

  hcatalog/pom.xml c1506d8dc2 
  hcatalog/streaming/pom.xml fef5bf75ab 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java
 bc99b6c824 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/ConnectionError.java
 897b82634b 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/DelimitedInputWriter.java
 85c3429329 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/HeartBeatFailure.java
 5d9b763c84 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/HiveEndPoint.java
 dc8c6636bc 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/ImpersonationFailed.java
 7932077595 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/InvalidColumn.java
 a7af608d76 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/InvalidPartition.java
 82b6db8ddd 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/InvalidTable.java
 9772c5c7bd 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/InvalidTrasactionState.java
 921d4dac9e 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/PartitionCreationFailed.java
 1913d33542 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/QueryFailedException.java
 f78be7f7ff 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/RecordWriter.java
 0f3c0bcfea 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/SerializationError.java
 33d2ceffe7 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/StreamingConnection.java
 3af9aed36b 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/StreamingException.java
 421eaf079e 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/StreamingIOFailure.java
 247424f185 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/StrictJsonWriter.java
 d588f71a5c 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/StrictRegexWriter.java
 28406d38e8 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/TransactionBatch.java
 96aae02170 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/TransactionBatchUnAvailable.java
 ae3587edc9 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/TransactionError.java
 d438447b18 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/HiveConfFactory.java
 ebe032d705 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/UgiMetaStoreClientFactory.java
 615fc1a751 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/client/AcidTable.java
 40de497b08 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/client/AcidTableSerializer.java
 43ac527e79 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/client/ClientException.java
 206a0ba54e 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/client/ConnectionException.java
 2b3b299f54 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/client/MutatorClient.java
 11664f6a7d 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/client/MutatorClientBuilder.java
 1575d8d4a4 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/client/TableType.java
 02c9e69605 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/client/Transaction.java
 e1c6735d6d 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/client/TransactionException.java
 21cffa12a9 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/client/lock/HeartbeatFactory.java
 ba0fa1e149 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/client/lock/HeartbeatTimerTask.java
 81f99de390 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/client/lock/Lock.java
 88970da3a5 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/client/lock/LockException.java
 bce232a883 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/client/lock/LockFailureListener.java
 a3845ea784 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/doc-files/system-overview.dot
 c5a8dbdf1c 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/package.html
 7bc75c0ee0 
  
hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/worker/BucketIdException.java
 040fce3ec7 
  

[jira] [Created] (HIVE-23552) TestMiniLlapCliDriver.testCliDriver[merge_test_dummy_operator] is unstable

2020-05-27 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-23552:
---

 Summary: 
TestMiniLlapCliDriver.testCliDriver[merge_test_dummy_operator] is unstable
 Key: HIVE-23552
 URL: https://issues.apache.org/jira/browse/HIVE-23552
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23553) Bump ORC version

2020-05-27 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-23553:
-

 Summary: Bump ORC version
 Key: HIVE-23553
 URL: https://issues.apache.org/jira/browse/HIVE-23553
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


Make sure we are using one of the more recent orc.versions that include 
row-filtering (ORC-577 , ORC-622)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23554) [LLAP] ReadPipeline support for ColumnVectorBatch with FilterContext

2020-05-27 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-23554:
-

 Summary: [LLAP] ReadPipeline support for ColumnVectorBatch with 
FilterContext
 Key: HIVE-23554
 URL: https://issues.apache.org/jira/browse/HIVE-23554
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


Currently the readPipeline in LLAP supports consuming ColumnVectorBatches.
As each batch can be now tied with a Filter (HIVE-23215) we should update the 
pipeline to consume BatchWrappers of ColumnVectorBatch and a Filter instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23555) Cancel compaction jobs when hive.compactor.worker.timeout is reached

2020-05-27 Thread Peter Vary (Jira)
Peter Vary created HIVE-23555:
-

 Summary: Cancel compaction jobs when hive.compactor.worker.timeout 
is reached
 Key: HIVE-23555
 URL: https://issues.apache.org/jira/browse/HIVE-23555
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Reporter: Peter Vary
Assignee: Peter Vary


Currently when a compactor worker thread is stuck, or working too long on a 
compaction the the initiator might decide to start a new compaction because of 
a timeout, but old worker might still wait for the results of the job.
It would be good to cancel the worker as well after the timeout is reached



--
This message was sent by Atlassian Jira
(v8.3.4#803005)