[jira] [Created] (HIVE-23530) Use SQL functions instead of compute_stats UDAF to compute column statistics

2020-05-21 Thread Jesus Camacho Rodriguez (Jira)
Jesus Camacho Rodriguez created HIVE-23530:
--

 Summary: Use SQL functions instead of compute_stats UDAF to 
compute column statistics
 Key: HIVE-23530
 URL: https://issues.apache.org/jira/browse/HIVE-23530
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Currently we compute column statistics by relying on the {{compute_stats}} 
UDAF. For instance, for a given table {{tbl}}, the query to compute statistics 
for columns is translated internally into:
{code}
SELECT compute_stats(c1),
   compute_stats(c2),
   ...
FROM tbl;
{code}
{{compute_stats}} produces data for the stats available for each column type, 
e.g., struct<"max":long,"min":long,"countnulls":long,...>.

This issue is to produce a query that relies purely on SQL functions instead:
{code}
SELECT max(c1), min(c1), count(case when c1 is null then 1 else null end),
   ...
FROM tbl;
{code}

This will allow us to deprecate the {{compute_stats}} UDAF since it mostly 
duplicates functionality found in those other functions. Additionally, many of 
those functions already provide a vectorized implementation so the approach 
could potentially improve the performance of column stats collection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23529) CTAS is broken for uniontype when row_deserialize

2020-05-21 Thread Mustafa Iman (Jira)
Mustafa Iman created HIVE-23529:
---

 Summary: CTAS is broken for uniontype when row_deserialize
 Key: HIVE-23529
 URL: https://issues.apache.org/jira/browse/HIVE-23529
 Project: Hive
  Issue Type: Bug
Reporter: Mustafa Iman
Assignee: Mustafa Iman


CTAS queries fail when there is a uniontype in source table and 
hive.vectorized.use.vector.serde.deserialize=false.

ObjectInspectorUtils.copyToStandardObject in ROW_DESERIALIZE path extracts the 
value from union type. However, VectorAssignRow expects a StandardUnion object 
causing ClassCastException for any CTAS query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23528) TestJdbcWithServiceDiscovery is unstable

2020-05-21 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-23528:
---

 Summary: TestJdbcWithServiceDiscovery is unstable
 Key: HIVE-23528
 URL: https://issues.apache.org/jira/browse/HIVE-23528
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


{code}
Error Message
tExecute
Stacktrace
java.lang.AssertionError: tExecute
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertNotNull(Assert.java:621)
at 
org.apache.hive.jdbc.TestJdbcWithServiceDiscovery.testKillQueryWithDifferentServer(TestJdbcWithServiceDiscovery.java:271)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)
Standard Outpu
{coide}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72532: HIVE-23495 AcidUtils.getAcidState cleanup

2020-05-21 Thread Peter Varga via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72532/
---

(Updated May 21, 2020, 1:23 p.m.)


Review request for hive, Karen Coppage and Peter Vary.


Repository: hive-git


Description
---

since HIVE-21225 there are two redundant implementation of the 
AcidUtils.getAcidState.

The previous implementation (without the recursive listing) can be removed.

Also the performance can be improved, by removing unnecessary fileStatus calls.


Diffs (updated)
-

  
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
 569de706df 
  
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/mutate/StreamingAssert.java
 86f762e97c 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java bf332bc0b8 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java ca234cfb37 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 1059cb227f 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
16c915959c 
  
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
 598220b0c4 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java 5fa3d9ad42 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
018c73376f 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java fa2ede3738 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MinorQueryCompactor.java 
d83a50f555 
  
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MmMajorQueryCompactor.java 
5e11d8d2d8 
  
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MmMinorQueryCompactor.java 
1bdec7df2d 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java a96cf1e731 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java 366282a30f 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java 9e6d47ebc5 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 
12a15a16eb 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcRawRecordMerger.java 
f63c40a7b5 
  streaming/src/test/org/apache/hive/streaming/TestStreaming.java 6101caac66 


Diff: https://reviews.apache.org/r/72532/diff/2/

Changes: https://reviews.apache.org/r/72532/diff/1-2/


Testing
---


Thanks,

Peter Varga



Review Request 72538: TestMiniLlapLocalCliDriver should be the default driver for q tests

2020-05-21 Thread Miklos Gergely

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72538/
---

Review request for hive, Jesús Camacho Rodríguez and Krisztian Kasa.


Bugs: HIVE-23510
https://issues.apache.org/jira/browse/HIVE-23510


Repository: hive-git


Description
---

Set TestMiniLlapLocalCliDriver as the default driver. For now the few tests 
still processed by TestCliDriver should be marked in the 
testconfiguration.properties, until it is completely eliminated.


Diffs
-

  itests/src/test/resources/testconfiguration.properties e7c3e432ee 
  itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliConfigs.java 
d7f519fac2 
  ql/src/test/results/clientpositive/llap/quotedid_basic_standard.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/quotedid_basic_standard.q.out 2374dc84ea 


Diff: https://reviews.apache.org/r/72538/diff/1/


Testing
---


Thanks,

Miklos Gergely



[jira] [Created] (HIVE-23527) CUME_DIST should be threated as a function returning double

2020-05-21 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-23527:
---

 Summary: CUME_DIST should be threated as a function returning 
double
 Key: HIVE-23527
 URL: https://issues.apache.org/jira/browse/HIVE-23527
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


right now it is handled as integer in calcite - which might cause some trouble 
(at least for me)

Interestingly enough it seems like only percent_rank has double type

https://github.com/apache/hive/blob/b047cfae872244e769f5d1c3c11811d1c49c19ad/ql/src/java/org/apache/hadoop/hive/ql/parse/type/HiveFunctionHelper.java#L392

since this change goes back to HIVE-9133 and seems to me that removing this 
tweak is just works - I'll submit a patch to remove this conditional



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23526) Out of sequence seen in Beeline may swallow the real problem

2020-05-21 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-23526:
--

 Summary: Out of sequence seen in Beeline may swallow the real 
problem 
 Key: HIVE-23526
 URL: https://issues.apache.org/jira/browse/HIVE-23526
 Project: Hive
  Issue Type: Improvement
  Components: Beeline
 Environment: Hive 1.2.2
Reporter: Zhihua Deng


Sometimes we can see 'out of sequence response' message in beeline, for example:

Error: org.apache.thrift.TApplicationException: CloseOperation failed: out of 
sequence response (state=08S01,code=0)
java.sql.SQLException: org.apache.thrift.TApplicationException: CloseOperation 
failed: out of sequence response
at 
org.apache.hive.jdbc.HiveStatement.closeClientOperation(HiveStatement.java:198)
at org.apache.hive.jdbc.HiveStatement.close(HiveStatement.java:217)
at org.apache.hive.beeline.Commands.execute(Commands.java:891)
at org.apache.hive.beeline.Commands.sql(Commands.java:713)
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:976)
at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:816)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:774)
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:487)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:470)

and there is no other usage messages to figured it out, this makes problem 
puzzled as beeline does not have concurrency problem on underlying thrift 
transport.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72528: ValidTxnManager doesn't consider txns opened and committed between snapshot generation and locking when evaluating ValidTxnListState

2020-05-21 Thread Denys Kuzmenko via Review Board


> On May 20, 2020, 3:16 p.m., Peter Varga wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/Driver.java
> > Lines 686 (patched)
> > 
> >
> > I have concerns here, but I am not sure if they are well founded or 
> > not. I think this will break what the outside world thinks of snapshot 
> > isolation. I might have a hypothetical client that inserts lots of data in 
> > a source table and sometimes issue a merge statement from the source to the 
> > target table. They have some requirement that the target table can not have 
> > partial data regarding some property. Example they inserting sales data, 
> > and the target table can not contain half the data of a day, it can either 
> > have all or none. So what the clients does, it will issue the inserts into 
> > the source table synchronously ordered by the date and when it gets to a 
> > next day it issue a merge statement asynchronously and continues to inserts 
> > the data for the next day synchronously. And it might think that it is save 
> > to do so, since the merge statement has a snapshot it will not see the data 
> > inserted afterwards. But with this change it will break.
> > It might not be the best example, since how would the client know when 
> > the snapshot is actually captured. But I am not familiar enough with the 
> > ecosystem, does anything use the Hive by issuing the compile and run 
> > separately? Because there you could be sure before this change, that the 
> > compilation order also meant snapshot order. So summarized, I don't know 
> > what the outside world excepts of the snapshot isolation.
> 
> Denys Kuzmenko wrote:
> insert into source and merge from source into target won't conflict with 
> each other, they touch different tables. Maybe I missing something here...
> 
> Peter Varga wrote:
> My example was not perfect. I don't mean that it will conflict with the 
> insert into the source table. It can conflict with some other client's 
> transaction. My main point is, after the conflict is noticed and you 
> regenerate the snapshot it will starts to read results from transactions that 
> were opened and committed after the original query was compiled, and I'm just 
> trying to figure out, what kinf of problems can it cause, if any. In my 
> example you start to read records inserted later, but what if somebody added 
> a new partition since the compilation, wouldn't it cause problem?
> 
> Denys Kuzmenko wrote:
> probably there might be an issue as we won't create any locks for the 
> newly created partition, however we'll start reading it.
> instead of rollback & retry on Hive side we might consider to just fail 
> and let the user re-try.

however it still leaves the question what happens now in Hive when somebody 
adds a new partition (insert with dynamic partitioning) since the compilation 
(merge insert). I'll test this out.


- Denys


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72528/#review220838
---


On May 19, 2020, 11:19 a.m., Denys Kuzmenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72528/
> ---
> 
> (Updated May 19, 2020, 11:19 a.m.)
> 
> 
> Review request for hive, Jesús Camacho Rodríguez, Peter Varga, and Peter Vary.
> 
> 
> Bugs: HIVE-23503
> https://issues.apache.org/jira/browse/HIVE-23503
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> ValidTxnManager doesn't consider txns opened and committed between snapshot 
> generation and locking when evaluating ValidTxnListState. This cause issues 
> like duplicate insert in case of concurrent merge insert & insert.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java e70c92eef4 
>   ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java a8c83fc504 
>   ql/src/java/org/apache/hadoop/hive/ql/ValidTxnManager.java 7d49c57dda 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 71afcbdc68 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java 
> 0383881acc 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java 
> 600289f837 
>   ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 
> 8a15b7cc5d 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
>  65df9c2ba9 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
>  887d4303f4 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClientPreCatalog.java
>  312936efa8 
>   

Re: Review Request 72528: ValidTxnManager doesn't consider txns opened and committed between snapshot generation and locking when evaluating ValidTxnListState

2020-05-21 Thread Denys Kuzmenko via Review Board


> On May 20, 2020, 3:16 p.m., Peter Varga wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/Driver.java
> > Lines 686 (patched)
> > 
> >
> > I have concerns here, but I am not sure if they are well founded or 
> > not. I think this will break what the outside world thinks of snapshot 
> > isolation. I might have a hypothetical client that inserts lots of data in 
> > a source table and sometimes issue a merge statement from the source to the 
> > target table. They have some requirement that the target table can not have 
> > partial data regarding some property. Example they inserting sales data, 
> > and the target table can not contain half the data of a day, it can either 
> > have all or none. So what the clients does, it will issue the inserts into 
> > the source table synchronously ordered by the date and when it gets to a 
> > next day it issue a merge statement asynchronously and continues to inserts 
> > the data for the next day synchronously. And it might think that it is save 
> > to do so, since the merge statement has a snapshot it will not see the data 
> > inserted afterwards. But with this change it will break.
> > It might not be the best example, since how would the client know when 
> > the snapshot is actually captured. But I am not familiar enough with the 
> > ecosystem, does anything use the Hive by issuing the compile and run 
> > separately? Because there you could be sure before this change, that the 
> > compilation order also meant snapshot order. So summarized, I don't know 
> > what the outside world excepts of the snapshot isolation.
> 
> Denys Kuzmenko wrote:
> insert into source and merge from source into target won't conflict with 
> each other, they touch different tables. Maybe I missing something here...
> 
> Peter Varga wrote:
> My example was not perfect. I don't mean that it will conflict with the 
> insert into the source table. It can conflict with some other client's 
> transaction. My main point is, after the conflict is noticed and you 
> regenerate the snapshot it will starts to read results from transactions that 
> were opened and committed after the original query was compiled, and I'm just 
> trying to figure out, what kinf of problems can it cause, if any. In my 
> example you start to read records inserted later, but what if somebody added 
> a new partition since the compilation, wouldn't it cause problem?

probably there might be an issue as we won't create any locks for the newly 
created partition, however we'll start reading it.
instead of rollback & retry on Hive side we might consider to just fail and let 
the user re-try.


- Denys


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72528/#review220838
---


On May 19, 2020, 11:19 a.m., Denys Kuzmenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72528/
> ---
> 
> (Updated May 19, 2020, 11:19 a.m.)
> 
> 
> Review request for hive, Jesús Camacho Rodríguez, Peter Varga, and Peter Vary.
> 
> 
> Bugs: HIVE-23503
> https://issues.apache.org/jira/browse/HIVE-23503
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> ValidTxnManager doesn't consider txns opened and committed between snapshot 
> generation and locking when evaluating ValidTxnListState. This cause issues 
> like duplicate insert in case of concurrent merge insert & insert.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java e70c92eef4 
>   ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java a8c83fc504 
>   ql/src/java/org/apache/hadoop/hive/ql/ValidTxnManager.java 7d49c57dda 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 71afcbdc68 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java 
> 0383881acc 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java 
> 600289f837 
>   ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 
> 8a15b7cc5d 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
>  65df9c2ba9 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
>  887d4303f4 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClientPreCatalog.java
>  312936efa8 
>   storage-api/src/java/org/apache/hadoop/hive/common/ValidReadTxnList.java 
> b8ff03f9c4 
>   storage-api/src/java/org/apache/hadoop/hive/common/ValidTxnList.java 
> d4c3b09730 
> 
> 
> Diff: https://reviews.apache.org/r/72528/diff/1/
> 
> 
> Testing
> ---
> 
> 

[jira] [Created] (HIVE-23525) TestAcidTxnCleanerService is unstable

2020-05-21 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-23525:
---

 Summary: TestAcidTxnCleanerService is unstable
 Key: HIVE-23525
 URL: https://issues.apache.org/jira/browse/HIVE-23525
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich


from time to time this exception happens

http://34.66.156.144:8080/job/hive-c/7/console

{code}
15:03:41  [INFO] 
15:03:41  [INFO] ---
15:03:41  [INFO]  T E S T S
15:03:41  [INFO] ---
15:03:42  [INFO] Running 
org.apache.hadoop.hive.metastore.txn.TestAcidTxnCleanerService
15:04:10  [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time 
elapsed: 25.582 s <<< FAILURE! - in 
org.apache.hadoop.hive.metastore.txn.TestAcidTxnCleanerService
15:04:10  [ERROR] 
cleansAllCommittedTxns(org.apache.hadoop.hive.metastore.txn.TestAcidTxnCleanerService)
  Time elapsed: 9.952 s  <<< FAILURE!
15:04:10  java.lang.AssertionError: expected:<6> but was:<7>
15:04:10at 
org.apache.hadoop.hive.metastore.txn.TestAcidTxnCleanerService.cleansAllCommittedTxns(TestAcidTxnCleanerService.java:107)
15:04:10  
15:04:10  [INFO] 
15:04:10  [INFO] Results:
15:04:10  [INFO] 
15:04:10  [ERROR] Failures: 
15:04:10  [ERROR]   TestAcidTxnCleanerService.cleansAllCommittedTxns:107 
expected:<6> but was:<7>
15:04:10  [INFO] 
15:04:10  [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0
15:04:10  [INFO] 
15:04:10  [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.20.1:test (default-test) on 
project hive-standalone-metastore-server: There are test failures.
15:04:10  [ERROR] 
15:04:10  [ERROR] Please refer to 
/home/jenkins/agent/workspace/hive-c/standalone-metastore/metastore-server/target/surefire-reports
 for the individual test results.
15:04:10  [ERROR] Please refer to dump files (if any exist) 
[date]-jvmRun[N].dump, [date].dumpstream and [date]-jvmRun[N].dumpstream.
15:04:10  [ERROR] -> [Help 1]
15:04:10  [ERROR] 
15:04:10  [ERROR] To see the full stack trace of the errors, re-run Maven with 
the -e switch.
15:04:10  [ERROR] Re-run Maven using the -X switch to enable full debug logging.
15:04:10  [ERROR] 
15:04:10  [ERROR] For more information about the errors and possible solutions, 
please read the following articles:
15:04:10  [ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23524) TestNewGetSplitsFormat is unstable

2020-05-21 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-23524:
---

 Summary: TestNewGetSplitsFormat is unstable
 Key: HIVE-23524
 URL: https://issues.apache.org/jira/browse/HIVE-23524
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich



http://34.66.156.144:8080/job/hive-c/5/console

{code}
12:05:16  [INFO] ---
12:05:16  [INFO]  T E S T S
12:05:16  [INFO] ---
12:05:16  [INFO] Running org.apache.hive.jdbc.TestNewGetSplitsFormat
12:12:38  [ERROR] Tests run: 9, Failures: 0, Errors: 2, Skipped: 0, Time 
elapsed: 433.083 s <<< FAILURE! - in org.apache.hive.jdbc.TestNewGetSplitsFormat
12:12:38  [ERROR] 
testLlapInputFormatEndToEnd(org.apache.hive.jdbc.TestNewGetSplitsFormat)  Time 
elapsed: 82.662 s  <<< ERROR!
12:12:38  java.io.IOException: java.sql.SQLException: Error while cleaning up 
the server resources
12:12:38at 
org.apache.hadoop.hive.llap.LlapBaseInputFormat.getSplits(LlapBaseInputFormat.java:350)
12:12:38at 
org.apache.hadoop.hive.llap.LlapArrowRowInputFormat.getSplits(LlapArrowRowInputFormat.java:54)
12:12:38at 
org.apache.hive.jdbc.TestNewGetSplitsFormat.processQuery(TestNewGetSplitsFormat.java:87)
12:12:38at 
org.apache.hive.jdbc.BaseJdbcWithMiniLlap.processQuery(BaseJdbcWithMiniLlap.java:676)
12:12:38at 
org.apache.hive.jdbc.BaseJdbcWithMiniLlap.testLlapInputFormatEndToEnd(BaseJdbcWithMiniLlap.java:223)
12:12:38at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
12:12:38at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
12:12:38at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
12:12:38at java.lang.reflect.Method.invoke(Method.java:498)
12:12:38at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
12:12:38at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
12:12:38at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
12:12:38at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
12:12:38at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
12:12:38  Caused by: java.sql.SQLException: Error while cleaning up the server 
resources
12:12:38at 
org.apache.hive.jdbc.HiveConnection.close(HiveConnection.java:1018)
12:12:38at 
org.apache.hadoop.hive.llap.LlapBaseInputFormat.getSplits(LlapBaseInputFormat.java:342)
12:12:38... 13 more
12:12:38  Caused by: org.apache.thrift.transport.TTransportException: 
java.net.SocketTimeoutException: Read timed out
12:12:38at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
12:12:38at 
org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
12:12:38at 
org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:374)
12:12:38at 
org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:451)
12:12:38at 
org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:433)
12:12:38at 
org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
12:12:38at 
org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
12:12:38at 
org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:380)
12:12:38at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:230)
12:12:38at 
org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
12:12:38at 
org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_CloseSession(TCLIService.java:199)
12:12:38at 
org.apache.hive.service.rpc.thrift.TCLIService$Client.CloseSession(TCLIService.java:186)
12:12:38at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
12:12:38at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
12:12:38at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
12:12:38at java.lang.reflect.Method.invoke(Method.java:498)
12:12:38at 
org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1732)
12:12:38at com.sun.proxy.$Proxy139.CloseSession(Unknown Source)
12:12:38at 
org.apache.hive.jdbc.HiveConnection.close(HiveConnection.java:1016)
12:12:38... 14 more
12:12:38  Caused by: java.net.SocketTimeoutException: Read timed out
12:12:38at java.net.SocketInputStream.socketRead0(Native Method)
12:12:38at 
java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
12:12:38at 

[jira] [Created] (HIVE-23523) TestTriggersTezSessionPoolManager sometimes exits the JVM

2020-05-21 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-23523:
---

 Summary: TestTriggersTezSessionPoolManager sometimes exits the JVM
 Key: HIVE-23523
 URL: https://issues.apache.org/jira/browse/HIVE-23523
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich


{code}
12:34:01  [INFO] 
12:34:01  [INFO] ---
12:34:01  [INFO]  T E S T S
12:34:01  [INFO] ---
12:34:01  [INFO] Running org.apache.hive.jdbc.TestTriggersTezSessionPoolManager
12:44:09  [INFO] 
12:44:09  [INFO] Results:
12:44:09  [INFO] 
12:44:09  [INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0
12:44:09  [INFO] 
12:44:09  [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
project hive-it-unit: There are test failures.
12:44:09  [ERROR] 
12:44:09  [ERROR] Please refer to 
/home/jenkins/agent/workspace/hive-c/itests/hive-unit/target/surefire-reports 
for the individual test results.
12:44:09  [ERROR] Please refer to dump files (if any exist) 
[date]-jvmRun[N].dump, [date].dumpstream and [date]-jvmRun[N].dumpstream.
12:44:09  [ERROR] ExecutionException The forked VM terminated without properly 
saying goodbye. VM crash or System.exit called?
12:44:09  [ERROR] Command was /bin/sh -c cd 
/home/jenkins/agent/workspace/hive-c/itests/hive-unit && 
/usr/lib/jvm/zulu-8-amd64/jre/bin/java -Xmx2048m -jar 
/home/jenkins/agent/workspace/hive-c/itests/hive-unit/target/surefire/surefirebooter2756019330500992692.jar
 /home/jenkins/agent/workspace/hive-c/itests/hive-unit/target/surefire 
2020-05-18T10-33-57_988-jvmRun1 surefire6986381633375506092tmp 
surefire_08030518552390892289tmp
12:44:09  [ERROR] Process Exit Code: 0
12:44:09  [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: 
ExecutionException The forked VM terminated without properly saying goodbye. VM 
crash or System.exit called?
12:44:09  [ERROR] Command was /bin/sh -c cd 
/home/jenkins/agent/workspace/hive-c/itests/hive-unit && 
/usr/lib/jvm/zulu-8-amd64/jre/bin/java -Xmx2048m -jar 
/home/jenkins/agent/workspace/hive-c/itests/hive-unit/target/surefire/surefirebooter2756019330500992692.jar
 /home/jenkins/agent/workspace/hive-c/itests/hive-unit/target/surefire 
2020-05-18T10-33-57_988-jvmRun1 surefire6986381633375506092tmp 
surefire_08030518552390892289tmp
12:44:09  [ERROR] Process Exit Code: 0
12:44:09  [ERROR]   at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:494)
12:44:09  [ERROR]   at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:441)
12:44:09  [ERROR]   at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:293)
12:44:09  [ERROR]   at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:245)
12:44:09  [ERROR]   at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1149)
12:44:09  [ERROR]   at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:978)
12:44:09  [ERROR]   at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:854)
12:44:09  [ERROR]   at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)
12:44:09  [ERROR]   at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:210)
12:44:09  [ERROR]   at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:156)
12:44:09  [ERROR]   at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:148)
12:44:09  [ERROR]   at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
12:44:09  [ERROR]   at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
12:44:09  [ERROR]   at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:56)
12:44:09  [ERROR]   at 
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
12:44:09  [ERROR]   at 
org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:305)
12:44:09  [ERROR]   at 
org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:192)
12:44:09  [ERROR]   at 
org.apache.maven.DefaultMaven.execute(DefaultMaven.java:105)
12:44:09  [ERROR]   at 
org.apache.maven.cli.MavenCli.execute(MavenCli.java:957)
12:44:09  [ERROR]   at 
org.apache.maven.cli.MavenCli.doMain(MavenCli.java:289)
12:44:09  [ERROR]   at org.apache.maven.cli.MavenCli.main(MavenCli.java:193)
12:44:09  [ERROR]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
12:44:09  

Re: Review Request 72528: ValidTxnManager doesn't consider txns opened and committed between snapshot generation and locking when evaluating ValidTxnListState

2020-05-21 Thread Peter Varga via Review Board


> On May 20, 2020, 3:16 p.m., Peter Varga wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/Driver.java
> > Lines 686 (patched)
> > 
> >
> > I have concerns here, but I am not sure if they are well founded or 
> > not. I think this will break what the outside world thinks of snapshot 
> > isolation. I might have a hypothetical client that inserts lots of data in 
> > a source table and sometimes issue a merge statement from the source to the 
> > target table. They have some requirement that the target table can not have 
> > partial data regarding some property. Example they inserting sales data, 
> > and the target table can not contain half the data of a day, it can either 
> > have all or none. So what the clients does, it will issue the inserts into 
> > the source table synchronously ordered by the date and when it gets to a 
> > next day it issue a merge statement asynchronously and continues to inserts 
> > the data for the next day synchronously. And it might think that it is save 
> > to do so, since the merge statement has a snapshot it will not see the data 
> > inserted afterwards. But with this change it will break.
> > It might not be the best example, since how would the client know when 
> > the snapshot is actually captured. But I am not familiar enough with the 
> > ecosystem, does anything use the Hive by issuing the compile and run 
> > separately? Because there you could be sure before this change, that the 
> > compilation order also meant snapshot order. So summarized, I don't know 
> > what the outside world excepts of the snapshot isolation.
> 
> Denys Kuzmenko wrote:
> insert into source and merge from source into target won't conflict with 
> each other, they touch different tables. Maybe I missing something here...

My example was not perfect. I don't mean that it will conflict with the insert 
into the source table. It can conflict with some other client's transaction. My 
main point is, after the conflict is noticed and you regenerate the snapshot it 
will starts to read results from transactions that were opened and committed 
after the original query was compiled, and I'm just trying to figure out, what 
kinf of problems can it cause, if any. In my example you start to read records 
inserted later, but what if somebody added a new partition since the 
compilation, wouldn't it cause problem?


- Peter


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72528/#review220838
---


On May 19, 2020, 11:19 a.m., Denys Kuzmenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72528/
> ---
> 
> (Updated May 19, 2020, 11:19 a.m.)
> 
> 
> Review request for hive, Jesús Camacho Rodríguez, Peter Varga, and Peter Vary.
> 
> 
> Bugs: HIVE-23503
> https://issues.apache.org/jira/browse/HIVE-23503
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> ValidTxnManager doesn't consider txns opened and committed between snapshot 
> generation and locking when evaluating ValidTxnListState. This cause issues 
> like duplicate insert in case of concurrent merge insert & insert.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java e70c92eef4 
>   ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java a8c83fc504 
>   ql/src/java/org/apache/hadoop/hive/ql/ValidTxnManager.java 7d49c57dda 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 71afcbdc68 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java 
> 0383881acc 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java 
> 600289f837 
>   ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 
> 8a15b7cc5d 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
>  65df9c2ba9 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
>  887d4303f4 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClientPreCatalog.java
>  312936efa8 
>   storage-api/src/java/org/apache/hadoop/hive/common/ValidReadTxnList.java 
> b8ff03f9c4 
>   storage-api/src/java/org/apache/hadoop/hive/common/ValidTxnList.java 
> d4c3b09730 
> 
> 
> Diff: https://reviews.apache.org/r/72528/diff/1/
> 
> 
> Testing
> ---
> 
> DbTxnManager tests.
> 
> 
> Thanks,
> 
> Denys Kuzmenko
> 
>



[jira] [Created] (HIVE-23522) repl bootstrap load: optimize partition loads

2020-05-21 Thread Anishek Agarwal (Jira)
Anishek Agarwal created HIVE-23522:
--

 Summary: repl bootstrap load: optimize partition loads 
 Key: HIVE-23522
 URL: https://issues.apache.org/jira/browse/HIVE-23522
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Anishek Agarwal


when "hive.repl.dump.metadata.only.for.external.table" = true is used in repl 
dump, we only dump metadata for external tables. partitioned external tables 
currently on the load side load one partition at a time, even though HMS has an 
api for bulk partition update. we should for such scenarios use the bulk api. 
this will significantly improve performance during bootstrap phase. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)