date:20191028

Submitting a Patch Against Branch 3

2019-10-28 Thread David Mollitor

Hello Gang,

I have attempted a couple of times now to submit a patch for branch-3 of
Hive.  None of my attempts have been successful and I'm not sure why they
are failing.  The following JIRA is a very trivial change but YETUS won't
build it.

Any thoughts?

https://issues.apache.org/jira/browse/HIVE-18415

Thanks!

[jira] [Created] (HIVE-22418) auto_join0.q and auto_join1.q tests can break cbo_limit.q

2019-10-28 Thread Jira

László Bodor created HIVE-22418:
---

 Summary: auto_join0.q and auto_join1.q tests can break cbo_limit.q
 Key: HIVE-22418
 URL: https://issues.apache.org/jira/browse/HIVE-22418
 Project: Hive
  Issue Type: Bug
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Review Request 71671: HIVE-22401: Refactor CompactorMR

2019-10-28 Thread Peter Vary via Review Board


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71671/#review218420
---



Thanks for the patch.
A nit and a question.


ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Line 239 (original), 220 (patched)


Could we remove the extra spaces if we are here, please?



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MmMajorQueryCompactor.java
Lines 85-87 (patched)


Maybe this should be checked outside? This is something general? Or am I 
mistaken?


- Peter Vary


On okt. 24, 2019, 3:57 du, Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71671/
> ---
> 
> (Updated okt. 24, 2019, 3:57 du)
> 
> 
> Review request for hive, Marta Kuczora, Peter Vary, and Adam Szita.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-22401: Refactor CompactorMR
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> 0f1579aa542f83b68f2efc92e08e6c0a32bd113d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MajorQueryCompactor.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MmMajorQueryCompactor.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/QueryCompactor.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/QueryCompactorFactory.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/71671/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>

[jira] [Created] (HIVE-22417) Remove stringifyException from MetaStore

2019-10-28 Thread David Mollitor (Jira)

David Mollitor created HIVE-22417:
-

 Summary: Remove stringifyException from MetaStore
 Key: HIVE-22417
 URL: https://issues.apache.org/jira/browse/HIVE-22417
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore, Standalone Metastore
Affects Versions: 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22416) MR-related operation logs missing when parallel execution is enabled

2019-10-28 Thread Karen Coppage (Jira)

Karen Coppage created HIVE-22416:


 Summary:  MR-related operation logs missing when parallel 
execution is enabled
 Key: HIVE-22416
 URL: https://issues.apache.org/jira/browse/HIVE-22416
 Project: Hive
  Issue Type: Bug
Reporter: Karen Coppage
Assignee: Karen Coppage


Repro steps:
 1. Happy path, parallel execution disabled
{code:java}
0: jdbc:hive2://localhost:1> set hive.exec.parallel=false;
No rows affected (0.023 seconds)
0: jdbc:hive2://localhost:1> select count  (*) from t1;
INFO  : Compiling 
command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d):
 select count  (*) from t1
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:c0, 
type:bigint, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d);
 Time taken: 0.309 seconds
INFO  : Executing 
command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d):
 select count  (*) from t1
WARN  : 
INFO  : Query ID = 
karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks determined at compile time: 1
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=
DEBUG : Configuring job job_local495362389_0008 with 
file:/tmp/hadoop/mapred/staging/karencoppage495362389/.staging/job_local495362389_0008
 as the submit dir
DEBUG : adding the following namenodes' delegation tokens:[file:///]
DEBUG : Creating splits at 
file:/tmp/hadoop/mapred/staging/karencoppage495362389/.staging/job_local495362389_0008
INFO  : number of splits:0
INFO  : Submitting tokens for job: job_local495362389_0008
INFO  : Executing with tokens: []
INFO  : The url to track the job: http://localhost:8080/
INFO  : Job running in-process (local Hadoop)
INFO  : 2019-10-28 15:26:22,537 Stage-1 map = 0%,  reduce = 100%
INFO  : Ended Job = job_local495362389_0008
INFO  : MapReduce Jobs Launched: 
INFO  : Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 0 msec
INFO  : Completed executing 
command(queryId=karencoppage_20191028152610_a26c25e1-9834-446a-9a56-c676cb693e7d);
 Time taken: 6.497 seconds
INFO  : OK
DEBUG : Shutting down query select count  (*) from t1
+-+
| c0  |
+-+
| 0   |
+-+
1 row selected (11.874 seconds)
{code}
2. Faulty path, parallel execution enabled
{code:java}
0: jdbc:hive2://localhost:1> set 
hive.server2.logging.operation.level=EXECUTION;
No rows affected (0.236 seconds)
0: jdbc:hive2://localhost:1> set hive.exec.parallel=true;
No rows affected (0.01 seconds)
0: jdbc:hive2://localhost:1> select count  (*) from t1;
INFO  : Compiling 
command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77):
 select count  (*) from t1
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:c0, 
type:bigint, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77);
 Time taken: 4.707 seconds
INFO  : Executing 
command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77):
 select count  (*) from t1
WARN  : 
INFO  : Query ID = 
karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in parallel
INFO  : MapReduce Jobs Launched: 
INFO  : Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 0 msec
INFO  : Completed executing 
command(queryId=karencoppage_20191028155346_4e7b793b-654e-4d69-b588-f3f0d3ae0c77);
 Time taken: 44.577 seconds
INFO  : OK
DEBUG : Shutting down query select count  (*) from t1
+-+
| c0  |
+-+
| 0   |
+-+
1 row selected (54.665 seconds)
{code}
The issue is that Log4J stores the session ID and query ID in some atomic 
thread metadata (org.apache.logging.log4j.ThreadContext.getImmutableContext()). 
If the queryId is missing from this metadata, then the RoutingAppender (which 
is defined programmatically in LogDivertAppender) will route the log to a 
NullAppender, which logs nothing. If the queryId is present, then the 
RoutingAppender routes the event to the "query-appender" logger, which will log 
the line in the operation log/Beeline. This is not happening in a 
multi-threaded context since new threads created for parallel query execution 
do not have the queryId/sessionId

Re: Review Request 71589: Create read-only transactions

2019-10-28 Thread Denys Kuzmenko via Review Board


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71589/
---

(Updated Oct. 28, 2019, 1:40 p.m.)


Review request for hive, Laszlo Pinter and Peter Vary.


Bugs: HIVE-21114
https://issues.apache.org/jira/browse/HIVE-21114


Repository: hive-git


Description
---

With HIVE-21036 we have a way to indicate that a txn is read only.
We should (at least in auto-commit mode) determine if the single stmt is a read 
and mark the txn accordingly.
Then we can optimize TxnHandler.commitTxn() so that it doesn't do any checks in 
write_set etc.

TxnHandler.commitTxn() already starts with lockTransactionRecord(stmt, txnid, 
TXN_OPEN) so it can read the txn type in the same SQL stmt.

HiveOperation only has QUERY, which includes Insert and Select, so this 
requires figuring out how to determine if a query is a SELECT. By the time 
Driver.openTransaction(); is called, we have already parsed the query so there 
should be a way to know if the statement only reads.

For multi-stmt txns (once these are supported) we should allow user to indicate 
that a txn is read-only and then not allow any statements that can make 
modifications in this txn. This should be a different jira.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 91910d1c0c 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java fcf499d53a 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 943aa383bb 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java ac813c8288 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java 1c53426966 
  ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager.java 
cc86afedbf 
  ql/src/test/org/apache/hadoop/hive/ql/parse/TestParseUtils.java PRE-CREATION 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 504e6b12a1 
  
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreTxns.java
 46fc3ed1e0 


Diff: https://reviews.apache.org/r/71589/diff/5/


Testing
---

Unit + manual test


File Attachments (updated)


HIVE-21114.1.patch
  
https://reviews.apache.org/media/uploaded/files/2019/10/10/0929ed4a-17be-4098-8c61-0819a30613fd__HIVE-21114.1.patch
HIVE-21114.5.patch
  
https://reviews.apache.org/media/uploaded/files/2019/10/17/80cbb092-97d6-48d2-b603-24213141cb5e__HIVE-21114.5.patch
HIVE-21114.8.patch
  
https://reviews.apache.org/media/uploaded/files/2019/10/22/b14eedb4-a2f1-4f77-9676-c258b6804b98__HIVE-21114.8.patch
HIVE-21114.8.patch
  
https://reviews.apache.org/media/uploaded/files/2019/10/22/9096f402-3d2e-4cd2-9f85-df1dfeb25863__HIVE-21114.8.patch
HIVE-21114.8.patch
  
https://reviews.apache.org/media/uploaded/files/2019/10/28/a001316c-bcf4-43a2-83fa-7d49183b2a7f__HIVE-21114.8.patch


Thanks,

Denys Kuzmenko

[jira] [Created] (HIVE-22415) Upgrade to Java 11

2019-10-28 Thread David Mollitor (Jira)

David Mollitor created HIVE-22415:
-

 Summary: Upgrade to Java 11
 Key: HIVE-22415
 URL: https://issues.apache.org/jira/browse/HIVE-22415
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22414) Make LLAP CacheTags more memory efficient

2019-10-28 Thread Jira

Ádám Szita created HIVE-22414:
-

 Summary: Make LLAP CacheTags more memory efficient
 Key: HIVE-22414
 URL: https://issues.apache.org/jira/browse/HIVE-22414
 Project: Hive
  Issue Type: Improvement
  Components: llap
Reporter: Ádám Szita
Assignee: Ádám Szita






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22413) Avoid dirty read when reading the ACID table while compaction is running

2019-10-28 Thread Hocheol Park (Jira)

Hocheol Park created HIVE-22413:
---

 Summary: Avoid dirty read when reading the ACID table while 
compaction is running
 Key: HIVE-22413
 URL: https://issues.apache.org/jira/browse/HIVE-22413
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Reporter: Hocheol Park


There is a problem that dirty read occurs when reading the ACID table while 
base or delta directories are being created by the compactor. Especially it is 
highly likely to occur in the S3 storage because the “move” logic of S3 is 
“copy and delete”, and it takes a long time to copy if the size of files are 
large or bucketing count is large.

So here’s the logic to avoid this problem. If “_tmp” prefixed directories are 
existed in the partition directory on the process of listing the child 
directories when reading the ACID table, compare the names of the directory in 
the “_tmp” one and skip it in case of the same. Then it will read the files 
before merging, no difference on the results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22412) StatsUtils throw NPE when explain

2019-10-28 Thread xiepengjie (Jira)

xiepengjie created HIVE-22412:
-

 Summary: StatsUtils throw NPE when explain
 Key: HIVE-22412
 URL: https://issues.apache.org/jira/browse/HIVE-22412
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: xiepengjie
Assignee: xiepengjie


The demo like this:
{code:java}
create table explain_npe ( c1 map );
explain select c1 from explain_npe where c1 is null;{code}
error like:
{code:java}
2019-10-10 09:11:52,670 ERROR ql.Driver (SessionState.java:printError(1068)) - 
FAILED: NullPointerException null
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getSizeOfMap(StatsUtils.java:1045)
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getSizeOfComplexTypes(StatsUtils.java:931)
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getAvgColLenOfVariableLengthTypes(StatsUtils.java:869)
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.estimateRowSizeFromSchema(StatsUtils.java:526)
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:223)
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:136)
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:124)
at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:111)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79)
at 
org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:56)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110)
at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78)
at 
org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:192)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10205)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:210)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:425)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:309)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1153)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1206)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1082)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Review Request 71685: HIVE-22374 Upgrade commons-compress version to 1.19

2019-10-28 Thread Sumin Byeon


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71685/
---

(Updated Oct. 28, 2019, 8:44 a.m.)


Review request for hive.


Repository: hive-git


Description (updated)
---

Upgrade commons-compress version to 1.19 to prevent potential security issues.

https://issues.apache.org/jira/browse/HIVE-22374


Diffs
-

  pom.xml ff2c86a960 


Diff: https://reviews.apache.org/r/71685/diff/1/


Testing
---

No additional test has been provided


Thanks,

Sumin Byeon

[jira] [Created] (HIVE-22411) Performance degradation on single row inserts

2019-10-28 Thread Attila Magyar (Jira)

Attila Magyar created HIVE-22411:


 Summary: Performance degradation on single row inserts
 Key: HIVE-22411
 URL: https://issues.apache.org/jira/browse/HIVE-22411
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0
 Attachments: Screen Shot 2019-10-17 at 8.40.50 PM.png

Executing single insert statements on a transactional table effects write 
performance on a s3 file system. Each insert creates a new delta directory. 
After each insert hive calculates statistics like number of file in the table 
and total size of the table. For this it traverses the directory recursively. 
During the recursion for each path a separate listStatus call is executed. In 
the end the more delta directory you have the more time it takes to calculate 
the statistics.

Therefore insertion time goes up linearly:

!Screen Shot 2019-10-17 at 8.40.50 PM.png|width=601,height=436!

The fix is to use fs.listFiles(path, /*recursive*/ true) instead the 
handcrafter recursive method/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Submitting a Patch Against Branch 3

[jira] [Created] (HIVE-22418) auto_join0.q and auto_join1.q tests can break cbo_limit.q

Re: Review Request 71671: HIVE-22401: Refactor CompactorMR

[jira] [Created] (HIVE-22417) Remove stringifyException from MetaStore

[jira] [Created] (HIVE-22416) MR-related operation logs missing when parallel execution is enabled

Re: Review Request 71589: Create read-only transactions

[jira] [Created] (HIVE-22415) Upgrade to Java 11

[jira] [Created] (HIVE-22414) Make LLAP CacheTags more memory efficient

[jira] [Created] (HIVE-22413) Avoid dirty read when reading the ACID table while compaction is running

[jira] [Created] (HIVE-22412) StatsUtils throw NPE when explain

Re: Review Request 71685: HIVE-22374 Upgrade commons-compress version to 1.19

[jira] [Created] (HIVE-22411) Performance degradation on single row inserts

12 matches

Site Navigation

Mail list logo

Footer information