Re: Review Request 71707: Performance degradation on single row inserts

2019-10-31 Thread Slim Bouguerra

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71707/#review218479
---



looked at the code looks good to me.

- Slim Bouguerra


On Oct. 31, 2019, 11:16 a.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71707/
> ---
> 
> (Updated Oct. 31, 2019, 11:16 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra.
> 
> 
> Bugs: HIVE-22411
> https://issues.apache.org/jira/browse/HIVE-22411
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Executing single insert statements on a transactional table effects write 
> performance on a s3 file system. Each insert creates a new delta directory. 
> After each insert hive calculates statistics like number of file in the table 
> and total size of the table. In order to calculate these, it traverses the 
> directory recursively. During the recursion for each path a separate 
> listStatus call is executed. In the end the more delta directory you have the 
> more time it takes to calculate the statistics.
> 
> Therefore insertion time goes up linearly.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java
>  38e843aeacf 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java
>  155ecb18bf5 
> 
> 
> Diff: https://reviews.apache.org/r/71707/diff/1/
> 
> 
> Testing
> ---
> 
> measured and plotted insertation time
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



[jira] [Created] (HIVE-22442) Datanucleus cannot map MTableColumnStatistics.bitVector when writing to HiveMetastore

2019-10-31 Thread Rentao Wu (Jira)
Rentao Wu created HIVE-22442:


 Summary: Datanucleus cannot map MTableColumnStatistics.bitVector 
when writing to HiveMetastore 
 Key: HIVE-22442
 URL: https://issues.apache.org/jira/browse/HIVE-22442
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema, Metastore, SQL, Standalone Metastore
Affects Versions: 3.1.2
Reporter: Rentao Wu


I'm seeing on insert statements, in StatsTasks as part of persisting 
MTableColumnStatistics into the metastore, the new column bitVector (defined to 
map to BLOB sql type) is failing to be mapped, causing my insert statements to 
fail. Any ideas on how to fix this issue?

 

I'm using:

Hive 3.1.2

Hadoop 3.2.1

Hive Metastore: Mariadb 5.5.64

Datanucleus: (default versions defined in pom.xml)

/usr/lib/hive/lib/datanucleus-api-jdo-4.2.4.jar
/usr/lib/hive/lib/datanucleus-core-4.1.17.jar
/usr/lib/hive/lib/datanucleus-rdbms-4.1.19.jar

 

2019-10-29T19:01:11,799 ERROR [aba24ff8-5560-411d-a1ee-141871cd4b4b main([])]: 
exec.StatsTask (:()) - Failed to run stats task
org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Failed 
to generate new Mapping of type 
org.datanucleus.store.rdbms.mapping.java.ArrayMapping, exception : JDBC type 
BLOB declared for field 
"org.apache.hadoop.hive.metastore.model.MTableColumnStatistics.bitVector" of 
java type java.io.Serializable cant be mapped for this datastore.
JDBC type BLOB declared for field 
"org.apache.hadoop.hive.metastore.model.MTableColumnStatistics.bitVector" of 
java type java.io.Serializable cant be mapped for this datastore.
org.datanucleus.exceptions.NucleusException: JDBC type BLOB declared for field 
"org.apache.hadoop.hive.metastore.model.MTableColumnStatistics.bitVector" of 
java type java.io.Serializable cant be mapped for this datastore.
 at 
org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.getDatastoreMappingClass(RDBMSMappingManager.java:1386)
 at 
org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.createDatastoreMapping(RDBMSMappingManager.java:1616)
 at 
org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.prepareDatastoreMapping(SingleFieldMapping.java:59)
 at 
org.datanucleus.store.rdbms.mapping.java.AbstractContainerMapping.prepareDatastoreMapping(AbstractContainerMapping.java:99)
 at 
org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.initialize(SingleFieldMapping.java:48)
 at 
org.datanucleus.store.rdbms.mapping.java.AbstractContainerMapping.initialize(AbstractContainerMapping.java:67)
 at 
org.datanucleus.store.rdbms.mapping.RDBMSMappingManager.getMapping(RDBMSMappingManager.java:482)
 at 
org.datanucleus.store.rdbms.table.ClassTable.manageMembers(ClassTable.java:536)
 at 
org.datanucleus.store.rdbms.table.ClassTable.manageClass(ClassTable.java:442)
 at 
org.datanucleus.store.rdbms.table.ClassTable.initializeForClass(ClassTable.java:1270)
 at org.datanucleus.store.rdbms.table.ClassTable.initialize(ClassTable.java:276)
 at 
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.initializeClassTables(RDBMSStoreManager.java:3279)
 at 
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2889)
 at 
org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119)
 at 
org.datanucleus.store.rdbms.RDBMSStoreManager.manageClasses(RDBMSStoreManager.java:1627)
 at 
org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:672)
 at 
org.datanucleus.store.rdbms.RDBMSStoreManager.getPropertiesForGenerator(RDBMSStoreManager.java:2088)
 at 
org.datanucleus.store.AbstractStoreManager.getStrategyValue(AbstractStoreManager.java:1271)
 at 
org.datanucleus.ExecutionContextImpl.newObjectId(ExecutionContextImpl.java:3760)
 at 
org.datanucleus.state.StateManagerImpl.setIdentity(StateManagerImpl.java:2267)
 at 
org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:484)
 at 
org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:120)
 at 
org.datanucleus.state.ObjectProviderFactoryImpl.newForPersistentNew(ObjectProviderFactoryImpl.java:218)
 at 
org.datanucleus.ExecutionContextImpl.persistObjectInternal(ExecutionContextImpl.java:2079)
 at 
org.datanucleus.ExecutionContextImpl.persistObjectWork(ExecutionContextImpl.java:1923)
 at 
org.datanucleus.ExecutionContextImpl.persistObject(ExecutionContextImpl.java:1778)
 at 
org.datanucleus.ExecutionContextThreadedImpl.persistObject(ExecutionContextThreadedImpl.java:217)
 at 
org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:724)
 at 
org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:749)
 at 
org.apache.hadoop.hive.metastore.ObjectStore.writeMTableColumnStatistics(ObjectStore.java:8152)
 at 

Re: Submitting a Patch Against Branch 3

2019-10-31 Thread Alan Gates
My thought would be to not worry about Yetus for branches, since it doesn't
work.  As long as it passes the regression tests for the branch it should
be fine.

Alan.

On Thu, Oct 31, 2019 at 10:05 AM David Mollitor  wrote:

> Hello Peter,
>
> Is there a way then to build against branch-3?
>
> Directions I got are from here:
>
> https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CreatingaPatch
>
>
> Thanks!
>
> On Thu, Oct 31, 2019 at 1:03 PM Peter Vary 
> wrote:
>
> > Hi David,
> >
> > Unfortunately Yetus as it is now, does not understand the concept of
> > branches based on the patch names. :(
> >
> > Thanks,
> > Peter
> >
> > > On Oct 29, 2019, at 00:22, David Mollitor  wrote:
> > >
> > > Hello Gang,
> > >
> > > I have attempted a couple of times now to submit a patch for branch-3
> of
> > > Hive.  None of my attempts have been successful and I'm not sure why
> they
> > > are failing.  The following JIRA is a very trivial change but YETUS
> won't
> > > build it.
> > >
> > > Any thoughts?
> > >
> > > https://issues.apache.org/jira/browse/HIVE-18415
> > >
> > > Thanks!
> >
> >
>


Re: Submitting a Patch Against Branch 3

2019-10-31 Thread David Mollitor
Hello Peter,

Is there a way then to build against branch-3?

Directions I got are from here:
https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CreatingaPatch


Thanks!

On Thu, Oct 31, 2019 at 1:03 PM Peter Vary 
wrote:

> Hi David,
>
> Unfortunately Yetus as it is now, does not understand the concept of
> branches based on the patch names. :(
>
> Thanks,
> Peter
>
> > On Oct 29, 2019, at 00:22, David Mollitor  wrote:
> >
> > Hello Gang,
> >
> > I have attempted a couple of times now to submit a patch for branch-3 of
> > Hive.  None of my attempts have been successful and I'm not sure why they
> > are failing.  The following JIRA is a very trivial change but YETUS won't
> > build it.
> >
> > Any thoughts?
> >
> > https://issues.apache.org/jira/browse/HIVE-18415
> >
> > Thanks!
>
>


Re: Submitting a Patch Against Branch 3

2019-10-31 Thread Peter Vary
Hi David,

Unfortunately Yetus as it is now, does not understand the concept of branches 
based on the patch names. :(

Thanks,
Peter

> On Oct 29, 2019, at 00:22, David Mollitor  wrote:
> 
> Hello Gang,
> 
> I have attempted a couple of times now to submit a patch for branch-3 of
> Hive.  None of my attempts have been successful and I'm not sure why they
> are failing.  The following JIRA is a very trivial change but YETUS won't
> build it.
> 
> Any thoughts?
> 
> https://issues.apache.org/jira/browse/HIVE-18415
> 
> Thanks!



[jira] [Created] (HIVE-22441) Metrics Subsytem Improvements

2019-10-31 Thread David Mollitor (Jira)
David Mollitor created HIVE-22441:
-

 Summary: Metrics Subsytem Improvements
 Key: HIVE-22441
 URL: https://issues.apache.org/jira/browse/HIVE-22441
 Project: Hive
  Issue Type: Improvement
Affects Versions: 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor


# CodahaleMetrics uses Guava LoadingCache, which is already thread-safe, and 
then puts an explicit lock around the structure.  Use Java 8 new Map API with 
ConcurrentHashMap.
# Introduce Java 8 APIs
# Simplifications
# Updated unit tests to no longer include a 'sleep'

https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/CodahaleMetrics.java#L91-L94




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Review Request 71708: HIVE-22435

2019-10-31 Thread Krisztian Kasa

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71708/
---

Review request for hive, Jesús Camacho Rodríguez and Zoltan Haindrich.


Bugs: HIVE-20148 and HIVE-22435
https://issues.apache.org/jira/browse/HIVE-20148
https://issues.apache.org/jira/browse/HIVE-22435


Repository: hive-git


Description
---

Exception when using VectorTopNKeyOperator operator
===

VectorTopNKeyOperator extends TopNKeyOperator and it calls it's 
super.initializeOp method
https://github.com/apache/hive/blob/5c8392468cb581f53b6cb55d201fc933dca025e3/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java#L71

which is focusing on non-vectorized execution.

Fix: Derive VectorTopNKeyOperator from Oprator instead of TopNKeyOperator and 
do the initialization 
- map the key columns with the inputObjInspectors 
- setup comparators for mapped keys using the objectInspector extracted from 
the the inputObjInspectors
- add KeyWeapped class for storing key entries in the priorityQueue


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 
c80bc804a2 
  ql/src/test/queries/clientpositive/vector_topnkey.q e1b7d26afe 
  ql/src/test/results/clientpositive/llap/vector_topnkey.q.out d859270ff0 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectComparator.java
 PRE-CREATION 


Diff: https://reviews.apache.org/r/71708/diff/1/


Testing
---

run q test: vector_topnkey and limit_pushdown3 after applying the patch for 
TopNKey pushdown 
(https://issues.apache.org/jira/secure/attachment/12984389/HIVE-20150.15.patch)


Thanks,

Krisztian Kasa



Re: Review Request 71589: Create read-only transactions

2019-10-31 Thread Denys Kuzmenko via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71589/
---

(Updated Oct. 31, 2019, 3:21 p.m.)


Review request for hive, Laszlo Pinter and Peter Vary.


Bugs: HIVE-21114
https://issues.apache.org/jira/browse/HIVE-21114


Repository: hive-git


Description
---

With HIVE-21036 we have a way to indicate that a txn is read only.
We should (at least in auto-commit mode) determine if the single stmt is a read 
and mark the txn accordingly.
Then we can optimize TxnHandler.commitTxn() so that it doesn't do any checks in 
write_set etc.

TxnHandler.commitTxn() already starts with lockTransactionRecord(stmt, txnid, 
TXN_OPEN) so it can read the txn type in the same SQL stmt.

HiveOperation only has QUERY, which includes Insert and Select, so this 
requires figuring out how to determine if a query is a SELECT. By the time 
Driver.openTransaction(); is called, we have already parsed the query so there 
should be a way to know if the statement only reads.

For multi-stmt txns (once these are supported) we should allow user to indicate 
that a txn is read-only and then not allow any statements that can make 
modifications in this txn. This should be a different jira.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 91910d1c0c 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java fcf499d53a 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 943aa383bb 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java ac813c8288 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java 1c53426966 
  ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager.java 
cc86afedbf 
  ql/src/test/org/apache/hadoop/hive/ql/parse/TestParseUtils.java PRE-CREATION 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 355c4f5374 
  
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreTxns.java
 b5f22092a9 


Diff: https://reviews.apache.org/r/71589/diff/7/

Changes: https://reviews.apache.org/r/71589/diff/6-7/


Testing
---

Unit + manual test


File Attachments


HIVE-21114.1.patch
  
https://reviews.apache.org/media/uploaded/files/2019/10/10/0929ed4a-17be-4098-8c61-0819a30613fd__HIVE-21114.1.patch
HIVE-21114.5.patch
  
https://reviews.apache.org/media/uploaded/files/2019/10/17/80cbb092-97d6-48d2-b603-24213141cb5e__HIVE-21114.5.patch
HIVE-21114.8.patch
  
https://reviews.apache.org/media/uploaded/files/2019/10/22/b14eedb4-a2f1-4f77-9676-c258b6804b98__HIVE-21114.8.patch
HIVE-21114.8.patch
  
https://reviews.apache.org/media/uploaded/files/2019/10/22/9096f402-3d2e-4cd2-9f85-df1dfeb25863__HIVE-21114.8.patch
HIVE-21114.8.patch
  
https://reviews.apache.org/media/uploaded/files/2019/10/28/a001316c-bcf4-43a2-83fa-7d49183b2a7f__HIVE-21114.8.patch
HIVE-21114.10.patch
  
https://reviews.apache.org/media/uploaded/files/2019/10/31/93854078-7b63-46ec-95a0-62ab783ee54c__HIVE-21114.10.patch


Thanks,

Denys Kuzmenko



Re: Review Request 71589: Create read-only transactions

2019-10-31 Thread Denys Kuzmenko via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71589/
---

(Updated Oct. 31, 2019, 3:20 p.m.)


Review request for hive, Laszlo Pinter and Peter Vary.


Bugs: HIVE-21114
https://issues.apache.org/jira/browse/HIVE-21114


Repository: hive-git


Description
---

With HIVE-21036 we have a way to indicate that a txn is read only.
We should (at least in auto-commit mode) determine if the single stmt is a read 
and mark the txn accordingly.
Then we can optimize TxnHandler.commitTxn() so that it doesn't do any checks in 
write_set etc.

TxnHandler.commitTxn() already starts with lockTransactionRecord(stmt, txnid, 
TXN_OPEN) so it can read the txn type in the same SQL stmt.

HiveOperation only has QUERY, which includes Insert and Select, so this 
requires figuring out how to determine if a query is a SELECT. By the time 
Driver.openTransaction(); is called, we have already parsed the query so there 
should be a way to know if the statement only reads.

For multi-stmt txns (once these are supported) we should allow user to indicate 
that a txn is read-only and then not allow any statements that can make 
modifications in this txn. This should be a different jira.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 91910d1c0c 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java fcf499d53a 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 943aa383bb 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java ac813c8288 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java 1c53426966 
  ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager.java 
cc86afedbf 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 355c4f5374 
  
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreTxns.java
 b5f22092a9 


Diff: https://reviews.apache.org/r/71589/diff/6/


Testing
---

Unit + manual test


File Attachments (updated)


HIVE-21114.1.patch
  
https://reviews.apache.org/media/uploaded/files/2019/10/10/0929ed4a-17be-4098-8c61-0819a30613fd__HIVE-21114.1.patch
HIVE-21114.5.patch
  
https://reviews.apache.org/media/uploaded/files/2019/10/17/80cbb092-97d6-48d2-b603-24213141cb5e__HIVE-21114.5.patch
HIVE-21114.8.patch
  
https://reviews.apache.org/media/uploaded/files/2019/10/22/b14eedb4-a2f1-4f77-9676-c258b6804b98__HIVE-21114.8.patch
HIVE-21114.8.patch
  
https://reviews.apache.org/media/uploaded/files/2019/10/22/9096f402-3d2e-4cd2-9f85-df1dfeb25863__HIVE-21114.8.patch
HIVE-21114.8.patch
  
https://reviews.apache.org/media/uploaded/files/2019/10/28/a001316c-bcf4-43a2-83fa-7d49183b2a7f__HIVE-21114.8.patch
HIVE-21114.10.patch
  
https://reviews.apache.org/media/uploaded/files/2019/10/31/93854078-7b63-46ec-95a0-62ab783ee54c__HIVE-21114.10.patch


Thanks,

Denys Kuzmenko



[jira] [Created] (HIVE-22440) Beeline Stops Running Command At Comment

2019-10-31 Thread Shawn Weeks (Jira)
Shawn Weeks created HIVE-22440:
--

 Summary: Beeline Stops Running Command At Comment
 Key: HIVE-22440
 URL: https://issues.apache.org/jira/browse/HIVE-22440
 Project: Hive
  Issue Type: Bug
  Components: Beeline, CLI
Reporter: Shawn Weeks
 Attachments: bug_exec.sql, bug_run.sql

This seems to be related to HIVE-13864 but I'm not sure that HIVE-16935 fixes 
it as the example is different. Please flag as a dupe and close if that's the 
case.

If you download the two attached files and run this command it's quits parsing 
at the comment.

{{beeline -u jdbc:hive2://localhost:1 -f ./bug_run.sql}}

{{Connecting to jdbc:hive2://localhost:1/default
Connected to: Apache Hive (version 2.3.5-amzn-1)
Driver: Hive JDBC (version 2.3.5-amzn-1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:1/default> !run bug_example.sql
>>>  create table if not exists bug_test (c1 string) stored as orc;
INFO  : Compiling 
command(queryId=hive_20191031141030_42617cb4-bff2-48a3-85f2-e4eb08997a71): 
create table if not exists bug_test (c1 string) stored as orc
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling 
command(queryId=hive_20191031141030_42617cb4-bff2-48a3-85f2-e4eb08997a71); Time 
taken: 0.005 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20191031141030_42617cb4-bff2-48a3-85f2-e4eb08997a71): 
create table if not exists bug_test (c1 string) stored as orc
INFO  : Completed executing 
command(queryId=hive_20191031141030_42617cb4-bff2-48a3-85f2-e4eb08997a71); Time 
taken: 0.002 seconds
INFO  : OK
No rows affected (0.048 seconds)
>>>
>>>  select 'Hello World'
from bug_test;
INFO  : Compiling 
command(queryId=hive_20191031141030_b97f4ba2-7d18-49b8-a348-8316cc8ae144): 
select 'Hello World'
from bug_test
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
type:string, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20191031141030_b97f4ba2-7d18-49b8-a348-8316cc8ae144); Time 
taken: 0.07 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20191031141030_b97f4ba2-7d18-49b8-a348-8316cc8ae144): 
select 'Hello World'
from bug_test
INFO  : Completed executing 
command(queryId=hive_20191031141030_b97f4ba2-7d18-49b8-a348-8316cc8ae144); Time 
taken: 0.0 seconds
INFO  : OK
+--+
| _c0  |
+--+
+--+
No rows selected (0.1 seconds)
>>>
>>>  select c1 -- a comment
, c1 -- another comment
from bug_test;
. . . . . . . . . . . . . . . . . . . .>
. . . . . . . . . . . . . . . . . . . .> Error: Error while compiling 
statement: FAILED: SemanticException [Error 10004]: Line 1:7 Invalid table 
alias or column reference 'c1': (possible column names are: ) 
(state=42000,code=10004)
Aborting command set because "force" is false and command failed: "select c1 -- 
a comment
, c1 -- another comment
from bug_test;"
Closing: 0: jdbc:hive2://localhost:1/default}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 71671: HIVE-22401: Refactor CompactorMR

2019-10-31 Thread Laszlo Pinter via Review Board


> On Oct. 28, 2019, 5:44 p.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MmMajorQueryCompactor.java
> > Lines 85-87 (patched)
> > 
> >
> > Maybe this should be checked outside? This is something general? Or am 
> > I mistaken?

In my next change, related to query based minor compaction, I  will refactor 
this part as well.


- Laszlo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71671/#review218420
---


On Oct. 24, 2019, 3:57 p.m., Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71671/
> ---
> 
> (Updated Oct. 24, 2019, 3:57 p.m.)
> 
> 
> Review request for hive, Marta Kuczora, Peter Vary, and Adam Szita.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-22401: Refactor CompactorMR
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> 0f1579aa542f83b68f2efc92e08e6c0a32bd113d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MajorQueryCompactor.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MmMajorQueryCompactor.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/QueryCompactor.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/QueryCompactorFactory.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/71671/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



Review Request 71707: Performance degradation on single row inserts

2019-10-31 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71707/
---

Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra.


Bugs: HIVE-22411
https://issues.apache.org/jira/browse/HIVE-22411


Repository: hive-git


Description
---

Executing single insert statements on a transactional table effects write 
performance on a s3 file system. Each insert creates a new delta directory. 
After each insert hive calculates statistics like number of file in the table 
and total size of the table. In order to calculate these, it traverses the 
directory recursively. During the recursion for each path a separate listStatus 
call is executed. In the end the more delta directory you have the more time it 
takes to calculate the statistics.

Therefore insertion time goes up linearly.


Diffs
-

  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java
 38e843aeacf 
  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java
 155ecb18bf5 


Diff: https://reviews.apache.org/r/71707/diff/1/


Testing
---

measured and plotted insertation time


Thanks,

Attila Magyar



[jira] [Created] (HIVE-22439) TestStatsReplicationScenariosACIDNoAutogather timed out after 40 minutes

2019-10-31 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-22439:
---

 Summary: TestStatsReplicationScenariosACIDNoAutogather timed out 
after 40 minutes
 Key: HIVE-22439
 URL: https://issues.apache.org/jira/browse/HIVE-22439
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
 Attachments: ptest_logs.tgz

on the ptest server sometimes it reaches the 40 minute timeout



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22438) Additional comma is added to projection column ids

2019-10-31 Thread Wenning Ding (Jira)
Wenning Ding created HIVE-22438:
---

 Summary: Additional comma is added to projection column ids
 Key: HIVE-22438
 URL: https://issues.apache.org/jira/browse/HIVE-22438
 Project: Hive
  Issue Type: Bug
Reporter: Wenning Ding
Assignee: Wenning Ding


I ran into this issue when querying a Hudi data through Hive.

Basically, to query a Hudi style table, Hudi implements its own InputFormat 
class and overwrite the getRecordReader method. In this method, because of some 
reasons, Hudi will manually add several projection column ids and projection 
column names when each time getRecordReader method is called. Like this:

 
{code:java}
public RecordReader getRecordReader(final 
InputSplit split, final JobConf job,
final Reporter reporter) throws IOException {
if 
(!job.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR).contains("col_a")) {
job.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "col_a");
}
if (!job.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR).contains("1")) 
{
job.set(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR, "1");
}
super.getRecordReader(split, job, reporter);
}
{code}
 

In this situation, it will cause a problem when using COUNT(*) or COUNT(1) 
query. Note that for COUNT(*) or COUNT(1), Hive don't need to read any column. 
So the projection column ids is an empty string.

Here is a log example to show the whole workflow.
{code:java}
[DEBUG] [TezChild] |split.TezGroupedSplitsInputFormat|: Init record reader for 
index 0 of 2
[INFO] [TezChild] |realtime.HoodieParquetRealtimeInputFormat|: Before adding 
Hoodie columns, Projections : Ids :
[INFO] [TezChild] |hadoop.HoodieParquetInputFormat|: After adding Hoodie 
columns, Projections :col_a Ids :1
[DEBUG] [TezChild] |split.TezGroupedSplitsInputFormat|: Init record reader for 
index 1 of 2
[INFO] [TezChild] |realtime.HoodieParquetRealtimeInputFormat|: Before adding 
Hoodie columns, Projections :col_a Ids :,1
{code}
As we can see, at the second time, projection ids becomes ",1" and that 
additional comma will cause exceptions in the following program.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)