Re: [VOTE] Release Apache Hive 4.0.0 (Release Candidate 0)

2024-03-27 Thread Marta Kuczora
+1 (binding)

Thanks a lot Denys for driving the release!

* Verified the checksum and signature [OK]

* Built Hive 4.0.0 from source [OK]

* Initialized metastore with MySQL [OK]

* Built package and ran metastore and hiveserver [OK]

* Deployed and start the binary tar with Hadoop 3.3.6 and Tez 0.10.3 [OK]

* Ran some simple Hive queries with external/acid/iceberg tables [OK]


Regards,

Marta

On Tue, Mar 26, 2024 at 8:26 AM Denys Kuzmenko  wrote:

> Hi Everyone,
>
> We would like to thank everyone who has contributed to the project and
> request
> the Hive PMC members to review and vote on this new release candidate.
>
> Apache Hive 4.0.0 RC-0 artifacts are available here:*
> https://people.apache.org/~dkuzmenko/apache-hive-4.0.0-rc0/
>
>
> The checksums are as follows:
> - 83eb88549ae88d3df6a86bb3e2526c7f4a0f21acafe21452c18071cee058c666
> apache-hive-4.0.0-bin.tar.gz
> - 4dbc9321d245e7fd26198e5d3dff95e5f7d0673d54d0727787d72956a1bca4f5
> apache-hive-4.0.0-src.tar.gz
>
>
> You can find the KEYS file here:
>
> * https://downloads.apache.org/hive/KEYS
>
>
> A staged Maven repository URL is:*
> https://repository.apache.org/content/repositories/orgapachehive-1127/
>
> The git commit hash is:*
>
> https://github.com/apache/hive/commit/183f8cb41d3dbed961ffd27999876468ff06690c
>
>
> This corresponds to the tag: release-4.0.0-rc0
> * https://github.com/apache/hive/tree/release-4.0.0-rc0
>
> The vote is open for the next 72 hours and passes if a majority of at least
> three +1 PMC votes are cast.
>
> (Only PMC members have binding votes, however, other community members
> are encouraged to cast non-binding votes.)
>
>
> [ ] +1 Release this package as Apache Hive 4.0.0
> [ ] +0
> [ ] -1 Do not release this because...
>
>
> Please download, verify, and test.
>
>
> Regards,
>
> Denys
>


[jira] [Created] (HIVE-25457) Implement querying Iceberg table metadata

2021-08-17 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-25457:


 Summary: Implement querying Iceberg table metadata
 Key: HIVE-25457
 URL: https://issues.apache.org/jira/browse/HIVE-25457
 Project: Hive
  Issue Type: New Feature
Reporter: Marta Kuczora
Assignee: Marta Kuczora
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [EXTERNAL] Re: Welcome Marta to Hive PMC

2021-08-04 Thread Marta Kuczora
Thanks a lot, I am really honored.

On Tue, Aug 3, 2021 at 11:31 AM Sankar Hariappan
 wrote:

> Congrats Marta!
>
> Thanks,
> Sankar
>
> -Original Message-
> From: Peter Vary 
> Sent: 03 August 2021 14:26
> To: dev@hive.apache.org
> Subject: [EXTERNAL] Re: Welcome Marta to Hive PMC
>
> Congratulations Marta!
>
> > On Aug 3, 2021, at 10:01, Karen Coppage  wrote:
> >
> > Congratulations!!  
> >
> > Karen
> >
> >> On 2021. Aug 3., at 6:50, Ashutosh Chauhan 
> wrote:
> >>
> >> Hi all,
> >>
> >> It's an honor to announce that Apache Hive PMC has recently voted to
> >> invite Marta Kuczora as a new Hive PMC member. Marta is a long time
> >> Hive contributor and committer, and has made significant contributions
> in Hive.
> >> Please join me in congratulating her and looking forward to a bigger
> >> role that she will play in the Apache Hive project.
> >>
> >> Thanks,
> >> Ashutosh
> >
>
>


[jira] [Created] (HIVE-25357) Fix the checkstyle issue in HiveIcebergMetaHook which breaks the build

2021-07-20 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-25357:


 Summary: Fix the checkstyle issue in HiveIcebergMetaHook which 
breaks the build
 Key: HIVE-25357
 URL: https://issues.apache.org/jira/browse/HIVE-25357
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Marta Kuczora
Assignee: Marta Kuczora
 Fix For: 4.0.0


[ERROR] 
/home/jenkins/agent/workspace/hive-precommit_master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java:221:3:
 Cyclomatic Complexity is 13 (max allowed is 12). [CyclomaticComplexity]

This issue probably came in with 
[this|https://github.com/apache/hive/commit/76c49b9df957c8c05b81a4016282c03648b728b9]
 commit 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25325) Add TRUNCATE TABLE support for Hive Iceberg tables

2021-07-12 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-25325:


 Summary: Add TRUNCATE TABLE support for Hive Iceberg tables
 Key: HIVE-25325
 URL: https://issues.apache.org/jira/browse/HIVE-25325
 Project: Hive
  Issue Type: Improvement
Reporter: Marta Kuczora
Assignee: Marta Kuczora
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25310) Fix local test run problems with Iceberg tests: Socket closed by peer

2021-07-07 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-25310:


 Summary: Fix local test run problems with Iceberg tests: Socket 
closed by peer
 Key: HIVE-25310
 URL: https://issues.apache.org/jira/browse/HIVE-25310
 Project: Hive
  Issue Type: Test
Reporter: Marta Kuczora
Assignee: Marta Kuczora






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25264) Add tests to verify Hive can read/write after schema change on Iceberg table

2021-06-18 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-25264:


 Summary: Add tests to verify Hive can read/write after schema 
change on Iceberg table
 Key: HIVE-25264
 URL: https://issues.apache.org/jira/browse/HIVE-25264
 Project: Hive
  Issue Type: Test
Reporter: Marta Kuczora
Assignee: Marta Kuczora






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25258) Incorrect row order after query-based MINOR compaction

2021-06-16 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-25258:


 Summary: Incorrect row order after query-based MINOR compaction
 Key: HIVE-25258
 URL: https://issues.apache.org/jira/browse/HIVE-25258
 Project: Hive
  Issue Type: Bug
Reporter: Marta Kuczora
Assignee: Marta Kuczora






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25257) Incorrect row order validation for query-based MAJOR compaction

2021-06-16 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-25257:


 Summary: Incorrect row order validation for query-based MAJOR 
compaction
 Key: HIVE-25257
 URL: https://issues.apache.org/jira/browse/HIVE-25257
 Project: Hive
  Issue Type: Bug
Reporter: Marta Kuczora
Assignee: Marta Kuczora






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24642) Multiple file listing calls are executed in the MoveTask in case of direct inserts

2021-01-15 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-24642:


 Summary: Multiple file listing calls are executed in the MoveTask 
in case of direct inserts
 Key: HIVE-24642
 URL: https://issues.apache.org/jira/browse/HIVE-24642
 Project: Hive
  Issue Type: Improvement
Reporter: Marta Kuczora
Assignee: Marta Kuczora


When inserting data into a table with dynamic partitioning with direct insert 
on, the MoveTask performs several file listings to look up the newly created 
partitions and files. Check if all files listings are necessary or it can be 
optimized to do less listings.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24530) Potential NPE in FileSinkOperator.closeRecordwriters method

2020-12-14 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-24530:


 Summary: Potential NPE in FileSinkOperator.closeRecordwriters 
method
 Key: HIVE-24530
 URL: https://issues.apache.org/jira/browse/HIVE-24530
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Marta Kuczora
Assignee: Marta Kuczora






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24506) Investigate the materialized_view_create_rewrite_4.q test with direct insert on

2020-12-08 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-24506:


 Summary: Investigate the materialized_view_create_rewrite_4.q test 
with direct insert on
 Key: HIVE-24506
 URL: https://issues.apache.org/jira/browse/HIVE-24506
 Project: Hive
  Issue Type: Task
Reporter: Marta Kuczora






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24505) Investigate if the arrays in the FileSinkOperator could be replaced by Lists

2020-12-08 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-24505:


 Summary: Investigate if the arrays in the FileSinkOperator could 
be replaced by Lists
 Key: HIVE-24505
 URL: https://issues.apache.org/jira/browse/HIVE-24505
 Project: Hive
  Issue Type: Task
Affects Versions: 4.0.0
Reporter: Marta Kuczora
Assignee: Marta Kuczora


The FileSinkOperator uses some array variables, like
Path[] outPaths;
Path[] outPathsCommitted;
Path[] finalPaths;
RecordWriter[] outWriters;
RecordUpdater[] updaters;
Working with these is not always convenient, like when in the 
createDynamicBucket method, they are extended with elements. Or in case of an 
UPDATE operation with direct insert on. Then the delete deltas have to be 
collected separately, because the outPaths array will contain only the inserted 
deltas. These operations would be much easier with lists.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24336) Turn off the direct insert for EXPLAIN ANALYZE queries

2020-10-30 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-24336:


 Summary: Turn off the direct insert for EXPLAIN ANALYZE queries
 Key: HIVE-24336
 URL: https://issues.apache.org/jira/browse/HIVE-24336
 Project: Hive
  Issue Type: Bug
Reporter: Marta Kuczora
Assignee: Marta Kuczora
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24322) In case of direct insert, the attempt ID has to be checked when reading the manifest files

2020-10-28 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-24322:


 Summary: In case of direct insert, the attempt ID has to be 
checked when reading the manifest files
 Key: HIVE-24322
 URL: https://issues.apache.org/jira/browse/HIVE-24322
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Marta Kuczora
Assignee: Marta Kuczora
 Fix For: 4.0.0


In [IMPALA-10247|https://issues.apache.org/jira/browse/IMPALA-10247] there was 
an exception from Hive when tyring to load the data:
{noformat}
2020-10-13T16:50:53,424 ERROR [HiveServer2-Background-Pool: Thread-23832] 
exec.Task: Job Commit failed with exception 
'org.apache.hadoop.hive.ql.metadata.HiveException(java.io.EOFException)'
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
 at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1468)
 at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
 at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
 at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
 at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:627)
 at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:342)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
 at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
 at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
 at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
 at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
 at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
 at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
 at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
 at 
org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
 at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
 at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:392)
 at 
org.apache.hadoop.hive.ql.exec.Utilities.handleDirectInsertTableFinalPath(Utilities.java:4587)
 at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1462)
 ... 29 more
{noformat}

The reason of the exception was that Hive was trying to read an empty manifest 
file. Manifest files are used in case of direct insert to determine which files 
needs to be kept and which one needs to be cleaned up. They are created by the 
tasks and they use the tast attempt Id as postfix. In this particular test what 
happened is that one of the container ran out of memory so Tez decided to kill 
it right after the manifest file got created but before the pathes got written 
into the manifest file. This was the manifest file for the task attempt 0. Then 
Tez assigned a new container to the task, so a new attemp was made with 
attemptId=1. This one was successful, and wrote the manifest file correctly. 
But Hive didn't know about this, since this out of memory issue got handled by 
Tez under the hood, so there was no exception in Hive, therefore no clean-up in 
the manifest folder. And when Hive is reading the manifest files, it just reads 
every file from the defined folder, so it tried to read the manifest files for 
attemp 0 and 1 as well.
If there are multiple manifest files with the same name but different 
attemptId, Hive should only read the one with the biggest attempt Id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23763) Query based minor compaction produces wrong files when rows with different buckets Ids are processed by the same FileSinkOperator

2020-06-25 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-23763:


 Summary: Query based minor compaction produces wrong files when 
rows with different buckets Ids are processed by the same FileSinkOperator
 Key: HIVE-23763
 URL: https://issues.apache.org/jira/browse/HIVE-23763
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 4.0.0
Reporter: Marta Kuczora
Assignee: Marta Kuczora
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72532: HIVE-23495 AcidUtils.getAcidState cleanup

2020-06-15 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72532/#review221005
---


Ship it!




Ship It!

- Marta Kuczora


On June 8, 2020, 10:58 a.m., Peter Varga wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72532/
> ---
> 
> (Updated June 8, 2020, 10:58 a.m.)
> 
> 
> Review request for hive, Karen Coppage, Marta Kuczora, and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> since HIVE-21225 there are two redundant implementation of the 
> AcidUtils.getAcidState.
> 
> The previous implementation (without the recursive listing) can be removed.
> 
> Also the performance can be improved, by removing unnecessary fileStatus 
> calls.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 635ed3149c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java ca234cfb37 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 1059cb227f 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
> 16c915959c 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
>  598220b0c4 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java 2a15913f9f 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> 4e5d5b003b 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> 7913295380 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MinorQueryCompactor.java 
> d83a50f555 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MmMajorQueryCompactor.java
>  5e11d8d2d8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MmMinorQueryCompactor.java
>  1bdec7df2d 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java 75941b3f33 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java 337f469d1a 
>   ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java f351f04b08 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 
> e4440e9136 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcRawRecordMerger.java 
> f63c40a7b5 
>   streaming/src/test/org/apache/hive/streaming/TestStreaming.java 3a3b267927 
> 
> 
> Diff: https://reviews.apache.org/r/72532/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Peter Varga
> 
>



[jira] [Created] (HIVE-23444) Concurrent ACID direct inserts may fail with FileNotFoundException

2020-05-11 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-23444:


 Summary: Concurrent ACID direct inserts may fail with 
FileNotFoundException
 Key: HIVE-23444
 URL: https://issues.apache.org/jira/browse/HIVE-23444
 Project: Hive
  Issue Type: Bug
Reporter: Marta Kuczora
Assignee: Marta Kuczora
 Fix For: 4.0.0


{noformat}
2020-04-30 15:56:54,706 ERROR org.apache.hive.service.cli.operation.Operation: 
[HiveServer2-Background-Pool: Thread-675]: Error running hive query: 
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: 
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException: 
java.io.FileNotFoundException: File 
hdfs://ns1/warehouse/tablespace/managed/hive/tpch_unbucketed.db/concurrent_insert_partitioned/l_tax=0.0/_tmp.delta_001_001_
 does not exist.
at 
org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:362)
 ~[hive-service-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:241)
 ~[hive-service-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
at 
org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
 ~[hive-service-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
 [hive-service-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
at java.security.AccessController.doPrivileged(Native Method) [?:?]
at javax.security.auth.Subject.doAs(Subject.java:423) [?:?]
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
 [hadoop-common-3.1.1.7.1.1.0-493.jar:?]
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
 [hive-service-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: java.io.FileNotFoundException: File 
hdfs://ns1/warehouse/tablespace/managed/hive/tpch_unbucketed.db/concurrent_insert_partitioned/l_tax=0.0/_tmp.delta_001_001_
 does not exist.
at 
org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2465) 
~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
at 
org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:2228) 
~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
at 
org.apache.hadoop.hive.ql.exec.MoveTask.handleStaticParts(MoveTask.java:522) 
~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:442) 
~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) 
~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) 
~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) 
~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) 
~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) 
~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) 
~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
~[hive-exec-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493]
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
 ~[hive-service-3.1.3000.7.1.1.0-493.jar:3.1.3000.7.1.1.0-493

[jira] [Created] (HIVE-23442) ACID major compaction doesn't read base correct if it was written by insert overwrite by direct insert

2020-05-11 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-23442:


 Summary: ACID major compaction doesn't read base correct if it was 
written by insert overwrite by direct insert
 Key: HIVE-23442
 URL: https://issues.apache.org/jira/browse/HIVE-23442
 Project: Hive
  Issue Type: Bug
Reporter: Marta Kuczora






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23410) ACID: Improve the delete and update operations to avoid the move step

2020-05-08 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-23410:


 Summary: ACID: Improve the delete and update operations to avoid 
the move step
 Key: HIVE-23410
 URL: https://issues.apache.org/jira/browse/HIVE-23410
 Project: Hive
  Issue Type: Improvement
Affects Versions: 4.0.0
Reporter: Marta Kuczora
Assignee: Marta Kuczora


This is a follow-up task for 
[HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the insert 
operation has been modified to write directly to the table locations instead of 
the staging directory. The same improvement should be done for the ACID update 
and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23345) INT64 Parquet timestamps cannot be read into bigint Hive type

2020-04-30 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-23345:


 Summary: INT64 Parquet timestamps cannot be read into bigint Hive 
type
 Key: HIVE-23345
 URL: https://issues.apache.org/jira/browse/HIVE-23345
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Marta Kuczora
Assignee: Marta Kuczora






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23286) The clean-up in case of an aborted FileSinkOperator is not correct for ACID direct insert

2020-04-23 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-23286:


 Summary: The clean-up in case of an aborted FileSinkOperator is 
not correct for ACID direct insert
 Key: HIVE-23286
 URL: https://issues.apache.org/jira/browse/HIVE-23286
 Project: Hive
  Issue Type: Bug
Reporter: Marta Kuczora
Assignee: Marta Kuczora
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72336: HIVE-23114: Insert overwrite with dynamic partitioning is not working correctly with direct insert

2020-04-08 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72336/
---

(Updated April 8, 2020, 1:47 p.m.)


Review request for hive and Peter Vary.


Changes
---

Fixing whitespaces.


Bugs: HIVE-23114
https://issues.apache.org/jira/browse/HIVE-23114


Repository: hive-git


Description
---

The idea behind the patch is the following:
When doing a multi-statement insert overwrite with dynamic partitioning, the 
partition information will be written to the manifest file. With this 
information, each FileSinkOperator can clean-up only the partition directories 
written by the same FileSinkOperator and do not clean-up the partition 
directories written by the other FileSinkOperators.
If a statement from the insert overwrite query, doesn't produce any data, a 
manifest file will still be written, otherwise the missing manifest file would 
result a clean-up on table level which could delete the data written by the 
other FileSinkOperators.


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties e99ce7babb 
  ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java 
d68d8f9409 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 04166a23ee 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java e25dc54e7d 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 17e6cdf162 
  ql/src/test/queries/clientpositive/acid_direct_insert_insert_overwrite.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/acid_multiinsert_dyn_part.q PRE-CREATION 
  ql/src/test/results/clientpositive/acid_direct_insert_insert_overwrite.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/acid_multiinsert_dyn_part.q.out 
PRE-CREATION 
  
ql/src/test/results/clientpositive/llap/acid_direct_insert_insert_overwrite.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/llap/acid_multiinsert_dyn_part.q.out 
PRE-CREATION 


Diff: https://reviews.apache.org/r/72336/diff/2/

Changes: https://reviews.apache.org/r/72336/diff/1-2/


Testing
---

Added specific q tests for different insert overwrite scenarios.


Thanks,

Marta Kuczora



Review Request 72336: HIVE-23114: Insert overwrite with dynamic partitioning is not working correctly with direct insert

2020-04-08 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72336/
---

Review request for hive and Peter Vary.


Bugs: HIVE-23114
https://issues.apache.org/jira/browse/HIVE-23114


Repository: hive-git


Description
---

The idea behind the patch is the following:
When doing a multi-statement insert overwrite with dynamic partitioning, the 
partition information will be written to the manifest file. With this 
information, each FileSinkOperator can clean-up only the partition directories 
written by the same FileSinkOperator and do not clean-up the partition 
directories written by the other FileSinkOperators.
If a statement from the insert overwrite query, doesn't produce any data, a 
manifest file will still be written, otherwise the missing manifest file would 
result a clean-up on table level which could delete the data written by the 
other FileSinkOperators.


Diffs
-

  itests/src/test/resources/testconfiguration.properties e99ce7babb 
  ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java 
d68d8f9409 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 04166a23ee 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java e25dc54e7d 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 17e6cdf162 
  ql/src/test/queries/clientpositive/acid_direct_insert_insert_overwrite.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/acid_multiinsert_dyn_part.q PRE-CREATION 
  ql/src/test/results/clientpositive/acid_direct_insert_insert_overwrite.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/acid_multiinsert_dyn_part.q.out 
PRE-CREATION 
  
ql/src/test/results/clientpositive/llap/acid_direct_insert_insert_overwrite.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/llap/acid_multiinsert_dyn_part.q.out 
PRE-CREATION 


Diff: https://reviews.apache.org/r/72336/diff/1/


Testing
---

Added specific q tests for different insert overwrite scenarios.


Thanks,

Marta Kuczora



[jira] [Created] (HIVE-23114) Insert overwrite with dynamic partitioning is not working correctly with ACID tables with direct insert and with insert-only tables

2020-03-31 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-23114:


 Summary: Insert overwrite with dynamic partitioning is not working 
correctly with ACID tables with direct insert and with insert-only tables
 Key: HIVE-23114
 URL: https://issues.apache.org/jira/browse/HIVE-23114
 Project: Hive
  Issue Type: Bug
Reporter: Marta Kuczora
Assignee: Marta Kuczora






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72181: HIVE-22832: Parallelise direct insert directory cleaning process

2020-03-04 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72181/#review219763
---


Ship it!




Ship It!

- Marta Kuczora


On March 2, 2020, 9:22 a.m., Marton Bod wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72181/
> ---
> 
> (Updated March 2, 2020, 9:22 a.m.)
> 
> 
> Review request for hive, Marta Kuczora and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-22832: Parallelise direct insert directory cleaning process
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java e9966e6364 
> 
> 
> Diff: https://reviews.apache.org/r/72181/diff/1/
> 
> 
> Testing
> ---
> 
> pre-commit build success: 
> https://builds.apache.org/job/PreCommit-HIVE-Build/20874/
> 
> 
> Thanks,
> 
> Marton Bod
> 
>



[jira] [Created] (HIVE-22969) Union remove optimisation results incorrect data when inserting to ACID table

2020-03-03 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-22969:


 Summary: Union remove optimisation results incorrect data when 
inserting to ACID table
 Key: HIVE-22969
 URL: https://issues.apache.org/jira/browse/HIVE-22969
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Marta Kuczora
Assignee: Marta Kuczora


Steps to reproduce the issue:
{noformat}
create table input_text(key string, val string) stored as textfile location 
'/Users/martakuczora/work/hive/warehouse/external/input_text';

create table output_acid(key string, val string) stored as orc 
tblproperties('transactional'='true');
insert into input_text values ('1','1'), ('2','2'),('3','3');
{noformat}
{noformat}
set hive.mapred.mode=nonstrict;
set hive.stats.autogather=false;
set hive.optimize.union.remove=true;
set hive.auto.convert.join=true;
set hive.exec.submitviachild=false;
set hive.exec.submit.local.task.via.child=false;

SELECT * FROM (
select key, val from input_text
union all
select a.key as key, b.val as val FROM input_text a join input_text b on 
a.key=b.key) c;

The result of the select:
1   1
2   2
3   3
1   1
2   2
3   3
{noformat}
{noformat}
insert into table output_acid
SELECT * FROM (
select key, val from input_text
union all
select a.key as key, b.val as val FROM input_text a join input_text b on 
a.key=b.key) c;

select * from output_acid;
The result:
1   1
2   2
3   3
{noformat}

The folder of the output_acid table contained the following delta directories:
{noformat}
drwxr-xr-x  6 martakuczora  staff  192 Mar  2 16:29 delta_000_000
drwxr-xr-x  6 martakuczora  staff  192 Mar  2 16:29 delta_001_001_0001
{noformat}
It can be seen that the statement ID from the first directory is missing and 
when the select statements runs on the table, this directory will be ignored. 
That's why only half of the data got returned when running the select on the 
output_acid table.

If either hive.stats.autogather is set to true or hive.optimize.union.remove is 
set to false the result of the insert will be correct. In this case there will 
be only 1 delta directory in the table's folder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22918) Investigate empty bucket file creation for ACID tables

2020-02-21 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-22918:


 Summary: Investigate empty bucket file creation for ACID tables
 Key: HIVE-22918
 URL: https://issues.apache.org/jira/browse/HIVE-22918
 Project: Hive
  Issue Type: Task
Affects Versions: 4.0.0
Reporter: Marta Kuczora
Assignee: Marton Bod


When creating an insert-only bucketed table with 5 buckets, and we insert only 
one row to this table, Hive creates empty files for the other 4 buckets. This 
logic is in the code for ACID tables as well, but when checking the table's 
final directory after the insert, I found that only 1 files got created. When 
debugged this issue, I found that the empty files are created in the staging 
directory outside the delta directory, therefore they won't get copied by the 
move task to the final directory. This behavior seems broken, but not sure if 
we really need the empty files in this case.

This Jira is about investigating whether or not we need these empty files for 
ACID tables and if we do, fix the code to have them for ACID tables as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22917) Configuration for Hive to recognise non-empty destination folders

2020-02-21 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-22917:


 Summary: Configuration for Hive to recognise non-empty destination 
folders
 Key: HIVE-22917
 URL: https://issues.apache.org/jira/browse/HIVE-22917
 Project: Hive
  Issue Type: Task
Reporter: Marta Kuczora
Assignee: Marta Kuczora


Currently Hive overwrites the LOCATION folder even if it is non-empty in case 
of INSERT or CTAS.
Investigate this behavior and if we can introduce a switch whereby any 
ALTER/INSERT or CTAS or CREATE or DROP operation / transaction would be aborted 
if the switch is ON and the LOCATION clause points at a non-empty folder.

{noformat}
>> create table test (json_data string)
 STORED AS TEXTFILE
 LOCATION 'hdfs://host-10-17-102-132.coe.>ra.com:8020/tmp/test'
 TBLPROPERTIES ('serialization.null.format' = '');

>> insert into test values('test0');
>> insert into test values('test1');
>> insert into test values('test2');

>> select * from test;
INFO : Compiling 
command(queryId=hive_20200207150101_601d6dbc-99cb-446d-86ac-6f8ce5304681): 
select * from test
INFO : Executing 
command(queryId=hive_20200207150101_601d6dbc-99cb-446d-86ac-6f8ce5304681): 
select * from test
INFO : Completed executing 
command(queryId=hive_20200207150101_601d6dbc-99cb-446d-86ac-6f8ce5304681); Time 
taken: 0.001 seconds
INFO : OK
-+
test.json_data
-+
test0
test1
test2
-+

>> select * from test_id2;
INFO : Compiling 
command(queryId=hive_20200207145656_e99d1a0d-ea4c-4636-ae3a-dd930df14644): 
select * from test_id2
INFO : Executing 
command(queryId=hive_20200207145656_e99d1a0d-ea4c-4636-ae3a-dd930df14644): 
select * from test_id2
INFO : Completed executing 
command(queryId=hive_20200207145656_e99d1a0d-ea4c-4636-ae3a-dd930df14644); Time 
taken: 0.001 seconds
INFO : OK
--+
test_id2.id
--+
1
13
14
--+

>> create table test2 (json_data int)
 STORED AS TEXTFILE
 LOCATION 'hdfs://host-10-17-102-132.coe.>ra.com:8020/tmp/test'
 as SELECT * from test_id;

INFO : Completed executing 
command(queryId=hive_20200207150303_cbb57a17-1242-46dc-a98e-addf50f01c5b); Time 
taken: 13.137 seconds
INFO : OK
No rows affected (13.226 seconds)

SELECT * from test;
INFO : Compiling 
command(queryId=hive_20200207150404_d0aabd08-a15f-4e6c-99a3-e607b8a6cfd3): 
SELECT * from test
INFO : Executing 
command(queryId=hive_20200207150404_d0aabd08-a15f-4e6c-99a3-e607b8a6cfd3): 
SELECT * from test
INFO : Completed executing 
command(queryId=hive_20200207150404_d0aabd08-a15f-4e6c-99a3-e607b8a6cfd3); Time 
taken: 0.001 seconds
INFO : OK
-+
test.json_data
-+
1
13
14
-+
3 rows selected (0.081 seconds)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 71904: HIVE-21164: ACID: explore how we can avoid a move step during inserts/compaction

2020-02-18 Thread Marta Kuczora via Review Board


> On Feb. 4, 2020, 3:49 p.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
> > Lines 1732-1737 (patched)
> > <https://reviews.apache.org/r/71904/diff/3/?file=2210218#file2210218line1732>
> >
> > What about using lambda here?
> 
> Marta Kuczora wrote:
> Fixed it.

At the end this code part got removed.


- Marta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71904/#review219487
-------


On Feb. 18, 2020, 12:21 p.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71904/
> ---
> 
> (Updated Feb. 18, 2020, 12:21 p.m.)
> 
> 
> Review request for hive, Gopal V and Peter Vary.
> 
> 
> Bugs: HIVE-21164
> https://issues.apache.org/jira/browse/HIVE-21164
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Extended the original patch with saving the task attempt ids in the file 
> names and also fixed some bugs in the original patch.
> With this fix, inserting into an ACID table would not use move task to place 
> the generated files into the final directory. It will inserts every files to 
> the final directory and then clean up the files which are not needed (like 
> written by failed task attempts).
> Also fixed the replication tests which failed for the original patch as well.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java d3cb60b790 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  da677c7977 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 
> 056cd27496 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java
>  31d15fdef9 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorTestUtil.java
>  c2aa73b5f1 
>   itests/src/test/resources/testconfiguration.properties 1b1bf1147a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java 
> 9a3258115b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 9ad4e71482 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 06e4ebee82 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 6c67bc7dd8 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidInputFormat.java bba3960102 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidOutputFormat.java 1e8bb223f2 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 2f5ec5270c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 
> 8980a6292a 
>   ql/src/java/org/apache/hadoop/hive/ql/io/RecordUpdater.java 737e6774b7 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 76984abd0a 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java 
> c4c56f8477 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
> b8a0f0465c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 
> 398698ec06 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
>  2543dc6fc4 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1eb9c12cc8 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 73ca658d9c 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> 33d3beba46 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
> c102a69f8f 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java ecc7bdee4d 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java bed05819b5 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> 739f2b654b 
>   ql/src/java/org/apache/hadoop/hive/ql/util/UpgradeTool.java 58e6289583 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnAddPartition.java c9cb6692df 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands.java 842140815d 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java e56d83158f 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands3.java 908ceb43fc 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnConcatenate.java 8676e0db11 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnExIm.java 66b2b2768b 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnLoadData.java bb55d9fd79 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnNoBuckets.java ea6b1d9bec 
>   ql/src/test/org/apache/hadoop/hive/ql/TxnCommandsBaseForTests.java 
> af14

Re: Review Request 71904: HIVE-21164: ACID: explore how we can avoid a move step during inserts/compaction

2020-02-18 Thread Marta Kuczora via Review Board


> On Feb. 4, 2020, 10:16 p.m., Rajesh Balamohan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
> > Line 4382 (original), 4397 (patched)
> > <https://reviews.apache.org/r/71904/diff/3/?file=2210218#file2210218line4397>
> >
> > Is this needed for direct insert?. In objectstores, we could have calls 
> > getting throttled.

That's a really good question, I was thinking about it a lot. I think it is not 
needed. This method does two things: removes the temporarily and duplicated 
files and returns the emptyBuckets list. This list contains elements if the 
number of buckets are bigger than the number of files. In this case, for MM 
tables,  empty files will be created. But this is not the case for ACID tables, 
there won't be any empty files created for ACID tables. I want to revisit this 
topic whether or not we need these empty files, but for now, I would go with 
the same behaviour as for ACID tables. 
About the temp file removal, when the direct insert is finished all files which 
are not committed (meaning not in the manifest files) will be deleted prior to 
this call. So there shouldn't be any unnecessary files left at this point. 
I remove this call, and upload a patch to see the result of the pre-commit 
tests. If everything passes, I think it is safe to remove this call in case of 
direct insert.


- Marta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71904/#review219494
-------


On Feb. 18, 2020, 12:21 p.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71904/
> ---
> 
> (Updated Feb. 18, 2020, 12:21 p.m.)
> 
> 
> Review request for hive, Gopal V and Peter Vary.
> 
> 
> Bugs: HIVE-21164
> https://issues.apache.org/jira/browse/HIVE-21164
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Extended the original patch with saving the task attempt ids in the file 
> names and also fixed some bugs in the original patch.
> With this fix, inserting into an ACID table would not use move task to place 
> the generated files into the final directory. It will inserts every files to 
> the final directory and then clean up the files which are not needed (like 
> written by failed task attempts).
> Also fixed the replication tests which failed for the original patch as well.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java d3cb60b790 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  da677c7977 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 
> 056cd27496 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java
>  31d15fdef9 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorTestUtil.java
>  c2aa73b5f1 
>   itests/src/test/resources/testconfiguration.properties 1b1bf1147a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java 
> 9a3258115b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 9ad4e71482 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 06e4ebee82 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 6c67bc7dd8 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidInputFormat.java bba3960102 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidOutputFormat.java 1e8bb223f2 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 2f5ec5270c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 
> 8980a6292a 
>   ql/src/java/org/apache/hadoop/hive/ql/io/RecordUpdater.java 737e6774b7 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 76984abd0a 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java 
> c4c56f8477 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
> b8a0f0465c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 
> 398698ec06 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
>  2543dc6fc4 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1eb9c12cc8 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 73ca658d9c 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> 33d3beba46 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
> c102a69f8f 
>   ql/src/java/org/apache/hadoop/hive/ql/pl

Re: Review Request 71904: HIVE-21164: ACID: explore how we can avoid a move step during inserts/compaction

2020-02-18 Thread Marta Kuczora via Review Board
/mm_all.q.out 226f2a9374 
  
ql/src/test/results/clientpositive/llap/tez_acid_union_dynamic_partition.q.out 
PRE-CREATION 
  
ql/src/test/results/clientpositive/llap/tez_acid_union_dynamic_partition_2.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/llap/tez_acid_union_multiinsert.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/mm_all.q.out 143ebd69f9 
  streaming/src/test/org/apache/hive/streaming/TestStreaming.java 35a220facd 


Diff: https://reviews.apache.org/r/71904/diff/5/

Changes: https://reviews.apache.org/r/71904/diff/4-5/


Testing
---

Had to modify some tests because of the file name changes. Also added some 
specific tests.
In the pre-commit run all tests passed successfully.


Thanks,

Marta Kuczora



Re: Review Request 71904: HIVE-21164: ACID: explore how we can avoid a move step during inserts/compaction

2020-02-14 Thread Marta Kuczora via Review Board
/clientpositive/llap/insert_overwrite.q.out fbc3326b39 
  ql/src/test/results/clientpositive/llap/mm_all.q.out 226f2a9374 
  
ql/src/test/results/clientpositive/llap/tez_acid_union_dynamic_partition.q.out 
PRE-CREATION 
  
ql/src/test/results/clientpositive/llap/tez_acid_union_dynamic_partition_2.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/llap/tez_acid_union_multiinsert.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/mm_all.q.out 143ebd69f9 
  streaming/src/test/org/apache/hive/streaming/TestStreaming.java 35a220facd 


Diff: https://reviews.apache.org/r/71904/diff/4/

Changes: https://reviews.apache.org/r/71904/diff/3-4/


Testing
---

Had to modify some tests because of the file name changes. Also added some 
specific tests.
In the pre-commit run all tests passed successfully.


Thanks,

Marta Kuczora



Re: Review Request 71904: HIVE-21164: ACID: explore how we can avoid a move step during inserts/compaction

2020-02-14 Thread Marta Kuczora via Review Board


> On Feb. 4, 2020, 3:49 p.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
> > Lines 1444 (patched)
> > <https://reviews.apache.org/r/71904/diff/3/?file=2210216#file2210216line1446>
> >
> > Why is this null?

It is null, because if the union all optimization is on, the different union 
statements will be translated into different FileSinkOperators and they will 
write to their own separate directories. They are normally writing to the 
staging directory and under folders with specific 'HIVE_UNION_SUBDIR_' prefix. 
Then the move tasks will move these files to the final table directory. In ACID 
tables these FileSinkOperators would write to different delta directories 
anyway, so the tasks could write directly to the final table location instead 
of the 'HIVE_UNION_SUBDIR_' folders. That's why the unionSuffix is null here. 
In other cases, they have the 'HIVE_UNION_SUBDIR_' value.
Btw, I locally modified many union q tests to run with ACID tables and ran them 
with MR and Tez. I found one bug, which I fixed and I also added some union q 
tests to run with ACID table and direct insert.


> On Feb. 4, 2020, 3:49 p.m., Peter Vary wrote:
> > ql/src/test/org/apache/hadoop/hive/ql/TestTxnNoBuckets.java
> > Lines 77 (patched)
> > <https://reviews.apache.org/r/71904/diff/3/?file=2210244#file2210244line77>
> >
> > We created this variable - we should use it? Maybe set it even as a 
> > constant?

You're right. I move this as a constant and changed the tests.


- Marta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71904/#review219487
-----------


On Jan. 31, 2020, 4:12 p.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71904/
> ---
> 
> (Updated Jan. 31, 2020, 4:12 p.m.)
> 
> 
> Review request for hive, Gopal V and Peter Vary.
> 
> 
> Bugs: HIVE-21164
> https://issues.apache.org/jira/browse/HIVE-21164
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Extended the original patch with saving the task attempt ids in the file 
> names and also fixed some bugs in the original patch.
> With this fix, inserting into an ACID table would not use move task to place 
> the generated files into the final directory. It will inserts every files to 
> the final directory and then clean up the files which are not needed (like 
> written by failed task attempts).
> Also fixed the replication tests which failed for the original patch as well.
> 
> 
> Diffs
> -
> 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  da677c7977 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 
> 056cd27496 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java
>  31d15fdef9 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorTestUtil.java
>  c2aa73b5f1 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
>  4c0137 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java 
> 9a3258115b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 9ad4e71482 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 06e4ebee82 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 6c67bc7dd8 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidInputFormat.java bba3960102 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidOutputFormat.java 1e8bb223f2 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 2f5ec5270c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 
> 8980a6292a 
>   ql/src/java/org/apache/hadoop/hive/ql/io/RecordUpdater.java 737e6774b7 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 76984abd0a 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java 
> c4c56f8477 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
> b8a0f0465c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 
> 398698ec06 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
>  2543dc6fc4 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 7f061d4a6b 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 73ca658d9c 
>   ql/src/java/org/apache/hadoop/hive/q

Re: Review Request 71904: HIVE-21164: ACID: explore how we can avoid a move step during inserts/compaction

2020-02-04 Thread Marta Kuczora via Review Board


> On Feb. 4, 2020, 3:49 p.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
> > Lines 1732-1737 (patched)
> > <https://reviews.apache.org/r/71904/diff/3/?file=2210218#file2210218line1732>
> >
> > What about using lambda here?

Fixed it.


> On Feb. 4, 2020, 3:49 p.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
> > Lines 7442-7443 (original), 7456-7460 (patched)
> > <https://reviews.apache.org/r/71904/diff/3/?file=2210231#file2210231line7459>
> >
> > nit: Maybe if/else

Fixed it.


> On Feb. 4, 2020, 3:49 p.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
> > Lines 7562-7563 (original), 7600-7604 (patched)
> > <https://reviews.apache.org/r/71904/diff/3/?file=2210231#file2210231line7605>
> >
> > nit: Maybe if/else?

Fixed it.


- Marta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71904/#review219487
---


On Jan. 31, 2020, 4:12 p.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71904/
> ---
> 
> (Updated Jan. 31, 2020, 4:12 p.m.)
> 
> 
> Review request for hive, Gopal V and Peter Vary.
> 
> 
> Bugs: HIVE-21164
> https://issues.apache.org/jira/browse/HIVE-21164
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Extended the original patch with saving the task attempt ids in the file 
> names and also fixed some bugs in the original patch.
> With this fix, inserting into an ACID table would not use move task to place 
> the generated files into the final directory. It will inserts every files to 
> the final directory and then clean up the files which are not needed (like 
> written by failed task attempts).
> Also fixed the replication tests which failed for the original patch as well.
> 
> 
> Diffs
> -
> 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  da677c7977 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 
> 056cd27496 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java
>  31d15fdef9 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorTestUtil.java
>  c2aa73b5f1 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
>  4c0137 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java 
> 9a3258115b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 9ad4e71482 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 06e4ebee82 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 6c67bc7dd8 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidInputFormat.java bba3960102 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidOutputFormat.java 1e8bb223f2 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 2f5ec5270c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 
> 8980a6292a 
>   ql/src/java/org/apache/hadoop/hive/ql/io/RecordUpdater.java 737e6774b7 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 76984abd0a 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java 
> c4c56f8477 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
> b8a0f0465c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 
> 398698ec06 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
>  2543dc6fc4 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 7f061d4a6b 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 73ca658d9c 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> 5fcc367cc9 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
> c102a69f8f 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java ecc7bdee4d 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java bed05819b5 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> bb70db4524 
>   ql/src/java/org/apache/hadoop/hive/ql/util/UpgradeTool.java 58e6289583 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnAddPartition.java c9cb6692df 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCo

Re: Review Request 71904: HIVE-21164: ACID: explore how we can avoid a move step during inserts/compaction

2020-02-04 Thread Marta Kuczora via Review Board


> On Feb. 4, 2020, 3:49 p.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
> > Lines 7526-7543 (patched)
> > <https://reviews.apache.org/r/71904/diff/3/?file=2210231#file2210231line7529>
> >
> > Is this duplicated code?

Yeah, however I cannot move this whole part to a separate method, because the 
acidOp and the isDirectInsert variables both have to be set. I can create a 
separate method for getting the value of isDirectInsert and a separate method 
for getting the tmp dir.


- Marta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71904/#review219487
---


On Jan. 31, 2020, 4:12 p.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71904/
> ---
> 
> (Updated Jan. 31, 2020, 4:12 p.m.)
> 
> 
> Review request for hive, Gopal V and Peter Vary.
> 
> 
> Bugs: HIVE-21164
> https://issues.apache.org/jira/browse/HIVE-21164
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Extended the original patch with saving the task attempt ids in the file 
> names and also fixed some bugs in the original patch.
> With this fix, inserting into an ACID table would not use move task to place 
> the generated files into the final directory. It will inserts every files to 
> the final directory and then clean up the files which are not needed (like 
> written by failed task attempts).
> Also fixed the replication tests which failed for the original patch as well.
> 
> 
> Diffs
> -
> 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  da677c7977 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 
> 056cd27496 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java
>  31d15fdef9 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorTestUtil.java
>  c2aa73b5f1 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
>  4c0137 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java 
> 9a3258115b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 9ad4e71482 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 06e4ebee82 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 6c67bc7dd8 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidInputFormat.java bba3960102 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidOutputFormat.java 1e8bb223f2 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 2f5ec5270c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 
> 8980a6292a 
>   ql/src/java/org/apache/hadoop/hive/ql/io/RecordUpdater.java 737e6774b7 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 76984abd0a 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java 
> c4c56f8477 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
> b8a0f0465c 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 
> 398698ec06 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
>  2543dc6fc4 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 7f061d4a6b 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
> 73ca658d9c 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> 5fcc367cc9 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
> c102a69f8f 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java ecc7bdee4d 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java bed05819b5 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> bb70db4524 
>   ql/src/java/org/apache/hadoop/hive/ql/util/UpgradeTool.java 58e6289583 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnAddPartition.java c9cb6692df 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands.java 842140815d 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java 88ca683173 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands3.java 908ceb43fc 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnConcatenate.java 8676e0db11 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnExIm.java 66b2b2768b 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnLoadData.java bb55d9fd79 
>   ql/

Re: Review Request 71904: HIVE-21164: ACID: explore how we can avoid a move step during inserts/compaction

2020-02-04 Thread Marta Kuczora via Review Board


> On Feb. 4, 2020, 3:49 p.m., Peter Vary wrote:
> > Thanks for the patch! This will be very-very usefull.
> > Some minor comments, questions...

Thanks a lot for the review!!


> On Feb. 4, 2020, 3:49 p.m., Peter Vary wrote:
> > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorTestUtil.java
> > Lines 55 (patched)
> > <https://reviews.apache.org/r/71904/diff/3/?file=2210213#file2210213line55>
> >
> > Is this import used?

You're right, it is not used. Removed it.


> On Feb. 4, 2020, 3:49 p.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
> > Lines 843 (patched)
> > <https://reviews.apache.org/r/71904/diff/3/?file=2210216#file2210216line845>
> >
> > Is inheritPerms still a working stuff? I kinda remember that it was 
> > removed from Hive some time ago...

No, I think this log message was just a copy-paste error. Fixed it.


> On Feb. 4, 2020, 3:49 p.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
> > Lines 1799 (patched)
> > <https://reviews.apache.org/r/71904/diff/3/?file=2210218#file2210218line1799>
> >
> > Maybe slightly different log message, so we can easily ditinguish 
> > between this and the line below

Fixed it.


> On Feb. 4, 2020, 3:49 p.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
> > Lines 7379 (patched)
> > <https://reviews.apache.org/r/71904/diff/3/?file=2210231#file2210231line7379>
> >
> > We might want to make this feature configurable, to turn it on/off in 
> > case we missed some edge cases

You are absolutely right. I introduced a config parameter so we can turn on/off 
this feature.


> On Feb. 4, 2020, 3:49 p.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
> > Lines 493-494 (patched)
> > <https://reviews.apache.org/r/71904/diff/3/?file=2210235#file2210235line493>
> >
> > nit: Formatting? Really not important, just for the completensess shake 
> > :D

Fixed it.


> On Feb. 4, 2020, 3:49 p.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
> > Lines 690-691 (patched)
> > <https://reviews.apache.org/r/71904/diff/3/?file=2210235#file2210235line690>
> >
> > nit: Formatting?

Fixed it.


> On Feb. 4, 2020, 3:49 p.m., Peter Vary wrote:
> > ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java
> > Lines 1246 (patched)
> > <https://reviews.apache.org/r/71904/diff/3/?file=2210248#file2210248line1246>
> >
> > Is this table always exists? Shall we use "drop table if exists" 
> > instead?

Fixed it.


- Marta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71904/#review219487
---


On Jan. 31, 2020, 4:12 p.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71904/
> ---
> 
> (Updated Jan. 31, 2020, 4:12 p.m.)
> 
> 
> Review request for hive, Gopal V and Peter Vary.
> 
> 
> Bugs: HIVE-21164
> https://issues.apache.org/jira/browse/HIVE-21164
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Extended the original patch with saving the task attempt ids in the file 
> names and also fixed some bugs in the original patch.
> With this fix, inserting into an ACID table would not use move task to place 
> the generated files into the final directory. It will inserts every files to 
> the final directory and then clean up the files which are not needed (like 
> written by failed task attempts).
> Also fixed the replication tests which failed for the original patch as well.
> 
> 
> Diffs
> -
> 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  da677c7977 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 
> 056cd27496 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java
>  31d15fdef9 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorTestUtil.java
>  c2aa73b5f1 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
>  4c0137 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMer

Re: Review Request 72074: HIVE-21215: Read Parquet INT64 timestamp

2020-02-03 Thread Marta Kuczora via Review Board


> On Feb. 3, 2020, 9:12 a.m., Karen Coppage wrote:
> > Thanks for the patch, looks good! Two ideas:
> > 1. It would be nice to have a unit test that reads a date before October 
> > 1582, so it's clear that we're using the Proleptic Gregorian calendar.
> > 2. ParquetTimestampUtils would be more readable if the big multipliers were 
> > declared as constants and/or in this format: e.g. 1_000_000.
> > 
> > Thanks!

Thanks a lot Karen for the review. These are really good point. I fixed the 
numbers in the ParquetTimestampUtils and also added some test cases for dates 
before 1582.


- Marta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72074/#review219466
-------


On Feb. 3, 2020, 12:31 p.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72074/
> ---
> 
> (Updated Feb. 3, 2020, 12:31 p.m.)
> 
> 
> Review request for hive, Karen Coppage and Peter Vary.
> 
> 
> Bugs: HIVE-21215
> https://issues.apache.org/jira/browse/HIVE-21215
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Implemented the read path for Parquet INT64 timestamp.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java 
> f2c1493f56 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 
> d67b030648 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/ParquetTimestampUtils.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
>  519bd813e9 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
>  2803baf90c 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/convert/TestETypeConverter.java
>  f6ee57140c 
> 
> 
> Diff: https://reviews.apache.org/r/72074/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>



Re: Review Request 72074: HIVE-21215: Read Parquet INT64 timestamp

2020-02-03 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72074/
---

(Updated Feb. 3, 2020, 12:31 p.m.)


Review request for hive, Karen Coppage and Peter Vary.


Bugs: HIVE-21215
https://issues.apache.org/jira/browse/HIVE-21215


Repository: hive-git


Description
---

Implemented the read path for Parquet INT64 timestamp.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java f2c1493f56 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 
d67b030648 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/ParquetTimestampUtils.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
 519bd813e9 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
 2803baf90c 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/convert/TestETypeConverter.java
 f6ee57140c 


Diff: https://reviews.apache.org/r/72074/diff/2/

Changes: https://reviews.apache.org/r/72074/diff/1-2/


Testing
---


Thanks,

Marta Kuczora



Review Request 72074: HIVE-21215: Read Parquet INT64 timestamp

2020-01-31 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72074/
---

Review request for hive, Karen Coppage and Peter Vary.


Bugs: HIVE-21215
https://issues.apache.org/jira/browse/HIVE-21215


Repository: hive-git


Description
---

Implemented the read path for Parquet INT64 timestamp.


Diffs
-

  common/src/java/org/apache/hadoop/hive/common/type/Timestamp.java f2c1493f56 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 
d67b030648 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/ParquetTimestampUtils.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
 519bd813e9 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java
 2803baf90c 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/convert/TestETypeConverter.java
 f6ee57140c 


Diff: https://reviews.apache.org/r/72074/diff/1/


Testing
---


Thanks,

Marta Kuczora



Re: Review Request 71904: HIVE-21164: ACID: explore how we can avoid a move step during inserts/compaction

2020-01-31 Thread Marta Kuczora via Review Board
---

Had to modify some tests because of the file name changes. Also added some 
specific tests.
In the pre-commit run all tests passed successfully.


Thanks,

Marta Kuczora



Re: Review Request 71904: HIVE-21164: ACID: explore how we can avoid a move step during inserts/compaction

2020-01-31 Thread Marta Kuczora via Review Board
 some 
specific tests.
In the pre-commit run all tests passed successfully.


Thanks,

Marta Kuczora



[jira] [Created] (HIVE-22716) Reading to ByteBuffer is broken in ParquetFooterInputFromCache

2020-01-10 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-22716:


 Summary: Reading to ByteBuffer is broken in 
ParquetFooterInputFromCache
 Key: HIVE-22716
 URL: https://issues.apache.org/jira/browse/HIVE-22716
 Project: Hive
  Issue Type: Bug
  Components: llap
Reporter: Marta Kuczora
Assignee: Marta Kuczora
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22648) Upgrade Parquet to 1.11.0

2019-12-14 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-22648:


 Summary: Upgrade Parquet to 1.11.0
 Key: HIVE-22648
 URL: https://issues.apache.org/jira/browse/HIVE-22648
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Marta Kuczora
Assignee: Karen Coppage


[WIP until Parquet community releases version 1.11.0]

The new Parquet version (1.11.0) uses 
[LogicalTypes|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md]
 instead of OriginalTypes.
 These are backwards-compatible with OriginalTypes.

Thanks to [~kuczoram] for her work on this patch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Review Request 71904: HIVE-21164: ACID: explore how we can avoid a move step during inserts/compaction

2019-12-12 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71904/
---

Review request for hive, Gopal V and Peter Vary.


Bugs: HIVE-21164
https://issues.apache.org/jira/browse/HIVE-21164


Repository: hive-git


Description
---

Extended the original patch with saving the task attempt ids in the file names 
and also fixed some bugs in the original patch.
With this fix, inserting into an ACID table would not use move task to place 
the generated files into the final directory. It will inserts every files to 
the final directory and then clean up the files which are not needed (like 
written by failed task attempts).
Also fixed the replication tests which failed for the original patch as well.


Diffs
-

  
hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
 da677c7 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 
2868427 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java
 31d15fd 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
 445e39c 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
 b7245e2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java 
9a32581 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 9ad4e71 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 06e4ebe 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 3d30d09 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidInputFormat.java bba3960 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidOutputFormat.java 1e8bb22 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 3c508ec 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 8980a62 
  ql/src/java/org/apache/hadoop/hive/ql/io/RecordUpdater.java 737e677 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 76984ab 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java c4c56f8 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 2ac6232 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 3fa61d3 
  
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
 2543dc6 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java f4bd0f9 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 73ca658 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 90549f9 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java c102a69 
  ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java ecc7bde 
  ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java bed0581 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 2b2cc1a 
  ql/src/java/org/apache/hadoop/hive/ql/util/UpgradeTool.java 58e6289 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnAddPartition.java c9cb669 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands.java 8421408 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java 88ca683 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands3.java 908ceb4 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnConcatenate.java 8676e0d 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnExIm.java 66b2b27 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnLoadData.java bb55d9f 
  ql/src/test/org/apache/hadoop/hive/ql/TestTxnNoBuckets.java ea6b1d9 
  ql/src/test/org/apache/hadoop/hive/ql/TxnCommandsBaseForTests.java af14e62 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java dd70524 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestFileSinkOperator.java 2c4b69b 
  ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java c033a94 
  ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/CompactorTest.java 
cfd7290 
  ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestWorker.java 70ae85c 
  ql/src/test/results/clientpositive/acid_subquery.q.out 1dc1775 
  ql/src/test/results/clientpositive/create_transactional_full_acid.q.out 
e324d5e 
  
ql/src/test/results/clientpositive/encrypted/encryption_insert_partition_dynamic.q.out
 61b0057 
  ql/src/test/results/clientpositive/llap/acid_no_buckets.q.out 5571c53 
  ql/src/test/results/clientpositive/llap/insert_overwrite.q.out fbc3326 
  ql/src/test/results/clientpositive/llap/mm_all.q.out 7542a6a 
  ql/src/test/results/clientpositive/mm_all.q.out 1377856 
  streaming/src/test/org/apache/hive/streaming/TestStreaming.java 58b3ae2 


Diff: https://reviews.apache.org/r/71904/diff/1/


Testing
---

Had to modify some tests because of the file name changes. Also added some 
specific tests.
In the pre-commit run all tests passed successfully.


Thanks,

Marta Kuczora



[jira] [Created] (HIVE-22375) ObjectStore.lockNotificationSequenceForUpdate is leaking query in case of error

2019-10-21 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-22375:


 Summary: ObjectStore.lockNotificationSequenceForUpdate is leaking 
query in case of error
 Key: HIVE-22375
 URL: https://issues.apache.org/jira/browse/HIVE-22375
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Affects Versions: 4.0.0
Reporter: Marta Kuczora
Assignee: Marta Kuczora


In the ObjectStore.lockNotificationSequenceForUpdate method, the query doesn't 
get closed if an error occur:

{noformat}
 private void lockNotificationSequenceForUpdate() throws MetaException {
 if (sqlGenerator.getDbProduct() == DatabaseProduct.DERBY && directSql != null) 
{
new RetryingExecutor(conf, () -> {
 directSql.lockDbTable("NOTIFICATION_SEQUENCE");
 }).run();
 } else {
 String selectQuery = "select \"NEXT_EVENT_ID\" from \"NOTIFICATION_SEQUENCE\"";
 String lockingQuery = sqlGenerator.addForUpdateClause(selectQuery);
 new RetryingExecutor(conf, () -> {
 prepareQuotes();
 Query query = pm.newQuery("javax.jdo.query.SQL", lockingQuery);
 query.setUnique(true);
 // only need to execute it to get db Lock
 query.execute();
 query.closeAll();
 }).run();
 }
 }
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22345) Accidentally committed HIVE-21327 with wrong commit message

2019-10-15 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-22345:


 Summary: Accidentally committed HIVE-21327 with wrong commit 
message
 Key: HIVE-22345
 URL: https://issues.apache.org/jira/browse/HIVE-22345
 Project: Hive
  Issue Type: Bug
Reporter: Marta Kuczora
Assignee: Marta Kuczora
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22336) The updates should be pushed to the Metastore backend DB before creating the notification event

2019-10-14 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-22336:


 Summary: The updates should be pushed to the Metastore backend DB 
before creating the notification event
 Key: HIVE-22336
 URL: https://issues.apache.org/jira/browse/HIVE-22336
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 4.0.0
Reporter: Marta Kuczora


There was an issue on HDP-3.1 where a table couldn't be deleted, because some 
related objects (like storage descriptor) were missing from the metastore. 
There was a previous delete attempt on that table which went wrong, but no 
rollback happened, that's why the SD were missing. In that previous delete, the 
notification creation swallowed the error which came from the backend DB, 
that's why no rollback happened. Here are the steps which happened in the first 
delete attempt:

 
# Open a transaction (transaction_1) - this step was successful
# Delete all the objects which are related to the table - this step was 
successful too, so the SD and other objects were deleted
# Delete the table - this step failed in the backend DB, but according to the 
log the delete happens in a batch statement, so it won't necessarily be 
executed right at this moment, so we won't see an error here
# Create a notification about the table delete:
## Open an other transaction for the notification creation (transaction_2) - 
call the ObjectStore.openTransaction method which increases a counter for open 
transactions and then checks if there is already an active transaction. If 
there is, then just returns true and doesn't really create a new transaction.
## Lock the notification id in the metastore backend db for update - here is 
where the exception from the backend DB (let's call it "MySQL Exception") 
manifests
## If an exception occurs during acquiring the log, retry - The "MySQL 
Exception" was caught and since there is no check on the exception, the retry 
mechanism thinks that it happened because couldn't acquire the log for the 
notification id, so retries and "forgot" about the "MySQL Exception".
## If the lock was acquired successfully, create the notification - Second 
time, the lock was acquired successfully, so the notification creation was 
successful.
## Commit transaction_2 - Just decrease the transaction counter, but doesn't 
actually commits anything.
# Commit transaction_1 - This commits the transaction, but since the error 
already got manifested and kind of "handled", here we won't see any error, just 
that the commit was successful, so no rollback happens and leaves the table 
object in an invalid state.
# If the commit was not successful then rollback

In the customer setup, this issue could be fixed by adding a flush call before 
creating the notification event, so all the updates would be pushed to the 
backend db and the error would manifest at this point. With this, the error 
would go back to the HiveMetastore class which would do the rollback and the 
delete table operation would fail as it should be, since the table couldn't be 
deleted. But then the Hivemetastore retry mechanism could try the table 
deletion again.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 71606: HIVE-21407: Parquet predicate pushdown is not working correctly for char column types

2019-10-10 Thread Marta Kuczora via Review Board


> On Oct. 10, 2019, 9 a.m., Peter Vary wrote:
> > Thanks for chasing this down!
> > Really appreciate it!

Thanks a lot for the review!


> On Oct. 10, 2019, 9 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/LeafFilterFactory.java
> > Lines 157 (patched)
> > <https://reviews.apache.org/r/71606/diff/1/?file=216#file216line158>
> >
> > This is the best way to check this?
> > Is this always starts with char? CHAR? or anything else is not possible?

It always start with "char", but you are right that it is not the best way to 
check it. I changed it to use at least the name of the CHAR serde constant.


> On Oct. 10, 2019, 9 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/LeafFilterFactory.java
> > Lines 181 (patched)
> > <https://reviews.apache.org/r/71606/diff/1/?file=216#file216line182>
> >
> > I do not like this.
> > Either we only aim for space, or we aim for whitespace characters, but 
> > the check and the replace is different.

You are right, thanks for pointing this out. Since the regex will always 
replace the whitespaces at the end of the string, the check if the string ends 
with space is not event necessary. If it doesn't end with space, the regex 
replace will do nothing.


- Marta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71606/#review218175
---


On Oct. 10, 2019, 11:39 a.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71606/
> ---
> 
> (Updated Oct. 10, 2019, 11:39 a.m.)
> 
> 
> Review request for hive and Peter Vary.
> 
> 
> Bugs: HIVE-21407
> https://issues.apache.org/jira/browse/HIVE-21407
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The previous approach didn't solve all use cases. In this new approach the 
> hive type is sent to the Parquet PPD part and trim the value which is pushed 
> to the predicate in case of CHAR hive type.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/FilterPredicateLeafBuilder.java
>  5b051dd 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/LeafFilterFactory.java 
> fc9188f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java 
> 033e26a 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java
>  ca5e085 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java
>  0210a0a 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java
>  7c7c657 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 
> 4c40908 
>   ql/src/test/queries/clientpositive/parquet_ppd_char.q 386fb25 
>   ql/src/test/queries/clientpositive/parquet_ppd_char2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_ppd_char2.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/71606/diff/2/
> 
> 
> Testing
> ---
> 
> Added new q test for testing the PPD for char and varchar types. Also 
> extended the unit tests for the 
> ParquetFilterPredicateConverter.toFilterPredicate method.
> 
> 
> The TestParquetRecordReaderWrapper and the TestParquetFilterPredicate are 
> both testing the same thing, the behavior of the 
> ParquetFilterPredicateConverter.toFilterPredicate method. It doesn't make 
> sense to have tests for the same use case in different test classes, so moved 
> the test cases from the TestParquetRecordReaderWrapper to 
> TestParquetFilterPredicate.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>



Re: Review Request 71606: HIVE-21407: Parquet predicate pushdown is not working correctly for char column types

2019-10-10 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71606/
---

(Updated Oct. 10, 2019, 11:39 a.m.)


Review request for hive and Peter Vary.


Changes
---

Fix the issues from the review.


Bugs: HIVE-21407
https://issues.apache.org/jira/browse/HIVE-21407


Repository: hive-git


Description
---

The previous approach didn't solve all use cases. In this new approach the hive 
type is sent to the Parquet PPD part and trim the value which is pushed to the 
predicate in case of CHAR hive type.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/FilterPredicateLeafBuilder.java
 5b051dd 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/LeafFilterFactory.java 
fc9188f 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java 
033e26a 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java
 ca5e085 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java
 0210a0a 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java
 7c7c657 
  ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 
4c40908 
  ql/src/test/queries/clientpositive/parquet_ppd_char.q 386fb25 
  ql/src/test/queries/clientpositive/parquet_ppd_char2.q PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_ppd_char2.q.out PRE-CREATION 


Diff: https://reviews.apache.org/r/71606/diff/2/

Changes: https://reviews.apache.org/r/71606/diff/1-2/


Testing
---

Added new q test for testing the PPD for char and varchar types. Also extended 
the unit tests for the ParquetFilterPredicateConverter.toFilterPredicate method.


The TestParquetRecordReaderWrapper and the TestParquetFilterPredicate are both 
testing the same thing, the behavior of the 
ParquetFilterPredicateConverter.toFilterPredicate method. It doesn't make sense 
to have tests for the same use case in different test classes, so moved the 
test cases from the TestParquetRecordReaderWrapper to 
TestParquetFilterPredicate.


Thanks,

Marta Kuczora



Review Request 71606: HIVE-21407: Parquet predicate pushdown is not working correctly for char column types

2019-10-10 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71606/
---

Review request for hive and Peter Vary.


Bugs: HIVE-21407
https://issues.apache.org/jira/browse/HIVE-21407


Repository: hive-git


Description
---

The previous approach didn't solve all use cases. In this new approach the hive 
type is sent to the Parquet PPD part and trim the value which is pushed to the 
predicate in case of CHAR hive type.


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/FilterPredicateLeafBuilder.java
 5b051dd 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/LeafFilterFactory.java 
fc9188f 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java 
033e26a 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java
 ca5e085 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java
 0210a0a 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java
 7c7c657 
  ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 
4c40908 
  ql/src/test/queries/clientpositive/parquet_ppd_char.q 386fb25 
  ql/src/test/queries/clientpositive/parquet_ppd_char2.q PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_ppd_char2.q.out PRE-CREATION 


Diff: https://reviews.apache.org/r/71606/diff/1/


Testing
---

Added new q test for testing the PPD for char and varchar types. Also extended 
the unit tests for the ParquetFilterPredicateConverter.toFilterPredicate method.


The TestParquetRecordReaderWrapper and the TestParquetFilterPredicate are both 
testing the same thing, the behavior of the 
ParquetFilterPredicateConverter.toFilterPredicate method. It doesn't make sense 
to have tests for the same use case in different test classes, so moved the 
test cases from the TestParquetRecordReaderWrapper to 
TestParquetFilterPredicate.


Thanks,

Marta Kuczora



Review Request 71558: HIVE-21987: Hive is unable to read Parquet int32 annotated with decimal

2019-09-30 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71558/
---

Review request for hive and Peter Vary.


Bugs: HIVE-21987
https://issues.apache.org/jira/browse/HIVE-21987


Repository: hive-git


Description
---

Added support to read INT32 Parquet decimals.


Diffs
-

  data/files/parquet_int_decimal_1.parquet PRE-CREATION 
  data/files/parquet_int_decimal_2.parquet PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 
350ae2d 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
 320ce52 
  ql/src/test/queries/clientpositive/parquet_int_decimal.q PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_int_decimal.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/type_change_test_fraction.q.out 07cf8fa 


Diff: https://reviews.apache.org/r/71558/diff/1/


Testing
---

Added new q tests for the use-case.


Thanks,

Marta Kuczora



[jira] [Created] (HIVE-22271) Create index on the TBL_COL_PRIVS table for the columns COLUMN_NAME, PRINCIPAL_NAME, PRINCIPAL_TYPE and TBL_ID

2019-09-30 Thread Marta Kuczora (Jira)
Marta Kuczora created HIVE-22271:


 Summary: Create index on the TBL_COL_PRIVS table for the columns 
COLUMN_NAME, PRINCIPAL_NAME, PRINCIPAL_TYPE and TBL_ID
 Key: HIVE-22271
 URL: https://issues.apache.org/jira/browse/HIVE-22271
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Marta Kuczora


In one of the escalations for HDP-3.1.0 we found that the table privilege 
checks could be very slow and these checks could be speed up by defining an 
INDEX on the TBL_COL_PRIVS table for the following columns: 
COLUMN_NAME,PRINCIPAL_NAME,PRINCIPAL_TYPE,TBL_ID

In the MYSQL slow query log, we found that the following query is executed 
slowly:
{noformat}
SELECT DISTINCT 'org.apache.hadoop.hive.metastore.model.MTableColumnPrivilege' 
AS 
`NUCLEUS_TYPE`,`A0`.`AUTHORIZER`,`A0`.`COLUMN_NAME`,`A0`.`CREATE_TIME`,`A0`.`GRANT_OPTION`,`A0`.`GRANTOR`,`A0`.`GRANTOR_TYPE`,`A0`.`PRINCIPAL_NAME`,`A0`.`PRINCIPAL_TYPE`,`A0`.`TBL_COL_PRIV`,`A0`.`TBL_COLUMN_GRANT_ID`
 FROM `TBL_COL_PRIVS` `A0` LEFT OUTER JOIN `TBLS` `B0` ON `A0`.`TBL_ID` = 
`B0`.`TBL_ID` LEFT OUTER JOIN `DBS` `C0` ON `B0`.`DB_ID` = `C0`.`DB_ID` WHERE 
`A0`.`PRINCIPAL_NAME` = 'xxx' AND `A0`.`PRINCIPAL_TYPE` = 'GROUP' AND 
`B0`.`TBL_NAME` = '' AND `C0`.`NAME` = 'xxx' AND `C0`.`CTLG_NAME` = 'xxx' 
AND `A0`.`COLUMN_NAME` = 'xxx'
{noformat}
When checked the explain plan of the this query, it could be seen that the 
index defined on the TBL_COL_PRIVS table is not used. In the slow query, the 
COLUMN_NAME, PRINCIPAL_NAME, PRINCIPAL_TYPE and TBL_ID columns were used, and 
after creating an index on these columns only, we saw significant performance 
improvement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 71271: HIVE-21580: Introduce ISO 8601 week numbering SQL:2016 formats

2019-08-21 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71271/#review217349
---


Fix it, then Ship it!




Thanks a lot for the patch. I have only one comment, otherwise it look good.


common/src/java/org/apache/hadoop/hive/common/format/datetime/HiveSqlDateTimeFormatter.java
Lines 1025 (patched)
<https://reviews.apache.org/r/71271/#comment304639>

As we discussed, it would be enough to update the variable in an if 
statement when the temporalField is "IsoFields.WEEK_BASED_YEAR". In that case, 
there is no need for the updateVar method which is a bit confusing.


- Marta Kuczora


On Aug. 12, 2019, 10:59 a.m., Karen Coppage wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71271/
> ---
> 
> (Updated Aug. 12, 2019, 10:59 a.m.)
> 
> 
> Review request for hive and Marta Kuczora.
> 
> 
> Bugs: HIVE-21580
> https://issues.apache.org/jira/browse/HIVE-21580
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Enable Hive to parse the following datetime formats when any 
> combination/subset of these or previously implemented patterns is provided in 
> one string. Also catch combinations that conflict.
> 
> IYYY
> IYY
> IY
> I
> IW
> 
> 
> Diffs
> -
> 
>   
> common/src/java/org/apache/hadoop/hive/common/format/datetime/HiveSqlDateTimeFormatter.java
>  9443e8ec78 
>   
> common/src/test/org/apache/hadoop/hive/common/format/datetime/TestHiveSqlDateTimeFormatter.java
>  ff41534fce 
> 
> 
> Diff: https://reviews.apache.org/r/71271/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Karen Coppage
> 
>



Re: Review Request 71016: HIVE-21578: Introduce SQL:2016 formats FM, FX, and nested strings

2019-07-26 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71016/#review216889
---


Ship it!




Ship It!

- Marta Kuczora


On July 26, 2019, 10:01 a.m., Karen Coppage wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71016/
> ---
> 
> (Updated July 26, 2019, 10:01 a.m.)
> 
> 
> Review request for hive and Marta Kuczora.
> 
> 
> Bugs: HIVE-21578
> https://issues.apache.org/jira/browse/HIVE-21578
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Enable Hive to parse the following datetime formats when any combination or 
> subset of these or previously implemented formats is provided in one string. 
> 
> "text" (nested strings)
> FM
> FX
> 
> 
> Diffs
> -
> 
>   
> common/src/java/org/apache/hadoop/hive/common/format/datetime/HiveSqlDateTimeFormatter.java
>  998e5a2f6a 
>   
> common/src/test/org/apache/hadoop/hive/common/format/datetime/TestHiveSqlDateTimeFormatter.java
>  ac57842148 
>   ql/src/test/queries/clientpositive/cast_datetime_with_sql_2016_format.q 
> 5a2a6d7894 
>   ql/src/test/results/clientpositive/cast_datetime_with_sql_2016_format.q.out 
> e1fd341050 
> 
> 
> Diff: https://reviews.apache.org/r/71016/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Karen Coppage
> 
>



Re: Review Request 71016: HIVE-21578: Introduce SQL:2016 formats FM, FX, and nested strings

2019-07-26 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71016/#review216885
---



Thanks a lot for the patch! I had two comments, but otherwise it looks good. 
Nice testing btw!! :)

- Marta Kuczora


On July 5, 2019, 7:51 a.m., Karen Coppage wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71016/
> ---
> 
> (Updated July 5, 2019, 7:51 a.m.)
> 
> 
> Review request for hive and Marta Kuczora.
> 
> 
> Bugs: HIVE-21578
> https://issues.apache.org/jira/browse/HIVE-21578
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Enable Hive to parse the following datetime formats when any combination or 
> subset of these or previously implemented formats is provided in one string. 
> 
> "text" (nested strings)
> FM
> FX
> 
> 
> Diffs
> -
> 
>   
> common/src/java/org/apache/hadoop/hive/common/format/datetime/HiveSqlDateTimeFormatter.java
>  998e5a2f6a 
>   
> common/src/test/org/apache/hadoop/hive/common/format/datetime/TestHiveSqlDateTimeFormatter.java
>  4e822d53f9 
>   ql/src/test/queries/clientpositive/cast_datetime_with_sql_2016_format.q 
> 5a2a6d7894 
>   ql/src/test/results/clientpositive/cast_datetime_with_sql_2016_format.q.out 
> e1fd341050 
> 
> 
> Diff: https://reviews.apache.org/r/71016/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Karen Coppage
> 
>



Re: Review Request 71016: HIVE-21578: Introduce SQL:2016 formats FM, FX, and nested strings

2019-07-26 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71016/#review216884
---




common/src/java/org/apache/hadoop/hive/common/format/datetime/HiveSqlDateTimeFormatter.java
Lines 546 (patched)
<https://reviews.apache.org/r/71016/#comment304127>

I think it would be better to split this method to two: one for checking 
only fm and one for checking only fx. Returning a boolean and setting an other 
one in the background can be a bit confusing for the caller.



common/src/test/org/apache/hadoop/hive/common/format/datetime/TestHiveSqlDateTimeFormatter.java
Lines 252 (patched)
<https://reviews.apache.org/r/71016/#comment304128>

I think this test could be split up to have separate tests for fm, fx and 
fm-fx cases. It is just a nit, but I think it is a good idea to focus on one 
use-case per test.


- Marta Kuczora


On July 5, 2019, 7:51 a.m., Karen Coppage wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71016/
> ---
> 
> (Updated July 5, 2019, 7:51 a.m.)
> 
> 
> Review request for hive and Marta Kuczora.
> 
> 
> Bugs: HIVE-21578
> https://issues.apache.org/jira/browse/HIVE-21578
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Enable Hive to parse the following datetime formats when any combination or 
> subset of these or previously implemented formats is provided in one string. 
> 
> "text" (nested strings)
> FM
> FX
> 
> 
> Diffs
> -
> 
>   
> common/src/java/org/apache/hadoop/hive/common/format/datetime/HiveSqlDateTimeFormatter.java
>  998e5a2f6a 
>   
> common/src/test/org/apache/hadoop/hive/common/format/datetime/TestHiveSqlDateTimeFormatter.java
>  4e822d53f9 
>   ql/src/test/queries/clientpositive/cast_datetime_with_sql_2016_format.q 
> 5a2a6d7894 
>   ql/src/test/results/clientpositive/cast_datetime_with_sql_2016_format.q.out 
> e1fd341050 
> 
> 
> Diff: https://reviews.apache.org/r/71016/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Karen Coppage
> 
>



Re: Review Request 71011: HIVE:21957: Create temporary table like should omit transactional properties.

2019-07-25 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71011/#review216850
---


Ship it!




Ship It!

- Marta Kuczora


On July 4, 2019, 1:52 p.m., Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71011/
> ---
> 
> (Updated July 4, 2019, 1:52 p.m.)
> 
> 
> Review request for hive, Marta Kuczora and Thejas Nair.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE:21957: Create temporary table like should omit transactional properties.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> e09fc379f5e0127367e73ed4c4556522de9838a8 
> 
> 
> Diff: https://reviews.apache.org/r/71011/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



Re: Review Request 71011: HIVE:21957: Create temporary table like should omit transactional properties.

2019-07-25 Thread Marta Kuczora via Review Board


> On July 18, 2019, noon, Marta Kuczora wrote:
> > Thanks a lot for the patch!
> > Just one question: could you add a test about the fixed use-case?
> 
> Laszlo Pinter wrote:
> I will add test in a separate patch.

Ok, thanks!


- Marta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71011/#review216720
---


On July 4, 2019, 1:52 p.m., Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71011/
> ---
> 
> (Updated July 4, 2019, 1:52 p.m.)
> 
> 
> Review request for hive, Marta Kuczora and Thejas Nair.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE:21957: Create temporary table like should omit transactional properties.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> e09fc379f5e0127367e73ed4c4556522de9838a8 
> 
> 
> Diff: https://reviews.apache.org/r/71011/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



Re: Review Request 71011: HIVE:21957: Create temporary table like should omit transactional properties.

2019-07-18 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71011/#review216720
---



Thanks a lot for the patch!
Just one question: could you add a test about the fixed use-case?

- Marta Kuczora


On July 4, 2019, 1:52 p.m., Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71011/
> ---
> 
> (Updated July 4, 2019, 1:52 p.m.)
> 
> 
> Review request for hive, Marta Kuczora and Thejas Nair.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE:21957: Create temporary table like should omit transactional properties.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> e09fc379f5e0127367e73ed4c4556522de9838a8 
> 
> 
> Diff: https://reviews.apache.org/r/71011/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



Re: Review Request 70920: HIVE-21868: Vectorize CAST...FORMAT

2019-07-09 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70920/#review216444
---


Ship it!




Ship It!

- Marta Kuczora


On July 4, 2019, 3:04 p.m., Karen Coppage wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70920/
> ---
> 
> (Updated July 4, 2019, 3:04 p.m.)
> 
> 
> Review request for hive and Marta Kuczora.
> 
> 
> Bugs: HIVE-21868
> https://issues.apache.org/jira/browse/HIVE-21868
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Vectorize UDFs for CAST ( AS STRING/CHAR/VARCHAR FORMAT 
> ) and CAST ( AS TIMESTAMP/DATE FORMAT ).
> 
> 
> Diffs
> -
> 
>   
> common/src/java/org/apache/hadoop/hive/common/format/datetime/HiveSqlDateTimeFormatter.java
>  4e024a357b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 
> fa9d1e9783 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastDateToCharWithFormat.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastDateToString.java
>  dfa9f8a00d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastDateToStringWithFormat.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastDateToVarCharWithFormat.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastStringToDate.java
>  a6dff12e1a 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastStringToDateWithFormat.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastStringToTimestamp.java
>  b48b0136eb 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastStringToTimestampWithFormat.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastTimestampToCharWithFormat.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastTimestampToString.java
>  adc3a9d7b9 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastTimestampToStringWithFormat.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastTimestampToVarCharWithFormat.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCastFormat.java 
> 16742eee9b 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorMathFunctions.java
>  663237739e 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTypeCasts.java
>  58fd7b030e 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTypeCastsWithFormat.java
>  PRE-CREATION 
>   ql/src/test/queries/clientnegative/udf_cast_format_bad_pattern.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/cast_datetime_with_sql_2016_format.q 
> 269edf6da6 
>   ql/src/test/results/clientnegative/udf_cast_format_bad_pattern.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/cast_datetime_with_sql_2016_format.q.out 
> 4a502b9700 
> 
> 
> Diff: https://reviews.apache.org/r/70920/diff/5/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Karen Coppage
> 
>



Re: Review Request 70963: HIVE-21874: Implement add partitions related methods on temporary table

2019-07-03 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70963/#review216337
---


Ship it!




Ship It!

- Marta Kuczora


On July 1, 2019, 9:20 a.m., Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70963/
> ---
> 
> (Updated July 1, 2019, 9:20 a.m.)
> 
> 
> Review request for hive, Marta Kuczora, Peter Vary, and Adam Szita.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-21874: Implement add partitions related methods on temporary table
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
>  957ebb12725e9deac7e7644709521a998df4dbb4 
>   
> ql/src/test/org/apache/hadoop/hive/ql/metadata/TestSessionHiveMetastoreClientAddPartitionsFromSpecTempTable.java
>  PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/metadata/TestSessionHiveMetastoreClientAddPartitionsTempTable.java
>  PRE-CREATION 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitions.java
>  a15f5ea0453c7459217d229fa373cc1fec2f4d7a 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitionsFromPartSpec.java
>  25643495b53e1ede473c48a90b208b43070ee6aa 
> 
> 
> Diff: https://reviews.apache.org/r/70963/diff/2/
> 
> 
> Testing
> ---
> 
> Unit testing is done via 
> TestSessionHiveMetastoreClientAddPartitionsTempTable, 
> TestSessionHiveMetastoreClientAddPartitionsFromSpecTempTable.
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



Re: Review Request 70963: HIVE-21874: Implement add partitions related methods on temporary table

2019-07-03 Thread Marta Kuczora via Review Board


> On June 28, 2019, 2:42 p.m., Marta Kuczora wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
> > Line 1046 (original), 1049-1050 (patched)
> > <https://reviews.apache.org/r/70963/diff/1/?file=2152472#file2152472line1049>
> >
> > Why do you need to make the DB and Table name lower case?
> 
> Laszlo Pinter wrote:
> Partition properties like table and db name must be stored in lower case. 
> This is the same in HiveMestarore as well. 
> Other properties are case sensitive.

Ah, I see, thanks for the explanation.


- Marta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70963/#review216227
---


On July 1, 2019, 9:20 a.m., Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70963/
> ---
> 
> (Updated July 1, 2019, 9:20 a.m.)
> 
> 
> Review request for hive, Marta Kuczora, Peter Vary, and Adam Szita.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-21874: Implement add partitions related methods on temporary table
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
>  957ebb12725e9deac7e7644709521a998df4dbb4 
>   
> ql/src/test/org/apache/hadoop/hive/ql/metadata/TestSessionHiveMetastoreClientAddPartitionsFromSpecTempTable.java
>  PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/metadata/TestSessionHiveMetastoreClientAddPartitionsTempTable.java
>  PRE-CREATION 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitions.java
>  a15f5ea0453c7459217d229fa373cc1fec2f4d7a 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitionsFromPartSpec.java
>  25643495b53e1ede473c48a90b208b43070ee6aa 
> 
> 
> Diff: https://reviews.apache.org/r/70963/diff/2/
> 
> 
> Testing
> ---
> 
> Unit testing is done via 
> TestSessionHiveMetastoreClientAddPartitionsTempTable, 
> TestSessionHiveMetastoreClientAddPartitionsFromSpecTempTable.
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



Re: Review Request 70920: HIVE-21868: Vectorize CAST...FORMAT

2019-07-03 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70920/#review216334
---



Thanks a lot Karen for the patch!
I have some questions, but otherwise the change looks good to me.


common/src/java/org/apache/hadoop/hive/common/format/datetime/HiveSqlDateTimeFormatter.java
Line 223 (original), 224 (patched)
<https://reviews.apache.org/r/70920/#comment303500>

Why did you change the type of this variable to ArrayList from List?



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastDateToString.java
Lines 59 (patched)
<https://reviews.apache.org/r/70920/#comment303501>

Do the CastDateToString, CastDateToChar and CastDateToVarchar udfs use this 
method, or is this just a typo and the CastDateToStringWithFormat, ... udfs use 
this?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCastFormat.java
Line 200 (original), 202 (patched)
<https://reviews.apache.org/r/70920/#comment303502>

Is the formattedOutput variable never going to be null after this change? 
If there is a scenario where it can be null, it will cause problems when trying 
to cast it.



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCastFormat.java
Line 217 (original), 220 (patched)
<https://reviews.apache.org/r/70920/#comment303503>

The same question about being null (previous comment) applies to the t and 
d variable as well.


- Marta Kuczora


On June 26, 2019, 8:44 a.m., Karen Coppage wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70920/
> ---
> 
> (Updated June 26, 2019, 8:44 a.m.)
> 
> 
> Review request for hive and Marta Kuczora.
> 
> 
> Bugs: HIVE-21868
> https://issues.apache.org/jira/browse/HIVE-21868
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Vectorize UDFs for CAST ( AS STRING/CHAR/VARCHAR FORMAT 
> ) and CAST ( AS TIMESTAMP/DATE FORMAT ).
> 
> 
> Diffs
> -
> 
>   
> common/src/java/org/apache/hadoop/hive/common/format/datetime/HiveSqlDateTimeFormatter.java
>  4e024a357b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 
> fa9d1e9783 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastDateToCharWithFormat.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastDateToString.java
>  dfa9f8a00d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastDateToStringWithFormat.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastDateToVarCharWithFormat.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastStringToDate.java
>  a6dff12e1a 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastStringToDateWithFormat.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastStringToTimestamp.java
>  b48b0136eb 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastStringToTimestampWithFormat.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastTimestampToCharWithFormat.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastTimestampToString.java
>  adc3a9d7b9 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastTimestampToStringWithFormat.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CastTimestampToVarCharWithFormat.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCastFormat.java 
> 16742eee9b 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorMathFunctions.java
>  663237739e 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTypeCasts.java
>  58fd7b030e 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTypeCastsWithFormat.java
>  PRE-CREATION 
>   ql/src/test/queries/clientnegative/udf_cast_format_bad_pattern.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/cast_datetime_with_sql_2016_format.q 
> 269edf6da6 
>   ql/src/test/results/clientnegative/udf_cast_format_bad_pattern.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/cast_datetime_with_sql_2016_format.q.out 
> 4a502b9700 
> 
> 
> Diff: https://reviews.apache.org/r/70920/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Karen Coppage
> 
>



Re: Review Request 70963: HIVE-21874: Implement add partitions related methods on temporary table

2019-06-28 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70963/#review216227
---




ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
Line 1046 (original), 1049-1050 (patched)
<https://reviews.apache.org/r/70963/#comment303353>

Why do you need to make the DB and Table name lower case?



ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
Lines 1100 (patched)
<https://reviews.apache.org/r/70963/#comment303354>

Why is it needed to get the newly added partition from the "parts" list as 
the addPartition method returns the newly added Partition?


- Marta Kuczora


On June 27, 2019, 9:07 a.m., Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70963/
> ---
> 
> (Updated June 27, 2019, 9:07 a.m.)
> 
> 
> Review request for hive, Marta Kuczora, Peter Vary, and Adam Szita.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-21874: Implement add partitions related methods on temporary table
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
>  957ebb12725e9deac7e7644709521a998df4dbb4 
>   
> ql/src/test/org/apache/hadoop/hive/ql/metadata/TestSessionHiveMetastoreClientAddPartitionsFromSpecTempTable.java
>  PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/metadata/TestSessionHiveMetastoreClientAddPartitionsTempTable.java
>  PRE-CREATION 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitions.java
>  a15f5ea0453c7459217d229fa373cc1fec2f4d7a 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitionsFromPartSpec.java
>  25643495b53e1ede473c48a90b208b43070ee6aa 
> 
> 
> Diff: https://reviews.apache.org/r/70963/diff/1/
> 
> 
> Testing
> ---
> 
> Unit testing is done via 
> TestSessionHiveMetastoreClientAddPartitionsTempTable, 
> TestSessionHiveMetastoreClientAddPartitionsFromSpecTempTable.
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



Re: Review Request 70963: HIVE-21874: Implement add partitions related methods on temporary table

2019-06-28 Thread Marta Kuczora via Review Board


> On June 28, 2019, 2:42 p.m., Marta Kuczora wrote:
> >

Thanks a lot for the patch!


- Marta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70963/#review216227
---


On June 27, 2019, 9:07 a.m., Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70963/
> ---
> 
> (Updated June 27, 2019, 9:07 a.m.)
> 
> 
> Review request for hive, Marta Kuczora, Peter Vary, and Adam Szita.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-21874: Implement add partitions related methods on temporary table
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
>  957ebb12725e9deac7e7644709521a998df4dbb4 
>   
> ql/src/test/org/apache/hadoop/hive/ql/metadata/TestSessionHiveMetastoreClientAddPartitionsFromSpecTempTable.java
>  PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/metadata/TestSessionHiveMetastoreClientAddPartitionsTempTable.java
>  PRE-CREATION 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitions.java
>  a15f5ea0453c7459217d229fa373cc1fec2f4d7a 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitionsFromPartSpec.java
>  25643495b53e1ede473c48a90b208b43070ee6aa 
> 
> 
> Diff: https://reviews.apache.org/r/70963/diff/1/
> 
> 
> Testing
> ---
> 
> Unit testing is done via 
> TestSessionHiveMetastoreClientAddPartitionsTempTable, 
> TestSessionHiveMetastoreClientAddPartitionsFromSpecTempTable.
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



Re: Review Request 70934: HIVE-18735: Create table like loses transactional attribute.

2019-06-26 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70934/#review216150
---


Ship it!




Thanks a lot for the patch. It looks good to me.

- Marta Kuczora


On June 25, 2019, 12:47 p.m., Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70934/
> ---
> 
> (Updated June 25, 2019, 12:47 p.m.)
> 
> 
> Review request for hive, Eugene Koifman, Marta Kuczora, Peter Vary, and Adam 
> Szita.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-18735: Create table like loses transactional attribute.
> 
> 
> Diffs
> -
> 
>   hbase-handler/src/test/results/positive/hbase_queries.q.out 
> 0c21d6d74882788d5748639ea2675579893791af 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> d395db1b59d021789b1bb47c7f09ff337cba2dd0 
>   ql/src/test/results/clientpositive/alter_rename_table.q.out 
> dd656954a1877f7f808de81f6952d7cf8ebfda2f 
>   ql/src/test/results/clientpositive/alter_table_stats_status.q.out 
> efa2834e0d6dbd77181473c214b77d09fcc1fe69 
>   ql/src/test/results/clientpositive/autoColumnStats_1.q.out 
> 1f594ddb6816805d22a1152c261dda75490cd5d0 
>   ql/src/test/results/clientpositive/autoColumnStats_2.q.out 
> 121a10384bca03942c297dd0488aceaf0d3bed68 
>   ql/src/test/results/clientpositive/autoColumnStats_3.q.out 
> 777d165dc26fb11a6fd863fe1f375c6ae3d55b2a 
>   ql/src/test/results/clientpositive/autoColumnStats_8.q.out 
> 0e1868bd52d717a6103f1456a1d4e525e85d8622 
>   ql/src/test/results/clientpositive/create_alter_list_bucketing_table1.q.out 
> 593ae8389971449ad0f8704d911f6f7c6bcc 
>   ql/src/test/results/clientpositive/create_like.q.out 
> f4a5ed55a568b0160a6c87cb2fe8c7cd9b20c7c8 
>   ql/src/test/results/clientpositive/create_like2.q.out 
> 7152f52fcf82d5052a67be6e27bda532f2b521bd 
>   ql/src/test/results/clientpositive/create_like_tbl_props.q.out 
> 4d11fc3c9e39c18dd18fdb585ad1831a0a068768 
>   ql/src/test/results/clientpositive/create_table_like_stats.q.out 
> 4aa1b4f167a99ffc97d97bb62e0f5313fd83314e 
>   ql/src/test/results/clientpositive/describe_table.q.out 
> 8c7a16c4b65d3f3951e6c230c42325056a7eab0b 
>   ql/src/test/results/clientpositive/erasurecoding/erasure_simple.q.out 
> 3ceb3d03c2614f3256a822c9f105ed6e9f2bada8 
>   ql/src/test/results/clientpositive/explain_ddl.q.out 
> c53ffae8003bdcc320d4910f021c821c0777bdeb 
>   ql/src/test/results/clientpositive/llap/autoColumnStats_1.q.out 
> 7272a9c925a4115ee3f1d3a4e6576057d75ac994 
>   ql/src/test/results/clientpositive/llap/autoColumnStats_2.q.out 
> 1a4b164b0925860543dd74215e0820fe84c5f3f1 
>   
> ql/src/test/results/clientpositive/llap/insert_values_orig_table_use_metadata.q.out
>  6c892cc5b87960b086d90c43516526056bdf221f 
>   ql/src/test/results/clientpositive/llap/stats_noscan_1.q.out 
> af55d23484ddb74a2c5b7f06c4e91a6063ae11dc 
>   ql/src/test/results/clientpositive/llap/whroot_external1.q.out 
> cac158c92669f1ad532ada3d6620adebeb909eae 
>   ql/src/test/results/clientpositive/load_dyn_part8.q.out 
> 7b1b5c1f862a581af3b2c4cabe21b6d186601652 
>   ql/src/test/results/clientpositive/merge3.q.out 
> 4e670558808894b0dd5f7b8815987e03de1dc6d3 
>   ql/src/test/results/clientpositive/mm_default.q.out 
> 70519b7da8346ddc2de74e46010183d2c9ab11ee 
>   ql/src/test/results/clientpositive/partition_discovery.q.out 
> cddb6e56ba8db9162c491125e3efd3acd2ed29b2 
>   ql/src/test/results/clientpositive/spark/load_dyn_part8.q.out 
> aebf4382cd78b02d9b7bab7285254431f04e29c0 
>   ql/src/test/results/clientpositive/spark/stats12.q.out 
> 9db43ef112d0898c08429c839e196b3e48067383 
>   ql/src/test/results/clientpositive/spark/stats13.q.out 
> 4922d717a0074146d6da91aae859f09aa5a2b623 
>   ql/src/test/results/clientpositive/spark/stats14.q.out 
> eb8a995e298d77098c5d7a01086943dc08307c19 
>   ql/src/test/results/clientpositive/spark/stats15.q.out 
> 3874e6de249428404946f461eec3575d6dcb50a5 
>   ql/src/test/results/clientpositive/spark/stats2.q.out 
> 30339caeb2cff5cc96101d8cbf5f3ed8b5b01667 
>   ql/src/test/results/clientpositive/spark/stats6.q.out 
> 77be16cb13558e6b2af2e772ff0505ea4dba8125 
>   ql/src/test/results/clientpositive/spark/stats7.q.out 
> fe942ad94b35288b0cc74d1434429378835ce9c2 
>   ql/src/test/results/clientpositive/spark/stats8.q.out 
> edfbd57f72b55d040233328330562703286627d3 
>   ql/src/test/results/clientpositive/spark/stats9.q.out 
> ed226b68d21733c7d371472d99f714b759e380e2 
>   ql/src/test/results/cl

Re: Review Request 70867: HIVE-21814: Implement list partitions related methods on temporary tables

2019-06-20 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70867/#review215995
---


Ship it!




Ship It!

- Marta Kuczora


On June 20, 2019, 9:53 a.m., Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70867/
> ---
> 
> (Updated June 20, 2019, 9:53 a.m.)
> 
> 
> Review request for hive, Marta Kuczora, Peter Vary, and Adam Szita.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-21814: Implement list partitions related methods on temporary tables
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
>  b71ef5a725d610cda402717f501f6c6a0f653216 
>   
> ql/src/test/org/apache/hadoop/hive/ql/metadata/TestSessionHiveMetastoreClientListPartitionsTempTable.java
>  PRE-CREATION 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/ConditionalIgnoreOnSessionHiveMetastoreClient.java
>  99039b08d014cddc9de12e70801267eba7331266 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestListPartitions.java
>  34ceb34de646cc2e501564e9b3a0cb8cc8a034e1 
> 
> 
> Diff: https://reviews.apache.org/r/70867/diff/2/
> 
> 
> Testing
> ---
> 
> Unit testing is done via 
> TestSessionHiveMetastoreClientListPartitionsTempTable.
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



Re: Review Request 70841: HIVE-21576: Introduce CAST...FORMAT and limited list of SQL:2016 datetime formats

2019-06-19 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70841/#review215966
---


Ship it!




Ship It!

- Marta Kuczora


On June 14, 2019, 8:30 a.m., Karen Coppage wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70841/
> ---
> 
> (Updated June 14, 2019, 8:30 a.m.)
> 
> 
> Review request for hive, Gabor Kaszab and Marta Kuczora.
> 
> 
> Bugs: HIVE-21576
> https://issues.apache.org/jira/browse/HIVE-21576
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Timestamp and date handling and formatting are currently implemented in Hive 
> using (sometimes very specific) Java SimpleDateFormat patterns with both 
> SimpleDateFormat and java.time.DateTimeFormatter, however, these patterns are 
> not what most standard SQL systems use. For example see Vertica, Netezza, 
> Oracle, and PostgreSQL.
> 
> **Cast...Format**
> 
> SQL:2016 introduced the FORMAT clause for CAST which is the standard way to 
> do string <-> datetime conversions
> 
> For example:
> 
> CAST( AS  [FORMAT ])
> CAST( AS  [FORMAT ])
> cast(dt as string format 'DD-MM-')
> cast('01-05-2017' as date format 'DD-MM-')
> Stuff like this wouldn't need to happen.
> 
> **New SQL:2016 Patterns**
> 
> Some conflicting examples:
> 
> SimpleDateTime: 'MMM dd,  HH:mm:ss'
> SQL:2016: 'mon dd,  hh24:mi:ss'
> 
> SimpleDateTime: '-MM-dd HH:mm:ss'
> SQL:2016: '-mm-dd hh24:mi:ss'
> 
> For the full list of patterns, see subsection "Proposal for Impala’s datetime 
> patterns" in this doc: 
> https://docs.google.com/document/d/1V7k6-lrPGW7_uhqM-FhKl3QsxwCRy69v2KIxPsGjc1k/edit
> 
> **Continued usage of SimpleDateFormat patterns**
> 
> [Update] This feature will NOT be behind a flag in order to keep things 
> simple for users. Existing Hive functions that accept SimpleDateFormat 
> patterns as input will continue to do so. Please let me know if you disagree 
> with this decision. These are the functions (afaik) affected:
> 
> from_unixtime(bigint unixtime[, string format])
> unix_timestamp(string date, string pattern)
> to_unix_timestamp(date[, pattern])
> add_months(string start_date, int num_months, output_date_format)
> date_format(date/timestamp/string ts, string fmt)
> This description is a heavily edited description of IMPALA-4018.
> 
> 
> Diffs
> -
> 
>   
> common/src/java/org/apache/hadoop/hive/common/format/datetime/HiveSqlDateTimeFormatter.java
>  PRE-CREATION 
>   
> common/src/java/org/apache/hadoop/hive/common/format/datetime/package-info.java
>  PRE-CREATION 
>   
> common/src/test/org/apache/hadoop/hive/common/format/datetime/TestHiveSqlDateTimeFormatter.java
>  PRE-CREATION 
>   
> common/src/test/org/apache/hadoop/hive/common/format/datetime/package-info.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d08b05fb68 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 58fe0cd32e 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCastFormat.java 
> PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFCastFormat.java
>  PRE-CREATION 
>   ql/src/test/queries/clientpositive/cast_datetime_with_sql_2016_format.q 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/cast_datetime_with_sql_2016_format.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/show_functions.q.out 374e9c4fce 
> 
> 
> Diff: https://reviews.apache.org/r/70841/diff/8/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Karen Coppage
> 
>



Re: Review Request 70841: HIVE-21576: Introduce CAST...FORMAT and limited list of SQL:2016 datetime formats

2019-06-19 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70841/#review215964
---



Thanks a lot for the patch.
I have some minor hints/questions, but otherwise the change looks good to me.


common/src/java/org/apache/hadoop/hive/common/format/datetime/HiveSqlDateTimeFormatter.java
Lines 631-651 (patched)
<https://reviews.apache.org/r/70841/#comment302889>

Does this method still needed?



common/src/java/org/apache/hadoop/hive/common/format/datetime/HiveSqlDateTimeFormatter.java
Lines 756 (patched)
<https://reviews.apache.org/r/70841/#comment302888>

Does this issue still exist?



common/src/test/org/apache/hadoop/hive/common/format/datetime/TestHiveSqlDateTimeFormatter.java
Lines 235 (patched)
<https://reviews.apache.org/r/70841/#comment302890>

You could add a message to the assertEquals to make it easier to identify 
which test case is failing.


- Marta Kuczora


On June 14, 2019, 8:30 a.m., Karen Coppage wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70841/
> ---
> 
> (Updated June 14, 2019, 8:30 a.m.)
> 
> 
> Review request for hive, Gabor Kaszab and Marta Kuczora.
> 
> 
> Bugs: HIVE-21576
> https://issues.apache.org/jira/browse/HIVE-21576
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Timestamp and date handling and formatting are currently implemented in Hive 
> using (sometimes very specific) Java SimpleDateFormat patterns with both 
> SimpleDateFormat and java.time.DateTimeFormatter, however, these patterns are 
> not what most standard SQL systems use. For example see Vertica, Netezza, 
> Oracle, and PostgreSQL.
> 
> **Cast...Format**
> 
> SQL:2016 introduced the FORMAT clause for CAST which is the standard way to 
> do string <-> datetime conversions
> 
> For example:
> 
> CAST( AS  [FORMAT ])
> CAST( AS  [FORMAT ])
> cast(dt as string format 'DD-MM-')
> cast('01-05-2017' as date format 'DD-MM-')
> Stuff like this wouldn't need to happen.
> 
> **New SQL:2016 Patterns**
> 
> Some conflicting examples:
> 
> SimpleDateTime: 'MMM dd,  HH:mm:ss'
> SQL:2016: 'mon dd,  hh24:mi:ss'
> 
> SimpleDateTime: '-MM-dd HH:mm:ss'
> SQL:2016: '-mm-dd hh24:mi:ss'
> 
> For the full list of patterns, see subsection "Proposal for Impala’s datetime 
> patterns" in this doc: 
> https://docs.google.com/document/d/1V7k6-lrPGW7_uhqM-FhKl3QsxwCRy69v2KIxPsGjc1k/edit
> 
> **Continued usage of SimpleDateFormat patterns**
> 
> [Update] This feature will NOT be behind a flag in order to keep things 
> simple for users. Existing Hive functions that accept SimpleDateFormat 
> patterns as input will continue to do so. Please let me know if you disagree 
> with this decision. These are the functions (afaik) affected:
> 
> from_unixtime(bigint unixtime[, string format])
> unix_timestamp(string date, string pattern)
> to_unix_timestamp(date[, pattern])
> add_months(string start_date, int num_months, output_date_format)
> date_format(date/timestamp/string ts, string fmt)
> This description is a heavily edited description of IMPALA-4018.
> 
> 
> Diffs
> -
> 
>   
> common/src/java/org/apache/hadoop/hive/common/format/datetime/HiveSqlDateTimeFormatter.java
>  PRE-CREATION 
>   
> common/src/java/org/apache/hadoop/hive/common/format/datetime/package-info.java
>  PRE-CREATION 
>   
> common/src/test/org/apache/hadoop/hive/common/format/datetime/TestHiveSqlDateTimeFormatter.java
>  PRE-CREATION 
>   
> common/src/test/org/apache/hadoop/hive/common/format/datetime/package-info.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d08b05fb68 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 58fe0cd32e 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCastFormat.java 
> PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFCastFormat.java
>  PRE-CREATION 
>   ql/src/test/queries/clientpositive/cast_datetime_with_sql_2016_format.q 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/cast_datetime_with_sql_2016_format.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/show_functions.q.out 374e9c4fce 
> 
> 
> Diff: https://reviews.apache.org/r/70841/diff/8/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Karen Coppage
> 
>



Re: Review Request 70867: HIVE-21814: Implement list partitions related methods on temporary tables

2019-06-19 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70867/#review215963
---


Ship it!




Thanks for the patch.
I have two minor hints, otherwise the change looks good to me. Just please 
consider them before the new batch of partitioned temp table changes.

- Marta Kuczora


On June 17, 2019, 3:36 p.m., Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70867/
> ---
> 
> (Updated June 17, 2019, 3:36 p.m.)
> 
> 
> Review request for hive, Marta Kuczora, Peter Vary, and Adam Szita.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-21814: Implement list partitions related methods on temporary tables
> 
> This change is the next step to support partitions on temporary tables. 
> HIVE-18739 and HIVE-20661 added partial support for partition columns on 
> temporary tables, but it was not complete and it was available only for 
> internal usage. This change addresses the missing functionality related to 
> listing partitions from temporary tables, although is still remains unexposed 
> until all the partition related functionalities (get, list, add, alter etc.) 
> are implemented.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
>  b71ef5a725d610cda402717f501f6c6a0f653216 
>   
> ql/src/test/org/apache/hadoop/hive/ql/metadata/TestSessionHiveMetastoreClientListPartitionsTempTable.java
>  PRE-CREATION 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/ConditionalIgnoreOnSessionHiveMetastoreClient.java
>  99039b08d014cddc9de12e70801267eba7331266 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestListPartitions.java
>  34ceb34de646cc2e501564e9b3a0cb8cc8a034e1 
> 
> 
> Diff: https://reviews.apache.org/r/70867/diff/1/
> 
> 
> Testing
> ---
> 
> Unit testing is done via 
> TestSessionHiveMetastoreClientListPartitionsTempTable.
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



Re: Review Request 70867: HIVE-21814: Implement list partitions related methods on temporary tables

2019-06-19 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70867/#review215960
---




ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
Lines 1250-1255 (patched)
<https://reviews.apache.org/r/70867/#comment302886>

This code piece is used in multiple methods. Maybe it would make sense to 
extract it to a separate method.
But since you have some more patches to go around the temp table partition 
handling, it is ok if you consider fixing this in a next patch.



ql/src/test/org/apache/hadoop/hive/ql/metadata/TestSessionHiveMetastoreClientListPartitionsTempTable.java
Lines 133 (patched)
<https://reviews.apache.org/r/70867/#comment302887>

Would it make sense to add test with low max parts number to see if the 
method returns the correct number of partitions?


- Marta Kuczora


On June 17, 2019, 3:36 p.m., Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70867/
> ---
> 
> (Updated June 17, 2019, 3:36 p.m.)
> 
> 
> Review request for hive, Marta Kuczora, Peter Vary, and Adam Szita.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-21814: Implement list partitions related methods on temporary tables
> 
> This change is the next step to support partitions on temporary tables. 
> HIVE-18739 and HIVE-20661 added partial support for partition columns on 
> temporary tables, but it was not complete and it was available only for 
> internal usage. This change addresses the missing functionality related to 
> listing partitions from temporary tables, although is still remains unexposed 
> until all the partition related functionalities (get, list, add, alter etc.) 
> are implemented.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
>  b71ef5a725d610cda402717f501f6c6a0f653216 
>   
> ql/src/test/org/apache/hadoop/hive/ql/metadata/TestSessionHiveMetastoreClientListPartitionsTempTable.java
>  PRE-CREATION 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/ConditionalIgnoreOnSessionHiveMetastoreClient.java
>  99039b08d014cddc9de12e70801267eba7331266 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestListPartitions.java
>  34ceb34de646cc2e501564e9b3a0cb8cc8a034e1 
> 
> 
> Diff: https://reviews.apache.org/r/70867/diff/1/
> 
> 
> Testing
> ---
> 
> Unit testing is done via 
> TestSessionHiveMetastoreClientListPartitionsTempTable.
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



Re: Review Request 70850: HIVE-21812: Implement get partition related methods on temporary tables

2019-06-13 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70850/#review215869
---


Ship it!




Ship It!

- Marta Kuczora


On June 13, 2019, 8:01 a.m., Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70850/
> ---
> 
> (Updated June 13, 2019, 8:01 a.m.)
> 
> 
> Review request for hive, Marta Kuczora, Peter Vary, and Adam Szita.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-21812: Implement get partition related methods on temporary tables
> 
> HIVE-18739 and HIVE-20661 added partial support for partition columns on 
> temporary tables, but it was not complete and it was available only for 
> internal usage. This change addresses the missing functionality related to 
> getting partitions from temporary tables, although is still remains unexposed 
> until all the partition related functionalities (get, list, add, alter etc.) 
> are implemented.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
>  410868cacfe53e8898d4e08572d7a01e05b7eb49 
>   
> ql/src/test/org/apache/hadoop/hive/ql/metadata/TestSessionHiveMetastoreClientGetPartitionsTempTable.java
>  PRE-CREATION 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/ConditionalIgnoreOnSessionHiveMetastoreClient.java
>  PRE-CREATION 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/CustomIgnoreRule.java
>  PRE-CREATION 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/MetaStoreClientTest.java
>  dc48fa8308a07f68c5e21a2d95f40127d3ff41df 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestGetPartitions.java
>  4d7f7c12203a9a90568f4aae644ff5cabaafa18c 
> 
> 
> Diff: https://reviews.apache.org/r/70850/diff/1/
> 
> 
> Testing
> ---
> 
> Unit testing is done via TestSessionHiveMetastoreClientGetPartitionsTempTable.
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



Re: Review Request 70474: HIVE-21407: Parquet predicate pushdown is not working correctly for char column types

2019-05-09 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70474/
---

(Updated May 9, 2019, 7:51 a.m.)


Review request for hive and Peter Vary.


Changes
---

Fixed the whitespace issue.


Bugs: HIVE-21407
https://issues.apache.org/jira/browse/HIVE-21407


Repository: hive-git


Description
---

The idea behind the patch is that for CHAR columns extend the predicate which 
is pushed to Parquet with an “or” clause which contains the same expression 
with a padded and a stripped value.
Example:
column c is a CHAR(10) type and the search expression is c='apple'
The predicate which is pushed to Parquet looked like c='apple ' before the 
patch and it would look like (c='apple ' or c='apple') after the patch.
Since the value 'apple' is stored in Parquet without padding, the predicate 
before the patch didn’t return any rows. With the patch it will return the 
correct row. 
Since on predicate level, there is no distinction between CHAR or VARCHAR, the 
predicates for VARCHARs will be changed as well, so the result set returned 
from Parquet will be wider than before.
Example:
A table contains a c VARCHAR(10) column and there is a row where c='apple' and 
there is an other row where c='apple '. If the search expression is c='apple ', 
both rows will be returned from Parquet after the patch. But since Hive is 
doing an additional filtering on the rows returned from Parquet, it won’t be a 
problem, the result set returned by Hive will contain only the row with the 
value 'apple '.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/LeafFilterFactory.java 
be4c0d5 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java
 0210a0a 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java
 d464046 
  ql/src/test/queries/clientpositive/parquet_ppd_char.q 4230d8c 
  ql/src/test/queries/clientpositive/parquet_ppd_char2.q PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_ppd_char2.q.out PRE-CREATION 


Diff: https://reviews.apache.org/r/70474/diff/2/

Changes: https://reviews.apache.org/r/70474/diff/1-2/


Testing
---

Added new q test for testing the PPD for char and varchar types. Also extended 
the unit tests for the ParquetFilterPredicateConverter.toFilterPredicate method.

The TestParquetRecordReaderWrapper and the TestParquetFilterPredicate are both 
testing the same thing, the behavior of the 
ParquetFilterPredicateConverter.toFilterPredicate method. It doesn't make sense 
to have tests for the same use case in different test classes, so moved the 
test cases from the TestParquetRecordReaderWrapper to 
TestParquetFilterPredicate.


Thanks,

Marta Kuczora



[jira] [Created] (HIVE-21632) Hive should not push partition columns to the Parquet predicate, even if the data file contains a column with the same name as the partition column

2019-04-18 Thread Marta Kuczora (JIRA)
Marta Kuczora created HIVE-21632:


 Summary: Hive should not push partition columns to the Parquet 
predicate, even if the data file contains a column with the same name as the 
partition column
 Key: HIVE-21632
 URL: https://issues.apache.org/jira/browse/HIVE-21632
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Marta Kuczora






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Review Request 70474: HIVE-21407: Parquet predicate pushdown is not working correctly for char column types

2019-04-14 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70474/
---

Review request for hive and Peter Vary.


Bugs: HIVE-21407
https://issues.apache.org/jira/browse/HIVE-21407


Repository: hive-git


Description
---

The idea behind the patch is that for CHAR columns extend the predicate which 
is pushed to Parquet with an “or” clause which contains the same expression 
with a padded and a stripped value.
Example:
column c is a CHAR(10) type and the search expression is c='apple'
The predicate which is pushed to Parquet looked like c='apple ' before the 
patch and it would look like (c='apple ' or c='apple') after the patch.
Since the value 'apple' is stored in Parquet without padding, the predicate 
before the patch didn’t return any rows. With the patch it will return the 
correct row. 
Since on predicate level, there is no distinction between CHAR or VARCHAR, the 
predicates for VARCHARs will be changed as well, so the result set returned 
from Parquet will be wider than before.
Example:
A table contains a c VARCHAR(10) column and there is a row where c='apple' and 
there is an other row where c='apple '. If the search expression is c='apple ', 
both rows will be returned from Parquet after the patch. But since Hive is 
doing an additional filtering on the rows returned from Parquet, it won’t be a 
problem, the result set returned by Hive will contain only the row with the 
value 'apple '.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/LeafFilterFactory.java 
be4c0d5 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java
 0210a0a 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java
 d464046 
  ql/src/test/queries/clientpositive/parquet_ppd_char.q 4230d8c 
  ql/src/test/queries/clientpositive/parquet_ppd_char2.q PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_ppd_char2.q.out PRE-CREATION 


Diff: https://reviews.apache.org/r/70474/diff/1/


Testing
---

Added new q test for testing the PPD for char and varchar types. Also extended 
the unit tests for the ParquetFilterPredicateConverter.toFilterPredicate method.

The TestParquetRecordReaderWrapper and the TestParquetFilterPredicate are both 
testing the same thing, the behavior of the 
ParquetFilterPredicateConverter.toFilterPredicate method. It doesn't make sense 
to have tests for the same use case in different test classes, so moved the 
test cases from the TestParquetRecordReaderWrapper to 
TestParquetFilterPredicate.


Thanks,

Marta Kuczora



[jira] [Created] (HIVE-21407) Parquet predicate pushdown is not working correctly for char column types

2019-03-07 Thread Marta Kuczora (JIRA)
Marta Kuczora created HIVE-21407:


 Summary: Parquet predicate pushdown is not working correctly for 
char column types
 Key: HIVE-21407
 URL: https://issues.apache.org/jira/browse/HIVE-21407
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Marta Kuczora






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21327) Predicate is not pushed to Parquet if hive.parquet.timestamp.skip.conversion=true

2019-02-26 Thread Marta Kuczora (JIRA)
Marta Kuczora created HIVE-21327:


 Summary: Predicate is not pushed to Parquet if 
hive.parquet.timestamp.skip.conversion=true
 Key: HIVE-21327
 URL: https://issues.apache.org/jira/browse/HIVE-21327
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Marta Kuczora
Assignee: Marta Kuczora






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[DISCUSS] Consistent Timestamps across Hadoop

2018-12-06 Thread Marta Kuczora
Hi Hive Community,

I would like to share the following document on our "Consistent Timestamp
types in Hadoop" plans for review.
https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit

With this plan we would like to get an agreement on consistent timestamp
behavior on Hive, Spark and Impala and in order to achieve this, we are
sharing this document with all three communities.

Please review and comment, any feedback is much appreciated!

Regards,
Marta


Re: [ANNOUNCE] New committer: Bharathkrishna Guruvayoor Murali

2018-12-03 Thread Marta Kuczora
Congratulations Bharath!

On Mon, Dec 3, 2018 at 8:45 AM Peter Vary 
wrote:

> Congratulations!
>
> > On Dec 3, 2018, at 05:32, Sankar Hariappan 
> wrote:
> >
> > Congrats Bharath!
> >
> > Best regards
> > Sankar
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On 03/12/18, 7:38 AM, "Vihang Karajgaonkar" 
> wrote:
> >
> >> Congratulations Bharath!
> >>
> >> On Sun, Dec 2, 2018 at 9:33 AM Sahil Takiar 
> wrote:
> >>
> >>> Congrats Bharath!
> >>>
> >>> On Sun, Dec 2, 2018 at 11:14 AM Andrew Sherman
> >>>  wrote:
> >>>
>  Congratulations Bharath!
> 
>  On Sat, Dec 1, 2018 at 10:26 AM Ashutosh Chauhan <
> hashut...@apache.org>
>  wrote:
> 
> > Apache Hive's Project Management Committee (PMC) has invited
> > Bharathkrishna
> > Guruvayoor Murali to become a committer, and we are pleased to
> announce
> > that
> > he has accepted.
> >
> > Bharath, welcome, thank you for your contributions, and we look
> forward
> > your
> > further interactions with the community!
> >
> > Ashutosh Chauhan (on behalf of the Apache Hive PMC)
> >
> 
> >>>
> >>>
> >>> --
> >>> Sahil Takiar
> >>> Software Engineer
> >>> takiar.sa...@gmail.com | (510) 673-0309
> >>>
>
>


Re: Review Request 69432: HIVE-20964 Create a test that checks the level of the parallel compilation

2018-11-23 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69432/#review210821
---


Ship it!




Ship It!

- Marta Kuczora


On Nov. 22, 2018, 3:19 p.m., Peter Vary wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69432/
> ---
> 
> (Updated Nov. 22, 2018, 3:19 p.m.)
> 
> 
> Review request for hive, Denys Kuzmenko, Marta Kuczora, and Adam Szita.
> 
> 
> Bugs: HIVE-20964
> https://issues.apache.org/jira/browse/HIVE-20964
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> * Created 2 query types in the TestCompileLock mock driver. The original 
> SHORT_QUERY is finishing in 0.5s as before, but the new LONG_QUERY will 
> finish only after 5s.
> * With using the new 5s query I have created a new test where the compile 
> quota is 4 and the parallel request number is 10. So the test expects that 6 
> query will fail with timeout.
> * Added a new verifyThatTimedOutCompileOpsCount method to validate the number 
> of the timed out queries.
> * The other changes are just pushing down the query string so the 
> compileAndRespond method can decide which query to run.
> 
> 
> Diffs
> -
> 
>   ql/src/test/org/apache/hadoop/hive/ql/TestCompileLock.java 8dc05ff480 
> 
> 
> Diff: https://reviews.apache.org/r/69432/diff/1/
> 
> 
> Testing
> ---
> 
> Run the new test, and all the old tests in TestCompileLock
> 
> 
> Thanks,
> 
> Peter Vary
> 
>



Re: [ANNOUNCE] New PMC Member : Peter Vary

2018-07-31 Thread Marta Kuczora
Congratulations Peter!

On Mon, Jul 30, 2018 at 7:53 PM Andrew Sherman
 wrote:

> Congratulations Peter!
>
> On Sun, Jul 29, 2018 at 1:32 PM Vineet Garg  wrote:
>
> > Congratulations Peter!
> >
> > > On Jul 26, 2018, at 11:25 AM, Ashutosh Chauhan 
> > wrote:
> > >
> > > On behalf of the Hive PMC I am delighted to announce Peter Vary is
> > joining
> > > Hive PMC.
> > > Thanks Peter for all your contributions till now. Looking forward to
> many
> > > more.
> > >
> > > Welcome, Peter!
> > >
> > > Thanks,
> > > Ashutosh
> >
> >
>


Re: [ANNOUNCE] New PMC Member : Sahil Takiar

2018-07-31 Thread Marta Kuczora
Congratulations Sahil!

On Mon, Jul 30, 2018 at 9:44 AM Peter Vary 
wrote:

> Congratulations Sahil!
>
> > On Jul 29, 2018, at 22:32, Vineet Garg  wrote:
> >
> > Congratulations Sahil!
> >
> >> On Jul 26, 2018, at 11:28 AM, Ashutosh Chauhan 
> wrote:
> >>
> >> On behalf of the Hive PMC I am delighted to announce Sahil Takiar is
> >> joining Hive PMC.
> >> Thanks Sahil for all your contributions till now. Looking forward to
> many
> >> more.
> >>
> >> Welcome, Sahil!
> >>
> >> Thanks,
> >> Ashutosh
> >
>
>


Re: [ANNOUNCE] New PMC Member : Vihang Karajgaonkar

2018-07-31 Thread Marta Kuczora
Congratulations Vihang!

On Mon, Jul 30, 2018 at 9:44 AM Peter Vary 
wrote:

> Congratulations Vihang!
>
> > On Jul 29, 2018, at 22:32, Vineet Garg  wrote:
> >
> > Congratulations Vihang!
> >
> >> On Jul 26, 2018, at 11:27 AM, Ashutosh Chauhan 
> wrote:
> >>
> >> On behalf of the Hive PMC I am delighted to announce Vihang
> Karajgaonkar
> >> is joining Hive PMC.
> >> Thanks Vihang for all your contributions till now. Looking forward to
> many
> >> more.
> >>
> >> Welcome, Vihang!
> >>
> >> Thanks,
> >> Ashutosh
> >
>
>


Re: [ANNOUNCE] New PMC Member : Vineet Garg

2018-07-31 Thread Marta Kuczora
Congratulations Vineet!

On Mon, Jul 30, 2018 at 9:45 AM Peter Vary 
wrote:

> Congratulations Vineet!
>
> > On Jul 30, 2018, at 01:59, Ashutosh Chauhan 
> wrote:
> >
> > On behalf of the Hive PMC I am delighted to announce Vineet Garg is
> joining
> > Hive PMC.
> > Thanks Vineet for all your contributions till now. Looking forward to
> many
> > more.
> >
> > Welcome, Vineet!
> >
> > Thanks,
> > Ashutosh
>
>


Re: [ANNOUNCE] New committer: Slim Bouguerra

2018-07-31 Thread Marta Kuczora
Congratulations Slim!

On Mon, Jul 30, 2018 at 2:01 AM Ashutosh Chauhan 
wrote:

> Apache Hive's Project Management Committee (PMC) has invited Slim Bouguerra
> to become a committer, and we are pleased to announce that he has accepted.
>
> Slim, welcome, thank you for your contributions, and we look forward your
> further interactions with the community!
>
> Ashutosh Chauhan (on behalf of the Apache Hive PMC)
>


Re: New committer announcement : Marta Kuczora

2018-06-25 Thread Marta Kuczora
Thank you all!!


On Thu, Jun 21, 2018 at 9:03 AM Lefty Leverenz 
wrote:

> Congratulations Marta!
>
> -- Lefty
>
>
> On Thu, Jun 21, 2018 at 1:46 AM Prasanth Jayachandran <
> pjayachand...@hortonworks.com> wrote:
>
> > Congratulations!
> >
> > Thanks
> > Prasanth
> >
> >
> >
> > On Wed, Jun 20, 2018 at 10:44 PM -0700, "Vihang Karajgaonkar"
> > mailto:vih...@cloudera.com.INVALID>> wrote:
> >
> >
> > Congrats Marta!
> >
> > On Wed, Jun 20, 2018 at 8:46 PM, Zoltan Haindrich  wrote:
> >
> > > Congratulations Márta!
> > >
> > > On 20 June 2018 22:20:30 CEST, Deepak Jaiswal
> > > wrote:
> > > >Congratulations Marta.
> > > >
> > > >On 6/20/18, 12:06 PM, "Ashutosh Chauhan"  wrote:
> > > >
> > > >Apache Hive's Project Management Committee (PMC) has invited Marta
> > > >Kuczora
> > > >to become a committer, and we are pleased to announce that he has
> > > >accepted.
> > > >
> > > >Marta, welcome, thank you for your contributions, and we look forward
> > > >your
> > > >further interactions with the community!
> > > >
> > > >Ashutosh Chauhan (on behalf of the Apache Hive PMC)
> > > >
> > >
> >
> >
>


Re: Review Request 66667: HIVE-19046: Refactor the common parts of the HiveMetastore add_partition_core and add_partitions_pspec_core methods

2018-06-25 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7/
---

(Updated June 25, 2018, 12:01 p.m.)


Review request for hive, Peter Vary, Sahil Takiar, and Adam Szita.


Changes
---

Rebased the patch


Bugs: HIVE-19046
https://issues.apache.org/jira/browse/HIVE-19046


Repository: hive-git


Description
---

The biggest part of these methods use the same code. Refactored these code 
parts to common methods.


Diffs (updated)
-

  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 e9d7e7c 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitions.java
 bf559b4 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitionsFromPartSpec.java
 4f11a55 


Diff: https://reviews.apache.org/r/7/diff/7/

Changes: https://reviews.apache.org/r/7/diff/6-7/


Testing
---


Thanks,

Marta Kuczora



Re: Review Request 66667: HIVE-19046: Refactor the common parts of the HiveMetastore add_partition_core and add_partitions_pspec_core methods

2018-06-11 Thread Marta Kuczora via Review Board


> On June 6, 2018, 10:34 p.m., Alexander Kolbasov wrote:
> > Looks good, a few nits below.

Thanks for looking into this review. I fixed/answered the issues. 
Please let me know if the patch looks ok, then I will upload it to the Jira to 
run the pre-commit tests.


> On June 6, 2018, 10:34 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 3227 (original), 3322 (patched)
> > <https://reviews.apache.org/r/7/diff/2/?file=2012314#file2012314line3325>
> >
> > Is it possible to do it once in constructor instead? I suspect that 
> > this is a no-trivial operation.

To be honest, I don't see clearly if it would be worth to move this part to the 
constructor. I am not sure what side effect it would have. In HIVE-15137, where 
this part was added to the code, the problem was that if two HiveCli were 
started with different users and both users added a partition, the owner of the 
partition directories was always the first user. Would moving this code to the 
constructor not affect this use-case? Would it work correctly? I think, this 
should be investigated. I am just not sure of the benefit of moving this code. 
The current user is fetched only once when creating a batch of partitions, and 
I don't see this as a very expensive call. If we want to move this, I would 
suggest to investigate and do it in a seperate Jira. What do you think?


> On June 6, 2018, 10:34 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Lines 3253 (patched)
> > <https://reviews.apache.org/r/7/diff/5/?file=2034474#file2034474line3253>
> >
> > Can you clarify that "clean up" means removing associated directory.

I fixed it accordingly.


> On June 6, 2018, 10:34 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Lines 3268 (patched)
> > <https://reviews.apache.org/r/7/diff/5/?file=2034474#file2034474line3268>
> >
> > Please add a Javadoc here explaining what is checked by validation. 
> > Also it isn't obvious that validation has side effects (updating partsToAdd)

Added Javadoc


> On June 6, 2018, 10:34 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 3247 (original), 3343 (patched)
> > <https://reviews.apache.org/r/7/diff/5/?file=2034474#file2034474line3346>
> >
> > addedPartitions is not defined here so it isn't obvious that it should 
> > be thread-safe. Is it possible to allocate and return addedPartitions here 
> > so that you guarantee using of thread-safe map? 
> > 
> > Another way you can do it is to collect added partitions in thread-safe 
> > local map and then copy it to the resulting map once you are done with 
> > concurrent part.

The createPartitionFolders method is called with a ConcurrentHashMap, I thought 
it would do the trick. 
Returning with the addedPartitions map would be complicated as we have to 
return the newParts list as well. So I fixed this issue by introducing a local 
map and then copy the result to the addedPartitions map.


- Marta


-------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7/#review203099
---


On June 11, 2018, 11:27 a.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7/
> ---
> 
> (Updated June 11, 2018, 11:27 a.m.)
> 
> 
> Review request for hive, Peter Vary, Sahil Takiar, and Adam Szita.
> 
> 
> Bugs: HIVE-19046
> https://issues.apache.org/jira/browse/HIVE-19046
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The biggest part of these methods use the same code. Refactored these code 
> parts to common methods.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  b9f5fb8 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitions.java
>  bf559b4 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitionsFromPartSpec.java
>  4f11a55 
> 
> 
> Diff: https://reviews.apache.org/r/7/diff/6/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>



Re: Review Request 66667: HIVE-19046: Refactor the common parts of the HiveMetastore add_partition_core and add_partitions_pspec_core methods

2018-06-11 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7/
---

(Updated June 11, 2018, 11:27 a.m.)


Review request for hive, Peter Vary, Sahil Takiar, and Adam Szita.


Changes
---

Address review findings.


Bugs: HIVE-19046
https://issues.apache.org/jira/browse/HIVE-19046


Repository: hive-git


Description
---

The biggest part of these methods use the same code. Refactored these code 
parts to common methods.


Diffs (updated)
-

  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 b9f5fb8 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitions.java
 bf559b4 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitionsFromPartSpec.java
 4f11a55 


Diff: https://reviews.apache.org/r/7/diff/6/

Changes: https://reviews.apache.org/r/7/diff/5-6/


Testing
---


Thanks,

Marta Kuczora



Re: Review Request 66667: HIVE-19046: Refactor the common parts of the HiveMetastore add_partition_core and add_partitions_pspec_core methods

2018-06-01 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7/
---

(Updated June 1, 2018, 12:31 p.m.)


Review request for hive, Peter Vary, Sahil Takiar, and Adam Szita.


Bugs: HIVE-19046
https://issues.apache.org/jira/browse/HIVE-19046


Repository: hive-git


Description
---

The biggest part of these methods use the same code. Refactored these code 
parts to common methods.


Diffs (updated)
-

  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 d8b8414 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitions.java
 88064d9 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitionsFromPartSpec.java
 debcd0e 


Diff: https://reviews.apache.org/r/7/diff/5/

Changes: https://reviews.apache.org/r/7/diff/4-5/


Testing
---


Thanks,

Marta Kuczora



Re: Review Request 66667: HIVE-19046: Refactor the common parts of the HiveMetastore add_partition_core and add_partitions_pspec_core methods

2018-05-29 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7/
---

(Updated May 29, 2018, 4:24 p.m.)


Review request for hive, Peter Vary, Sahil Takiar, and Adam Szita.


Changes
---

Rebased the patch.


Bugs: HIVE-19046
https://issues.apache.org/jira/browse/HIVE-19046


Repository: hive-git


Description
---

The biggest part of these methods use the same code. Refactored these code 
parts to common methods.


Diffs (updated)
-

  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 c1d25db 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitions.java
 88064d9 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitionsFromPartSpec.java
 debcd0e 


Diff: https://reviews.apache.org/r/7/diff/4/

Changes: https://reviews.apache.org/r/7/diff/3-4/


Testing
---


Thanks,

Marta Kuczora



Re: Review Request 66667: HIVE-19046: Refactor the common parts of the HiveMetastore add_partition_core and add_partitions_pspec_core methods

2018-05-23 Thread Marta Kuczora via Review Board


> On April 18, 2018, 9:52 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 3323 (original), 3396 (patched)
> > <https://reviews.apache.org/r/7/diff/1/?file=2004741#file2004741line3399>
> >
> > Should we set interrupted flag on the thread if we get 
> > InterruptedException?
> 
> Marta Kuczora wrote:
> Could you please give me some details about why you think it is needed? I 
> don't know actually if it is needed or not. My idea here was to go through on 
> all FutureTasks and if one of them didn't finish successfully (there was 
> either an error or the task was interrupted), throw an exception, cause it 
> would mean that not all partition folders were created successfully. For this 
> I don't think that I should set anything on the thread, but I might miss 
> something. So could you please explain me your thoughts on this?

I just uploaded a new patch with this change.


- Marta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7/#review201465
-------


On May 23, 2018, 4:24 p.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7/
> ---
> 
> (Updated May 23, 2018, 4:24 p.m.)
> 
> 
> Review request for hive, Peter Vary, Sahil Takiar, and Adam Szita.
> 
> 
> Bugs: HIVE-19046
> https://issues.apache.org/jira/browse/HIVE-19046
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The biggest part of these methods use the same code. Refactored these code 
> parts to common methods.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  92d2e3f 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitions.java
>  88064d9 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitionsFromPartSpec.java
>  debcd0e 
> 
> 
> Diff: https://reviews.apache.org/r/7/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>



Re: Review Request 66667: HIVE-19046: Refactor the common parts of the HiveMetastore add_partition_core and add_partitions_pspec_core methods

2018-05-23 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7/
---

(Updated May 23, 2018, 4:24 p.m.)


Review request for hive, Peter Vary, Sahil Takiar, and Adam Szita.


Changes
---

Address review finding.


Bugs: HIVE-19046
https://issues.apache.org/jira/browse/HIVE-19046


Repository: hive-git


Description
---

The biggest part of these methods use the same code. Refactored these code 
parts to common methods.


Diffs (updated)
-

  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 92d2e3f 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitions.java
 88064d9 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitionsFromPartSpec.java
 debcd0e 


Diff: https://reviews.apache.org/r/7/diff/3/

Changes: https://reviews.apache.org/r/7/diff/2-3/


Testing
---


Thanks,

Marta Kuczora



[jira] [Created] (HIVE-19656) Upgrade Hive to PARQUET 1.10.0

2018-05-22 Thread Marta Kuczora (JIRA)
Marta Kuczora created HIVE-19656:


 Summary: Upgrade Hive to PARQUET 1.10.0
 Key: HIVE-19656
 URL: https://issues.apache.org/jira/browse/HIVE-19656
 Project: Hive
  Issue Type: Improvement
Affects Versions: 3.1.0
Reporter: Marta Kuczora
Assignee: Marta Kuczora


In the future, the new Parquet logical types for the timestamp type should be 
introduced to Hive. The implementation of these logical types is planned to be 
released in the next Parquet version. Before this we should upgrade to the 
Parquet version 1.10.0 which is already released.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Review Request 66935: HIVE-18977: Listing partitions returns different results with JDO and direct SQL

2018-05-03 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66935/
---

Review request for hive, Alan Gates and Peter Vary.


Bugs: HIVE-18977
https://issues.apache.org/jira/browse/HIVE-18977


Repository: hive-git


Description
---

Some of the test cases in TestListPartitions fail when directSQL is disabled.


Diffs
-

  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 4601e09 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
 6645e55 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java
 d608e50 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestListPartitions.java
 a8b6e31 


Diff: https://reviews.apache.org/r/66935/diff/1/


Testing
---


Thanks,

Marta Kuczora



Re: Review Request 66667: HIVE-19046: Refactor the common parts of the HiveMetastore add_partition_core and add_partitions_pspec_core methods

2018-04-26 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7/
---

(Updated April 26, 2018, 11:18 a.m.)


Review request for hive, Peter Vary, Sahil Takiar, and Adam Szita.


Changes
---

Fixed review findings


Bugs: HIVE-19046
https://issues.apache.org/jira/browse/HIVE-19046


Repository: hive-git


Description
---

The biggest part of these methods use the same code. Refactored these code 
parts to common methods.


Diffs (updated)
-

  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 397a081 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitions.java
 88064d9 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitionsFromPartSpec.java
 debcd0e 


Diff: https://reviews.apache.org/r/7/diff/2/

Changes: https://reviews.apache.org/r/7/diff/1-2/


Testing
---


Thanks,

Marta Kuczora



Re: Review Request 66667: HIVE-19046: Refactor the common parts of the HiveMetastore add_partition_core and add_partitions_pspec_core methods

2018-04-26 Thread Marta Kuczora via Review Board
rrupted flag on the thread if we get 
> > InterruptedException?

Could you please give me some details about why you think it is needed? I don't 
know actually if it is needed or not. My idea here was to go through on all 
FutureTasks and if one of them didn't finish successfully (there was either an 
error or the task was interrupted), throw an exception, cause it would mean 
that not all partition folders were created successfully. For this I don't 
think that I should set anything on the thread, but I might miss something. So 
could you please explain me your thoughts on this?


> On April 18, 2018, 9:52 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 3500 (original), 3525 (patched)
> > <https://reviews.apache.org/r/7/diff/1/?file=2004741#file2004741line3576>
> >
> > Style nit: validPartition doesn't add any value here, why not just
> > 
> > if (validatePartition(part, catName, tblName, dbName,
> >   partsToAdd, ms, ifNotExists)) {... }

Fixed it.


> On April 18, 2018, 9:52 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 3595 (original), 3548 (patched)
> > <https://reviews.apache.org/r/7/diff/1/?file=2004741#file2004741line3671>
> >
> > Note that cleanupPartitionFolders() here may throw an exception, thus 
> > preventing other cleanup.

I guess you mean the same issue here than in your previous comment:
"Here we are trying to nuke a  bunch of values. If a single one fails, we do 
not attempt to delete others. Since you are just doing refactoring it is out of 
scope but I think the proper behavior is to continue nuking for others as well."

I would close this issue and continue the discussion under the other comment, 
just to have it in one place. If you meant something else, please feel free to 
reopen this issue.


- Marta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7/#review201465
---


On April 17, 2018, 1:37 p.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7/
> ---
> 
> (Updated April 17, 2018, 1:37 p.m.)
> 
> 
> Review request for hive, Peter Vary, Sahil Takiar, and Adam Szita.
> 
> 
> Bugs: HIVE-19046
> https://issues.apache.org/jira/browse/HIVE-19046
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The biggest part of these methods use the same code. Refactored these code 
> parts to common methods.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  ae9ec5c 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitions.java
>  f8497c7 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitionsFromPartSpec.java
>  fc0c60f 
> 
> 
> Diff: https://reviews.apache.org/r/7/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>



Re: Review Request 66774: HIVE-19285: Add logs to the subclasses of MetaDataOperation

2018-04-26 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66774/
---

(Updated April 26, 2018, 8:16 a.m.)


Review request for hive and Peter Vary.


Changes
---

Fixed stylecheck issues.


Bugs: HIVE-19285
https://issues.apache.org/jira/browse/HIVE-19285


Repository: hive-git


Description
---

Subclasses of MetaDataOperation are not writing anything to the logs. It would 
be useful to have some INFO and DEBUG level logging in these classes.


Diffs (updated)
-

  
service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java
 7944467 
  
service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java 
d67ea90 
  
service/src/java/org/apache/hive/service/cli/operation/GetCrossReferenceOperation.java
 99ccd4e 
  
service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java
 091bf50 
  
service/src/java/org/apache/hive/service/cli/operation/GetPrimaryKeysOperation.java
 e603fdd 
  
service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java 
de09ec9 
  
service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java
 59cfbb2 
  
service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java 
c9233d0 
  
service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java
 ac078b4 
  service/src/java/org/apache/hive/service/cli/operation/MetadataOperation.java 
bf7c021 


Diff: https://reviews.apache.org/r/66774/diff/3/

Changes: https://reviews.apache.org/r/66774/diff/2-3/


Testing
---

Just adding some additional log messages. Tested locally by checking the log 
messages for different use cases


Thanks,

Marta Kuczora



Re: Review Request 66774: HIVE-19285: Add logs to the subclasses of MetaDataOperation

2018-04-24 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66774/
---

(Updated April 24, 2018, 6:35 p.m.)


Review request for hive and Peter Vary.


Changes
---

Fixed review findings.


Bugs: HIVE-19285
https://issues.apache.org/jira/browse/HIVE-19285


Repository: hive-git


Description
---

Subclasses of MetaDataOperation are not writing anything to the logs. It would 
be useful to have some INFO and DEBUG level logging in these classes.


Diffs (updated)
-

  
service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java
 7944467 
  
service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java 
d67ea90 
  
service/src/java/org/apache/hive/service/cli/operation/GetCrossReferenceOperation.java
 99ccd4e 
  
service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java
 091bf50 
  
service/src/java/org/apache/hive/service/cli/operation/GetPrimaryKeysOperation.java
 e603fdd 
  
service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java 
de09ec9 
  
service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java
 59cfbb2 
  
service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java 
c9233d0 
  
service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java
 ac078b4 
  service/src/java/org/apache/hive/service/cli/operation/MetadataOperation.java 
bf7c021 


Diff: https://reviews.apache.org/r/66774/diff/2/

Changes: https://reviews.apache.org/r/66774/diff/1-2/


Testing
---

Just adding some additional log messages. Tested locally by checking the log 
messages for different use cases


Thanks,

Marta Kuczora



  1   2   3   >