[GitHub] hive pull request #318: Hive 18832

2018-03-08 Thread anishek
GitHub user anishek opened a pull request:

https://github.com/apache/hive/pull/318

Hive 18832

pport change management for trashing data files from ACID tables.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/anishek/hive HIVE-18832

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/318.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #318


commit 825f9d2f734ecf641438e95e23ed0384f5ed58ff
Author: Anishek Agarwal 
Date:   2018-03-06T04:34:52Z

HIVE-18832: Support change management for trashing data files from ACID 
tables.

commit cfba321380a368da2886a38d27a67be0a6f7b506
Author: Anishek Agarwal 
Date:   2018-03-08T06:52:16Z

HIVE-18832: Support change management for trashing data files from ACID 
tables.




---


[jira] [Created] (HIVE-18922) Hive is not cleaning up staging directories

2018-03-08 Thread Anant Mittal (JIRA)
Anant Mittal created HIVE-18922:
---

 Summary: Hive is not cleaning up  staging directories
 Key: HIVE-18922
 URL: https://issues.apache.org/jira/browse/HIVE-18922
 Project: Hive
  Issue Type: Bug
Reporter: Anant Mittal


Hive is creating hdfs folders with format 
/.hive-staging_hive__-xx/-ext-x

These are not being cleaned up even after long duration. The folder is used to 
load to the table. Example:

Loading data to table default.tablename from 
hdfs://clustermachine/apps/hive/warehouse/tablename/.hive-staging_hive_2018-01-31_11-45-14_005_1129336997995057804-51/-ext-1

 

This might be covered to some extent by HIVE-11940 but, want to make sure all 
cases are addressed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18921) SparkClientImpl should react to errors sent from the RemoteDriver

2018-03-08 Thread Sahil Takiar (JIRA)
Sahil Takiar created HIVE-18921:
---

 Summary: SparkClientImpl should react to errors sent from the 
RemoteDriver
 Key: HIVE-18921
 URL: https://issues.apache.org/jira/browse/HIVE-18921
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar


Right now, when {{RemoteDriver#shutdown}} runs it may / may not send an error 
message to the client.

When it does, we simply log the error and then proceed continue running 
normally. Eventually we will hit an error saying that the RPC channel has been 
closed.

There are two improvements we can make:
(1) If we get an error from the {{RemoteDriver}} we can treat the RPC channel 
as functionally closed, rather than proceeding with normal execution.
(2) If there is no known error on the driver side, at least send some type of 
termination signal so the client knows a shutdown is coming and can act 
accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18920) CBO: Initialize the Janino providers ahead of 1st query

2018-03-08 Thread Gopal V (JIRA)
Gopal V created HIVE-18920:
--

 Summary: CBO: Initialize the Janino providers ahead of 1st query
 Key: HIVE-18920
 URL: https://issues.apache.org/jira/browse/HIVE-18920
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V


Hive Calcite metadata providers are compiled when the 1st query comes in.

If a second query arrives before the 1st one has built a metadata provider, it 
will also try to do the same thing, because the cache is not populated yet.

With 1024 concurrent users, it takes 6 minutes for the 1st query to finish 
fighting all the other queries which are trying to load that cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18919) remove separate keytab setting for ZK in LLAP

2018-03-08 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-18919:
---

 Summary: remove separate keytab setting for ZK in LLAP
 Key: HIVE-18919
 URL: https://issues.apache.org/jira/browse/HIVE-18919
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18918) Bad error message in CompactorMR.lanuchCompactionJob()

2018-03-08 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-18918:
-

 Summary: Bad error message in CompactorMR.lanuchCompactionJob()
 Key: HIVE-18918
 URL: https://issues.apache.org/jira/browse/HIVE-18918
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.2.2
Reporter: Eugene Koifman
Assignee: Eugene Koifman


{noformat}
  rj.waitForCompletion();
  if (!rj.isSuccessful()) {
throw new IOException(compactionType == CompactionType.MAJOR ? "Major" 
: "Minor" +
   " compactor job failed for " + jobName + "! Hadoop JobId: " + 
rj.getID());
  }
{noformat}

produces no useful info in case of Major compaction



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 65988: HIVE-18907: Create utility to fix acid key index issue from HIVE-18817

2018-03-08 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65988/
---

(Updated March 8, 2018, 11:40 p.m.)


Review request for hive, Eugene Koifman and Prasanth_J.


Changes
---

Fixed a couple of places where we weren't closing a couple of reader instances. 
Also the FSInputDataInputStream only needs to be created/opened once for a 
file, not for each stripe.


Bugs: HIVE-18907
https://issues.apache.org/jira/browse/HIVE-18907


Repository: hive-git


Description
---

Create utility similar to orcfiledump to check/fix this particular acid key 
index issue.


Diffs (updated)
-

  bin/ext/fixacidkeyindex.sh PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/FixAcidKeyIndex.java 
PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFixAcidKeyIndex.java 
PRE-CREATION 


Diff: https://reviews.apache.org/r/65988/diff/3/

Changes: https://reviews.apache.org/r/65988/diff/2-3/


Testing
---


Thanks,

Jason Dere



[jira] [Created] (HIVE-18916) SparkClientImpl doesn't error out if spark-submit fails

2018-03-08 Thread Sahil Takiar (JIRA)
Sahil Takiar created HIVE-18916:
---

 Summary: SparkClientImpl doesn't error out if spark-submit fails
 Key: HIVE-18916
 URL: https://issues.apache.org/jira/browse/HIVE-18916
 Project: Hive
  Issue Type: Sub-task
Reporter: Sahil Takiar


If {{spark-submit}} returns a non-zero exit code, {{SparkClientImpl}} will 
simply log the exit code, but won't throw an error. Eventually, the connection 
timeout will get triggered and an exception like {{Timed out waiting for client 
connection}} will be logged, which is pretty misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18917) Add spark.home to hive.conf.restricted.list

2018-03-08 Thread Sahil Takiar (JIRA)
Sahil Takiar created HIVE-18917:
---

 Summary: Add spark.home to hive.conf.restricted.list
 Key: HIVE-18917
 URL: https://issues.apache.org/jira/browse/HIVE-18917
 Project: Hive
  Issue Type: Task
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Add spark.home to hive.conf.restricted.list so its not settable by users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18915) Better client logging when a HoS session can't be opened

2018-03-08 Thread Sahil Takiar (JIRA)
Sahil Takiar created HIVE-18915:
---

 Summary: Better client logging when a HoS session can't be opened
 Key: HIVE-18915
 URL: https://issues.apache.org/jira/browse/HIVE-18915
 Project: Hive
  Issue Type: Sub-task
Reporter: Sahil Takiar


Users just get a {{FAILED: Execution Error, return code 30041 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client 
for Spark session [id]} when a HoS session can't be opened, would be better 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18914) View definition resolved incorrectly for windowing case

2018-03-08 Thread Anant Mittal (JIRA)
Anant Mittal created HIVE-18914:
---

 Summary: View definition resolved incorrectly for windowing case
 Key: HIVE-18914
 URL: https://issues.apache.org/jira/browse/HIVE-18914
 Project: Hive
  Issue Type: Bug
Reporter: Anant Mittal


Select from a view leads to error as column name is not resolved properly.

Error seen:

FAILED: SemanticException Failed to breakup Windowing invocations into Groups. 
At least 1 group must only depend on input columns. Also check for circular 
dependencies.
Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:182 
Expression not in GROUP BY key 'amt' in definition of VIEW badview [
select `t1`.`partcol` from (select `b`.`partcol`, `b`.`amt`, `b`.`rownum` from 
( SELECT  partCol, +*SUM(`tdr`.`amt`)*+ AS `amt`, row_number() over (partition 
by `tdr`.`partcol` order by sum(+*amt*+) desc) `rownum` FROM (select 
`source`.`partcol`,`source`.`amt` from `default`.`source` group by 
`source`.`partcol`,`source`.`amt` )as `tdr` group by `tdr`.`partcol`)as `b`)as 
`t1` group by `t1`.`partcol`
] used as badview at Line 1:14

 

Queries to reproduce:

CREATE TABLE source(partCol STRING,amt DECIMAL(38,3));

CREATE VIEW badView AS select partCol from (select b.* from ( SELECT  partCol, 
SUM(amt) AS amt, row_number() over (partition by partCol order by sum(amt) 
desc) rownum FROM (select partCol,amt from source group by partCol,amt )as tdr 
group by partCol)as b)as t1 group by partCol;

select * from badView;

 

Note: Running the subquery in the AS part of view definition works correctly.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18913) Change DataType to Collection or Set

2018-03-08 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HIVE-18913:
--

 Summary: Change DataType to Collection or Set
 Key: HIVE-18913
 URL: https://issues.apache.org/jira/browse/HIVE-18913
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 3.0.0, 2.4.0
Reporter: BELUGA BEHR


https://github.com/apache/hive/blob/9b36ffa92cc4e0f47ea03d8d167debe743342f5b/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java

Please change the signature of these methods to be {{Set}} or {{Collection}}

{code}
public static List getColumnNames*
{code}

Often these calls are being used to get a list of columns, then searched 
through.  It would be more performant (and database set-theory correct) if the 
returned data type was a {{Set}} and not a {{List}}.  And they should probably 
be immutable collections returned as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18912) Stats task should set inaccurate flag on failure

2018-03-08 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-18912:
---

 Summary: Stats task should set inaccurate flag on failure
 Key: HIVE-18912
 URL: https://issues.apache.org/jira/browse/HIVE-18912
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin


See the code where it presets it e.g. for txn table.
{noformat}
StatsSetupConst.setBasicStatsState(parameters, StatsSetupConst.FALSE);
{noformat}
In my testing, I noticed that sometimes when stats task fails, the 
accurate=true flag is not cleaned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18911) LOAD.. code for MM has some suspect/dead code

2018-03-08 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-18911:
---

 Summary: LOAD.. code for MM has some suspect/dead code
 Key: HIVE-18911
 URL: https://issues.apache.org/jira/browse/HIVE-18911
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin


Discovered in HIVE-18571 and added TODO-s that need to be addressed.
E.g. {noformat}
if (isMmTableWrite) {
   // We will load into MM directory, and delete from the parent if 
needed.
  // TODO: this looks invalid after ACID integration. What about base 
dirs?
   destPath = new Path(destPath, AcidUtils.deltaSubdir(writeId, 
writeId, stmtId));
...
 // TODO: loadFileType for MM table will no longer be REPLACE_ALL
   filter = (loadFileType == LoadFileType.REPLACE_ALL)
{noformat}
2 places like that

Also replaceFiles has isMmTableWrite flag that should no longer be needed 
(since for a transactional table we should never replace files). Either there's 
some invalid code path that relies on it (load table?), or it is just unused 
and needs to be removed.
Also used in 2 places, "TODO: this should never run for MM tables anymore. 
Remove the flag, and maybe the filter?"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18910) Migrate to Murmur hash for shuffle and bucketing

2018-03-08 Thread Deepak Jaiswal (JIRA)
Deepak Jaiswal created HIVE-18910:
-

 Summary: Migrate to Murmur hash for shuffle and bucketing
 Key: HIVE-18910
 URL: https://issues.apache.org/jira/browse/HIVE-18910
 Project: Hive
  Issue Type: Bug
Reporter: Deepak Jaiswal
Assignee: Deepak Jaiswal


Hive uses JAVA hash which is not as good as murmur for better distribution and 
efficiency in bucketing a table.

Migrate to murmur hash but still keep backward compatibility for existing users 
so that they dont have to reload the existing tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 65988: HIVE-18907: Create utility to fix acid key index issue from HIVE-18817

2018-03-08 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65988/
---

(Updated March 8, 2018, 8:09 p.m.)


Review request for hive, Eugene Koifman and Prasanth_J.


Changes
---

updates per review comments


Bugs: HIVE-18907
https://issues.apache.org/jira/browse/HIVE-18907


Repository: hive-git


Description
---

Create utility similar to orcfiledump to check/fix this particular acid key 
index issue.


Diffs (updated)
-

  bin/ext/fixacidkeyindex.sh PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/FixAcidKeyIndex.java 
PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFixAcidKeyIndex.java 
PRE-CREATION 


Diff: https://reviews.apache.org/r/65988/diff/2/

Changes: https://reviews.apache.org/r/65988/diff/1-2/


Testing
---


Thanks,

Jason Dere



Re: Review Request 65988: HIVE-18907: Create utility to fix acid key index issue from HIVE-18817

2018-03-08 Thread Jason Dere


> On March 8, 2018, 7:10 p.m., Prasanth_J wrote:
> > bin/ext/fixacidkeyindex.sh
> > Lines 32 (patched)
> > 
> >
> > nit: whitespace

Thanks, will fix.


> On March 8, 2018, 7:10 p.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/FixAcidKeyIndex.java
> > Lines 227 (patched)
> > 
> >
> > what is the contract for these 3 fields?
> > is rowId always monotonically increasing?
> > can transactionId change within a stripe? if so is there any ordering?
> > is bucket number constant within a file?
> > 
> > based on the ordering the ordering guarantees, we may be able to max 
> > stats value of individual columns as opposed to seek. if there are no 
> > ordering guarantees don't we have to look the entire last stripe to figure 
> > the max value of triple?

Eugene, any comments about this?


> On March 8, 2018, 7:10 p.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/FixAcidKeyIndex.java
> > Lines 254 (patched)
> > 
> >
> > "+= +" is that an additional + at the end? intended?

Oh, that is not intended, thanks for catching.


> On March 8, 2018, 7:10 p.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/FixAcidKeyIndex.java
> > Lines 257 (patched)
> > 
> >
> > put this inside try..finally (or try..with..resource) to close the 
> > resource?

Ok, will change.


> On March 8, 2018, 7:10 p.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/FixAcidKeyIndex.java
> > Lines 259 (patched)
> > 
> >
> > comment doesn't seem to match code. looks like the following block is 
> > finalizing the rename (commit)?

The call to if (isAcidKeyIndexValid(newReader)) is the confirmation that the 
file is fixed - finalizing the rename only happens if that check passes.
I guess I can change the comment a bit.


- Jason


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65988/#review198895
---


On March 8, 2018, 5:38 p.m., Jason Dere wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65988/
> ---
> 
> (Updated March 8, 2018, 5:38 p.m.)
> 
> 
> Review request for hive, Eugene Koifman and Prasanth_J.
> 
> 
> Bugs: HIVE-18907
> https://issues.apache.org/jira/browse/HIVE-18907
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Create utility similar to orcfiledump to check/fix this particular acid key 
> index issue.
> 
> 
> Diffs
> -
> 
>   bin/ext/fixacidkeyindex.sh PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/FixAcidKeyIndex.java 
> PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFixAcidKeyIndex.java 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/65988/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jason Dere
> 
>



Re: Review Request 65988: HIVE-18907: Create utility to fix acid key index issue from HIVE-18817

2018-03-08 Thread j . prasanth . j

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65988/#review198895
---




bin/ext/fixacidkeyindex.sh
Lines 32 (patched)


nit: whitespace



ql/src/java/org/apache/hadoop/hive/ql/io/orc/FixAcidKeyIndex.java
Lines 227 (patched)


what is the contract for these 3 fields?
is rowId always monotonically increasing?
can transactionId change within a stripe? if so is there any ordering?
is bucket number constant within a file?

based on the ordering the ordering guarantees, we may be able to max stats 
value of individual columns as opposed to seek. if there are no ordering 
guarantees don't we have to look the entire last stripe to figure the max value 
of triple?



ql/src/java/org/apache/hadoop/hive/ql/io/orc/FixAcidKeyIndex.java
Lines 254 (patched)


"+= +" is that an additional + at the end? intended?



ql/src/java/org/apache/hadoop/hive/ql/io/orc/FixAcidKeyIndex.java
Lines 257 (patched)


put this inside try..finally (or try..with..resource) to close the resource?



ql/src/java/org/apache/hadoop/hive/ql/io/orc/FixAcidKeyIndex.java
Lines 259 (patched)


comment doesn't seem to match code. looks like the following block is 
finalizing the rename (commit)?


- Prasanth_J


On March 8, 2018, 5:38 p.m., Jason Dere wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65988/
> ---
> 
> (Updated March 8, 2018, 5:38 p.m.)
> 
> 
> Review request for hive, Eugene Koifman and Prasanth_J.
> 
> 
> Bugs: HIVE-18907
> https://issues.apache.org/jira/browse/HIVE-18907
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Create utility similar to orcfiledump to check/fix this particular acid key 
> index issue.
> 
> 
> Diffs
> -
> 
>   bin/ext/fixacidkeyindex.sh PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/FixAcidKeyIndex.java 
> PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFixAcidKeyIndex.java 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/65988/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jason Dere
> 
>



[jira] [Created] (HIVE-18909) Metrics for results cache

2018-03-08 Thread Jason Dere (JIRA)
Jason Dere created HIVE-18909:
-

 Summary: Metrics for results cache
 Key: HIVE-18909
 URL: https://issues.apache.org/jira/browse/HIVE-18909
 Project: Hive
  Issue Type: Sub-task
Reporter: Jason Dere






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18908) Add support for FULL OUTER JOIN to MapJoin

2018-03-08 Thread Matt McCline (JIRA)
Matt McCline created HIVE-18908:
---

 Summary: Add support for FULL OUTER JOIN to MapJoin
 Key: HIVE-18908
 URL: https://issues.apache.org/jira/browse/HIVE-18908
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Currently, we do not support FULL OUTER JOIN in MapJoin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Review Request 65988: HIVE-18907: Create utility to fix acid key index issue from HIVE-18817

2018-03-08 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65988/
---

Review request for hive, Eugene Koifman and Prasanth_J.


Bugs: HIVE-18907
https://issues.apache.org/jira/browse/HIVE-18907


Repository: hive-git


Description
---

Create utility similar to orcfiledump to check/fix this particular acid key 
index issue.


Diffs
-

  bin/ext/fixacidkeyindex.sh PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/FixAcidKeyIndex.java 
PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFixAcidKeyIndex.java 
PRE-CREATION 


Diff: https://reviews.apache.org/r/65988/diff/1/


Testing
---


Thanks,

Jason Dere



Re: Review Request 65716: HIVE-18696: The partition folders might not get cleaned up properly in the HiveMetaStore.add_partitions_core method if an exception occurs

2018-03-08 Thread Marta Kuczora via Review Board


> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote:
> >

Thanks a lot Sasha for the review!


> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 2887 (original), 2889 (patched)
> > 
> >
> > Here you can specify the length:
> > 
> > `List partitionsToAdd = new ArrayList<>(parts.size());
> > List partValWrappers = new ArrayList<>(parts.size());
> > `
> > 
> > But also why do we needd this list at all? We can just use 
> > partValWrappers as a collection of partitions we care about.

You are right, we don't need this list. I fixed the code to use only the 
wrappers.


> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 2888 (original), 2890 (patched)
> > 
> >
> > You use this to check for duplicates and list is pretty bad structure 
> > for this - please use set instead.

Fixed it.


> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 2891 (original), 2897 (patched)
> > 
> >
> > In cases like this it is also quite useful to know actual table and 
> > dbname that was supplied - it could help to figure out what wrong partition 
> > ended up here.

Fixed the error message.


> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 2898 (original), 2904 (patched)
> > 
> >
> > Can you fix this as well to use
> > 
> > `LOG.info("Not adding partition {} as it already exists", part)`

Fixed it.


> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Lines 2922 (patched)
> > 
> >
> > Please use new ArrayList<>(partitionsToAdd.size())

Fixed it.


> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 2902 (original), 2925 (patched)
> > 
> >
> > ugi doesn't change in the loop, so it can be moved outside. Same goes 
> > for currentUser - it can be done just once outside the loop.

You are right, I moved it outside the loop.


> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 2906 (original), 2929 (patched)
> > 
> >
> > Why are we converting IO exception to RuntimeException? This doesn't 
> > look right.

I don't know why it was implemented like this. I didn't change this part. It 
got introduced like this in https://issues.apache.org/jira/browse/HIVE-15137


> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 2909 (original), 2932 (patched)
> > 
> >
> > I am wondering what is the motivation for doing this concurrently. I 
> > guess that if the list of partitions is huge, it may be useful, but for 
> > smaller lists it is probably just an overhead. This is putside your scope 
> > but definitely worth investigating.

I only know that the threads got introduced in 
https://issues.apache.org/jira/browse/HIVE-13901 because the folder creation 
was slow on some file systems.


> On March 7, 2018, 5:42 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
> > Line 2913 (original), 2936 (patched)
> > 
> >
> > Since we are using Java 8 we can use lambdas now, so this can become:
> > 
> >   partFutures.add(threadPool.submit(() -> {
> > if (failureOccurred.get()) {
> >   return null;
> > }
> > ugi.doAs((PrivilegedExceptionAction) () -> {
> >   try {
> > boolean madeDir = 
> > createLocationForAddedPartition(table, part);
> > addedPartitions.put(partWrapper, madeDir);
> > initializeAddedPartition(table, part, madeDir);
> >   } 

Re: Review Request 65716: HIVE-18696: The partition folders might not get cleaned up properly in the HiveMetaStore.add_partitions_core method if an exception occurs

2018-03-08 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65716/
---

(Updated March 8, 2018, 4:52 p.m.)


Review request for hive, Alexander Kolbasov, Peter Vary, and Adam Szita.


Changes
---

Fixed review findings.


Bugs: HIVE-18696
https://issues.apache.org/jira/browse/HIVE-18696


Repository: hive-git


Description
---

The idea behind the patch is

1) Separate the partition validation from starting the tasks which create the 
partition folders. 
Instead of doing the checks on the partitions and submit the tasks in one loop, 
separated the validation into a different loop. So first iterate through the 
partitions, validate the table/db names, and check for duplicates. Then if all 
partitions were correct, in the second loop submit the tasks to create the 
partition folders. This way if one of the partitions is incorrect, the 
exception will be thrown in the first loop, before the tasks are submitted. So 
we can be sure that no partition folder will be created if the list contains an 
invalid partition.

2) Handle the exceptions which occur during the execution of the tasks 
differently.
Previously if an exception occured in one task, the remaining tasks were 
canceled, and the newly created partition folders were cleaned up in the 
finally part. The problem was that it could happen that some tasks were still 
not finished with the folder creation when cleaning up the others, so there 
could have been leftover folders. After doing some testing it turned out that 
this use case cannot be avoided completely when canceling the tasks.
The idea of this patch is to set a flag if an exception is thrown in one of the 
tasks. This flag is visible in the tasks and if its value is true, the 
partition folders won't be created. Then iterate through the remaining tasks 
and wait for them to finish. The tasks which are started before the flag got 
set will then finish creating the partition folders. The tasks which are 
started after the flag got set, won't create the partition folders, to avoid 
unnecessary work. This way it is sure that all tasks are finished, when 
entering the finally part where the partition folders are cleaned up.


Diffs (updated)
-

  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 662de9a 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitions.java
 4d9cb1b 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestAddPartitionsFromPartSpec.java
 1122057 


Diff: https://reviews.apache.org/r/65716/diff/3/

Changes: https://reviews.apache.org/r/65716/diff/2-3/


Testing
---

Added some new tests cases to the TestAddPartitions and 
TestAddPartitionsFromPartSpec tests.


Thanks,

Marta Kuczora



Review Request 65985: HIVE-18783: ALTER TABLE post-commit listener does not include the transactional listener responses

2018-03-08 Thread Sergio Pena via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65985/
---

Review request for hive, Alexander Kolbasov, Sahil Takiar, and Vihang 
Karajgaonkar.


Bugs: HIVE-18783
https://issues.apache.org/jira/browse/HIVE-18783


Repository: hive-git


Description
---

HIVE-16164 introduced a mechanism to pass HMS notification events ID to the 
post-commit listeners for all DDL operations, but it didn't add it to the ALTER 
TABLE event. This patch in review adds the same behavior for ALTER TABLE events.


Diffs
-

  
itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/TestDbNotificationListener.java
 e0e29652da94bbdaca515a17955d1409824c1742 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
 89354a2d34249903a9ff13c4ed913a68de93057e 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 662de9a66767f27f31998f14c68f854e59993ab6 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/IHMSHandler.java
 e6de0013bc1be12b2772e2e97102ed476cf5 


Diff: https://reviews.apache.org/r/65985/diff/1/


Testing
---

All tests passed.


Thanks,

Sergio Pena



Re: Review Request 65943: HIVE-18888: Replace synchronizedMap with ConcurrentHashMap

2018-03-08 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65943/#review198869
---


Ship it!




Ship It!

- Peter Vary


On March 8, 2018, 1:19 a.m., Alexander Kolbasov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65943/
> ---
> 
> (Updated March 8, 2018, 1:19 a.m.)
> 
> 
> Review request for hive, Aihua Xu, Andrew Sherman, Janaki Lahorani, Peter 
> Vary, Sahil Takiar, and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-1
> https://issues.apache.org/jira/browse/HIVE-1
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-1: Replace synchronizedMap with ConcurrentHashMap
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DynamicValueRegistryTez.java 
> 0bed22a8f805ed64ba1af260434a40667004e051 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
> c0be51e0b2c085896739a14096146822e4599d41 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  662de9a66767f27f31998f14c68f854e59993ab6 
> 
> 
> Diff: https://reviews.apache.org/r/65943/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Alexander Kolbasov
> 
>



[GitHub] hive pull request #317: HIVE-18751: ACID table scan through get_splits UDF d...

2018-03-08 Thread sankarh
GitHub user sankarh opened a pull request:

https://github.com/apache/hive/pull/317

HIVE-18751: ACID table scan through get_splits UDF doesn't receive 
ValidWriteIdList configuration.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sankarh/hive HIVE-18751

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/317.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #317


commit b8e481cc0a89a98c4654713af25b154a9262f5f9
Author: Sankar Hariappan 
Date:   2018-03-08T09:26:41Z

HIVE-18751: ACID table scan through get_splits UDF doesn't receive 
ValidWriteIdList configuration.




---


Re: Review Request 65952: HIVE-18898: Fix NPEs in HiveMetastore.dropPartition method

2018-03-08 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65952/
---

(Updated March 8, 2018, 9:40 a.m.)


Review request for hive, Peter Vary and Adam Szita.


Changes
---

Fixed stlyecheck issue.


Bugs: HIVE-18898
https://issues.apache.org/jira/browse/HIVE-18898


Repository: hive-git


Description
---

The TestDropPartitions tests revealed that NPE is thrown if the 
dropPartition(String db_name, String tbl_name, List part_vals, 
PartitionDropOptions options) method is called with null options and with a 
part_vals list which contains null elements.


Diffs (updated)
-

  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 662de9a 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
 3128089 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestDropPartitions.java
 4d94ebf 


Diff: https://reviews.apache.org/r/65952/diff/3/

Changes: https://reviews.apache.org/r/65952/diff/2-3/


Testing
---

Run the TestDropPartitions tests.


Thanks,

Marta Kuczora



Re: Review Request 65952: HIVE-18898: Fix NPEs in HiveMetastore.dropPartition method

2018-03-08 Thread Marta Kuczora via Review Board


> On March 7, 2018, 5:06 p.m., Sahil Takiar wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
> > Lines 1006 (patched)
> > 
> >
> > Do the other parameters need to be checked?
> 
> Marta Kuczora wrote:
> You are right, we can check the other parameters too.

Added checks to HiveMetaStore for the db and table names and for the part_vals 
list.


- Marta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65952/#review198796
---


On March 8, 2018, 9:35 a.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65952/
> ---
> 
> (Updated March 8, 2018, 9:35 a.m.)
> 
> 
> Review request for hive, Peter Vary and Adam Szita.
> 
> 
> Bugs: HIVE-18898
> https://issues.apache.org/jira/browse/HIVE-18898
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The TestDropPartitions tests revealed that NPE is thrown if the 
> dropPartition(String db_name, String tbl_name, List part_vals, 
> PartitionDropOptions options) method is called with null options and with a 
> part_vals list which contains null elements.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  662de9a 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
>  3128089 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestDropPartitions.java
>  4d94ebf 
> 
> 
> Diff: https://reviews.apache.org/r/65952/diff/2/
> 
> 
> Testing
> ---
> 
> Run the TestDropPartitions tests.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>



Re: Review Request 65952: HIVE-18898: Fix NPEs in HiveMetastore.dropPartition method

2018-03-08 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65952/
---

(Updated March 8, 2018, 9:35 a.m.)


Review request for hive, Peter Vary and Adam Szita.


Changes
---

Added checks for the db and table names and for the part value list.


Bugs: HIVE-18898
https://issues.apache.org/jira/browse/HIVE-18898


Repository: hive-git


Description
---

The TestDropPartitions tests revealed that NPE is thrown if the 
dropPartition(String db_name, String tbl_name, List part_vals, 
PartitionDropOptions options) method is called with null options and with a 
part_vals list which contains null elements.


Diffs (updated)
-

  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 662de9a 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
 3128089 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestDropPartitions.java
 4d94ebf 


Diff: https://reviews.apache.org/r/65952/diff/2/

Changes: https://reviews.apache.org/r/65952/diff/1-2/


Testing
---

Run the TestDropPartitions tests.


Thanks,

Marta Kuczora



[jira] [Created] (HIVE-18907) Create utility to fix acid key index issue from HIVE-18817

2018-03-08 Thread Jason Dere (JIRA)
Jason Dere created HIVE-18907:
-

 Summary: Create utility to fix acid key index issue from HIVE-18817
 Key: HIVE-18907
 URL: https://issues.apache.org/jira/browse/HIVE-18907
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Jason Dere


While HIVE-18817 will create new ORC Acid files from hitting the 
ArrayIndexOutOfBounds issue, existing files created before HIVE-18817 will 
still cause this issue. If there are delta directories then one way to generate 
new files is to perform a major compaction. But this does not work if there are 
no delta directories for the table/partition.

Add a tool to fix the Acid ORC files directly in the case that a compaction 
cannot be performed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 65952: HIVE-18898: Fix NPEs in HiveMetastore.dropPartition method

2018-03-08 Thread Marta Kuczora via Review Board


> On March 8, 2018, 2:15 a.m., Alexander Kolbasov wrote:
> > Afrew with Sahil - the checks should be on the server side.

Thanks a lot Sasha for the review!

I absolutely agree that it would be better to have the checks on the server 
side, however these checks have to be done in the client, because the NPE 
occurs before calling HiveMetaStore.

As I wrote above, the PartitionDropOptions parameter is not sent to 
HiveMetaStore. In HiveMetaStore we only have the boolean values which are set 
in PartitionDropOptions and the NPE occurs when trying to get these variables.

With a part_vals list which has null values, the NPE occurs when thrift is 
trying to serialize it. So we should do the check before calling the thrift 
method.


- Marta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65952/#review198847
---


On March 7, 2018, 3:48 p.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65952/
> ---
> 
> (Updated March 7, 2018, 3:48 p.m.)
> 
> 
> Review request for hive, Peter Vary and Adam Szita.
> 
> 
> Bugs: HIVE-18898
> https://issues.apache.org/jira/browse/HIVE-18898
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The TestDropPartitions tests revealed that NPE is thrown if the 
> dropPartition(String db_name, String tbl_name, List part_vals, 
> PartitionDropOptions options) method is called with null options and with a 
> part_vals list which contains null elements.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
>  3128089 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestDropPartitions.java
>  4d94ebf 
> 
> 
> Diff: https://reviews.apache.org/r/65952/diff/1/
> 
> 
> Testing
> ---
> 
> Run the TestDropPartitions tests.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>



Re: Review Request 65952: HIVE-18898: Fix NPEs in HiveMetastore.dropPartition method

2018-03-08 Thread Marta Kuczora via Review Board


> On March 7, 2018, 5:06 p.m., Sahil Takiar wrote:
> >

Thanks a lot Sahil for the review.


> On March 7, 2018, 5:06 p.m., Sahil Takiar wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
> > Lines 996 (patched)
> > 
> >
> > Why do this validation in `HiveMetaStoreClient` rather than 
> > `HiveMetaStore`?

I did it there, because the PartitionDropOptions parameter is not sent to 
HiveMetaStore, only the boolean values from it.
The dropPartition method in the HiveMetaStoreClient looked like this:

  public boolean dropPartition(String db_name, String tbl_name,
  List part_vals, PartitionDropOptions options) throws TException {
  
return dropPartition(db_name, tbl_name, part_vals, options.deleteData,
 options.purgeData? 
getEnvironmentContextWithIfPurgeSet() : null);
  }
  
And the NPE occurred when called options.deleteData in the dropPartition method 
call.


> On March 7, 2018, 5:06 p.m., Sahil Takiar wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
> > Lines 1006 (patched)
> > 
> >
> > Do the other parameters need to be checked?

You are right, we can check the other parameters too.


- Marta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65952/#review198796
---


On March 7, 2018, 3:48 p.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65952/
> ---
> 
> (Updated March 7, 2018, 3:48 p.m.)
> 
> 
> Review request for hive, Peter Vary and Adam Szita.
> 
> 
> Bugs: HIVE-18898
> https://issues.apache.org/jira/browse/HIVE-18898
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The TestDropPartitions tests revealed that NPE is thrown if the 
> dropPartition(String db_name, String tbl_name, List part_vals, 
> PartitionDropOptions options) method is called with null options and with a 
> part_vals list which contains null elements.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
>  3128089 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestDropPartitions.java
>  4d94ebf 
> 
> 
> Diff: https://reviews.apache.org/r/65952/diff/1/
> 
> 
> Testing
> ---
> 
> Run the TestDropPartitions tests.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>