[jira] [Created] (HIVE-18928) HS2: Perflogger has a race condition

2018-03-09 Thread Gopal V (JIRA)
Gopal V created HIVE-18928:
--

 Summary: HS2: Perflogger has a race condition
 Key: HIVE-18928
 URL: https://issues.apache.org/jira/browse/HIVE-18928
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V


{code}
Caused by: java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437) 
~[?:1.8.0_112]
at java.util.HashMap$EntryIterator.next(HashMap.java:1471) 
~[?:1.8.0_112]
at java.util.HashMap$EntryIterator.next(HashMap.java:1469) 
~[?:1.8.0_112]
at java.util.AbstractCollection.toArray(AbstractCollection.java:196) 
~[?:1.8.0_112]
at com.google.common.collect.Iterables.toArray(Iterables.java:316) 
~[guava-19.0.jar:?]
at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:342) 
~[guava-19.0.jar:?]
at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:327) 
~[guava-19.0.jar:?]
at 
org.apache.hadoop.hive.ql.log.PerfLogger.getEndTimes(PerfLogger.java:218) 
~[hive-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1561) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1498) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:198)
 ~[hive-service-3.0.0.3.0.0.2-132.jar:3.0.0.3.0.0.2-132]
{code




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18927) Hive "insert overwrite" doesn't replace the destination files if no partition in metastore for the files

2018-03-09 Thread wangzhihao (JIRA)
wangzhihao created HIVE-18927:
-

 Summary: Hive "insert overwrite" doesn't replace the destination 
files if no partition in metastore for the files
 Key: HIVE-18927
 URL: https://issues.apache.org/jira/browse/HIVE-18927
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: wangzhihao


[This 
post|http://www.ericlin.me/2015/05/hive-insert-overwrite-does-not-remove-existing-data/]
 describe a way to produce this issue:
{noformat}
# Add some files into file system but no partition in metastore to track it.
hdfs dfs -put test.txt test/p=p1

# Insert overwrite the partition(p = p1)
DROP TABLE IF EXISTS partition_test;
CREATE EXTERNAL TABLE partition_test (a int) PARTITIONED BY (p string);
INSERT OVERWRITE TABLE partition_test PARTITION (p = 'p1') SELECT 123;

# verify the text.txt is not removed.
hdfs dfs -ls test/p=p1
Found 2 items
-rwxr-xr-x   3 hdfs supergroup 194965 2015-05-05 00:15 test/p=p1/00_0
-rw-r--r--   3 hdfs supergroup  8 2015-05-05 00:10 test/p=p1/test.txt
{noformat}
The reason is that 
[Hive.loadPartition|https://github.com/apache/hive/blob/9b36ffa92cc4e0f47ea03d8d167debe743342f5b/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1652]
 will try to {{replaceFiles}} only if {{oldPath}} exists. Since metastore have 
no partition for the files, the {{oldPath}} is null and thus the files get no 
chance to be cleaned. We should also clean {{destf}} in method 
[Hive.replaceFiles|https://github.com/apache/hive/blob/b362de3871764731d8371657b07140e37a3c5105/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3817]
 to fix the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] hive pull request #319: HIVE-18925 Prevent failure when JVM zone had change ...

2018-03-09 Thread findepi
GitHub user findepi opened a pull request:

https://github.com/apache/hive/pull/319

HIVE-18925 Prevent failure when JVM zone had change on 1970-01-01

This prevents static initializer failure if JVM zone did not observe
1970-01-01 00:00:00, while retaining original behavior for all other
zones.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/findepi/hive fix-startup-banderas

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/319.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #319


commit 98b8becb0571da00bf90e073f13f70b4a7698472
Author: Piotr Findeisen 
Date:   2018-03-09T22:52:40Z

HIVE-18925 Prevent failure when JVM zone had change on 1970-01-01

This prevents static initializer failure if JVM zone did not observe
1970-01-01 00:00:00, while retaining original behavior for all other
zones.




---


Review Request 66005: HIVE-18846 Query results cache: Allow queries to refer to the pending results of a query that has not finished yet

2018-03-09 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66005/
---

Review request for hive, Gopal V and Jesús Camacho Rodríguez.


Bugs: HIVE-18846
https://issues.apache.org/jira/browse/HIVE-18846


Repository: hive-git


Description
---

This patch causes the query to wait for the pending results by blocking during 
query compilation


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 04b8c4b 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java d789ed0 
  ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java 
88a056b 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java d1609e1 
  ql/src/test/queries/clientpositive/results_cache_empty_result.q PRE-CREATION 
  ql/src/test/results/clientpositive/llap/results_cache_empty_result.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/results_cache_empty_result.q.out 
PRE-CREATION 


Diff: https://reviews.apache.org/r/66005/diff/1/


Testing
---


Thanks,

Jason Dere



Re: Review Request 65634: HIVE-18264: CachedStore: Store cached partitions/col stats within the table cache

2018-03-09 Thread Vaibhav Gumashta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65634/
---

(Updated March 9, 2018, 9 p.m.)


Review request for hive, Daniel Dai and Thejas Nair.


Bugs: HIVE-18264
https://issues.apache.org/jira/browse/HIVE-18264


Repository: hive-git


Description
---

https://issues.apache.org/jira/browse/HIVE-18264


Diffs (updated)
-

  
itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java
 a3725c5395 
  service/src/java/org/apache/hive/service/server/HiveServer2.java 86c9c2b33c 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ac71d0882f 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
 7b44df4128 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java
 f500d63725 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CacheUtils.java
 f0f650ddcf 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
 0d132f2074 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java
 32ea17495f 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
 50f873a013 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
 75ea8c4a77 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
 207d842f94 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java
 ab6feb6f0b 
  standalone-metastore/src/test/resources/log4j2.properties 365687e1c9 


Diff: https://reviews.apache.org/r/65634/diff/5/

Changes: https://reviews.apache.org/r/65634/diff/4-5/


Testing
---


Thanks,

Vaibhav Gumashta



Re: Review Request 65634: HIVE-18264: CachedStore: Store cached partitions/col stats within the table cache

2018-03-09 Thread Vaibhav Gumashta


> On March 2, 2018, 7:54 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CacheUtils.java
> > Lines 63 (patched)
> > 
> >
> > It would be cleaner and easier to read to rewrite this as
> > 
> > ```
> >   public static String buildKey(List partVals) {
> > if (partVals == null || partVals.isEmpty()) {
> >   return "";
> > }
> > return String.join(delimit, partVals);
> >   }
> > ```

I have made the change based on your suggestion, but I think it's just a 
preference of style.


> On March 2, 2018, 7:54 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CacheUtils.java
> > Lines 70 (patched)
> > 
> >
> > 1) Please add Javadoc comment, explaining what this function does.
> > 2) Is overloading really useful here?

I agree, makes sense to have a new method than overload since the params don't 
really convey the intent if you overlook the param name. I'll make changes to 
other methods as well.


> On March 2, 2018, 7:54 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CacheUtils.java
> > Lines 71 (patched)
> > 
> >
> > why not just 
> > 
> > `return buildKey(partVals) + delimit + colName`
> > 
> > can colName be empty here or not?

colName won't be empty here.


> On March 2, 2018, 7:54 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
> > Lines 128 (patched)
> > 
> >
> > 1) Please add units in the name and use constant for the default value.
> > 2) Please document what is `cacheRefreshPeriod`.

It is already documented as part of the Conf class. Adding the same note here.


> On March 2, 2018, 7:54 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
> > Line 226 (original), 140 (patched)
> > 
> >
> > Please document this method - in particular how does it gets cache 
> > implementaiton from config.

Actually I don't think this method is needed now (have removed it). This was 
introduced in HIVE-17629, when prewarm was a blocking call. In this patch, I 
have made it non-blocking and it runs in a background thread so that metastore 
can remain usable during cache prewarm.


> On March 2, 2018, 7:54 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
> > Line 262 (original), 175 (patched)
> > 
> >
> > This doesn't look correct:
> > 
> > 1) initBlackListWhiteList() will not update any existing whitelist or 
> > blacklist, only add one if it wasn't there.
> > 2) initBlackListWhiteList() is calling 
> > `Collections.reverse(blacklistPatterns)` which doesn't make sense when 
> > configuration is set to a new value.

This is picked once only when the metastore server is started and then used 
thereafter. Removed Collections.reverse.


> On March 2, 2018, 7:54 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
> > Lines 176 (patched)
> > 
> >
> > Looks like every time someone calls setConf() a new thread is started - 
> > isn't it a threda leak?
> > In general it isn't a good practice to add such side-effects for config 
> > changes like setConf - it is better to explicitly call a method which will 
> > do whatever is needed after conf update.

No, a new thread is started only when cacheUpdateMaster is null or from the 
Unit tests (where we explicitly try to control start and stop, and where the 
thread dies after one run).


> On March 2, 2018, 7:54 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
> > Line 266 (original), 202 (patched)
> > 
> >
> > Please document this method. Among other things - can prewarm() be 
> > called multiple times? If not, should it be somehow enforced?

Have enforced that. Currently it was being called just once from the background 
thread.


> On March 2, 2018, 7:54 a.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
> > Line 271 (original), 207 

[jira] [Created] (HIVE-18926) Imporve operator-tree matching

2018-03-09 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-18926:
---

 Summary: Imporve operator-tree matching
 Key: HIVE-18926
 URL: https://issues.apache.org/jira/browse/HIVE-18926
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


currently joins are not matched



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18925) Hive doesn't start when JVM is America/Bahia_Banderas time zone

2018-03-09 Thread Piotr Findeisen (JIRA)
Piotr Findeisen created HIVE-18925:
--

 Summary: Hive doesn't start when JVM is America/Bahia_Banderas 
time zone
 Key: HIVE-18925
 URL: https://issues.apache.org/jira/browse/HIVE-18925
 Project: Hive
  Issue Type: Bug
 Environment: JVM in America/Bahia_Banderas zone
Reporter: Piotr Findeisen


Hive Server2 doesn't  work if started with 
{{-Duser.timezone=America/Bahia_Banderas}}

 

Steps to reproduce
 # use [https://github.com/big-data-europe/docker-hive]
 # Add {{HADOOP_CLIENT_OPTS: '-Duser.timezone=America/Bahia_Banderas'}} to 
{{hive-server}} docker container environment configuration
 # {{docker-compose up}}
 # 
{code:java}
host# docker-compose exec hive-server bash
container# /opt/hive/bin/beeline -u jdbc:hive2://localhost:1
...
jdbc:hive2://localhost:1> select 1;
Error: java.lang.IllegalStateException: Can't overwrite cause with 
org.joda.time.IllegalInstantException: Illegal instant due to time zone offset 
transition (daylight savings time 'gap'): 1970-01-01T00:00:00.000 
(America/Bahia_Banderas) (state=08S01,code=0){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18924) Wrong column selected on insert overwrite of partitioned table

2018-03-09 Thread Sudip Hazra Choudhury (JIRA)
Sudip Hazra Choudhury created HIVE-18924:


 Summary: Wrong column selected on insert overwrite of partitioned 
table 
 Key: HIVE-18924
 URL: https://issues.apache.org/jira/browse/HIVE-18924
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Sudip Hazra Choudhury
 Attachments: dataset.csv, queries_submitted.hql

 

Issue ::

I have table A with dt as the partition column, start_date and end_date of type 
timestamp as other columns.

eq,
{code:java}
Table A { start_date timestamp, end_date timestamp & PARTITIONED BY ( dt string 
) }{code}
 

I also have another table B which has dt & hr as partition column and 
start_date and end_date of type timestamp as other columns.

eq,
{code:java}
 Table B { start_date timestamp, end_date timestamp & PARTITIONED BY ( dt 
string, hr string ) }{code}
 

Now I do insert overwrite to table B, selecting from table A. While selecting 
from table A, I am using data format function on start_date column and naming 
it as dt.  

eq,
{code:java}
insert overwrite table B
PARTITION(dt, hr)
select
 UNIX_TIMESTAMP(a.start_date, "-MM-dd'T'hh:mm:ss'Z'") * 1000 as start,
 UNIX_TIMESTAMP(a.end_date, "-MM-dd'T'hh:mm:ss'Z'") * 1000 as end,
 date_format(cast(UNIX_TIMESTAMP(a.start_date, "-MM-dd'T'hh:mm:ss'Z'") * 
1000 as timestamp),"-MM-dd") as dt,
 date_format(cast(UNIX_TIMESTAMP(a.start_date, "-MM-dd'T'hh:mm:ss'Z'") * 
1000 as timestamp),"HH") as hr
from A a
 where a.dt = '2018-03-10';{code}
 

 

We expect the value of date_format(cast(UNIX_TIMESTAMP(a.start_date, 
"-MM-dd'T'hh:mm:ss'Z'") * 1000 as timestamp),"-MM-dd") 

to go as the value of the partition column dt in Table B.

Instead the partition column dt of table B takes the value of partition column 
dt of table A, which 2018-03-10 in this example.

 

But in case of only select query from table A *without the insert overwrite* to 
table B, we are getting the value as expected.

 

 

*Steps to reproduce ::*

 
{code:java}
hadoop fs -mkdir -p /user/sudip.hc/hive-bug/stage/dt=2018-03-10/
hadoop fs -mkdir -p /user/sudip.hc/hive-bug/prod/
hadoop fs -put dataset.csv /user/sudip.hc/hive-bug/stage/dt=2018-03-10/
hive -f queries.hql{code}
 

 

Data ::

 
{code:java}
2016-03-01T06:08:44Z,2017-02-01T07:08:44Z
2016-03-01T06:04:46Z,2017-02-01T07:04:46Z
2016-03-01T06:10:34Z,2017-02-01T07:10:34Z
2016-03-01T06:04:46Z,2017-02-01T07:04:46Z
2016-03-01T06:04:45Z,2017-02-01T07:04:45Z{code}
 

 

Now Execute to check the differences::

1. hive -e "set hive.cli.print.header=true; select * from bug_stage;"
{code:java}
bug_stage.start_date bug_stage.end_date bug_stage.dt
2016-03-01T06:08:44Z 2017-02-01T07:08:44Z 2018-03-10
2016-03-01T06:04:46Z 2017-02-01T07:04:46Z 2018-03-10
2016-03-01T06:10:34Z 2017-02-01T07:10:34Z 2018-03-10
2016-03-01T06:04:46Z 2017-02-01T07:04:46Z 2018-03-10
2016-03-01T06:04:45Z 2017-02-01T07:04:45Z 2018-03-10{code}
2. hive -e "set hive.cli.print.header=true; select start_date, end_date, 
date_format(cast(UNIX_TIMESTAMP(start_date, "-MM-dd'T'hh:mm:ss'Z'") * 1000 
as timestamp),"-MM-dd") as dt from bug_stage;"
{code:java}
start_date end_date dt
2016-03-01T06:08:44Z 2017-02-01T07:08:44Z 2016-03-01
2016-03-01T06:04:46Z 2017-02-01T07:04:46Z 2016-03-01
2016-03-01T06:10:34Z 2017-02-01T07:10:34Z 2016-03-01
2016-03-01T06:04:46Z 2017-02-01T07:04:46Z 2016-03-01
2016-03-01T06:04:45Z 2017-02-01T07:04:45Z 2016-03-01{code}
3. hive -e "set hive.cli.print.header=true; select * from bug_prod;"
{code:java}
bug_prod.start_date bug_prod.end_date bug_prod.dt bug_prod.hr
2016-03-01 06:08:44 2017-02-01 07:08:44 2018-03-10 06
2016-03-01 06:04:46 2017-02-01 07:04:46 2018-03-10 06
2016-03-01 06:10:34 2017-02-01 07:10:34 2018-03-10 06
2016-03-01 06:04:46 2017-02-01 07:04:46 2018-03-10 06
2016-03-01 06:04:45 2017-02-01 07:04:45 2018-03-10 06{code}
 

 

 
{code:java}
 {code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18923) ValidWriteIdList snapshot per table can be cached for multi-statement transactions.

2018-03-09 Thread Sankar Hariappan (JIRA)
Sankar Hariappan created HIVE-18923:
---

 Summary: ValidWriteIdList snapshot per table can be cached for 
multi-statement transactions.
 Key: HIVE-18923
 URL: https://issues.apache.org/jira/browse/HIVE-18923
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Sankar Hariappan
Assignee: Sankar Hariappan
 Fix For: 3.0.0


Currently, for each query within a multi-statement transaction, it would 
request metastore/TxnHandler to build ValidWriteIdList snapshot for the given 
table. But, the snapshot won't change within the duration of transaction. So, 
it make sense to cache it within QueryTxnManager.

However, each txn should be able to view their own written rows. So, when a 
transaction allocates writeId to write on a table, then the cached 
ValidWriteIdList on this table should be recalculated as follows.

Original ValidWriteIdList: \{hwm=10, open/aborted=5,6} – (10 is allocated by 
txn < current txn_id).

Allocated writeId for this txn: 13 – (11 and 12 are taken by some other txn > 
current txn_id)

New ValidWriteIdList: \{hwm=12, open/aborted=5,6,11, 12} – (11, 12 are added to 
invalid list, so the snapshot remains same).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)