[jira] [Created] (HIVE-21223) CachedStore returns null partition when partition does not exist

2019-02-05 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-21223:


 Summary: CachedStore returns null partition when partition does 
not exist
 Key: HIVE-21223
 URL: https://issues.apache.org/jira/browse/HIVE-21223
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 4.0.0, 3.2.0
Reporter: Prasanth Jayachandran


CachedStore can return null partition for getPartitionWithAuth() when partition 
does not exist. null value serialization in thrift will break the connection. 
Instead if partition does not exist it should throw NoSuchObjectException.

Clients will see this exception
{code:java}
org.apache.thrift.TApplicationException: get_partition_with_auth failed: 
unknown result
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition_with_auth(ThriftHiveMetastore.java:3017)
 ~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266]
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition_with_auth(ThriftHiveMetastore.java:2990)
 ~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266]
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionWithAuthInfo(HiveMetaStoreClient.java:1679)
 ~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266]
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionWithAuthInfo(HiveMetaStoreClient.java:1671)
 ~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_181]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_181]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
 ~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266]
at com.sun.proxy.$Proxy36.getPartitionWithAuthInfo(Unknown Source) ~[?:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_181]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_181]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2976)
 ~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266]
at com.sun.proxy.$Proxy36.getPartitionWithAuthInfo(Unknown Source) ~[?:?]
at 
org.apache.hadoop.hive.metastore.SynchronizedMetaStoreClient.getPartitionWithAuthInfo(SynchronizedMetaStoreClient.java:101)
 ~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266]
at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2870) 
~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266]
at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2835) 
~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266]
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1950) 
~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266]
at org.apache.hadoop.hive.ql.metadata.Hive$4.call(Hive.java:2490) 
~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266]
at org.apache.hadoop.hive.ql.metadata.Hive$4.call(Hive.java:2481) 
~[hive-exec-3.1.0.3.0.100.0-266.jar:3.1.0.3.0.100.0-266]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_181]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_181]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 69903: HIVE-21214

2019-02-05 Thread Deepak Jaiswal


> On Feb. 5, 2019, 11:50 p.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
> > Lines 1876 (patched)
> > 
> >
> > nit: add the filenames to the error message

will do.


- Deepak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69903/#review212580
---


On Feb. 5, 2019, 10:10 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69903/
> ---
> 
> (Updated Feb. 5, 2019, 10:10 p.m.)
> 
> 
> Review request for hive and Jason Dere.
> 
> 
> Bugs: HIVE-21214
> https://issues.apache.org/jira/browse/HIVE-21214
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> MoveTask : Use attemptId instead of file size for deduplication of files 
> compareTempOrDuplicateFiles()
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 8937b43811 
> 
> 
> Diff: https://reviews.apache.org/r/69903/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



Re: Review Request 69903: HIVE-21214

2019-02-05 Thread Deepak Jaiswal


> On Feb. 5, 2019, 11:53 p.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
> > Line 1829 (original), 1838 (patched)
> > 
> >
> > No "if" - this dedup strategy does not work with speculative execution 
> > enabled.

Based on my understanding these are the two scenarios,

1. speculative execution succeeds, it has attempt ID 1. The original attempt ID 
is 0. The logic picks speculative one, regardless of original one's outcome. 
This works fine.
2. speculative execution fails, throws exception.

Let me know I am getting it wrong.


- Deepak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69903/#review212581
---


On Feb. 5, 2019, 10:10 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69903/
> ---
> 
> (Updated Feb. 5, 2019, 10:10 p.m.)
> 
> 
> Review request for hive and Jason Dere.
> 
> 
> Bugs: HIVE-21214
> https://issues.apache.org/jira/browse/HIVE-21214
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> MoveTask : Use attemptId instead of file size for deduplication of files 
> compareTempOrDuplicateFiles()
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 8937b43811 
> 
> 
> Diff: https://reviews.apache.org/r/69903/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



Re: Review Request 69903: HIVE-21214

2019-02-05 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69903/#review212581
---




ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
Line 1829 (original), 1838 (patched)


No "if" - this dedup strategy does not work with speculative execution 
enabled.


- Jason Dere


On Feb. 5, 2019, 10:10 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69903/
> ---
> 
> (Updated Feb. 5, 2019, 10:10 p.m.)
> 
> 
> Review request for hive and Jason Dere.
> 
> 
> Bugs: HIVE-21214
> https://issues.apache.org/jira/browse/HIVE-21214
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> MoveTask : Use attemptId instead of file size for deduplication of files 
> compareTempOrDuplicateFiles()
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 8937b43811 
> 
> 
> Diff: https://reviews.apache.org/r/69903/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



Re: Review Request 69903: HIVE-21214

2019-02-05 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69903/#review212580
---




ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
Lines 1876 (patched)


nit: add the filenames to the error message


- Jason Dere


On Feb. 5, 2019, 10:10 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69903/
> ---
> 
> (Updated Feb. 5, 2019, 10:10 p.m.)
> 
> 
> Review request for hive and Jason Dere.
> 
> 
> Bugs: HIVE-21214
> https://issues.apache.org/jira/browse/HIVE-21214
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> MoveTask : Use attemptId instead of file size for deduplication of files 
> compareTempOrDuplicateFiles()
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 8937b43811 
> 
> 
> Diff: https://reviews.apache.org/r/69903/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



[jira] [Created] (HIVE-21222) ACID: When there are no delete deltas skip finding min max keys

2019-02-05 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-21222:


 Summary: ACID: When there are no delete deltas skip finding min 
max keys
 Key: HIVE-21222
 URL: https://issues.apache.org/jira/browse/HIVE-21222
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0, 3.2.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


We create an orc reader in VectorizedOrcAcidRowBatchReader.findMinMaxKeys 
(which will read 16K footer) even for cases where delete deltas does not exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21221) Make HS2 and LLAP consistent - Bring up LLAP WebUI in test mode if WebUI port is configured

2019-02-05 Thread Oliver Draese (JIRA)
Oliver Draese created HIVE-21221:


 Summary: Make HS2 and LLAP consistent - Bring up LLAP WebUI in 
test mode if WebUI port is configured
 Key: HIVE-21221
 URL: https://issues.apache.org/jira/browse/HIVE-21221
 Project: Hive
  Issue Type: Improvement
  Components: llap
Reporter: Oliver Draese
Assignee: Oliver Draese


When HiveServer2 comes up, it skips the start of the WebUI if
1) hive.in.test is set to true
AND
2) the WebUI port is not specified or default (hive.server2.webui.port)
 
Right now, on LLAP daemon start, it is only checked if hive is in test 
(condition 1) above.
 
The LLAP Daemon start up code (to skip WebUI creation) should be consistent 
with HS2, therefore if a port is specified (other than the default), the WebUI 
should also be started in test mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Review Request 69903: HIVE-21214

2019-02-05 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69903/
---

Review request for hive and Jason Dere.


Bugs: HIVE-21214
https://issues.apache.org/jira/browse/HIVE-21214


Repository: hive-git


Description
---

MoveTask : Use attemptId instead of file size for deduplication of files 
compareTempOrDuplicateFiles()


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 8937b43811 


Diff: https://reviews.apache.org/r/69903/diff/1/


Testing
---


Thanks,

Deepak Jaiswal



[jira] [Created] (HIVE-21220) add_partitions in HiveMetaStoreClient should not filter the response

2019-02-05 Thread Na Li (JIRA)
Na Li created HIVE-21220:


 Summary: add_partitions in HiveMetaStoreClient should not filter 
the response
 Key: HIVE-21220
 URL: https://issues.apache.org/jira/browse/HIVE-21220
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 4.0.0
Reporter: Na Li


The response from HMS server for add_partitions in HiveMetaStoreClient.java is 
filtered. That could cause ghost partitions or failure that is hard to 
understand.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21219) Cleanup pom.xml remote repository references

2019-02-05 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-21219:
---

 Summary: Cleanup pom.xml remote repository references
 Key: HIVE-21219
 URL: https://issues.apache.org/jira/browse/HIVE-21219
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


some of them seem to be not needed anymore and if I enable "cache clearing" for 
ptest the datanucleus repository sometimes returns errors like:

{code}
[ERROR] Failed to execute goal on project hive-shims-common: Could not resolve 
dependencies for project 
org.apache.hive.shims:hive-shims-common:jar:4.0.0-SNAPSHOT: The following 
artifacts could not be resolved: 
org.codehaus.jackson:jackson-core-asl:jar:1.9.13, 
org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13: Could not find artifact 
org.codehaus.jackson:jackson-core-asl:jar:1.9.13 in datanucleus 
(http://www.datanucleus.org/downloads/maven2) -> [Help 1]
{code}

It happens for different artifacts ; but always with the "datanucleus" remote 
repoository.

https://issues.apache.org/jira/browse/HIVE-21001?focusedCommentId=16760283&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16760283




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2019-02-05 Thread Milan Baran (JIRA)
Milan Baran created HIVE-21218:
--

 Summary: KafkaSerDe doesn't support topics created via Confluent 
Avro serializer
 Key: HIVE-21218
 URL: https://issues.apache.org/jira/browse/HIVE-21218
 Project: Hive
  Issue Type: Bug
  Components: kafka integration, Serializers/Deserializers
Affects Versions: 3.1.1
Reporter: Milan Baran


According to [Google 
groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A] 
the Confluent avro serialzier uses propertiary format for kafka value - 
<4 bytes of schema ID>. 

This format does not cause any problem for Confluent kafka deserializer which 
respect the format however for hive kafka handler its bit a problem to 
correctly deserialize kafka value, because Hive uses custom deserializer from 
bytes to objects and ignores kafka consumer ser/deser classes provided via 
table property.

It would be nice to support Confluent format with magic byte.

Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21217) Optimize range calculation for PTF

2019-02-05 Thread Adam Szita (JIRA)
Adam Szita created HIVE-21217:
-

 Summary: Optimize range calculation for PTF
 Key: HIVE-21217
 URL: https://issues.apache.org/jira/browse/HIVE-21217
 Project: Hive
  Issue Type: Improvement
Reporter: Adam Szita
Assignee: Adam Szita


During window function execution Hive has to iterate on neighbouring rows of 
the current row to find the beginning and end of the proper range (on which the 
aggregation will be executed).

When we're using range based windows and have many rows with a certain key 
value this can take a lot of time. (e.g. partition size of 80M, in which we 
have 2 ranges of 40M rows according to the orderby column: within these 40M 
rowsets we're doing 40M x 40M/2 steps.. which is of n^2 time complexity)

I propose to introduce a cache that keeps track of already calculated range 
ends so it can be reused in future scans.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21215) Read Parquet INT64 timestamp

2019-02-05 Thread Karen Coppage (JIRA)
Karen Coppage created HIVE-21215:


 Summary: Read Parquet INT64 timestamp
 Key: HIVE-21215
 URL: https://issues.apache.org/jira/browse/HIVE-21215
 Project: Hive
  Issue Type: New Feature
Reporter: Karen Coppage
Assignee: Marta Kuczora


[WIP]
This patch enables Hive to start reading timestamps from Parquet written with 
the new semantics:

With Parquet version 1.11, a new timestamp LogicalType with base INT64 and the 
following metadata is introduced:
* boolean isAdjustedToUtc: marks whether the timestamp is converted to UTC (aka 
Instant semantics) or not (LocalDateTime semantics).
* enum TimeUnit (NANOS, MICROS, MILLIS): granularity of timestamp

Upon reading, the semantics of these new timestamps will be determined by their 
metadata, while the semantics of INT96 timestamps will continue to be deduced 
from the writer metadata.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21216) Write Parquet INT64 timestamp

2019-02-05 Thread Karen Coppage (JIRA)
Karen Coppage created HIVE-21216:


 Summary: Write Parquet INT64 timestamp
 Key: HIVE-21216
 URL: https://issues.apache.org/jira/browse/HIVE-21216
 Project: Hive
  Issue Type: New Feature
  Components: Hive
Reporter: Karen Coppage
Assignee: Karen Coppage


[WIP]
This patch enables Hive to start writing int64 timestamps in Parquet.

With Parquet version 1.11, a new timestamp LogicalType with base INT64 and the 
following metadata is introduced:
boolean isAdjustedToUtc: marks whether the timestamp is converted to UTC (aka 
Instant semantics) or not (LocalDateTime semantics)
enum TimeUnit (NANOS, MICROS, MILLIS): granularity of timestamp

The timestamp will have LocalDateTime semantics (not converted to UTC).
Time unit (granularity) will be determined by the user. Default is milliseconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21214) MoveTask : Use attemptId instead of file size for deduplication of files compareTempOrDuplicateFiles()

2019-02-05 Thread Deepak Jaiswal (JIRA)
Deepak Jaiswal created HIVE-21214:
-

 Summary: MoveTask : Use attemptId instead of file size for 
deduplication of files compareTempOrDuplicateFiles()
 Key: HIVE-21214
 URL: https://issues.apache.org/jira/browse/HIVE-21214
 Project: Hive
  Issue Type: Bug
Reporter: Deepak Jaiswal
Assignee: Deepak Jaiswal


For a given task, if there is more than one attempt then deduplication logic 
kicks in.
{noformat}
Utilities.compareTempOrDuplicateFiles(){noformat}
The logic uses file size and picks the one with largest size. This logic is 
very fragile.

ideally, it should pick the successful attempt's file.

However, a simpler solution is to pick the newest attempt and also checking the 
file size for the newest attempt is the largest.

If not, throw an exception.

 

cc [~gopalv] [~thejas] [~jdere] [~ekoifman]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)