[jira] [Commented] (HIVE-22760) Add Clock caching eviction based strategy

2020-03-30 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071066#comment-17071066
 ] 

Slim Bouguerra commented on HIVE-22760:
---

[~prasanth_j] can you take a final look on this 
https://github.com/apache/hive/pull/944
[~odraese] has taken a look on this already. 
Thanks

> Add Clock caching eviction based strategy
> -
>
> Key: HIVE-22760
> URL: https://issues.apache.org/jira/browse/HIVE-22760
> Project: Hive
>  Issue Type: New Feature
>  Components: llap
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22760.2.patch, HIVE-22760.3.patch, 
> HIVE-22760.3.patch, HIVE-22760.patch, HIVE-22760.patch
>
>
> LRFU is the current default right now.
> The main issue with such Strategy is that it has a very high memory overhead, 
> in addition to that, most of the accounting has to happen under locks thus 
> can be source of contentions.
> Add Simpler policy like clock, can help with both issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22760) Add Clock caching eviction based strategy

2020-03-26 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22760:
--
Attachment: HIVE-22760.3.patch

> Add Clock caching eviction based strategy
> -
>
> Key: HIVE-22760
> URL: https://issues.apache.org/jira/browse/HIVE-22760
> Project: Hive
>  Issue Type: New Feature
>  Components: llap
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22760.2.patch, HIVE-22760.3.patch, 
> HIVE-22760.3.patch, HIVE-22760.patch, HIVE-22760.patch
>
>
> LRFU is the current default right now.
> The main issue with such Strategy is that it has a very high memory overhead, 
> in addition to that, most of the accounting has to happen under locks thus 
> can be source of contentions.
> Add Simpler policy like clock, can help with both issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22760) Add Clock caching eviction based strategy

2020-03-23 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22760:
--
Attachment: HIVE-22760.3.patch

> Add Clock caching eviction based strategy
> -
>
> Key: HIVE-22760
> URL: https://issues.apache.org/jira/browse/HIVE-22760
> Project: Hive
>  Issue Type: New Feature
>  Components: llap
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22760.2.patch, HIVE-22760.3.patch, 
> HIVE-22760.patch, HIVE-22760.patch
>
>
> LRFU is the current default right now.
> The main issue with such Strategy is that it has a very high memory overhead, 
> in addition to that, most of the accounting has to happen under locks thus 
> can be source of contentions.
> Add Simpler policy like clock, can help with both issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2020-03-17 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: HIVE-22476.9.patch

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch, HIVE-22476.6.patch, HIVE-22476.7.patch, 
> HIVE-22476.7.patch, HIVE-22476.8.patch, HIVE-22476.8.patch, 
> HIVE-22476.8.patch, HIVE-22476.9.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23031) Add option to enable transparent rewrite of count(distinct) into sketch functions

2020-03-16 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060299#comment-17060299
 ] 

Slim Bouguerra commented on HIVE-23031:
---

I do not see how this can work well given the followings: 
One sketches return an approximate  and user want exact reporting.
Second how you will be mapping the sketching implementation to actual execution 
given that there is multiple sketches algorithms
Finally each sketch algorithm has some parameters like number of buckets etc 
how are you gonna allow the user to inject that.

In a nutshell am saying let's treat whatever sketch you have in mind as a UDF 
and maybe add some as defaults udf that are trusted by the system.

> Add option to enable transparent rewrite of count(distinct) into sketch 
> functions
> -
>
> Key: HIVE-23031
> URL: https://issues.apache.org/jira/browse/HIVE-23031
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-12 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-21218:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks [~mcginnda]
https://git-wip-us.apache.org/repos/asf?p=hive.git;a=commit;h=48bc9e389c9b88a6e25bec3fd19ed29130f3f156

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> 0001-HIVE-21818-Adding-ability-for-Kafka-Handler-to-proce.patch, 
> HIVE-21218.10.patch, HIVE-21218.11.patch, HIVE-21218.12.patch, 
> HIVE-21218.13.patch, HIVE-21218.2.patch, HIVE-21218.3.patch, 
> HIVE-21218.4.patch, HIVE-21218.5.patch, HIVE-21218.6.patch, 
> HIVE-21218.7.patch, HIVE-21218.8.patch, HIVE-21218.9.patch, HIVE-21218.patch
>
>  Time Spent: 15h
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21218) KafkaSerDe doesn't support topics created via Confluent Avro serializer

2020-03-11 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057394#comment-17057394
 ] 

Slim Bouguerra commented on HIVE-21218:
---

[~mcginnda], I do not see a green run for the last patch, aka patch 
https://issues.apache.org/jira/secure/attachment/12996112/HIVE-21218.12.patch 
can you please re submit it again to get a new test run ?

> KafkaSerDe doesn't support topics created via Confluent Avro serializer
> ---
>
> Key: HIVE-21218
> URL: https://issues.apache.org/jira/browse/HIVE-21218
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration, Serializers/Deserializers
>Affects Versions: 3.1.1
>Reporter: Milan Baran
>Assignee: David McGinnis
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> 0001-HIVE-21818-Adding-ability-for-Kafka-Handler-to-proce.patch, 
> HIVE-21218.10.patch, HIVE-21218.11.patch, HIVE-21218.12.patch, 
> HIVE-21218.2.patch, HIVE-21218.3.patch, HIVE-21218.4.patch, 
> HIVE-21218.5.patch, HIVE-21218.6.patch, HIVE-21218.7.patch, 
> HIVE-21218.8.patch, HIVE-21218.9.patch, HIVE-21218.patch
>
>  Time Spent: 15h
>  Remaining Estimate: 0h
>
> According to [Google 
> groups|https://groups.google.com/forum/#!topic/confluent-platform/JYhlXN0u9_A]
>  the Confluent avro serialzier uses propertiary format for kafka value - 
> <4 bytes of schema ID> conforms to schema>. 
> This format does not cause any problem for Confluent kafka deserializer which 
> respect the format however for hive kafka handler its bit a problem to 
> correctly deserialize kafka value, because Hive uses custom deserializer from 
> bytes to objects and ignores kafka consumer ser/deser classes provided via 
> table property.
> It would be nice to support Confluent format with magic byte.
> Also it would be great to support Schema registry as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22760) Add Clock caching eviction based strategy

2020-03-07 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22760:
--
Attachment: HIVE-22760.2.patch

> Add Clock caching eviction based strategy
> -
>
> Key: HIVE-22760
> URL: https://issues.apache.org/jira/browse/HIVE-22760
> Project: Hive
>  Issue Type: New Feature
>  Components: llap
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22760.2.patch, HIVE-22760.patch, HIVE-22760.patch
>
>
> LRFU is the current default right now.
> The main issue with such Strategy is that it has a very high memory overhead, 
> in addition to that, most of the accounting has to happen under locks thus 
> can be source of contentions.
> Add Simpler policy like clock, can help with both issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22760) Add Clock caching eviction based strategy

2020-03-07 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22760:
--
Attachment: HIVE-22760.patch

> Add Clock caching eviction based strategy
> -
>
> Key: HIVE-22760
> URL: https://issues.apache.org/jira/browse/HIVE-22760
> Project: Hive
>  Issue Type: New Feature
>  Components: llap
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22760.patch, HIVE-22760.patch
>
>
> LRFU is the current default right now.
> The main issue with such Strategy is that it has a very high memory overhead, 
> in addition to that, most of the accounting has to happen under locks thus 
> can be source of contentions.
> Add Simpler policy like clock, can help with both issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22760) Add Clock caching eviction based strategy

2020-03-06 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22760:
--
Attachment: HIVE-22760.patch

> Add Clock caching eviction based strategy
> -
>
> Key: HIVE-22760
> URL: https://issues.apache.org/jira/browse/HIVE-22760
> Project: Hive
>  Issue Type: New Feature
>  Components: llap
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22760.patch
>
>
> LRFU is the current default right now.
> The main issue with such Strategy is that it has a very high memory overhead, 
> in addition to that, most of the accounting has to happen under locks thus 
> can be source of contentions.
> Add Simpler policy like clock, can help with both issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22760) Add Clock caching eviction based strategy

2020-03-06 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22760:
--
Status: Patch Available  (was: Open)

> Add Clock caching eviction based strategy
> -
>
> Key: HIVE-22760
> URL: https://issues.apache.org/jira/browse/HIVE-22760
> Project: Hive
>  Issue Type: New Feature
>  Components: llap
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22760.patch
>
>
> LRFU is the current default right now.
> The main issue with such Strategy is that it has a very high memory overhead, 
> in addition to that, most of the accounting has to happen under locks thus 
> can be source of contentions.
> Add Simpler policy like clock, can help with both issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22760) Add Clock caching eviction based strategy

2020-03-06 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053844#comment-17053844
 ] 

Slim Bouguerra commented on HIVE-22760:
---

[~szita] can you take a look at this ?

> Add Clock caching eviction based strategy
> -
>
> Key: HIVE-22760
> URL: https://issues.apache.org/jira/browse/HIVE-22760
> Project: Hive
>  Issue Type: New Feature
>  Components: llap
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22760.patch
>
>
> LRFU is the current default right now.
> The main issue with such Strategy is that it has a very high memory overhead, 
> in addition to that, most of the accounting has to happen under locks thus 
> can be source of contentions.
> Add Simpler policy like clock, can help with both issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-22760) Add Clock caching eviction based strategy

2020-03-06 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053844#comment-17053844
 ] 

Slim Bouguerra edited comment on HIVE-22760 at 3/7/20, 1:50 AM:


[~szita] can please you take a look at this ?


was (Author: bslim):
[~szita] can you take a look at this ?

> Add Clock caching eviction based strategy
> -
>
> Key: HIVE-22760
> URL: https://issues.apache.org/jira/browse/HIVE-22760
> Project: Hive
>  Issue Type: New Feature
>  Components: llap
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22760.patch
>
>
> LRFU is the current default right now.
> The main issue with such Strategy is that it has a very high memory overhead, 
> in addition to that, most of the accounting has to happen under locks thus 
> can be source of contentions.
> Add Simpler policy like clock, can help with both issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22437) LLAP Metadata cache NPE on locking metadata.

2020-03-06 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22437:
--
Attachment: HIVE-22437.patch

> LLAP Metadata cache NPE on locking metadata.
> 
>
> Key: HIVE-22437
> URL: https://issues.apache.org/jira/browse/HIVE-22437
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22437.patch, HIVE-22437.patch
>
>
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.unlockSingleBuffer(MetadataCache.java:464)
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.lockBuffer(MetadataCache.java:409)
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.lockOldVal(MetadataCache.java:314)
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putInternal(MetadataCache.java:287)
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:199)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22988) LLAP: If consistent splits is disabled ordering instances is not required

2020-03-06 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053559#comment-17053559
 ] 

Slim Bouguerra commented on HIVE-22988:
---

+1

> LLAP: If consistent splits is disabled ordering instances is not required
> -
>
> Key: HIVE-22988
> URL: https://issues.apache.org/jira/browse/HIVE-22988
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22988.1.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> LlapTaskSchedulerService always gets consistent ordered list of all LLAP 
> instances even if consistent splits is disabled. When consistent split is 
> disabled ordering isn't really useful as there is no cache locality. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22937) LLAP : Use unique names for the zip and tarball bundle for LLAP

2020-03-05 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22937:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

https://git-wip-us.apache.org/repos/asf?p=hive.git;a=commit;h=1fe0bd2298ece4eb37a89c5d9e983d597e2b93eb

> LLAP : Use unique names for the zip and tarball bundle for LLAP
> ---
>
> Key: HIVE-22937
> URL: https://issues.apache.org/jira/browse/HIVE-22937
> Project: Hive
>  Issue Type: Bug
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
> Attachments: HIVE-22937.1.patch
>
>
> LLAP : Use unique names for the zip and tarball bundle for LLAP



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22583) LLAP cache always misses with non-vectorized serde readers such as OpenCSV

2020-02-28 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047856#comment-17047856
 ] 

Slim Bouguerra commented on HIVE-22583:
---

sorry i think am missing the fact that that file is not yours.
+1 

> LLAP cache always misses with non-vectorized serde readers such as OpenCSV
> --
>
> Key: HIVE-22583
> URL: https://issues.apache.org/jira/browse/HIVE-22583
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
> Attachments: HIVE-22583.0.patch, HIVE-22583.1.patch, 
> HIVE-22583.2.patch, HIVE-22583.3.patch
>
>
> Although after the first read LLAP cache stores data of tables that are not 
> using the LazySimple serde, the stored data is then never used in the future 
> subsequent queries, causing a full cache miss and re-read each time.
> Problem is rooted in SerdeEncodedDataReader#cacheFileData is not taking care 
> of creating an entry for the root/struct column of the table. The only cases 
> this is taken care of are when a vectorized reader is used _(e.g. 
> LazySimpleSerde's LazySimpleDeserializeRead)_, where 
> SerdeEncodedDataReader#processAsyncCacheData takes care of this.
> This can be reproduced by either using a custom serde, like OpenCSV or using 
> LazySimpleSerde, but turning off _hive.llap.io.encode.vector.serde.enabled_.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22583) LLAP cache always misses with non-vectorized serde readers such as OpenCSV

2020-02-28 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047846#comment-17047846
 ] 

Slim Bouguerra commented on HIVE-22583:
---

[~szita] can  you please remove the use of the counters form the q files ? 
thought we have added option to crash if the cache has miss ?

> LLAP cache always misses with non-vectorized serde readers such as OpenCSV
> --
>
> Key: HIVE-22583
> URL: https://issues.apache.org/jira/browse/HIVE-22583
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
> Attachments: HIVE-22583.0.patch, HIVE-22583.1.patch, 
> HIVE-22583.2.patch, HIVE-22583.3.patch
>
>
> Although after the first read LLAP cache stores data of tables that are not 
> using the LazySimple serde, the stored data is then never used in the future 
> subsequent queries, causing a full cache miss and re-read each time.
> Problem is rooted in SerdeEncodedDataReader#cacheFileData is not taking care 
> of creating an entry for the root/struct column of the table. The only cases 
> this is taken care of are when a vectorized reader is used _(e.g. 
> LazySimpleSerde's LazySimpleDeserializeRead)_, where 
> SerdeEncodedDataReader#processAsyncCacheData takes care of this.
> This can be reproduced by either using a custom serde, like OpenCSV or using 
> LazySimpleSerde, but turning off _hive.llap.io.encode.vector.serde.enabled_.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22933) Allow connecting kerberos-enabled Hive to connect to a non-kerberos druid cluster

2020-02-28 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047845#comment-17047845
 ] 

Slim Bouguerra commented on HIVE-22933:
---

looks good to me please fix the unused import.

> Allow connecting kerberos-enabled Hive to connect to a non-kerberos druid 
> cluster
> -
>
> Key: HIVE-22933
> URL: https://issues.apache.org/jira/browse/HIVE-22933
> Project: Hive
>  Issue Type: Bug
>Reporter: Nishant Bangarwa
>Assignee: Nishant Bangarwa
>Priority: Major
> Attachments: HIVE-22933.patch
>
>
> Currently, If kerberos is enabled for hive, it can only connect to external 
> druid clusters which are kerberos enabled, Since the Druid client used to 
> connect to druid is always KerberosHTTPClient, This task is to allow a 
> kerberos enabled hiverserver2 to connect to non-kerberized druid cluster. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22934) Hive server interactive log counters to error stream

2020-02-28 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22934:
--
Attachment: HIVE-22934.patch

> Hive server interactive log counters to error stream
> 
>
> Key: HIVE-22934
> URL: https://issues.apache.org/jira/browse/HIVE-22934
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22934.patch
>
>
> Hive server is logging the console output to system error stream.
> This need to be fixed because 
> First we do not roll the file.
> Second writing to such file is done sequential and can lead to throttle/poor 
> perf.
> {code}
> -rw-r--r--  1 hive hadoop 9.5G Feb 26 17:22 hive-server2-interactive.err
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22934) Hive server interactive log counters to error stream

2020-02-28 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22934:
--
Assignee: Slim Bouguerra
  Status: Patch Available  (was: Open)

> Hive server interactive log counters to error stream
> 
>
> Key: HIVE-22934
> URL: https://issues.apache.org/jira/browse/HIVE-22934
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
>
> Hive server is logging the console output to system error stream.
> This need to be fixed because 
> First we do not roll the file.
> Second writing to such file is done sequential and can lead to throttle/poor 
> perf.
> {code}
> -rw-r--r--  1 hive hadoop 9.5G Feb 26 17:22 hive-server2-interactive.err
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22937) LLAP : Use unique names for the zip and tarball bundle for LLAP

2020-02-26 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17046059#comment-17046059
 ] 

Slim Bouguerra commented on HIVE-22937:
---

+1

> LLAP : Use unique names for the zip and tarball bundle for LLAP
> ---
>
> Key: HIVE-22937
> URL: https://issues.apache.org/jira/browse/HIVE-22937
> Project: Hive
>  Issue Type: Bug
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
> Attachments: HIVE-22937.1.patch
>
>
> LLAP : Use unique names for the zip and tarball bundle for LLAP



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22359) LLAP: when a node restarts with the exact same host/port in kubernetes it is not detected as a task failure

2020-02-18 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17039451#comment-17039451
 ] 

Slim Bouguerra commented on HIVE-22359:
---

LGTM +1 

> LLAP: when a node restarts with the exact same host/port in kubernetes it is 
> not detected as a task failure
> ---
>
> Key: HIVE-22359
> URL: https://issues.apache.org/jira/browse/HIVE-22359
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal Vijayaraghavan
>Assignee: Prasanth Jayachandran
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22359.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> │ <14>1 2019-10-16T22:16:39.233Z 
> query-coordinator-0-5.query-coordinator-0-service.compute-1569601454-l2x9.svc.cluster.local
>  query-coordinator 1 461e5ad9-f05f-11e9-85f7-06e84765763e [mdc@18060 
> class="te │
> │ zplugins.LlapTaskCommunicator" level="INFO" thread="IPC Server handler 4 on 
> 3"] The tasks we expected to be on the node are not there: 
> attempt_1569601631911__1_04_34_0, attempt_15696016319 │
> │ 11__1_04_71_0, attempt_1569601631911__1_04_000191_0, 
> attempt_1569601631911__1_04_000211_0, 
> attempt_1569601631911__1_04_000229_0, 
> attempt_1569601631911__1_04_000231_0, attempt_1 │
> │ 569601631911__1_04_000235_0, attempt_1569601631911__1_04_000242_0, 
> attempt_1569601631911__1_04_000160_1, 
> attempt_1569601631911__1_04_12_2, 
> attempt_1569601631911__1_04_03_2, │
> │  attempt_1569601631911__1_04_56_2, 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22821) Add necessary endpoints for proactive cache eviction

2020-02-12 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035643#comment-17035643
 ] 

Slim Bouguerra commented on HIVE-22821:
---

Hi [~szita] can you please submit a pull request,
My first comment is to avoid mangling the llap caching the proto buff sutff, i 
recommend something like this
{code} 
/**
   * @param predicate filter selecting the buffers to be evicted
   * @return amount of evicted bytes.
   */
  long evict(Predicate predicate);

{code}

> Add necessary endpoints for proactive cache eviction
> 
>
> Key: HIVE-22821
> URL: https://issues.apache.org/jira/browse/HIVE-22821
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
> Attachments: HIVE-22821.0.patch, HIVE-22821.1.patch, 
> HIVE-22821.2.patch
>
>
> Implement the parts required for iHS2 -> LLAP daemons communication:
>  * protobuf message schema and endpoints
>  * Hive configuration
>  * for use cases:
>  ** dropping db
>  ** dropping table
>  ** dropping partition from a table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22760) Add Clock caching eviction based strategy

2020-01-22 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra reassigned HIVE-22760:
-


> Add Clock caching eviction based strategy
> -
>
> Key: HIVE-22760
> URL: https://issues.apache.org/jira/browse/HIVE-22760
> Project: Hive
>  Issue Type: New Feature
>  Components: llap
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
>
> LRFU is the current default right now.
> The main issue with such Strategy is that it has a very high memory overhead, 
> in addition to that, most of the accounting has to happen under locks thus 
> can be source of contentions.
> Add Simpler policy like clock, can help with both issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22437) LLAP Metadata cache NPE on locking metadata.

2020-01-22 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22437:
--
Attachment: HIVE-22437.patch

> LLAP Metadata cache NPE on locking metadata.
> 
>
> Key: HIVE-22437
> URL: https://issues.apache.org/jira/browse/HIVE-22437
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22437.patch
>
>
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.unlockSingleBuffer(MetadataCache.java:464)
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.lockBuffer(MetadataCache.java:409)
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.lockOldVal(MetadataCache.java:314)
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putInternal(MetadataCache.java:287)
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:199)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22437) LLAP Metadata cache NPE on locking metadata.

2020-01-22 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22437:
--
Status: Patch Available  (was: Open)

> LLAP Metadata cache NPE on locking metadata.
> 
>
> Key: HIVE-22437
> URL: https://issues.apache.org/jira/browse/HIVE-22437
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
>
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.unlockSingleBuffer(MetadataCache.java:464)
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.lockBuffer(MetadataCache.java:409)
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.lockOldVal(MetadataCache.java:314)
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putInternal(MetadataCache.java:287)
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:199)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22437) LLAP Metadata cache NPE on locking metadata.

2020-01-22 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22437:
--
Component/s: llap

> LLAP Metadata cache NPE on locking metadata.
> 
>
> Key: HIVE-22437
> URL: https://issues.apache.org/jira/browse/HIVE-22437
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
>
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.unlockSingleBuffer(MetadataCache.java:464)
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.lockBuffer(MetadataCache.java:409)
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.lockOldVal(MetadataCache.java:314)
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putInternal(MetadataCache.java:287)
>   at 
> org.apache.hadoop.hive.llap.io.metadata.MetadataCache.putFileMetadata(MetadataCache.java:199)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21334) Eviction of blocks is major source of blockage for allocation request. Allocation path need to be lock-free.

2020-01-21 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020750#comment-17020750
 ] 

Slim Bouguerra commented on HIVE-21334:
---

The idea to amortize this operation is keep an IO thread local buffer stash of 
the most frequent allocations.
In practice we should take advantage that all the allocation from a given split 
will be the same size since the orc file buffer sizes is the same for all the 
tiles.

> Eviction of blocks is major source of blockage for allocation request. 
> Allocation path need to be lock-free.
> 
>
> Key: HIVE-21334
> URL: https://issues.apache.org/jira/browse/HIVE-21334
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: lock_profile.png
>
>
> Eviction is getting in the way of memory allocation when the query fragment 
> has no cache entry.
> This is cause major bottleneck and waist lot of cpu cycles.
> To fix this is first we can batch the evictions to avoid taking the lock 
> multiple times.
> The memory manager need to be able to anticipate such issue and keep some 
> spare space for queries that do not have any hit.
> {code}
> IO-Elevator-Thread-12  Blocked CPU usage on sample: 692ms
>   
> org.apache.hadoop.hive.llap.cache.LowLevelLrfuCachePolicy.evictSomeBlocks(long)
>  LowLevelLrfuCachePolicy.java:264
>   
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.evictSomeBlocks(long) 
> CacheContentsTracker.java:194
>   
> org.apache.hadoop.hive.llap.cache.LowLevelCacheMemoryManager.reserveMemory(long,
>  boolean, AtomicBoolean) LowLevelCacheMemoryManager.java:87
>   
> org.apache.hadoop.hive.llap.cache.LowLevelCacheMemoryManager.reserveMemory(long,
>  AtomicBoolean) LowLevelCacheMemoryManager.java:63
>   
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(MemoryBuffer[],
>  int, Allocator$BufferObjectFactory, AtomicBoolean) BuddyAllocator.java:263
>   
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.allocateMultiple(MemoryBuffer[],
>  int) EncodedReaderImpl.java:1295
>   
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedStream(long,
>  DiskRangeList, long, long, EncodedColumnBatch$ColumnStreamData, long, long, 
> IdentityHashMap) EncodedReaderImpl.java:923
>   
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(int,
>  StripeInformation, OrcProto$RowIndex[], List, List, boolean[], boolean[], 
> Consumer) EncodedReaderImpl.java:501
>   
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead() 
> OrcEncodedDataReader.java:407
>   org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run() 
> OrcEncodedDataReader.java:266
>   org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run() 
> OrcEncodedDataReader.java:263
>   java.security.AccessController.doPrivileged(PrivilegedExceptionAction, 
> AccessControlContext) AccessController.java (native)
>   javax.security.auth.Subject.doAs(Subject, PrivilegedExceptionAction) 
> Subject.java:422
>   
> org.apache.hadoop.security.UserGroupInformation.doAs(PrivilegedExceptionAction)
>  UserGroupInformation.java:1688
>   org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal() 
> OrcEncodedDataReader.java:263
>   org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal() 
> OrcEncodedDataReader.java:110
>   org.apache.tez.common.CallableWithNdc.call() CallableWithNdc.java:36
>   
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call()
>  StatsRecordingThreadPool.java:110
>   java.util.concurrent.FutureTask.run() FutureTask.java:266
>   
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1142
>   java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:617
>   java.lang.Thread.run() Thread.java:745 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21334) Eviction of blocks is major source of blockage for allocation request. Allocation path need to be lock-free.

2020-01-21 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-21334:
--
Component/s: llap

> Eviction of blocks is major source of blockage for allocation request. 
> Allocation path need to be lock-free.
> 
>
> Key: HIVE-21334
> URL: https://issues.apache.org/jira/browse/HIVE-21334
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: lock_profile.png
>
>
> Eviction is getting in the way of memory allocation when the query fragment 
> has no cache entry.
> This is cause major bottleneck and waist lot of cpu cycles.
> To fix this is first we can batch the evictions to avoid taking the lock 
> multiple times.
> The memory manager need to be able to anticipate such issue and keep some 
> spare space for queries that do not have any hit.
> {code}
> IO-Elevator-Thread-12  Blocked CPU usage on sample: 692ms
>   
> org.apache.hadoop.hive.llap.cache.LowLevelLrfuCachePolicy.evictSomeBlocks(long)
>  LowLevelLrfuCachePolicy.java:264
>   
> org.apache.hadoop.hive.llap.cache.CacheContentsTracker.evictSomeBlocks(long) 
> CacheContentsTracker.java:194
>   
> org.apache.hadoop.hive.llap.cache.LowLevelCacheMemoryManager.reserveMemory(long,
>  boolean, AtomicBoolean) LowLevelCacheMemoryManager.java:87
>   
> org.apache.hadoop.hive.llap.cache.LowLevelCacheMemoryManager.reserveMemory(long,
>  AtomicBoolean) LowLevelCacheMemoryManager.java:63
>   
> org.apache.hadoop.hive.llap.cache.BuddyAllocator.allocateMultiple(MemoryBuffer[],
>  int, Allocator$BufferObjectFactory, AtomicBoolean) BuddyAllocator.java:263
>   
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.allocateMultiple(MemoryBuffer[],
>  int) EncodedReaderImpl.java:1295
>   
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedStream(long,
>  DiskRangeList, long, long, EncodedColumnBatch$ColumnStreamData, long, long, 
> IdentityHashMap) EncodedReaderImpl.java:923
>   
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(int,
>  StripeInformation, OrcProto$RowIndex[], List, List, boolean[], boolean[], 
> Consumer) EncodedReaderImpl.java:501
>   
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead() 
> OrcEncodedDataReader.java:407
>   org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run() 
> OrcEncodedDataReader.java:266
>   org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run() 
> OrcEncodedDataReader.java:263
>   java.security.AccessController.doPrivileged(PrivilegedExceptionAction, 
> AccessControlContext) AccessController.java (native)
>   javax.security.auth.Subject.doAs(Subject, PrivilegedExceptionAction) 
> Subject.java:422
>   
> org.apache.hadoop.security.UserGroupInformation.doAs(PrivilegedExceptionAction)
>  UserGroupInformation.java:1688
>   org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal() 
> OrcEncodedDataReader.java:263
>   org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal() 
> OrcEncodedDataReader.java:110
>   org.apache.tez.common.CallableWithNdc.call() CallableWithNdc.java:36
>   
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call()
>  StatsRecordingThreadPool.java:110
>   java.util.concurrent.FutureTask.run() FutureTask.java:266
>   
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1142
>   java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:617
>   java.lang.Thread.run() Thread.java:745 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22745) Config option to turn off read locks

2020-01-21 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020643#comment-17020643
 ] 

Slim Bouguerra commented on HIVE-22745:
---

[~gopalv] done thanks.

> Config option to turn off read locks
> 
>
> Key: HIVE-22745
> URL: https://issues.apache.org/jira/browse/HIVE-22745
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Ashutosh Chauhan
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22745.2.patch, HIVE-22745.3.patch, 
> HIVE-22745.4.patch, HIVE-22745.patch
>
>
> Although its not recommended but in perf critical scenario this option may be 
> exercised. We have observed lock acquisition to take long time in heavily 
> loaded system. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22745) Config option to turn off read locks

2020-01-21 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22745:
--
Attachment: HIVE-22745.4.patch

> Config option to turn off read locks
> 
>
> Key: HIVE-22745
> URL: https://issues.apache.org/jira/browse/HIVE-22745
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Ashutosh Chauhan
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22745.2.patch, HIVE-22745.3.patch, 
> HIVE-22745.4.patch, HIVE-22745.patch
>
>
> Although its not recommended but in perf critical scenario this option may be 
> exercised. We have observed lock acquisition to take long time in heavily 
> loaded system. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22755) Cleaner/Compaction can skip the read locks and use the min open txn id

2020-01-21 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020594#comment-17020594
 ] 

Slim Bouguerra commented on HIVE-22755:
---

cc [~t3rmin4t0r] please feel free to add more insights about your idea on how 
the Cleaner can skip the read lock.

> Cleaner/Compaction can skip the read locks and use the min open txn id
> --
>
> Key: HIVE-22755
> URL: https://issues.apache.org/jira/browse/HIVE-22755
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Slim Bouguerra
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22755) Cleaner/Compaction can skip the read locks and use the min open txn id

2020-01-21 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22755:
--
Fix Version/s: 4.0.0

> Cleaner/Compaction can skip the read locks and use the min open txn id
> --
>
> Key: HIVE-22755
> URL: https://issues.apache.org/jira/browse/HIVE-22755
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Slim Bouguerra
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22755) Cleaner/Compaction can skip the read locks and use the min open txn id

2020-01-21 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22755:
--
Component/s: Transactions

> Cleaner/Compaction can skip the read locks and use the min open txn id
> --
>
> Key: HIVE-22755
> URL: https://issues.apache.org/jira/browse/HIVE-22755
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Slim Bouguerra
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22754) Trim some extra HDFS find file name calls that can be deduced using current TXN watermark

2020-01-21 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22754:
--
Attachment: HIVE-22754.patch

> Trim some extra HDFS find file name calls that can be deduced using current 
> TXN watermark
> -
>
> Key: HIVE-22754
> URL: https://issues.apache.org/jira/browse/HIVE-22754
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22754.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22754) Trim some extra HDFS find file name calls that can be deduced using current TXN watermark

2020-01-21 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22754:
--
Status: Patch Available  (was: Open)

> Trim some extra HDFS find file name calls that can be deduced using current 
> TXN watermark
> -
>
> Key: HIVE-22754
> URL: https://issues.apache.org/jira/browse/HIVE-22754
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22754.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22754) Trim some extra HDFS find file name calls that can be deduced using current TXN watermark

2020-01-21 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22754:
--
Summary: Trim some extra HDFS find file name calls that can be deduced 
using current TXN watermark  (was: Trim some extra HDFS find file name calls 
that can be deduced using current TX watermark)

> Trim some extra HDFS find file name calls that can be deduced using current 
> TXN watermark
> -
>
> Key: HIVE-22754
> URL: https://issues.apache.org/jira/browse/HIVE-22754
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22754) Trim some extra HDFS find file name calls that can be deduced using current TX watermark

2020-01-21 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra reassigned HIVE-22754:
-


> Trim some extra HDFS find file name calls that can be deduced using current 
> TX watermark
> 
>
> Key: HIVE-22754
> URL: https://issues.apache.org/jira/browse/HIVE-22754
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22745) Config option to turn off read locks

2020-01-21 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22745:
--
Attachment: HIVE-22745.3.patch

> Config option to turn off read locks
> 
>
> Key: HIVE-22745
> URL: https://issues.apache.org/jira/browse/HIVE-22745
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Ashutosh Chauhan
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22745.2.patch, HIVE-22745.3.patch, HIVE-22745.patch
>
>
> Although its not recommended but in perf critical scenario this option may be 
> exercised. We have observed lock acquisition to take long time in heavily 
> loaded system. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22629) AST Node Children can be quite expensive to build due to List resizing

2020-01-21 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22629:
--
Attachment: HIVE-22629.3.patch

> AST Node Children can be quite expensive to build due to List resizing
> --
>
> Key: HIVE-22629
> URL: https://issues.apache.org/jira/browse/HIVE-22629
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22629.1.patch, HIVE-22629.2.patch, 
> HIVE-22629.3.patch, HIVE-22629.patch, 
> noETLs_ETLs_profile-kc-hdp-mstr06-p.servicemanagement.com-interactive-166620-t-e-cpu-1576029590.svg
>
>
> As per the attached profile, The AST Node can be a major source of CPU and 
> memory churn, due to the ArrayList resizing and copy.
> In my Opinion this can be amortized by providing the actual size.
> [~jcamachorodriguez] / [~vgarg] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22753) Fix gradual mem leak: Operationlog related appenders should be cleared up on errors

2020-01-21 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020492#comment-17020492
 ] 

Slim Bouguerra commented on HIVE-22753:
---

For the record this is a duplicate of this 
https://issues.apache.org/jira/browse/HIVE-22127. 

> Fix gradual mem leak: Operationlog related appenders should be cleared up on 
> errors 
> 
>
> Key: HIVE-22753
> URL: https://issues.apache.org/jira/browse/HIVE-22753
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-22753.1.patch, image-2020-01-21-11-14-37-911.png, 
> image-2020-01-21-11-17-59-279.png, image-2020-01-21-11-18-37-294.png
>
>
> In case of exception in SQLOperation, operational log does not get cleared 
> up. This causes gradual build up of HushableRandomAccessFileAppender causing 
> HS2 to OOM after some time.
> !image-2020-01-21-11-14-37-911.png|width=431,height=267!
>  
> Allocation tree
> !image-2020-01-21-11-18-37-294.png|width=425,height=178!
>  
> Prod instance mem
> !image-2020-01-21-11-17-59-279.png|width=698,height=209!
>  
> Each HushableRandomAccessFileAppender holds internal ref to 
> RandomAccessFileAppender which holds a 256 KB bytebuffer, causing the mem 
> leak.
> Related ticket: HIVE-18820



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22745) Config option to turn off read locks

2020-01-20 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22745:
--
Attachment: HIVE-22745.2.patch

> Config option to turn off read locks
> 
>
> Key: HIVE-22745
> URL: https://issues.apache.org/jira/browse/HIVE-22745
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Ashutosh Chauhan
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22745.2.patch, HIVE-22745.patch
>
>
> Although its not recommended but in perf critical scenario this option may be 
> exercised. We have observed lock acquisition to take long time in heavily 
> loaded system. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22745) Config option to turn off read locks

2020-01-18 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018643#comment-17018643
 ] 

Slim Bouguerra commented on HIVE-22745:
---

[~gopalv] i see that HIVE-20801 is not merged yet, any idea how for is that 
from the finish line ?

> Config option to turn off read locks
> 
>
> Key: HIVE-22745
> URL: https://issues.apache.org/jira/browse/HIVE-22745
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Ashutosh Chauhan
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22745.patch
>
>
> Although its not recommended but in perf critical scenario this option may be 
> exercised. We have observed lock acquisition to take long time in heavily 
> loaded system. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22745) Config option to turn off read locks

2020-01-17 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22745:
--
Attachment: HIVE-22745.patch

> Config option to turn off read locks
> 
>
> Key: HIVE-22745
> URL: https://issues.apache.org/jira/browse/HIVE-22745
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Ashutosh Chauhan
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22745.patch
>
>
> Although its not recommended but in perf critical scenario this option may be 
> exercised. We have observed lock acquisition to take long time in heavily 
> loaded system. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22745) Config option to turn off read locks

2020-01-17 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22745:
--
Status: Patch Available  (was: Open)

> Config option to turn off read locks
> 
>
> Key: HIVE-22745
> URL: https://issues.apache.org/jira/browse/HIVE-22745
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Ashutosh Chauhan
>Assignee: Slim Bouguerra
>Priority: Major
>
> Although its not recommended but in perf critical scenario this option may be 
> exercised. We have observed lock acquisition to take long time in heavily 
> loaded system. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22745) Config option to turn off read locks

2020-01-17 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra reassigned HIVE-22745:
-

Assignee: Slim Bouguerra

> Config option to turn off read locks
> 
>
> Key: HIVE-22745
> URL: https://issues.apache.org/jira/browse/HIVE-22745
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Ashutosh Chauhan
>Assignee: Slim Bouguerra
>Priority: Major
>
> Although its not recommended but in perf critical scenario this option may be 
> exercised. We have observed lock acquisition to take long time in heavily 
> loaded system. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22743) Enable Fast LLAP-IO path for tables with schema evolution case appending columns.

2020-01-17 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra reassigned HIVE-22743:
-


> Enable Fast LLAP-IO path for tables with schema evolution case appending 
> columns.
> -
>
> Key: HIVE-22743
> URL: https://issues.apache.org/jira/browse/HIVE-22743
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22583) LLAP cache always misses with non-vectorized serde readers such as OpenCSV

2020-01-08 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011048#comment-17011048
 ] 

Slim Bouguerra commented on HIVE-22583:
---

[~szita] I think that might be the same thing, in fact the tez counters depends 
on HDFS counters and that is related to the file format that can change and 
thus the bytes count can change.
Think of it that the byte read or miss by the cache are relative the ORC file 
formats.
As i said i think for now we can avoid this test case that can be flaky and 
work on a query that can run against the cache only, that's more robust IMO. 

> LLAP cache always misses with non-vectorized serde readers such as OpenCSV
> --
>
> Key: HIVE-22583
> URL: https://issues.apache.org/jira/browse/HIVE-22583
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
> Attachments: HIVE-22583.0.patch, HIVE-22583.1.patch, 
> HIVE-22583.2.patch
>
>
> Although after the first read LLAP cache stores data of tables that are not 
> using the LazySimple serde, the stored data is then never used in the future 
> subsequent queries, causing a full cache miss and re-read each time.
> Problem is rooted in SerdeEncodedDataReader#cacheFileData is not taking care 
> of creating an entry for the root/struct column of the table. The only cases 
> this is taken care of are when a vectorized reader is used _(e.g. 
> LazySimpleSerde's LazySimpleDeserializeRead)_, where 
> SerdeEncodedDataReader#processAsyncCacheData takes care of this.
> This can be reproduced by either using a custom serde, like OpenCSV or using 
> LazySimpleSerde, but turning off _hive.llap.io.encode.vector.serde.enabled_.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22629) AST Node Children can be quite expensive to build due to List resizing

2019-12-11 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22629:
--
Attachment: HIVE-22629.2.patch

> AST Node Children can be quite expensive to build due to List resizing
> --
>
> Key: HIVE-22629
> URL: https://issues.apache.org/jira/browse/HIVE-22629
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22629.1.patch, HIVE-22629.2.patch, 
> HIVE-22629.patch, 
> noETLs_ETLs_profile-kc-hdp-mstr06-p.servicemanagement.com-interactive-166620-t-e-cpu-1576029590.svg
>
>
> As per the attached profile, The AST Node can be a major source of CPU and 
> memory churn, due to the ArrayList resizing and copy.
> In my Opinion this can be amortized by providing the actual size.
> [~jcamachorodriguez] / [~vgarg] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22629) AST Node Children can be quite expensive to build due to List resizing

2019-12-11 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994111#comment-16994111
 ] 

Slim Bouguerra commented on HIVE-22629:
---

[~belugabehr] thanks sounds good to me.

> AST Node Children can be quite expensive to build due to List resizing
> --
>
> Key: HIVE-22629
> URL: https://issues.apache.org/jira/browse/HIVE-22629
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22629.1.patch, HIVE-22629.patch, 
> noETLs_ETLs_profile-kc-hdp-mstr06-p.servicemanagement.com-interactive-166620-t-e-cpu-1576029590.svg
>
>
> As per the attached profile, The AST Node can be a major source of CPU and 
> memory churn, due to the ArrayList resizing and copy.
> In my Opinion this can be amortized by providing the actual size.
> [~jcamachorodriguez] / [~vgarg] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22629) AST Node Children can be quite expensive to build due to List resizing

2019-12-11 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993886#comment-16993886
 ] 

Slim Bouguerra commented on HIVE-22629:
---

[~belugabehr] what you are saying make sense, but that is out of scope of this 
patch.
You need to dive deep and see if this list is mutated later and what null is 
used for...

> AST Node Children can be quite expensive to build due to List resizing
> --
>
> Key: HIVE-22629
> URL: https://issues.apache.org/jira/browse/HIVE-22629
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22629.1.patch, HIVE-22629.patch, 
> noETLs_ETLs_profile-kc-hdp-mstr06-p.servicemanagement.com-interactive-166620-t-e-cpu-1576029590.svg
>
>
> As per the attached profile, The AST Node can be a major source of CPU and 
> memory churn, due to the ArrayList resizing and copy.
> In my Opinion this can be amortized by providing the actual size.
> [~jcamachorodriguez] / [~vgarg] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22629) AST Node Children can be quite expensive to build due to List resizing

2019-12-11 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22629:
--
Attachment: HIVE-22629.1.patch

> AST Node Children can be quite expensive to build due to List resizing
> --
>
> Key: HIVE-22629
> URL: https://issues.apache.org/jira/browse/HIVE-22629
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22629.1.patch, HIVE-22629.patch, 
> noETLs_ETLs_profile-kc-hdp-mstr06-p.servicemanagement.com-interactive-166620-t-e-cpu-1576029590.svg
>
>
> As per the attached profile, The AST Node can be a major source of CPU and 
> memory churn, due to the ArrayList resizing and copy.
> In my Opinion this can be amortized by providing the actual size.
> [~jcamachorodriguez] / [~vgarg] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22629) AST Node Children can be quite expensive to build due to List resizing

2019-12-11 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993885#comment-16993885
 ] 

Slim Bouguerra commented on HIVE-22629:
---

also this is happening at SemanticAnalyzer {code} processPositionAlias{code}

> AST Node Children can be quite expensive to build due to List resizing
> --
>
> Key: HIVE-22629
> URL: https://issues.apache.org/jira/browse/HIVE-22629
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22629.patch, 
> noETLs_ETLs_profile-kc-hdp-mstr06-p.servicemanagement.com-interactive-166620-t-e-cpu-1576029590.svg
>
>
> As per the attached profile, The AST Node can be a major source of CPU and 
> memory churn, due to the ArrayList resizing and copy.
> In my Opinion this can be amortized by providing the actual size.
> [~jcamachorodriguez] / [~vgarg] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22629) AST Node Children can be quite expensive to build due to List resizing

2019-12-11 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22629:
--
Attachment: HIVE-22629.patch

> AST Node Children can be quite expensive to build due to List resizing
> --
>
> Key: HIVE-22629
> URL: https://issues.apache.org/jira/browse/HIVE-22629
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22629.patch, 
> noETLs_ETLs_profile-kc-hdp-mstr06-p.servicemanagement.com-interactive-166620-t-e-cpu-1576029590.svg
>
>
> As per the attached profile, The AST Node can be a major source of CPU and 
> memory churn, due to the ArrayList resizing and copy.
> In my Opinion this can be amortized by providing the actual size.
> [~jcamachorodriguez] / [~vgarg] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22629) AST Node Children can be quite expensive to build due to List resizing

2019-12-11 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22629:
--
Status: Patch Available  (was: Open)

> AST Node Children can be quite expensive to build due to List resizing
> --
>
> Key: HIVE-22629
> URL: https://issues.apache.org/jira/browse/HIVE-22629
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: 
> noETLs_ETLs_profile-kc-hdp-mstr06-p.servicemanagement.com-interactive-166620-t-e-cpu-1576029590.svg
>
>
> As per the attached profile, The AST Node can be a major source of CPU and 
> memory churn, due to the ArrayList resizing and copy.
> In my Opinion this can be amortized by providing the actual size.
> [~jcamachorodriguez] / [~vgarg] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22629) AST Node Children can be quite expensive to build due to List resizing

2019-12-11 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22629:
--
Attachment: 
noETLs_ETLs_profile-kc-hdp-mstr06-p.servicemanagement.com-interactive-166620-t-e-cpu-1576029590.svg

> AST Node Children can be quite expensive to build due to List resizing
> --
>
> Key: HIVE-22629
> URL: https://issues.apache.org/jira/browse/HIVE-22629
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: 
> noETLs_ETLs_profile-kc-hdp-mstr06-p.servicemanagement.com-interactive-166620-t-e-cpu-1576029590.svg
>
>
> As per the attached profile, The AST Node can be a major source of CPU and 
> memory churn, due to the ArrayList resizing and copy.
> In my Opinion this can be amortized by providing the actual size.
> [~jcamachorodriguez] / [~vgarg] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22629) AST Node Children can be quite expensive to build due to List resizing

2019-12-11 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra reassigned HIVE-22629:
-


> AST Node Children can be quite expensive to build due to List resizing
> --
>
> Key: HIVE-22629
> URL: https://issues.apache.org/jira/browse/HIVE-22629
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
>
> As per the attached profile, The AST Node can be a major source of CPU and 
> memory churn, due to the ArrayList resizing and copy.
> In my Opinion this can be amortized by providing the actual size.
> [~jcamachorodriguez] / [~vgarg] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22558) Metastore: Passwords jceks should be read lazily, in case of connection pools

2019-12-10 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22558:
--
Attachment: HIVE-22558.1.patch

> Metastore: Passwords jceks should be read lazily, in case of connection pools
> -
>
> Key: HIVE-22558
> URL: https://issues.apache.org/jira/browse/HIVE-22558
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Reporter: Gopal Vijayaraghavan
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22558.1.patch, getDatabase-password-md5-hotpath.png
>
>
> The jceks file is parsed for every instance of the metastore conf to populate 
> the password in plain-text, which is irrelevant for the scenario where the DB 
> connection pool is already active.
>   !getDatabase-password-md5-hotpath.png|width=640!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22558) Metastore: Passwords jceks should be read lazily, in case of connection pools

2019-12-10 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22558:
--
Status: Patch Available  (was: Open)

> Metastore: Passwords jceks should be read lazily, in case of connection pools
> -
>
> Key: HIVE-22558
> URL: https://issues.apache.org/jira/browse/HIVE-22558
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Reporter: Gopal Vijayaraghavan
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: getDatabase-password-md5-hotpath.png
>
>
> The jceks file is parsed for every instance of the metastore conf to populate 
> the password in plain-text, which is irrelevant for the scenario where the DB 
> connection pool is already active.
>   !getDatabase-password-md5-hotpath.png|width=640!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22558) Metastore: Passwords jceks should be read lazily, in case of connection pools

2019-12-10 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra reassigned HIVE-22558:
-

Assignee: Slim Bouguerra

> Metastore: Passwords jceks should be read lazily, in case of connection pools
> -
>
> Key: HIVE-22558
> URL: https://issues.apache.org/jira/browse/HIVE-22558
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore
>Reporter: Gopal Vijayaraghavan
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: getDatabase-password-md5-hotpath.png
>
>
> The jceks file is parsed for every instance of the metastore conf to populate 
> the password in plain-text, which is irrelevant for the scenario where the DB 
> connection pool is already active.
>   !getDatabase-password-md5-hotpath.png|width=640!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22583) LLAP cache always misses with non-vectorized serde readers such as OpenCSV

2019-12-10 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993009#comment-16993009
 ] 

Slim Bouguerra commented on HIVE-22583:
---

The fix looks good to me.
I try to avoid using TEZ counters as a part of the test validation since they 
can change quite drastically if the plan changes but i understand that it is 
the only way for now to test the data in the cache.

One of the idea i want to implement is to have a query flag that tells LLAP to 
run a query against the cache content only and that can enable such tests and 
will avoid to use the the TEZ counters.
 

> LLAP cache always misses with non-vectorized serde readers such as OpenCSV
> --
>
> Key: HIVE-22583
> URL: https://issues.apache.org/jira/browse/HIVE-22583
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
> Attachments: HIVE-22583.0.patch, HIVE-22583.1.patch, 
> HIVE-22583.2.patch
>
>
> Although after the first read LLAP cache stores data of tables that are not 
> using the LazySimple serde, the stored data is then never used in the future 
> subsequent queries, causing a full cache miss and re-read each time.
> Problem is rooted in SerdeEncodedDataReader#cacheFileData is not taking care 
> of creating an entry for the root/struct column of the table. The only cases 
> this is taken care of are when a vectorized reader is used _(e.g. 
> LazySimpleSerde's LazySimpleDeserializeRead)_, where 
> SerdeEncodedDataReader#processAsyncCacheData takes care of this.
> This can be reproduced by either using a custom serde, like OpenCSV or using 
> LazySimpleSerde, but turning off _hive.llap.io.encode.vector.serde.enabled_.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22499) LLAP: Add an EncodedReaderOptions to extend ORC impl for options

2019-12-03 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987326#comment-16987326
 ] 

Slim Bouguerra commented on HIVE-22499:
---

+1 

> LLAP: Add an EncodedReaderOptions to extend ORC impl for options
> 
>
> Key: HIVE-22499
> URL: https://issues.apache.org/jira/browse/HIVE-22499
> Project: Hive
>  Issue Type: Bug
>  Components: llap, ORC
>Reporter: Gopal Vijayaraghavan
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-22499.2.patch, HIVE-22499.2.patch, 
> HIVE-22499.3.patch, HIVE-22499.4.patch, HIVE-22499.5.patch, 
> HIVE-22499.6.patch, HIVE-22499.7.patch, HIVE-22499.WIP.patch, HIVE-22499.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ORC-570 is an ABI change to the way getFileSystem() by adding an another 
> exception to the implementation.
> To accept and use that change requires waiting for an ORC release, while this 
> patch serves the same purpose though falls back for a retry for 
> FileSystem.get() in case the supplier fails at runtime.
> Also as a side-note, the FS.get() call is always used in the cases where the 
> file is not being read from a cache such as EncodedOrcFile (so the upstream 
> API change might be overkill).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22523) The error handler in LlapRecordReader might block if its queue is full

2019-11-26 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22523:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

https://git-wip-us.apache.org/repos/asf?p=hive.git;a=commit;h=9c06b2c32329824c202a6a45fb11f04820d827d5

> The error handler in LlapRecordReader might block if its queue is full
> --
>
> Key: HIVE-22523
> URL: https://issues.apache.org/jira/browse/HIVE-22523
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22523.1.patch, HIVE-22523.2.patch
>
>
> In setError() we set the value of an atomic reference (pendingError) and we 
> also put the error in a queue. The latter seems not just unnecessary but it 
> might block the caller of the handler if the queue is full. Also closing of 
> the reader is might not properly handled as some of the flags are not 
> volatile.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22523) The error handler in LlapRecordReader might block if its queue is full

2019-11-25 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16981764#comment-16981764
 ] 

Slim Bouguerra commented on HIVE-22523:
---

+1 thanks

> The error handler in LlapRecordReader might block if its queue is full
> --
>
> Key: HIVE-22523
> URL: https://issues.apache.org/jira/browse/HIVE-22523
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22523.1.patch, HIVE-22523.2.patch
>
>
> In setError() we set the value of an atomic reference (pendingError) and we 
> also put the error in a queue. The latter seems not just unnecessary but it 
> might block the caller of the handler if the queue is full. Also closing of 
> the reader is might not properly handled as some of the flags are not 
> volatile.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-25 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Status: Open  (was: Patch Available)

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch, HIVE-22476.6.patch, HIVE-22476.7.patch, 
> HIVE-22476.7.patch, HIVE-22476.8.patch, HIVE-22476.8.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-25 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Status: Patch Available  (was: Open)

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch, HIVE-22476.6.patch, HIVE-22476.7.patch, 
> HIVE-22476.7.patch, HIVE-22476.8.patch, HIVE-22476.8.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-25 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: HIVE-22476.8.patch

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch, HIVE-22476.6.patch, HIVE-22476.7.patch, 
> HIVE-22476.7.patch, HIVE-22476.8.patch, HIVE-22476.8.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22523) The error handler in LlapRecordReader might block if its queue is full

2019-11-22 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980380#comment-16980380
 ] 

Slim Bouguerra commented on HIVE-22523:
---

[~amagyar] then please leave the {code} enqueueInternal {code} call as is since 
it is by design not blocking and it is not good idea to remove it if there is 
not reason.
As i said i do not think this is the issue why we get the OOM.

> The error handler in LlapRecordReader might block if its queue is full
> --
>
> Key: HIVE-22523
> URL: https://issues.apache.org/jira/browse/HIVE-22523
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22523.1.patch
>
>
> In setError() we set the value of an atomic reference (pendingError) and we 
> also put the error in a queue. The latter seems not just unnecessary but it 
> might block the caller of the handler if the queue is full. Also closing of 
> the reader is might not properly handled as some of the flags are not 
> volatile.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22523) The error handler in LlapRecordReader might block if its queue is full

2019-11-21 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979550#comment-16979550
 ] 

Slim Bouguerra commented on HIVE-22523:
---

as per the code will wait for 100ms then next round should exit if one of the 
flags are set.
{code} 
 private void enqueueInternal(Object o) throws InterruptedException {
// We need to loop here to handle the case where consumer goes away.
do {} while (!isClosed && !isInterrupted && !queue.offer(o, 100, 
TimeUnit.MILLISECONDS));
  }
{code}

are you saying that in some cases the flags are not set or it is not visible to 
the thread ?

> The error handler in LlapRecordReader might block if its queue is full
> --
>
> Key: HIVE-22523
> URL: https://issues.apache.org/jira/browse/HIVE-22523
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22523.1.patch
>
>
> In setError() we set the value of an atomic reference (pendingError) and we 
> also put the error in a queue. The latter seems not just unnecessary but it 
> might block the caller of the handler if the queue is full. Also closing of 
> the reader is might not properly handled as some of the flags are not 
> volatile.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22523) The error handler in LlapRecordReader might block if its queue is full

2019-11-21 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979516#comment-16979516
 ] 

Slim Bouguerra commented on HIVE-22523:
---

[~amagyar] {code} 
org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader#enqueueInternal{code} 
is not blocking can you please explain more what it the issue ? is it variable 
reads visibility issue ? 

> The error handler in LlapRecordReader might block if its queue is full
> --
>
> Key: HIVE-22523
> URL: https://issues.apache.org/jira/browse/HIVE-22523
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22523.1.patch
>
>
> In setError() we set the value of an atomic reference (pendingError) and we 
> also put the error in a queue. The latter seems not just unnecessary but it 
> might block the caller of the handler if the queue is full. Also closing of 
> the reader is might not properly handled as some of the flags are not 
> volatile.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-21 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: HIVE-22476.8.patch

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch, HIVE-22476.6.patch, HIVE-22476.7.patch, 
> HIVE-22476.7.patch, HIVE-22476.8.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-20 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: HIVE-22476.7.patch

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch, HIVE-22476.6.patch, HIVE-22476.7.patch, HIVE-22476.7.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22514) HiveProtoLoggingHook might leak memory

2019-11-19 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977921#comment-16977921
 ] 

Slim Bouguerra commented on HIVE-22514:
---

+1 pending on tests thanks [~amagyar]

> HiveProtoLoggingHook might leak memory
> --
>
> Key: HIVE-22514
> URL: https://issues.apache.org/jira/browse/HIVE-22514
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22514.1.patch, Screen Shot 2019-11-18 at 2.19.24 
> PM.png
>
>
> HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer 
> tasks and to periodically handle rollover. The builtin 
> ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced 
> from the outside. If log events are generated at a very fast rate this queue 
> can grow large.
> !Screen Shot 2019-11-18 at 2.19.24 PM.png|width=650,height=101!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-19 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Status: Patch Available  (was: Open)

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch, HIVE-22476.6.patch, HIVE-22476.7.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-19 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: HIVE-22476.7.patch

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch, HIVE-22476.6.patch, HIVE-22476.7.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-19 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: (was: HIVE-22476.7.patch)

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch, HIVE-22476.6.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-19 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Status: Open  (was: Patch Available)

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch, HIVE-22476.6.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-18 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: HIVE-22476.7.patch

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch, HIVE-22476.6.patch, HIVE-22476.7.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-22492) Amortize lock contention due to LRFU accounting

2019-11-18 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22492:
--
Comment: was deleted

(was: 
https://git-wip-us.apache.org/repos/asf?p=hive.git;a=commit;h=0781cf2c5104dafd0c5496631cafabac9d59df67)

> Amortize lock contention due to LRFU accounting
> ---
>
> Key: HIVE-22492
> URL: https://issues.apache.org/jira/browse/HIVE-22492
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22492.2.patch, HIVE-22492.patch, 
> llap-lock-contention.svg
>
>
> LRFU eviction policy can be a major source of contention under high load.
> This can be see on the following profiles.
> To fix this the idea is to use a batching wrapper to amortize the locking 
> contention.
> The trick i a common way to amortize locking as explained here 
> http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/ding-09-BP-Wrapper.pdf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22492) Amortize lock contention due to LRFU accounting

2019-11-18 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976772#comment-16976772
 ] 

Slim Bouguerra commented on HIVE-22492:
---

https://git-wip-us.apache.org/repos/asf?p=hive.git;a=commit;h=0781cf2c5104dafd0c5496631cafabac9d59df67

> Amortize lock contention due to LRFU accounting
> ---
>
> Key: HIVE-22492
> URL: https://issues.apache.org/jira/browse/HIVE-22492
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22492.2.patch, HIVE-22492.patch, 
> llap-lock-contention.svg
>
>
> LRFU eviction policy can be a major source of contention under high load.
> This can be see on the following profiles.
> To fix this the idea is to use a batching wrapper to amortize the locking 
> contention.
> The trick i a common way to amortize locking as explained here 
> http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/ding-09-BP-Wrapper.pdf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22492) Amortize lock contention due to LRFU accounting

2019-11-18 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22492:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

https://git-wip-us.apache.org/repos/asf?p=hive.git;a=commit;h=0781cf2c5104dafd0c5496631cafabac9d59df67

> Amortize lock contention due to LRFU accounting
> ---
>
> Key: HIVE-22492
> URL: https://issues.apache.org/jira/browse/HIVE-22492
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22492.2.patch, HIVE-22492.patch, 
> llap-lock-contention.svg
>
>
> LRFU eviction policy can be a major source of contention under high load.
> This can be see on the following profiles.
> To fix this the idea is to use a batching wrapper to amortize the locking 
> contention.
> The trick i a common way to amortize locking as explained here 
> http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/ding-09-BP-Wrapper.pdf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22493) Scheduled Query Execution Failure in Tests

2019-11-15 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975354#comment-16975354
 ] 

Slim Bouguerra commented on HIVE-22493:
---

+1

> Scheduled Query Execution Failure in Tests
> --
>
> Key: HIVE-22493
> URL: https://issues.apache.org/jira/browse/HIVE-22493
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Zoltan Haindrich
>Priority: Critical
> Attachments: HIVE-22493.01.patch
>
>
> {code:none}
> org.apache.hadoop.hive.schq.TestScheduledQueryIntegration.testScheduledQueryExecutionImpersonation
>  (batchId=279)
> org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=284)
> org.apache.hive.jdbc.TestSSL.testMetastoreWithSSL (batchId=284)
> {code}
> {code:none}
> 2019-11-12T18:11:00,181  INFO [pool-20-thread-10] HiveMetaStore.audit: 
> ugi=hiveptest  ip=127.0.0.1cmd=source:127.0.0.1 scheduled_query_poll  
>  
> 2019-11-12T18:11:00,182  INFO [pool-20-thread-10] metastore.HiveMetaStore: 
> 25: Opening raw store with implementation 
> class:org.apache.hadoop.hive.metastore.ObjectStore
> 2019-11-12T18:11:00,183  INFO [pool-20-thread-10] 
> metastore.PersistenceManagerProvider: Updating the pmf due to property change
> 2019-11-12T18:11:00,184 ERROR [pool-20-thread-10] metastore.HiveMetaStore: 
> Caught exception
> javax.jdo.JDOUserException: Cant close PersistenceManagerFactory while we 
> have active transactions.
>   at 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.close(JDOPersistenceManagerFactory.java:603)
>  ~[datanucleus-api-jdo-4.2.4.jar:?]
>   at 
> org.apache.hadoop.hive.metastore.PersistenceManagerProvider.updatePmfProperties(PersistenceManagerProvider.java:199)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:213) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:77) 
> ~[hadoop-common-3.1.0.jar:?]
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137) 
> ~[hadoop-common-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:59) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:852)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:820)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:814)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.scheduled_query_poll(HiveMetaStore.java:9660)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_102]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_102]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_102]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_102]
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at com.sun.proxy.$Proxy46.scheduled_query_poll(Unknown Source) [?:?]
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$scheduled_query_poll.getResult(ThriftHiveMetastore.java:21561)
>  [hive-standalone-metastore-common-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$scheduled_query_poll.getResult(ThriftHiveMetastore.java:21545)
>  [hive-standalone-metastore-common-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> [libthrift-0.9.3-1.jar:0.9.3-1]
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>   at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_102]
>   at 

[jira] [Updated] (HIVE-22492) Amortize lock contention due to LRFU accounting

2019-11-15 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22492:
--
Attachment: HIVE-22492.2.patch

> Amortize lock contention due to LRFU accounting
> ---
>
> Key: HIVE-22492
> URL: https://issues.apache.org/jira/browse/HIVE-22492
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22492.2.patch, HIVE-22492.patch, 
> llap-lock-contention.svg
>
>
> LRFU eviction policy can be a major source of contention under high load.
> This can be see on the following profiles.
> To fix this the idea is to use a batching wrapper to amortize the locking 
> contention.
> The trick i a common way to amortize locking as explained here 
> http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/ding-09-BP-Wrapper.pdf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-15 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Status: Patch Available  (was: Open)

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch, HIVE-22476.6.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-15 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: HIVE-22476.6.patch

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch, HIVE-22476.6.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-15 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: (was: HIVE-22476.patch)

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-15 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Status: Open  (was: Patch Available)

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-13 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: (was: HIVE-22476.4.patch)

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch, HIVE-22476.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-13 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: HIVE-22476.5.patch

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.5.patch, HIVE-22476.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-13 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: HIVE-22476.4.patch

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, 
> HIVE-22476.4.patch, HIVE-22476.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22492) Amortize lock contention due to LRFU accounting

2019-11-13 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22492:
--
Attachment: HIVE-22492.patch

> Amortize lock contention due to LRFU accounting
> ---
>
> Key: HIVE-22492
> URL: https://issues.apache.org/jira/browse/HIVE-22492
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22492.patch, llap-lock-contention.svg
>
>
> LRFU eviction policy can be a major source of contention under high load.
> This can be see on the following profiles.
> To fix this the idea is to use a batching wrapper to amortize the locking 
> contention.
> The trick i a common way to amortize locking as explained here 
> http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/ding-09-BP-Wrapper.pdf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22492) Amortize lock contention due to LRFU accounting

2019-11-13 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973593#comment-16973593
 ] 

Slim Bouguerra commented on HIVE-22492:
---

[~t3rmin4t0r] can you take a look at this seems to be a very low hanging fruit.

> Amortize lock contention due to LRFU accounting
> ---
>
> Key: HIVE-22492
> URL: https://issues.apache.org/jira/browse/HIVE-22492
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22492.patch, llap-lock-contention.svg
>
>
> LRFU eviction policy can be a major source of contention under high load.
> This can be see on the following profiles.
> To fix this the idea is to use a batching wrapper to amortize the locking 
> contention.
> The trick i a common way to amortize locking as explained here 
> http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/ding-09-BP-Wrapper.pdf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22492) Amortize lock contention due to LRFU accounting

2019-11-13 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22492:
--
Status: Patch Available  (was: Open)

> Amortize lock contention due to LRFU accounting
> ---
>
> Key: HIVE-22492
> URL: https://issues.apache.org/jira/browse/HIVE-22492
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22492.patch, llap-lock-contention.svg
>
>
> LRFU eviction policy can be a major source of contention under high load.
> This can be see on the following profiles.
> To fix this the idea is to use a batching wrapper to amortize the locking 
> contention.
> The trick i a common way to amortize locking as explained here 
> http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/ding-09-BP-Wrapper.pdf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-13 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: HIVE-22476.3.patch

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.3.patch, HIVE-22476.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22492) Amortize lock contention due to LRFU accounting

2019-11-13 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22492:
--
Attachment: llap-lock-contention.svg

> Amortize lock contention due to LRFU accounting
> ---
>
> Key: HIVE-22492
> URL: https://issues.apache.org/jira/browse/HIVE-22492
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: llap-lock-contention.svg
>
>
> LRFU eviction policy can be a major source of contention under high load.
> This can be see on the following profiles.
> To fix this the idea is to use a batching wrapper to amortize the locking 
> contention.
> The trick i a common way to amortize locking as explained here 
> http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/ding-09-BP-Wrapper.pdf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22492) Amortize lock contention due to LRFU accounting

2019-11-13 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973569#comment-16973569
 ] 

Slim Bouguerra commented on HIVE-22492:
---

{code}
IO-Elevator-Thread-9  Blocked CPU usage on sample: 609ms
  
org.apache.hadoop.hive.llap.cache.LowLevelLrfuCachePolicy.notifyUnlock(LlapCacheableBuffer)
 LowLevelLrfuCachePolicy.java:125
  
org.apache.hadoop.hive.llap.cache.CacheContentsTracker.notifyUnlock(LlapCacheableBuffer)
 CacheContentsTracker.java:173
  
org.apache.hadoop.hive.llap.cache.LowLevelCacheImpl.unlockBuffer(LlapDataBuffer,
 boolean) LowLevelCacheImpl.java:391
  org.apache.hadoop.hive.llap.cache.LowLevelCacheImpl.decRefBuffers(List) 
LowLevelCacheImpl.java:379
  
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.returnData(Reader$OrcEncodedColumnBatch)
 OrcEncodedDataReader.java:759
  
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.returnData(Object) 
OrcEncodedDataReader.java:110
  
org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.returnSourceData(EncodedColumnBatch)
 EncodedDataConsumer.java:100
  
org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedColumnBatch)
 EncodedDataConsumer.java:92
  org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(Object) 
EncodedDataConsumer.java:34
  
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(int,
 StripeInformation, OrcProto$RowIndex[], List, List, boolean[], boolean[], 
Consumer) EncodedReaderImpl.java:532
  org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead() 
OrcEncodedDataReader.java:407
  org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run() 
OrcEncodedDataReader.java:266
  org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run() 
OrcEncodedDataReader.java:263
  java.security.AccessController.doPrivileged(PrivilegedExceptionAction, 
AccessControlContext) AccessController.java (native)
  javax.security.auth.Subject.doAs(Subject, PrivilegedExceptionAction) 
Subject.java:422
  
org.apache.hadoop.security.UserGroupInformation.doAs(PrivilegedExceptionAction) 
UserGroupInformation.java:1688
  org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal() 
OrcEncodedDataReader.java:263
  org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal() 
OrcEncodedDataReader.java:110
  org.apache.tez.common.CallableWithNdc.call() CallableWithNdc.java:36
  
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call()
 StatsRecordingThreadPool.java:110
  java.util.concurrent.FutureTask.run() FutureTask.java:266
  java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
ThreadPoolExecutor.java:1142
  java.util.concurrent.ThreadPoolExecutor$Worker.run() 
ThreadPoolExecutor.java:617
  java.lang.Thread.run() Thread.java:745

{code}

> Amortize lock contention due to LRFU accounting
> ---
>
> Key: HIVE-22492
> URL: https://issues.apache.org/jira/browse/HIVE-22492
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
>
> LRFU eviction policy can be a major source of contention under high load.
> This can be see on the following profiles.
> To fix this the idea is to use a batching wrapper to amortize the locking 
> contention.
> The trick i a common way to amortize locking as explained here 
> http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/ding-09-BP-Wrapper.pdf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22492) Amortize lock contention due to LRFU accounting

2019-11-13 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra reassigned HIVE-22492:
-


> Amortize lock contention due to LRFU accounting
> ---
>
> Key: HIVE-22492
> URL: https://issues.apache.org/jira/browse/HIVE-22492
> Project: Hive
>  Issue Type: Improvement
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
>
> LRFU eviction policy can be a major source of contention under high load.
> This can be see on the following profiles.
> To fix this the idea is to use a batching wrapper to amortize the locking 
> contention.
> The trick i a common way to amortize locking as explained here 
> http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/ding-09-BP-Wrapper.pdf



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22476) Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none

2019-11-12 Thread Slim Bouguerra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slim Bouguerra updated HIVE-22476:
--
Attachment: HIVE-22476.2.patch

> Hive datediff function provided inconsistent results when 
> hive.fetch.task.conversion is set to none
> ---
>
> Key: HIVE-22476
> URL: https://issues.apache.org/jira/browse/HIVE-22476
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Slim Bouguerra
>Priority: Major
> Attachments: HIVE-22476.2.patch, HIVE-22476.patch
>
>
> The actual issue stems to the different date parser used by various part of 
> the engine.
> Fetch task uses udfdatediff via {code} 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFToDate{code} while the 
> vectorized llap execution uses {code}VectorUDFDateDiffScalarCol{code}.
> This fix is meant to be not very intrusive and will add more support to the 
> GenericUDFToDate by enhancing the parser.
> For the longer term will be better to use one parser for all the operators.
> Thanks [~Rajkumar Singh] for the repro example
> {code} 
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22459) Hive datadiff function provided inconsistent results when hive.ferch.task.conversion is set to none

2019-11-11 Thread Slim Bouguerra (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972033#comment-16972033
 ] 

Slim Bouguerra commented on HIVE-22459:
---

i think this is somehow the same issue, working on it.

> Hive datadiff function provided inconsistent results when 
> hive.ferch.task.conversion is set to none
> ---
>
> Key: HIVE-22459
> URL: https://issues.apache.org/jira/browse/HIVE-22459
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: Chiran Ravani
>Priority: Critical
>
> Hive datadiff function provided inconsistent results when 
> hive.ferch.task.conversion to more
> Below is output, whereas in Hive 1.2 the results are consistent
> Note: Same query works well on Hive 3 when hive.ferch.task.conversion is set 
> to none
>  Steps to reproduce the problem.
> {code:java}
> 0: jdbc:hive2://c1113-node2.squadron.support.> select datetimecol from 
> testdatediff where datediff(cast(current_timestamp as string), 
> datetimecol)<183;
> INFO : Compiling 
> command(queryId=hive_20191105103636_1dff22a1-02f3-48a8-b076-0b91272f2268): 
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Returning Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:datetimecol, type:string, 
> comment:null)], properties:null)
> INFO : Completed compiling 
> command(queryId=hive_20191105103636_1dff22a1-02f3-48a8-b076-0b91272f2268); 
> Time taken: 0.479 seconds
> INFO : Executing 
> command(queryId=hive_20191105103636_1dff22a1-02f3-48a8-b076-0b91272f2268): 
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183
> INFO : Completed executing 
> command(queryId=hive_20191105103636_1dff22a1-02f3-48a8-b076-0b91272f2268); 
> Time taken: 0.013 seconds
> INFO : OK
> +--+
> | datetimecol |
> +--+
> | 2019-07-24 |
> +--+
> 1 row selected (0.797 seconds)
> 0: jdbc:hive2://c1113-node2.squadron.support.>
> {code}
> After setting fetch task conversion as none.
> {code:java}
> 0: jdbc:hive2://c1113-node2.squadron.support.> set 
> hive.fetch.task.conversion=none;
> No rows affected (0.017 seconds)
> 0: jdbc:hive2://c1113-node2.squadron.support.> set hive.fetch.task.conversion;
> +--+
> | set |
> +--+
> | hive.fetch.task.conversion=none |
> +--+
> 1 row selected (0.015 seconds)
> 0: jdbc:hive2://c1113-node2.squadron.support.> select datetimecol from 
> testdatediff where datediff(cast(current_timestamp as string), 
> datetimecol)<183;
> INFO : Compiling 
> command(queryId=hive_20191105103709_0c38e446-09cf-45dd-9553-365146f42452): 
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183
> ++
> | datetimecol |
> ++
> | 2019-09-09T10:45:49+02:00 |
> | 2019-07-24 |
> ++
> 2 rows selected (5.327 seconds)
> 0: jdbc:hive2://c1113-node2.squadron.support.>
> {code}
> Steps to reproduce
> {code}
> create external table testdatediff(datetimecol string) stored as orc;
> insert into testdatediff values ('2019-09-09T10:45:49+02:00'),('2019-07-24');
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> set hive.ferch.task.conversion=none;
> select datetimecol from testdatediff where datediff(cast(current_timestamp as 
> string), datetimecol)<183;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >