Re: Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-19 Thread Harish Jaiprakash

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/#review218710
---




ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java
Lines 176 (patched)


We expect the dequeue to have not happened by this time. There is no 
guarantee, since its another thread. Can we atleast add a comment that this 
test can fail intermittently?


- Harish Jaiprakash


On Nov. 19, 2019, 9:13 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71784/
> ---
> 
> (Updated Nov. 19, 2019, 9:13 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
> Panos Garefalakis.
> 
> 
> Bugs: HIVE-22514
> https://issues.apache.org/jira/browse/HIVE-22514
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer 
> tasks and to periodically handle rollover. The builtin 
> ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced 
> from the outside. If log events are generated at a very fast rate this queue 
> can grow large.
> 
> Since ScheduledThreadPoolExecutor does not support changing the default 
> unbounded queue to a bounded one, the queue capacity is checked manually by 
> the patch.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
> 8eab54859bf 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
> 450a0b544d6 
> 
> 
> Diff: https://reviews.apache.org/r/71784/diff/1/
> 
> 
> Testing
> ---
> 
> unittest
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Re: Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-19 Thread Harish Jaiprakash

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/#review218709
---



Thanks for the change. This does solve the memory problem and it looks good for 
me.

We need a follow up JIRA to address why the queue size was 17,000 events. Was 
this hdfs or s3fs? In either case we should have some more optimizations like:
* if there are lot of events, batch the flush to hdfs.
* if its one event per file mode, increase parallelism since writes are not 
happening in different files.

FYI, the events are lost since it is not written to the hdfs file and DAS will 
not get these events. But that is better than crashing hiveserver2.

- Harish Jaiprakash


On Nov. 19, 2019, 9:13 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71784/
> ---
> 
> (Updated Nov. 19, 2019, 9:13 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
> Panos Garefalakis.
> 
> 
> Bugs: HIVE-22514
> https://issues.apache.org/jira/browse/HIVE-22514
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer 
> tasks and to periodically handle rollover. The builtin 
> ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced 
> from the outside. If log events are generated at a very fast rate this queue 
> can grow large.
> 
> Since ScheduledThreadPoolExecutor does not support changing the default 
> unbounded queue to a bounded one, the queue capacity is checked manually by 
> the patch.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
> 8eab54859bf 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
> 450a0b544d6 
> 
> 
> Diff: https://reviews.apache.org/r/71784/diff/1/
> 
> 
> Testing
> ---
> 
> unittest
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



[jira] [Created] (HIVE-22515) Support cast to decimal64 in Vectorization

2019-11-19 Thread Ramesh Kumar Thangarajan (Jira)
Ramesh Kumar Thangarajan created HIVE-22515:
---

 Summary: Support cast to decimal64 in Vectorization
 Key: HIVE-22515
 URL: https://issues.apache.org/jira/browse/HIVE-22515
 Project: Hive
  Issue Type: Bug
Reporter: Ramesh Kumar Thangarajan


Support cast to decimal64 in Vectorization



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-19 Thread Slim Bouguerra


> On Nov. 19, 2019, 10:38 p.m., Slim Bouguerra wrote:
> > I recommend using a bounded queue instead of checking the size and soing 
> > the if else everytime  
> > something like this might work
> > ```java
> >  BlockingQueue linkedBlockingDeque = new 
> > LinkedBlockingDeque(
> > 1);
> > ExecutorService service = new ThreadPoolExecutor(1, 1, 30, 
> > TimeUnit.SECONDS, linkedBlockingDeque,
> >  new 
> > ThreadPoolExecutor.DiscardPolicy());
> > 
> > ```
> 
> Attila Magyar wrote:
> We also have a periodic event which is scheduled with a fix delay by the 
> ScheduledThreadPoolExecutor. The ThreadPoolExecutor can't do this scheduling, 
> this would require either using 2 executors (a ThreadPoolExecutor and a 
> ScheduledThreadPoolExecutor) or 1 executor plus a Timer, plus some 
> synchronization. The queue size check looked like the most simple solution I 
> could find.

I see Then this makes sense


- Slim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/#review218697
---


On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71784/
> ---
> 
> (Updated Nov. 19, 2019, 3:43 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
> Panos Garefalakis.
> 
> 
> Bugs: HIVE-22514
> https://issues.apache.org/jira/browse/HIVE-22514
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer 
> tasks and to periodically handle rollover. The builtin 
> ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced 
> from the outside. If log events are generated at a very fast rate this queue 
> can grow large.
> 
> Since ScheduledThreadPoolExecutor does not support changing the default 
> unbounded queue to a bounded one, the queue capacity is checked manually by 
> the patch.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
> 8eab54859bf 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
> 450a0b544d6 
> 
> 
> Diff: https://reviews.apache.org/r/71784/diff/1/
> 
> 
> Testing
> ---
> 
> unittest
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Re: Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-19 Thread Attila Magyar


> On Nov. 19, 2019, 10:38 p.m., Slim Bouguerra wrote:
> > I recommend using a bounded queue instead of checking the size and soing 
> > the if else everytime  
> > something like this might work
> > ```java
> >  BlockingQueue linkedBlockingDeque = new 
> > LinkedBlockingDeque(
> > 1);
> > ExecutorService service = new ThreadPoolExecutor(1, 1, 30, 
> > TimeUnit.SECONDS, linkedBlockingDeque,
> >  new 
> > ThreadPoolExecutor.DiscardPolicy());
> > 
> > ```

We also have a periodic event which is scheduled with a fix delay by the 
ScheduledThreadPoolExecutor. The ThreadPoolExecutor can't do this scheduling, 
this would require either using 2 executors (a ThreadPoolExecutor and a 
ScheduledThreadPoolExecutor) or 1 executor plus a Timer, plus some 
synchronization. The queue size check looked like the most simple solution I 
could find.


- Attila


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/#review218697
---


On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71784/
> ---
> 
> (Updated Nov. 19, 2019, 3:43 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
> Panos Garefalakis.
> 
> 
> Bugs: HIVE-22514
> https://issues.apache.org/jira/browse/HIVE-22514
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer 
> tasks and to periodically handle rollover. The builtin 
> ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced 
> from the outside. If log events are generated at a very fast rate this queue 
> can grow large.
> 
> Since ScheduledThreadPoolExecutor does not support changing the default 
> unbounded queue to a bounded one, the queue capacity is checked manually by 
> the patch.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
> 8eab54859bf 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
> 450a0b544d6 
> 
> 
> Diff: https://reviews.apache.org/r/71784/diff/1/
> 
> 
> Testing
> ---
> 
> unittest
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Re: Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-19 Thread Attila Magyar


> On Nov. 19, 2019, 10:24 p.m., Panos Garefalakis wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java
> > Lines 192 (patched)
> > 
> >
> > The solution makes sense to me, however maybe we need to investigate 
> > further why the queueCapacity change (which is similar to what you are 
> > proposing) was reverted in HIVE-20746?
> > 
> > Also this logwriter is used to track all protobuf messages right? Is it 
> > acceptable to drop messages here?
> 
> Panos Garefalakis wrote:
> Update: Seems like HIVE-20746 only makes sure that logFile is closed at 
> the end of the day (even when no events are triggered) -- so the remaining 
> question is if its acceptible to start dropping messages here (because even 
> if we drop the messages the events are still going to happen)

@harishjp is already added to the review but he is on vacation so I don't know 
if'll respond or not. But my impression is that removing the capacity limit was 
not the goal but a side effect when the ScheduledThreadPoolExecutor was added. 
Before HIVE-20746 we were dropping events. HIVE-20746 was about making sure 
that the file is closed not about making sure that there are no drops. As far 
as I understand these events are for external systems like DAS. The event is 
still visible in the log so they're not lost. But @harishjp will hopefully 
confirm or refute this.


- Attila


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/#review218694
---


On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71784/
> ---
> 
> (Updated Nov. 19, 2019, 3:43 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
> Panos Garefalakis.
> 
> 
> Bugs: HIVE-22514
> https://issues.apache.org/jira/browse/HIVE-22514
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer 
> tasks and to periodically handle rollover. The builtin 
> ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced 
> from the outside. If log events are generated at a very fast rate this queue 
> can grow large.
> 
> Since ScheduledThreadPoolExecutor does not support changing the default 
> unbounded queue to a bounded one, the queue capacity is checked manually by 
> the patch.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
> 8eab54859bf 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
> 450a0b544d6 
> 
> 
> Diff: https://reviews.apache.org/r/71784/diff/1/
> 
> 
> Testing
> ---
> 
> unittest
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Re: Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-19 Thread Slim Bouguerra

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/#review218697
---



I recommend using a bounded queue instead of checking the size and soing the if 
else everytime  
something like this might work
```java
 BlockingQueue linkedBlockingDeque = new 
LinkedBlockingDeque(
1);
ExecutorService service = new ThreadPoolExecutor(1, 1, 30, 
TimeUnit.SECONDS, linkedBlockingDeque,
 new 
ThreadPoolExecutor.DiscardPolicy());

```

- Slim Bouguerra


On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71784/
> ---
> 
> (Updated Nov. 19, 2019, 3:43 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
> Panos Garefalakis.
> 
> 
> Bugs: HIVE-22514
> https://issues.apache.org/jira/browse/HIVE-22514
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer 
> tasks and to periodically handle rollover. The builtin 
> ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced 
> from the outside. If log events are generated at a very fast rate this queue 
> can grow large.
> 
> Since ScheduledThreadPoolExecutor does not support changing the default 
> unbounded queue to a bounded one, the queue capacity is checked manually by 
> the patch.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
> 8eab54859bf 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
> 450a0b544d6 
> 
> 
> Diff: https://reviews.apache.org/r/71784/diff/1/
> 
> 
> Testing
> ---
> 
> unittest
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Re: Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-19 Thread Panos Garefalakis via Review Board


> On Nov. 19, 2019, 10:24 p.m., Panos Garefalakis wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java
> > Lines 192 (patched)
> > 
> >
> > The solution makes sense to me, however maybe we need to investigate 
> > further why the queueCapacity change (which is similar to what you are 
> > proposing) was reverted in HIVE-20746?
> > 
> > Also this logwriter is used to track all protobuf messages right? Is it 
> > acceptable to drop messages here?

Update: Seems like HIVE-20746 only makes sure that logFile is closed at the end 
of the day (even when no events are triggered) -- so the remaining question is 
if its acceptible to start dropping messages here (because even if we drop the 
messages the events are still going to happen)


- Panos


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/#review218694
---


On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71784/
> ---
> 
> (Updated Nov. 19, 2019, 3:43 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
> Panos Garefalakis.
> 
> 
> Bugs: HIVE-22514
> https://issues.apache.org/jira/browse/HIVE-22514
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer 
> tasks and to periodically handle rollover. The builtin 
> ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced 
> from the outside. If log events are generated at a very fast rate this queue 
> can grow large.
> 
> Since ScheduledThreadPoolExecutor does not support changing the default 
> unbounded queue to a bounded one, the queue capacity is checked manually by 
> the patch.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
> 8eab54859bf 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
> 450a0b544d6 
> 
> 
> Diff: https://reviews.apache.org/r/71784/diff/1/
> 
> 
> Testing
> ---
> 
> unittest
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Re: Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-19 Thread Panos Garefalakis via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/#review218694
---




ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java
Lines 192 (patched)


The solution makes sense to me, however maybe we need to investigate 
further why the queueCapacity change (which is similar to what you are 
proposing) was reverted in HIVE-20746?

Also this logwriter is used to track all protobuf messages right? Is it 
acceptable to drop messages here?


- Panos Garefalakis


On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71784/
> ---
> 
> (Updated Nov. 19, 2019, 3:43 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
> Panos Garefalakis.
> 
> 
> Bugs: HIVE-22514
> https://issues.apache.org/jira/browse/HIVE-22514
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer 
> tasks and to periodically handle rollover. The builtin 
> ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced 
> from the outside. If log events are generated at a very fast rate this queue 
> can grow large.
> 
> Since ScheduledThreadPoolExecutor does not support changing the default 
> unbounded queue to a bounded one, the queue capacity is checked manually by 
> the patch.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
> 8eab54859bf 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
> 450a0b544d6 
> 
> 
> Diff: https://reviews.apache.org/r/71784/diff/1/
> 
> 
> Testing
> ---
> 
> unittest
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Re: Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-19 Thread Slim Bouguerra


> On Nov. 19, 2019, 6:53 p.m., Slim Bouguerra wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java
> > Lines 279 (patched)
> > 
> >
> > i am still not sure how this is going to work?
> > the original code was dropping events when the queue is full that is 
> > the case where you see the `RejectedExecutionException`
> 
> Attila Magyar wrote:
> RejectedExecutionException was never thrown with the original code 
> because of the unbounded queue. The queue continued to be larger. In the heap 
> dump there 17000 elements in the queue totally and it takes about 2.5g space.
> 
> Slim Bouguerra wrote:
> Very strange,
> As per the java doc it says ` Executor uses finite bounds for both 
> maximum threads and work queue capacity, and is saturated`
> 
> https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ThreadPoolExecutor.html
> 
> ```
> New tasks submitted in method execute(Runnable) will be rejected when the 
> Executor has been shut down, and also when the Executor uses finite bounds 
> for both maximum threads and work queue capacity, and is saturated. In either 
> case, the execute method invokes the 
> RejectedExecutionHandler.rejectedExecution(Runnable, ThreadPoolExecutor) 
> method of its RejectedExecutionHandler. Four predefined handler policies are 
> provided:
> In the default ThreadPoolExecutor.AbortPolicy, the handler throws a 
> runtime RejectedExecutionException upon rejection.
> In ThreadPoolExecutor.CallerRunsPolicy, the thread that invokes execute 
> itself runs the task. This provides a simple feedback control mechanism that 
> will slow down the rate that new tasks are submitted.
> In ThreadPoolExecutor.DiscardPolicy, a task that cannot be executed is 
> simply dropped.
> In ThreadPoolExecutor.DiscardOldestPolicy, if the executor is not shut 
> down, the task at the head of the work queue is dropped, and then execution 
> is retried (which can fail again, causing this to be repeated.)
> It is possible to define and use other kinds of RejectedExecutionHandler 
> classes. Doing so requires some care especially when policies are designed to 
> work only under particular capacity or queuing policies.
> Hook methods
> ```

You are correct i did run some tests and i do not see any rejection. Thanks


- Slim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/#review218679
---


On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71784/
> ---
> 
> (Updated Nov. 19, 2019, 3:43 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
> Panos Garefalakis.
> 
> 
> Bugs: HIVE-22514
> https://issues.apache.org/jira/browse/HIVE-22514
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer 
> tasks and to periodically handle rollover. The builtin 
> ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced 
> from the outside. If log events are generated at a very fast rate this queue 
> can grow large.
> 
> Since ScheduledThreadPoolExecutor does not support changing the default 
> unbounded queue to a bounded one, the queue capacity is checked manually by 
> the patch.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
> 8eab54859bf 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
> 450a0b544d6 
> 
> 
> Diff: https://reviews.apache.org/r/71784/diff/1/
> 
> 
> Testing
> ---
> 
> unittest
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Re: Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-19 Thread Attila Magyar


> On Nov. 19, 2019, 6:53 p.m., Slim Bouguerra wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java
> > Lines 274 (patched)
> > 
> >
> > looking at the java code i see that it is using a bounded queue so not 
> > sure what you mean by unbounded ?
> > can you please clarify ?
> 
> Attila Magyar wrote:
> No, unfortunatelly it uses an unbounded DelayedWorkQueue internally and 
> it cannot be changed.
> 
> Slim Bouguerra wrote:
> ```java
>  /**
>  * Specialized delay queue. To mesh with TPE declarations, this
>  * class must be declared as a BlockingQueue even though
>  * it can only hold RunnableScheduledFutures.
>  */
> static class DelayedWorkQueue extends AbstractQueue
> implements BlockingQueue {
> 
> ```
> but it is blocking means that it should block and therefore we get the 
> RejectedException.

Yes but a BlockingQueue can be created without a size limit. It will still 
block if I want to take out from an empty queue but it will never be full. So 
it'll never block when adding an element. 

/ The implementation might use Integer.MAX_VAL as a size limit which is 
practically almost the same (it'll likely never block or it'll block too late). 
/


- Attila


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/#review218679
---


On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71784/
> ---
> 
> (Updated Nov. 19, 2019, 3:43 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
> Panos Garefalakis.
> 
> 
> Bugs: HIVE-22514
> https://issues.apache.org/jira/browse/HIVE-22514
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer 
> tasks and to periodically handle rollover. The builtin 
> ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced 
> from the outside. If log events are generated at a very fast rate this queue 
> can grow large.
> 
> Since ScheduledThreadPoolExecutor does not support changing the default 
> unbounded queue to a bounded one, the queue capacity is checked manually by 
> the patch.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
> 8eab54859bf 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
> 450a0b544d6 
> 
> 
> Diff: https://reviews.apache.org/r/71784/diff/1/
> 
> 
> Testing
> ---
> 
> unittest
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Re: Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-19 Thread Slim Bouguerra


> On Nov. 19, 2019, 6:53 p.m., Slim Bouguerra wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java
> > Lines 274 (patched)
> > 
> >
> > looking at the java code i see that it is using a bounded queue so not 
> > sure what you mean by unbounded ?
> > can you please clarify ?
> 
> Attila Magyar wrote:
> No, unfortunatelly it uses an unbounded DelayedWorkQueue internally and 
> it cannot be changed.

```java
 /**
 * Specialized delay queue. To mesh with TPE declarations, this
 * class must be declared as a BlockingQueue even though
 * it can only hold RunnableScheduledFutures.
 */
static class DelayedWorkQueue extends AbstractQueue
implements BlockingQueue {

```
but it is blocking means that it should block and therefore we get the 
RejectedException.


> On Nov. 19, 2019, 6:53 p.m., Slim Bouguerra wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java
> > Lines 279 (patched)
> > 
> >
> > i am still not sure how this is going to work?
> > the original code was dropping events when the queue is full that is 
> > the case where you see the `RejectedExecutionException`
> 
> Attila Magyar wrote:
> RejectedExecutionException was never thrown with the original code 
> because of the unbounded queue. The queue continued to be larger. In the heap 
> dump there 17000 elements in the queue totally and it takes about 2.5g space.

Very strange,
As per the java doc it says ` Executor uses finite bounds for both maximum 
threads and work queue capacity, and is saturated`
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ThreadPoolExecutor.html

```
New tasks submitted in method execute(Runnable) will be rejected when the 
Executor has been shut down, and also when the Executor uses finite bounds for 
both maximum threads and work queue capacity, and is saturated. In either case, 
the execute method invokes the 
RejectedExecutionHandler.rejectedExecution(Runnable, ThreadPoolExecutor) method 
of its RejectedExecutionHandler. Four predefined handler policies are provided:
In the default ThreadPoolExecutor.AbortPolicy, the handler throws a runtime 
RejectedExecutionException upon rejection.
In ThreadPoolExecutor.CallerRunsPolicy, the thread that invokes execute itself 
runs the task. This provides a simple feedback control mechanism that will slow 
down the rate that new tasks are submitted.
In ThreadPoolExecutor.DiscardPolicy, a task that cannot be executed is simply 
dropped.
In ThreadPoolExecutor.DiscardOldestPolicy, if the executor is not shut down, 
the task at the head of the work queue is dropped, and then execution is 
retried (which can fail again, causing this to be repeated.)
It is possible to define and use other kinds of RejectedExecutionHandler 
classes. Doing so requires some care especially when policies are designed to 
work only under particular capacity or queuing policies.
Hook methods
```


- Slim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/#review218679
---


On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71784/
> ---
> 
> (Updated Nov. 19, 2019, 3:43 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
> Panos Garefalakis.
> 
> 
> Bugs: HIVE-22514
> https://issues.apache.org/jira/browse/HIVE-22514
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer 
> tasks and to periodically handle rollover. The builtin 
> ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced 
> from the outside. If log events are generated at a very fast rate this queue 
> can grow large.
> 
> Since ScheduledThreadPoolExecutor does not support changing the default 
> unbounded queue to a bounded one, the queue capacity is checked manually by 
> the patch.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
> 8eab54859bf 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
> 450a0b544d6 
> 
> 
> Diff: https://reviews.apache.org/r/71784/diff/1/
> 
> 
> Testing
> ---
> 
> unittest
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Re: Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-19 Thread Attila Magyar


> On Nov. 19, 2019, 6:58 p.m., Slim Bouguerra wrote:
> > the description is kind of confusing, is this a leak or we have spike of 
> > overload ?
> > Leak means we are not cleaning the resources thus that is why we have an 
> > OOM.
> > What you are describing seems to be a system overload that cause a memory 
> > spike.

Yeat, that's right, callig it overload might be better than a leak, i'll modify 
the description.


- Attila


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/#review218680
---


On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71784/
> ---
> 
> (Updated Nov. 19, 2019, 3:43 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
> Panos Garefalakis.
> 
> 
> Bugs: HIVE-22514
> https://issues.apache.org/jira/browse/HIVE-22514
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer 
> tasks and to periodically handle rollover. The builtin 
> ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced 
> from the outside. If log events are generated at a very fast rate this queue 
> can grow large.
> 
> Since ScheduledThreadPoolExecutor does not support changing the default 
> unbounded queue to a bounded one, the queue capacity is checked manually by 
> the patch.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
> 8eab54859bf 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
> 450a0b544d6 
> 
> 
> Diff: https://reviews.apache.org/r/71784/diff/1/
> 
> 
> Testing
> ---
> 
> unittest
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Re: Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-19 Thread Attila Magyar


> On Nov. 19, 2019, 6:53 p.m., Slim Bouguerra wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java
> > Line 217 (original), 219 (patched)
> > 
> >
> > how this can solve the issue ?
> > Seems like it is doing the same thing 
> > 
> > `Executors.newSingleThreadScheduledExecutor`
> > is the same as what you are doing
> > ```
> > public static ScheduledExecutorService 
> > newSingleThreadScheduledExecutor(ThreadFactory threadFactory) {
> > return new DelegatedScheduledExecutorService
> > (new ScheduledThreadPoolExecutor(1, threadFactory));
> > }
> > ```

This not the fix. This change is only needed to get back the proper type so 
that I can invoke the getQueue() method. I can't do that on 
ScheduledExecutorService which is interface that is returned by 
Executors.newSingleThreadScheduledExecutor. I need the concreate class 
(ScheduledThreadPoolExecutor).


> On Nov. 19, 2019, 6:53 p.m., Slim Bouguerra wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java
> > Lines 274 (patched)
> > 
> >
> > looking at the java code i see that it is using a bounded queue so not 
> > sure what you mean by unbounded ?
> > can you please clarify ?

No, unfortunatelly it uses an unbounded DelayedWorkQueue internally and it 
cannot be changed.


> On Nov. 19, 2019, 6:53 p.m., Slim Bouguerra wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java
> > Lines 279 (patched)
> > 
> >
> > i am still not sure how this is going to work?
> > the original code was dropping events when the queue is full that is 
> > the case where you see the `RejectedExecutionException`

RejectedExecutionException was never thrown with the original code because of 
the unbounded queue. The queue continued to be larger. In the heap dump there 
17000 elements in the queue totally and it takes about 2.5g space.


- Attila


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/#review218679
---


On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71784/
> ---
> 
> (Updated Nov. 19, 2019, 3:43 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
> Panos Garefalakis.
> 
> 
> Bugs: HIVE-22514
> https://issues.apache.org/jira/browse/HIVE-22514
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer 
> tasks and to periodically handle rollover. The builtin 
> ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced 
> from the outside. If log events are generated at a very fast rate this queue 
> can grow large.
> 
> Since ScheduledThreadPoolExecutor does not support changing the default 
> unbounded queue to a bounded one, the queue capacity is checked manually by 
> the patch.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
> 8eab54859bf 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
> 450a0b544d6 
> 
> 
> Diff: https://reviews.apache.org/r/71784/diff/1/
> 
> 
> Testing
> ---
> 
> unittest
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Re: Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-19 Thread Slim Bouguerra

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/#review218680
---



the description is kind of confusing, is this a leak or we have spike of 
overload ?
Leak means we are not cleaning the resources thus that is why we have an OOM.
What you are describing seems to be a system overload that cause a memory spike.

- Slim Bouguerra


On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71784/
> ---
> 
> (Updated Nov. 19, 2019, 3:43 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
> Panos Garefalakis.
> 
> 
> Bugs: HIVE-22514
> https://issues.apache.org/jira/browse/HIVE-22514
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer 
> tasks and to periodically handle rollover. The builtin 
> ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced 
> from the outside. If log events are generated at a very fast rate this queue 
> can grow large.
> 
> Since ScheduledThreadPoolExecutor does not support changing the default 
> unbounded queue to a bounded one, the queue capacity is checked manually by 
> the patch.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
> 8eab54859bf 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
> 450a0b544d6 
> 
> 
> Diff: https://reviews.apache.org/r/71784/diff/1/
> 
> 
> Testing
> ---
> 
> unittest
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Re: Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-19 Thread Slim Bouguerra

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/#review218679
---




ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java
Line 217 (original), 219 (patched)


how this can solve the issue ?
Seems like it is doing the same thing 

`Executors.newSingleThreadScheduledExecutor`
is the same as what you are doing
```
public static ScheduledExecutorService 
newSingleThreadScheduledExecutor(ThreadFactory threadFactory) {
return new DelegatedScheduledExecutorService
(new ScheduledThreadPoolExecutor(1, threadFactory));
}
```



ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java
Lines 274 (patched)


looking at the java code i see that it is using a bounded queue so not sure 
what you mean by unbounded ?
can you please clarify ?



ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java
Lines 279 (patched)


i am still not sure how this is going to work?
the original code was dropping events when the queue is full that is the 
case where you see the `RejectedExecutionException`


- Slim Bouguerra


On Nov. 19, 2019, 3:43 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71784/
> ---
> 
> (Updated Nov. 19, 2019, 3:43 p.m.)
> 
> 
> Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
> Panos Garefalakis.
> 
> 
> Bugs: HIVE-22514
> https://issues.apache.org/jira/browse/HIVE-22514
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer 
> tasks and to periodically handle rollover. The builtin 
> ScheduledThreadPoolExecutor uses a unbounded queue which cannot be replaced 
> from the outside. If log events are generated at a very fast rate this queue 
> can grow large.
> 
> Since ScheduledThreadPoolExecutor does not support changing the default 
> unbounded queue to a bounded one, the queue capacity is checked manually by 
> the patch.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
> 8eab54859bf 
>   ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
> 450a0b544d6 
> 
> 
> Diff: https://reviews.apache.org/r/71784/diff/1/
> 
> 
> Testing
> ---
> 
> unittest
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>



Re: Review Request 71708: HIVE-22435

2019-11-19 Thread Krisztian Kasa

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71708/
---

(Updated Nov. 19, 2019, 4:25 p.m.)


Review request for hive, Gopal V, Jesús Camacho Rodríguez, Zoltan Haindrich, 
and Rajesh Balamohan.


Bugs: HIVE-20148 and HIVE-22435
https://issues.apache.org/jira/browse/HIVE-20148
https://issues.apache.org/jira/browse/HIVE-22435


Repository: hive-git


Description
---

Exception when using VectorTopNKeyOperator operator
===

VectorTopNKeyOperator extends TopNKeyOperator and it calls it's 
super.initializeOp method
https://github.com/apache/hive/blob/5c8392468cb581f53b6cb55d201fc933dca025e3/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java#L71

which is focusing on non-vectorized execution.

Fix: Derive VectorTopNKeyOperator from Oprator instead of TopNKeyOperator and 
do the initialization 
- Use VectorHashKeyWrapperBatch to store the current batch keys
- Add support of NULLS FIRST/LAST


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 2918a6852c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java d16500ef05 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 
c80bc804a2 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperBatch.java
 dd31991d03 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneralComparator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
4cc02b4975 
  ql/src/test/queries/clientpositive/topnkey.q 283f426f18 
  ql/src/test/queries/clientpositive/topnkey_order_null.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vector_topnkey.q e1b7d26afe 
  ql/src/test/results/clientpositive/llap/topnkey.q.out cd47e9d223 
  ql/src/test/results/clientpositive/llap/topnkey_order_null.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/llap/vector_topnkey.q.out d859270ff0 
  ql/src/test/results/clientpositive/tez/topnkey.q.out 206c0c805d 
  ql/src/test/results/clientpositive/tez/topnkey_order_null.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/tez/vector_topnkey.q.out b6760db156 
  ql/src/test/results/clientpositive/topnkey.q.out 66d0eca989 
  ql/src/test/results/clientpositive/topnkey_order_null.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vector_topnkey.q.out 3438be2dc0 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectComparator.java
 PRE-CREATION 


Diff: https://reviews.apache.org/r/71708/diff/7/

Changes: https://reviews.apache.org/r/71708/diff/6-7/


Testing
---

run q test: vector_topnkey and limit_pushdown3 after applying the patch for 
TopNKey pushdown 
(https://issues.apache.org/jira/secure/attachment/12984389/HIVE-20150.15.patch)


Thanks,

Krisztian Kasa



Review Request 71784: HiveProtoLoggingHook might leak memory

2019-11-19 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/
---

Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
Panos Garefalakis.


Bugs: HIVE-22514
https://issues.apache.org/jira/browse/HIVE-22514


Repository: hive-git


Description
---

HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer tasks 
and to periodically handle rollover. The builtin ScheduledThreadPoolExecutor 
uses a unbounded queue which cannot be replaced from the outside. If log events 
are generated at a very fast rate this queue can grow large.

Since ScheduledThreadPoolExecutor does not support changing the default 
unbounded queue to a bounded one, the queue capacity is checked manually by the 
patch.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
  ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
8eab54859bf 
  ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
450a0b544d6 


Diff: https://reviews.apache.org/r/71784/diff/1/


Testing
---

unittest


Thanks,

Attila Magyar



[jira] [Created] (HIVE-22514) HiveProtoLoggingHook might leak memory

2019-11-19 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-22514:


 Summary: HiveProtoLoggingHook might leak memory
 Key: HIVE-22514
 URL: https://issues.apache.org/jira/browse/HIVE-22514
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 4.0.0
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0
 Attachments: Screen Shot 2019-11-18 at 2.19.24 PM.png

HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer tasks 
and to periodically handle rollover. The builtin ScheduledThreadPoolExecutor 
uses a unbounded queue which cannot be replaced from the outside. If log events 
are generated at a very fast rate this queue can grow large.

!Screen Shot 2019-11-18 at 2.19.24 PM.png|width=650,height=101!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22513) Constant propagation of casted column in filter ops can cause incorrect results

2019-11-19 Thread Jira
Ádám Szita created HIVE-22513:
-

 Summary: Constant propagation of casted column in filter ops can 
cause incorrect results
 Key: HIVE-22513
 URL: https://issues.apache.org/jira/browse/HIVE-22513
 Project: Hive
  Issue Type: Bug
Reporter: Ádám Szita
Assignee: Ádám Szita


This issue happens if CBO is disabled.

We should not be propagating constants if the corresponding ExprNodeColumnDesc 
instance is wrapped inside a CAST operator as casting might truncate 
information from the original column.

This can happen if we're using CAST in a WHERE clause, which will cause the 
projected columns to be replaced in a SELECT operator. Their new value will be 
the result of casting which could be a different value compared to that in the 
original column:
{code:java}
set hive.cbo.enable=false;
set hive.fetch.task.conversion=more; --just for testing convenience
create table testtb (id string);
insert into testtb values('2019-11-05 01:01:11');
select id, CAST(id AS VARCHAR(10)) from testtb where CAST(id AS VARCHAR(9)) = 
'2019-11-0';

+++
|     id     |    _c1     |
+++
| 2019-11-0  | 2019-11-0  |
+++
1 row selected (0.168 seconds)

-- VS expected: 2019-11-05 01:01:11 | 2019-11-05 {code}
As to what types of casting (from and where types) cause information loss it's 
hard to properly keep track of, and I don't think it should be taken into 
consideration when deciding whether or not to propagate a constant. Rather than 
adding a big and potentially convoluted and fragile check for this, I propose 
to prevent constant mappings to be spawned out of CASTed columns.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: ORC: duplicate record - rowid meaning ?

2019-11-19 Thread David Morin
here after more details about ORC content and the fact we have duplicate
rows:

/delta_0011365_0011365_/bucket_3

{"operation":0,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11365,"row":{"TS":1574156027915254212,"cle":5218,...}}
{"operation":0,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11365,"row":{"TS":1574156027915075038,"cle":5216,...}}


/delta_0011368_0011368_/bucket_3

{"operation":2,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11368,"row":null}
{"operation":2,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11368,"row":null}

/delta_0011369_0011369_/bucket_3

{"operation":0,"originalTransaction":11369,"bucket":3,"rowId":1,"currentTransaction":11369,"row":{"TS":1574157407855174144,"cle":5216,...}}
{"operation":0,"originalTransaction":11369,"bucket":3,"rowId":0,"currentTransaction":11369,"row":{"TS":1574157407855265906,"cle":5218,...}}

+-+---+--+
| row__id |  cle  |
+-+---+--+
| {"transactionid":11367,"bucketid":0,"rowid":0}  | 5209  |
| {"transactionid":11369,"bucketid":0,"rowid":0}  | 5211  |
| {"transactionid":11369,"bucketid":1,"rowid":0}  | 5210  |
| {"transactionid":11369,"bucketid":2,"rowid":0}  | 5214  |
| {"transactionid":11369,"bucketid":2,"rowid":1}  | 5215  |
| {"transactionid":11365,"bucketid":3,"rowid":0}  | *5218*  |
| {"transactionid":11365,"bucketid":3,"rowid":1}  | *5216*  |
| {"transactionid":11369,"bucketid":3,"rowid":1}  | *5216*  |
| {"transactionid":11369,"bucketid":3,"rowid":0}  | *5218*  |
| {"transactionid":11369,"bucketid":4,"rowid":0}  | 5217  |
| {"transactionid":11369,"bucketid":4,"rowid":1}  | 5213  |
| {"transactionid":11369,"bucketid":7,"rowid":0}  | 5212  |
+-+---+--+

As you can see we have duplicate rows for column "cle" 5216 and 5218
Do we have to keep the rowids ordered ? because this is the only difference
I have noticed based on some tests with beeline.

Thanks



Le mar. 19 nov. 2019 à 00:18, David Morin  a
écrit :

> Hello,
>
> I'm trying to understand the purpose of the rowid column inside ORC delta
> file
> {"transactionid":11359,"bucketid":5,"*rowid*":0}
> Orc view: {"operation":0,"originalTransaction":11359,"bucket":5,"*rowId*
> ":0,"currentTransaction":11359,"row":...}
> I use HDP 2.6 => Hive 2
>
> If I want to be idempotent with INSERT / DELETE / INSERT.
> Do we have to keep the same rowid ?
> It seems that when the rowid is changed during the second INSERT I have a
> duplicate row.
> For me, I can create a new rowid for the new transaction during the second
> INSERT but that seems to generate duplicate records.
>
> Regards,
> David
>
>
>
>


[jira] [Created] (HIVE-22512) Use direct SQL to fetch column privileges in refreshPrivileges

2019-11-19 Thread Ashutosh Bapat (Jira)
Ashutosh Bapat created HIVE-22512:
-

 Summary: Use direct SQL to fetch column privileges in 
refreshPrivileges
 Key: HIVE-22512
 URL: https://issues.apache.org/jira/browse/HIVE-22512
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 4.0.0
Reporter: Ashutosh Bapat
Assignee: Ashutosh Bapat


refreshPrivileges() calls listTableAllColumnGrants() to fetch the column level 
privileges. The later function retrieves the individual column objects by 
firing one query per column privilege object, thus causing the backend db to be 
swamped by these queries when PrivilegeSynchronizer is run. 
PrivilegeSynchronizer synchronizes privileges of all the databases, tables and 
columns and thus the backend db can get swamped really bad when there are 
thousands of tables with hundreds of columns.

The output of listTableAllColumnGrants() is not used completely so all the 
columns the PM has tried to retrieves anyway goes waste.

Fix this by using direct SQL to fetch column privileges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22511) Fix case of Month token in datetime to string conversion

2019-11-19 Thread Gabor Kaszab (Jira)
Gabor Kaszab created HIVE-22511:
---

 Summary: Fix case of Month token in datetime to string conversion
 Key: HIVE-22511
 URL: https://issues.apache.org/jira/browse/HIVE-22511
 Project: Hive
  Issue Type: Bug
Reporter: Gabor Kaszab


Currently Hive doesn't allow month tokens with weird spelling like 'MONth', 
'mONTH' etc. However, Oracle does and Hive should follow that approach.

The rules:
- If the first letter is lowercase then the output is lowercase: 'mONTH' -> 
'may'
- If the first two letters are uppercase then the output is uppercase: 'MOnth' 
-> 'MAY'
- If the first letter is uppercase and the second is lowercase then the output 
is capitalized: 'Month' -> 'May'.

Oracle:
{code:java}
select to_char(to_timestamp('2019-05-10', '-MM-DD'), 'MOnth') from DUAL;
MAY  2019

select to_char(to_timestamp('2019-05-10', '-MM-DD'), 'mONTH') from DUAL;
may  2019

select to_char(to_timestamp('2019-05-10', '-MM-DD'), 'MoNTH') from DUAL;
May  2019
{code}

Please check the same for 'Name of the day' tokens.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 71708: HIVE-22435

2019-11-19 Thread Krisztian Kasa

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71708/
---

(Updated Nov. 19, 2019, 8:39 a.m.)


Review request for hive, Gopal V, Jesús Camacho Rodríguez, Zoltan Haindrich, 
and Rajesh Balamohan.


Bugs: HIVE-20148 and HIVE-22435
https://issues.apache.org/jira/browse/HIVE-20148
https://issues.apache.org/jira/browse/HIVE-22435


Repository: hive-git


Description
---

Exception when using VectorTopNKeyOperator operator
===

VectorTopNKeyOperator extends TopNKeyOperator and it calls it's 
super.initializeOp method
https://github.com/apache/hive/blob/5c8392468cb581f53b6cb55d201fc933dca025e3/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java#L71

which is focusing on non-vectorized execution.

Fix: Derive VectorTopNKeyOperator from Oprator instead of TopNKeyOperator and 
do the initialization 
- Use VectorHashKeyWrapperBatch to store the current batch keys
- Add support of NULLS FIRST/LAST


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 2918a6852c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyFilter.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java d16500ef05 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 
c80bc804a2 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperBatch.java
 dd31991d03 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneralComparator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
4cc02b4975 
  ql/src/test/queries/clientpositive/topnkey.q 283f426f18 
  ql/src/test/queries/clientpositive/topnkey_order_null.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vector_topnkey.q e1b7d26afe 
  ql/src/test/results/clientpositive/llap/topnkey.q.out cd47e9d223 
  ql/src/test/results/clientpositive/llap/topnkey_order_null.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/llap/vector_topnkey.q.out d859270ff0 
  ql/src/test/results/clientpositive/tez/topnkey.q.out 206c0c805d 
  ql/src/test/results/clientpositive/tez/topnkey_order_null.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/tez/vector_topnkey.q.out b6760db156 
  ql/src/test/results/clientpositive/topnkey.q.out 66d0eca989 
  ql/src/test/results/clientpositive/topnkey_order_null.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vector_topnkey.q.out 3438be2dc0 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectComparator.java
 PRE-CREATION 


Diff: https://reviews.apache.org/r/71708/diff/6/

Changes: https://reviews.apache.org/r/71708/diff/5-6/


Testing
---

run q test: vector_topnkey and limit_pushdown3 after applying the patch for 
TopNKey pushdown 
(https://issues.apache.org/jira/secure/attachment/12984389/HIVE-20150.15.patch)


Thanks,

Krisztian Kasa