Re: Review Request 69107: HIVE-20512

2018-11-08 Thread Bharathkrishna Guruvayoor Murali via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/
---

(Updated Nov. 8, 2018, 5:16 p.m.)


Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang Karajgaonkar.


Changes
---

Adding awaitTermination and shutDownNow after cancelling the thread in close().


Repository: hive-git


Description
---

Improve record and memory usage logging in SparkRecordHandler


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
88dd12c05ade417aca4cdaece4448d31d4e1d65f 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
 8880bb604e088755dcfb0bcb39689702fab0cb77 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 
20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 


Diff: https://reviews.apache.org/r/69107/diff/7/

Changes: https://reviews.apache.org/r/69107/diff/6-7/


Testing
---


Thanks,

Bharathkrishna Guruvayoor Murali



Re: Review Request 69107: HIVE-20512

2018-11-07 Thread Bharathkrishna Guruvayoor Murali via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/
---

(Updated Nov. 7, 2018, 8:52 p.m.)


Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang Karajgaonkar.


Changes
---

Adding scheduledFuture.cancel with shutDown


Repository: hive-git


Description
---

Improve record and memory usage logging in SparkRecordHandler


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
88dd12c05ade417aca4cdaece4448d31d4e1d65f 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
 8880bb604e088755dcfb0bcb39689702fab0cb77 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 
20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 


Diff: https://reviews.apache.org/r/69107/diff/6/

Changes: https://reviews.apache.org/r/69107/diff/5-6/


Testing
---


Thanks,

Bharathkrishna Guruvayoor Murali



Re: Review Request 69107: HIVE-20512

2018-11-05 Thread Bharathkrishna Guruvayoor Murali via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/
---

(Updated Nov. 6, 2018, 5:08 a.m.)


Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang Karajgaonkar.


Changes
---

Adding shutDown and awaitTermination. Addressed the review comment.


Repository: hive-git


Description
---

Improve record and memory usage logging in SparkRecordHandler


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
88dd12c05ade417aca4cdaece4448d31d4e1d65f 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
 8880bb604e088755dcfb0bcb39689702fab0cb77 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 
20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 


Diff: https://reviews.apache.org/r/69107/diff/5/

Changes: https://reviews.apache.org/r/69107/diff/4-5/


Testing
---


Thanks,

Bharathkrishna Guruvayoor Murali



Re: Review Request 69107: HIVE-20512

2018-11-05 Thread Sahil Takiar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/#review210338
---




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Lines 135 (patched)


Should call `Thread.currentThread().interrupt();` after this - see 
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorService.html


- Sahil Takiar


On Oct. 31, 2018, 11:15 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69107/
> ---
> 
> (Updated Oct. 31, 2018, 11:15 p.m.)
> 
> 
> Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang 
> Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Improve record and memory usage logging in SparkRecordHandler
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
> 88dd12c05ade417aca4cdaece4448d31d4e1d65f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
>  8880bb604e088755dcfb0bcb39689702fab0cb77 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
> cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 
> 
> 
> Diff: https://reviews.apache.org/r/69107/diff/4/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>



Re: Review Request 69107: HIVE-20512

2018-10-31 Thread Bharathkrishna Guruvayoor Murali via Review Board


> On Oct. 31, 2018, 5:29 p.m., Antal Sinkovits wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
> > Line 62 (original), 83 (patched)
> > 
> >
> > Probably these three rows can move to the MemoryInfoLogger class  (as 
> > static method), as from now on its his responsibility.
> > 
> > Something like MemoryInfoLogger.start() or schedule

It is used by the close method as well, so I will probably leave it there 
itself.


> On Oct. 31, 2018, 5:29 p.m., Antal Sinkovits wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
> > Line 574 (original), 572 (patched)
> > 
> >
> > Is this correct? In batch processing increasing it with only one?

I see that when I go to implementations of reducer.process(batch, 0);, it 
considers the batch as a single row. And there too they are incrementing 
runTimeRows++
So that is why it is probably done in such a way to increment row count by 1 
itself.


- Bharathkrishna


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/#review210230
---


On Oct. 26, 2018, 5:13 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69107/
> ---
> 
> (Updated Oct. 26, 2018, 5:13 p.m.)
> 
> 
> Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang 
> Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Improve record and memory usage logging in SparkRecordHandler
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
> 88dd12c05ade417aca4cdaece4448d31d4e1d65f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
>  8880bb604e088755dcfb0bcb39689702fab0cb77 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
> cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 
> 
> 
> Diff: https://reviews.apache.org/r/69107/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>



Re: Review Request 69107: HIVE-20512

2018-10-31 Thread Bharathkrishna Guruvayoor Murali via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/
---

(Updated Oct. 31, 2018, 11:15 p.m.)


Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang Karajgaonkar.


Changes
---

Added change for threadfactory to have daemon threads.
Addressed review comments. I am using shutDownNow instead of shutDown because 
we really do not need awaitTermination as shutDownNow will anyways interrupt 
the existing threads, and will not accept new tasks.
For awaitTermination, the main thread has to wait for a fixed amount of time 
which I feel is not really useful here, as we are not executing any critical 
task, we do not care about the logger thread as we will anyways log final count 
and memory usage in close method.


Repository: hive-git


Description
---

Improve record and memory usage logging in SparkRecordHandler


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
88dd12c05ade417aca4cdaece4448d31d4e1d65f 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
 8880bb604e088755dcfb0bcb39689702fab0cb77 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 
20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 


Diff: https://reviews.apache.org/r/69107/diff/4/

Changes: https://reviews.apache.org/r/69107/diff/3-4/


Testing
---


Thanks,

Bharathkrishna Guruvayoor Murali



Re: Review Request 69107: HIVE-20512

2018-10-31 Thread Antal Sinkovits via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/#review210230
---




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
Line 93 (original), 93 (patched)


I'm not sure, but shouldnt we call incrementRowNumber from here as well?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Lines 67 (patched)


I think it should be deamon. What do you think?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Line 62 (original), 83 (patched)


Probably these three rows can move to the MemoryInfoLogger class  (as 
static method), as from now on its his responsibility.

Something like MemoryInfoLogger.start() or schedule



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
Line 574 (original), 572 (patched)


Is this correct? In batch processing increasing it with only one?


- Antal Sinkovits


On okt. 26, 2018, 5:13 du, Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69107/
> ---
> 
> (Updated okt. 26, 2018, 5:13 du)
> 
> 
> Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang 
> Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Improve record and memory usage logging in SparkRecordHandler
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
> 88dd12c05ade417aca4cdaece4448d31d4e1d65f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
>  8880bb604e088755dcfb0bcb39689702fab0cb77 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
> cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 
> 
> 
> Diff: https://reviews.apache.org/r/69107/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>



Re: Review Request 69107: HIVE-20512

2018-10-26 Thread Bharathkrishna Guruvayoor Murali via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/
---

(Updated Oct. 26, 2018, 5:13 p.m.)


Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang Karajgaonkar.


Repository: hive-git


Description
---

Improve record and memory usage logging in SparkRecordHandler


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
88dd12c05ade417aca4cdaece4448d31d4e1d65f 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
 8880bb604e088755dcfb0bcb39689702fab0cb77 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 
20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 


Diff: https://reviews.apache.org/r/69107/diff/3/

Changes: https://reviews.apache.org/r/69107/diff/2-3/


Testing
---


Thanks,

Bharathkrishna Guruvayoor Murali



Re: Review Request 69107: HIVE-20512

2018-10-25 Thread Sahil Takiar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/#review210041
---




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Line 63 (original), 67 (patched)


intialize this above and mark it as final, since its accessed by the 
MemoryInfoLogger thread it needs to be thread safe.

use a custom `ThreadFactory` for the pool. You can use Guava's 
`ThreadFactoryBuilder` - the pool should use daemon threads, specify a name 
format that includes something like `MemoryAndRowLogger`, and a customer 
uncaught exception handler that should just log any exceptions that are caught



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Lines 113 (patched)


instead of just calling `shutdownNow` you should call `shutdown` and then 
run `awaitTermination` with a wait time of say 30 seconds, and then call 
`shutdownNow`. This allows for orderly shutdown of the executor. All in 
progress tasks are allowed to complete.

this will also require handling the race condition where the 
`MemoryInfoLogger` is tries to schedule a task on a shutdown executor. You will 
probably have to use a a custom `RejectedExecutionHandler` - probably the 
`ThreadPoolExecutor.DiscardPolicy`


- Sahil Takiar


On Oct. 24, 2018, 8:55 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69107/
> ---
> 
> (Updated Oct. 24, 2018, 8:55 p.m.)
> 
> 
> Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang 
> Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Improve record and memory usage logging in SparkRecordHandler
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
> 88dd12c05ade417aca4cdaece4448d31d4e1d65f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
>  8880bb604e088755dcfb0bcb39689702fab0cb77 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
> cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 
> 
> 
> Diff: https://reviews.apache.org/r/69107/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>



Re: Review Request 69107: HIVE-20512

2018-10-25 Thread Sahil Takiar


> On Oct. 23, 2018, 7:50 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
> > Line 49 (original), 52 (patched)
> > 
> >
> > i think volatile long is sufficient here and is probably cheaper. 
> > atomics might be expensive when done per row
> 
> Bharathkrishna Guruvayoor Murali wrote:
> I first used volatile, but I replaced it with AtomicLong because the 
> rowNumber needs to be incremented and rowNumber++ on a volatile variable is 
> not considered a safe operation. What do you think about that?

i think volatile should still be fine because there is no contention on the 
variable - e.g. it is only updated by a single thread at a time. as long we 
maintain that invariant we should be fine. would be good to add some javadocs 
saying that we only expect this variable to be updated by a single thread at a 
time.


> On Oct. 23, 2018, 7:50 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
> > Line 50 (original), 53 (patched)
> > 
> >
> > this need to be volatile since it is modified by the timer task
> 
> Bharathkrishna Guruvayoor Murali wrote:
> This variable is also used as 
> logThresholdInterval = Math.min(maxLogThresholdInterval, 2 * 
> logThresholdInterval);
> 
> Non-atomic operation. So should I make this variable atomic as well?

same as above, i think volatile should be ok as long as a single thread access 
logThresholdInterval at a time.


- Sahil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/#review209935
---


On Oct. 24, 2018, 8:55 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69107/
> ---
> 
> (Updated Oct. 24, 2018, 8:55 p.m.)
> 
> 
> Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang 
> Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Improve record and memory usage logging in SparkRecordHandler
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
> 88dd12c05ade417aca4cdaece4448d31d4e1d65f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
>  8880bb604e088755dcfb0bcb39689702fab0cb77 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
> cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 
> 
> 
> Diff: https://reviews.apache.org/r/69107/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>



Re: Review Request 69107: HIVE-20512

2018-10-25 Thread Sahil Takiar


> On Oct. 24, 2018, 8:58 p.m., Bharathkrishna Guruvayoor Murali wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
> > Line 67 (original), 67 (patched)
> > 
> >
> > Creating this as a threadPool of size 1. I guess that is fine, as we 
> > know only one thread will be used at any point?

yes the size should be 1


- Sahil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/#review209986
---


On Oct. 24, 2018, 8:55 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69107/
> ---
> 
> (Updated Oct. 24, 2018, 8:55 p.m.)
> 
> 
> Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang 
> Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Improve record and memory usage logging in SparkRecordHandler
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
> 88dd12c05ade417aca4cdaece4448d31d4e1d65f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
>  8880bb604e088755dcfb0bcb39689702fab0cb77 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
> cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 
> 
> 
> Diff: https://reviews.apache.org/r/69107/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>



Re: Review Request 69107: HIVE-20512

2018-10-24 Thread Bharathkrishna Guruvayoor Murali via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/#review209986
---




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Line 67 (original), 67 (patched)


Creating this as a threadPool of size 1. I guess that is fine, as we know 
only one thread will be used at any point?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Line 116 (original), 113 (patched)


Using shutDownNow instead of shutDown to cancel pending tasks


- Bharathkrishna Guruvayoor Murali


On Oct. 24, 2018, 8:55 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69107/
> ---
> 
> (Updated Oct. 24, 2018, 8:55 p.m.)
> 
> 
> Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang 
> Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Improve record and memory usage logging in SparkRecordHandler
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
> 88dd12c05ade417aca4cdaece4448d31d4e1d65f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
>  8880bb604e088755dcfb0bcb39689702fab0cb77 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
> cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 
> 
> 
> Diff: https://reviews.apache.org/r/69107/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>



Re: Review Request 69107: HIVE-20512

2018-10-24 Thread Bharathkrishna Guruvayoor Murali via Review Board


> On Oct. 24, 2018, 8:51 a.m., Antal Sinkovits wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
> > Lines 56 (patched)
> > 
> >
> > Any reason why we use Timer?
> > 
> > From the timer java docs: 
> > 
> > "Java 5.0 introduced the java.util.concurrent package and one of the 
> > concurrency utilities therein is the ScheduledThreadPoolExecutor which is a 
> > thread pool for repeatedly executing tasks at a given rate or delay. It is 
> > effectively a more versatile replacement for the Timer/TimerTask 
> > combination, as it allows multiple service threads, accepts various time 
> > units, and doesn't require subclassing TimerTask (just implement Runnable). 
> > Configuring ScheduledThreadPoolExecutor with one thread makes it equivalent 
> > to Timer."

I used timer as there was only one thread, bbut I have changed it to 
ScheduledThreadPoolExecutor, as it looks more extensible


- Bharathkrishna


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/#review209950
---


On Oct. 24, 2018, 8:55 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69107/
> ---
> 
> (Updated Oct. 24, 2018, 8:55 p.m.)
> 
> 
> Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang 
> Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Improve record and memory usage logging in SparkRecordHandler
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
> 88dd12c05ade417aca4cdaece4448d31d4e1d65f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
>  8880bb604e088755dcfb0bcb39689702fab0cb77 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
> cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 
> 
> 
> Diff: https://reviews.apache.org/r/69107/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>



Re: Review Request 69107: HIVE-20512

2018-10-24 Thread Bharathkrishna Guruvayoor Murali via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/
---

(Updated Oct. 24, 2018, 8:55 p.m.)


Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang Karajgaonkar.


Changes
---

Using ThreadPoolExecutor
Addressed some review comments, and have added some comments which I want to be 
specifically reviewed.


Repository: hive-git


Description
---

Improve record and memory usage logging in SparkRecordHandler


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
88dd12c05ade417aca4cdaece4448d31d4e1d65f 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
 8880bb604e088755dcfb0bcb39689702fab0cb77 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 
20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 


Diff: https://reviews.apache.org/r/69107/diff/2/

Changes: https://reviews.apache.org/r/69107/diff/1-2/


Testing
---


Thanks,

Bharathkrishna Guruvayoor Murali



Re: Review Request 69107: HIVE-20512

2018-10-24 Thread Bharathkrishna Guruvayoor Murali via Review Board


> On Oct. 23, 2018, 7:50 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
> > Line 50 (original), 53 (patched)
> > 
> >
> > this need to be volatile since it is modified by the timer task

This variable is also used as 
logThresholdInterval = Math.min(maxLogThresholdInterval, 2 * 
logThresholdInterval);

Non-atomic operation. So should I make this variable atomic as well?


- Bharathkrishna


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/#review209935
---


On Oct. 20, 2018, 7:13 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69107/
> ---
> 
> (Updated Oct. 20, 2018, 7:13 p.m.)
> 
> 
> Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang 
> Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Improve record and memory usage logging in SparkRecordHandler
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
> 88dd12c05ade417aca4cdaece4448d31d4e1d65f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
>  8880bb604e088755dcfb0bcb39689702fab0cb77 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
> cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 
> 
> 
> Diff: https://reviews.apache.org/r/69107/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>



Re: Review Request 69107: HIVE-20512

2018-10-24 Thread Bharathkrishna Guruvayoor Murali via Review Board


> On Oct. 23, 2018, 7:50 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
> > Line 49 (original), 52 (patched)
> > 
> >
> > i think volatile long is sufficient here and is probably cheaper. 
> > atomics might be expensive when done per row

I first used volatile, but I replaced it with AtomicLong because the rowNumber 
needs to be incremented and rowNumber++ on a volatile variable is not 
considered a safe operation. What do you think about that?


- Bharathkrishna


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/#review209935
---


On Oct. 20, 2018, 7:13 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69107/
> ---
> 
> (Updated Oct. 20, 2018, 7:13 p.m.)
> 
> 
> Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang 
> Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Improve record and memory usage logging in SparkRecordHandler
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
> 88dd12c05ade417aca4cdaece4448d31d4e1d65f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
>  8880bb604e088755dcfb0bcb39689702fab0cb77 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
> cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 
> 
> 
> Diff: https://reviews.apache.org/r/69107/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>



Re: Review Request 69107: HIVE-20512

2018-10-24 Thread Antal Sinkovits via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/#review209950
---




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Lines 56 (patched)


Any reason why we use Timer?

From the timer java docs: 

"Java 5.0 introduced the java.util.concurrent package and one of the 
concurrency utilities therein is the ScheduledThreadPoolExecutor which is a 
thread pool for repeatedly executing tasks at a given rate or delay. It is 
effectively a more versatile replacement for the Timer/TimerTask combination, 
as it allows multiple service threads, accepts various time units, and doesn't 
require subclassing TimerTask (just implement Runnable). Configuring 
ScheduledThreadPoolExecutor with one thread makes it equivalent to Timer."


- Antal Sinkovits


On okt. 20, 2018, 7:13 du, Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69107/
> ---
> 
> (Updated okt. 20, 2018, 7:13 du)
> 
> 
> Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang 
> Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Improve record and memory usage logging in SparkRecordHandler
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
> 88dd12c05ade417aca4cdaece4448d31d4e1d65f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
>  8880bb604e088755dcfb0bcb39689702fab0cb77 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
> cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 
> 
> 
> Diff: https://reviews.apache.org/r/69107/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>



Re: Review Request 69107: HIVE-20512

2018-10-23 Thread Sahil Takiar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/#review209935
---




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java
Line 171 (original), 170 (patched)


move to first line of method



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Line 49 (original), 52 (patched)


i think volatile long is sufficient here and is probably cheaper. atomics 
might be expensive when done per row



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Line 50 (original), 53 (patched)


this need to be volatile since it is modified by the timer task



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Line 52 (original), 55 (patched)


Lets set the max to 15 minutes



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Lines 104 (patched)


I think you can remove these debug statements. They don't look like they 
add much value.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Lines 105 (patched)


might make more sense to schedule the task at the end of the method



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Lines 117 (patched)


nit: remove this



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Lines 105-106 (original), 129-130 (patched)


this looks like the same code that is called in the `MemoryInfoLogger` can 
it be abstracted into its own method?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
Line 601 (original), 597 (patched)


move to top of method


- Sahil Takiar


On Oct. 20, 2018, 7:13 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69107/
> ---
> 
> (Updated Oct. 20, 2018, 7:13 p.m.)
> 
> 
> Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang 
> Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Improve record and memory usage logging in SparkRecordHandler
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
> 88dd12c05ade417aca4cdaece4448d31d4e1d65f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
>  8880bb604e088755dcfb0bcb39689702fab0cb77 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
> cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 
> 
> 
> Diff: https://reviews.apache.org/r/69107/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>



Review Request 69107: HIVE-20512

2018-10-20 Thread Bharathkrishna Guruvayoor Murali via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/
---

Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang Karajgaonkar.


Repository: hive-git


Description
---

Improve record and memory usage logging in SparkRecordHandler


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
88dd12c05ade417aca4cdaece4448d31d4e1d65f 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
 8880bb604e088755dcfb0bcb39689702fab0cb77 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 
20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 


Diff: https://reviews.apache.org/r/69107/diff/1/


Testing
---


Thanks,

Bharathkrishna Guruvayoor Murali