from:"Sahil Takiar"

Re: [ANNOUNCE] New committer: Bharathkrishna Guruvayoor Murali

2018-12-02 Thread Sahil Takiar

Congrats Bharath!

On Sun, Dec 2, 2018 at 11:14 AM Andrew Sherman
 wrote:

> Congratulations Bharath!
>
> On Sat, Dec 1, 2018 at 10:26 AM Ashutosh Chauhan 
> wrote:
>
> > Apache Hive's Project Management Committee (PMC) has invited
> > Bharathkrishna
> > Guruvayoor Murali to become a committer, and we are pleased to announce
> > that
> > he has accepted.
> >
> > Bharath, welcome, thank you for your contributions, and we look forward
> > your
> > further interactions with the community!
> >
> > Ashutosh Chauhan (on behalf of the Apache Hive PMC)
> >
>


-- 
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309

Re: Review Request 69107: HIVE-20512

2018-11-05 Thread Sahil Takiar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/#review210338
---




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Lines 135 (patched)
<https://reviews.apache.org/r/69107/#comment294997>

Should call `Thread.currentThread().interrupt();` after this - see 
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorService.html


- Sahil Takiar


On Oct. 31, 2018, 11:15 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69107/
> ---
> 
> (Updated Oct. 31, 2018, 11:15 p.m.)
> 
> 
> Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang 
> Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Improve record and memory usage logging in SparkRecordHandler
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
> 88dd12c05ade417aca4cdaece4448d31d4e1d65f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
>  8880bb604e088755dcfb0bcb39689702fab0cb77 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
> cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 
> 
> 
> Diff: https://reviews.apache.org/r/69107/diff/4/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>

[jira] [Created] (HIVE-20828) Upgrade to Spark 2.4.0

2018-10-29 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-20828:
---

 Summary: Upgrade to Spark 2.4.0
 Key: HIVE-20828
 URL: https://issues.apache.org/jira/browse/HIVE-20828
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Spark is in the process of releasing Spark 2.4.0. We should do something 
testing with the RC candidates and then upgrade once the release is finalized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20809) Parse Spark error blacklist errors

2018-10-25 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-20809:
---

 Summary: Parse Spark error blacklist errors
 Key: HIVE-20809
 URL: https://issues.apache.org/jira/browse/HIVE-20809
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar


Spark has an executor blacklist feature that throws errors similar to the 
following:

{code}
Aborting TaskSet 52.0 because task 0 (partition 0) cannot run anywhere due to 
node and executor blacklist.  Blacklisting behavior can be configured via 
spark.blacklist.*.
{code}

I think the message changed in Spark 2.4.0, but its similar to the one above.

It would be good to have some custom parsing logic and a custom {{ErroMsg}} for 
Spark blacklist errors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 69107: HIVE-20512

2018-10-25 Thread Sahil Takiar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/#review210041
---

ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Line 63 (original), 67 (patched)
<https://reviews.apache.org/r/69107/#comment294681>

intialize this above and mark it as final, since its accessed by the 
MemoryInfoLogger thread it needs to be thread safe.

use a custom `ThreadFactory` for the pool. You can use Guava's 
`ThreadFactoryBuilder` - the pool should use daemon threads, specify a name 
format that includes something like `MemoryAndRowLogger`, and a customer 
uncaught exception handler that should just log any exceptions that are caught

ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Lines 113 (patched)
<https://reviews.apache.org/r/69107/#comment294682>

instead of just calling `shutdownNow` you should call `shutdown` and then 
run `awaitTermination` with a wait time of say 30 seconds, and then call 
`shutdownNow`. This allows for orderly shutdown of the executor. All in 
progress tasks are allowed to complete.

this will also require handling the race condition where the 
`MemoryInfoLogger` is tries to schedule a task on a shutdown executor. You will 
probably have to use a a custom `RejectedExecutionHandler` - probably the 
`ThreadPoolExecutor.DiscardPolicy`

- Sahil Takiar

On Oct. 24, 2018, 8:55 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69107/
> ---
> 
> (Updated Oct. 24, 2018, 8:55 p.m.)
> 
> 
> Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang 
> Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Improve record and memory usage logging in SparkRecordHandler
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
> 88dd12c05ade417aca4cdaece4448d31d4e1d65f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
>  8880bb604e088755dcfb0bcb39689702fab0cb77 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
> cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 
> 
> 
> Diff: https://reviews.apache.org/r/69107/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>

Re: Review Request 69107: HIVE-20512

2018-10-25 Thread Sahil Takiar



> On Oct. 23, 2018, 7:50 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
> > Line 49 (original), 52 (patched)
> > <https://reviews.apache.org/r/69107/diff/1/?file=2101701#file2101701line52>
> >
> > i think volatile long is sufficient here and is probably cheaper. 
> > atomics might be expensive when done per row
> 
> Bharathkrishna Guruvayoor Murali wrote:
> I first used volatile, but I replaced it with AtomicLong because the 
> rowNumber needs to be incremented and rowNumber++ on a volatile variable is 
> not considered a safe operation. What do you think about that?

i think volatile should still be fine because there is no contention on the 
variable - e.g. it is only updated by a single thread at a time. as long we 
maintain that invariant we should be fine. would be good to add some javadocs 
saying that we only expect this variable to be updated by a single thread at a 
time.


> On Oct. 23, 2018, 7:50 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
> > Line 50 (original), 53 (patched)
> > <https://reviews.apache.org/r/69107/diff/1/?file=2101701#file2101701line53>
> >
> > this need to be volatile since it is modified by the timer task
> 
> Bharathkrishna Guruvayoor Murali wrote:
> This variable is also used as 
> logThresholdInterval = Math.min(maxLogThresholdInterval, 2 * 
> logThresholdInterval);
> 
> Non-atomic operation. So should I make this variable atomic as well?

same as above, i think volatile should be ok as long as a single thread access 
logThresholdInterval at a time.


- Sahil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/#review209935
---


On Oct. 24, 2018, 8:55 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69107/
> -------
> 
> (Updated Oct. 24, 2018, 8:55 p.m.)
> 
> 
> Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang 
> Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Improve record and memory usage logging in SparkRecordHandler
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
> 88dd12c05ade417aca4cdaece4448d31d4e1d65f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
>  8880bb604e088755dcfb0bcb39689702fab0cb77 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
> cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 
> 
> 
> Diff: https://reviews.apache.org/r/69107/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>

Re: Review Request 69107: HIVE-20512

2018-10-25 Thread Sahil Takiar



> On Oct. 24, 2018, 8:58 p.m., Bharathkrishna Guruvayoor Murali wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
> > Line 67 (original), 67 (patched)
> > <https://reviews.apache.org/r/69107/diff/1-2/?file=2101701#file2101701line67>
> >
> > Creating this as a threadPool of size 1. I guess that is fine, as we 
> > know only one thread will be used at any point?

yes the size should be 1


- Sahil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/#review209986
---


On Oct. 24, 2018, 8:55 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69107/
> ---
> 
> (Updated Oct. 24, 2018, 8:55 p.m.)
> 
> 
> Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang 
> Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Improve record and memory usage logging in SparkRecordHandler
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
> 88dd12c05ade417aca4cdaece4448d31d4e1d65f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
>  8880bb604e088755dcfb0bcb39689702fab0cb77 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
> cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 
> 
> 
> Diff: https://reviews.apache.org/r/69107/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>

[jira] [Created] (HIVE-20790) SparkSession should be able to close a session while it is being opened

2018-10-23 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-20790:
---

 Summary: SparkSession should be able to close a session while it 
is being opened
 Key: HIVE-20790
 URL: https://issues.apache.org/jira/browse/HIVE-20790
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar


In HIVE-14162 we adding locks to {{SparkSessionImpl}} to support scenarios 
where we want to close the session due to a timeout. However, the locks remove 
the ability to close a session while it is being opened. This is important to 
allow cancelling of a session while it is being setup. This can be useful on 
busy clusters where there may not be enough YARN containers to setup the Spark 
Remote Driver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 69107: HIVE-20512

2018-10-23 Thread Sahil Takiar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69107/#review209935
---




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java
Line 171 (original), 170 (patched)
<https://reviews.apache.org/r/69107/#comment294574>

move to first line of method



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Line 49 (original), 52 (patched)
<https://reviews.apache.org/r/69107/#comment294578>

i think volatile long is sufficient here and is probably cheaper. atomics 
might be expensive when done per row



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Line 50 (original), 53 (patched)
<https://reviews.apache.org/r/69107/#comment294580>

this need to be volatile since it is modified by the timer task



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Line 52 (original), 55 (patched)
<https://reviews.apache.org/r/69107/#comment294575>

Lets set the max to 15 minutes



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Lines 104 (patched)
<https://reviews.apache.org/r/69107/#comment294576>

I think you can remove these debug statements. They don't look like they 
add much value.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Lines 105 (patched)
<https://reviews.apache.org/r/69107/#comment294579>

might make more sense to schedule the task at the end of the method



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Lines 117 (patched)
<https://reviews.apache.org/r/69107/#comment294581>

nit: remove this



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java
Lines 105-106 (original), 129-130 (patched)
<https://reviews.apache.org/r/69107/#comment294582>

this looks like the same code that is called in the `MemoryInfoLogger` can 
it be abstracted into its own method?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
Line 601 (original), 597 (patched)
<https://reviews.apache.org/r/69107/#comment294577>

move to top of method


- Sahil Takiar


On Oct. 20, 2018, 7:13 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69107/
> ---
> 
> (Updated Oct. 20, 2018, 7:13 p.m.)
> 
> 
> Review request for hive, Antal Sinkovits, Sahil Takiar, and Vihang 
> Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Improve record and memory usage logging in SparkRecordHandler
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
> 88dd12c05ade417aca4cdaece4448d31d4e1d65f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMergeFileRecordHandler.java
>  8880bb604e088755dcfb0bcb39689702fab0cb77 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 
> cb5bd7ada2d5ad4f1f654cf80ddaf4504be5d035 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  20e7ea0f4e8d4ff79dddeaab0406fc7350d22bd7 
> 
> 
> Diff: https://reviews.apache.org/r/69107/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>

Re: Review Request 69022: HIVE-20737: Local SparkContext is shared between user sessions and should be closed only when there is no active

2018-10-18 Thread Sahil Takiar



> On Oct. 16, 2018, 1:47 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java
> > Line 73 (original), 73 (patched)
> > <https://reviews.apache.org/r/69022/diff/2/?file=2097990#file2097990line75>
> >
> > if we expect multiple sessions to access this, should we make this 
> > `volatile`?
> 
> denys kuzmenko wrote:
> it's being accesed only inside of the critical section (within the lock 
> boundaries)
> 
> Sahil Takiar wrote:
> does java guarantee that non-volatile variables accessed inside a 
> critical section are not cached locally by a CPU?
> 
> denys kuzmenko wrote:
> In short - yes.
> 
> JSR 133 (Java Memory Model)
> 
> Synchronization ensures that memory writes by a thread before or during a 
> synchronized block are made visible in a predictable manner to other threads 
> which synchronize on the same monitor. After we exit a synchronized block, we 
> release the monitor, which has the effect of flushing the cache to main 
> memory, so that writes made by this thread can be visible to other threads. 
> Before we can enter a synchronized block, we acquire the monitor, which has 
> the effect of invalidating the local processor cache so that variables will 
> be reloaded from main memory. We will then be able to see all of the writes 
> made visible by the previous release.

makes sense


> On Oct. 16, 2018, 1:47 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java
> > Line 125 (original), 120 (patched)
> > <https://reviews.apache.org/r/69022/diff/2/?file=2097991#file2097991line125>
> >
> > we might need to re-think how we are synchronizing this method a bit. I 
> > think we want to support the use case where we call `close()` while 
> > `open()` is being run. The offers a way for the user to cancel a session 
> > while it is being opened, which can be useful if opening a session takes a 
> > long time, which can happen on a busy cluster where there aren't enough 
> > resources to open a session.
> > 
> > fixing that might be out of the scope of this JIRA, so I would 
> > recommend using a separate lock to guard against multiple users calling 
> > open on the same session.
> 
> Sahil Takiar wrote:
> Tracking the aformentioned fix in HIVE-20519, unless you want to fix it 
> in this patch.
> 
> denys kuzmenko wrote:
> i think it should be addressed in another JIRA, right now we need to have 
> working at least basic use-case
> 
> Sahil Takiar wrote:
> okay, still recommend using a separate lock
> 
> denys kuzmenko wrote:
> open() and close() both manipulate with the shared variable (isOpen), so 
> they have to be synchronized on a same monitor (at least in current approach).
> I am not sure whether SparkContext supports instant interruption 
> (Thread.interrupt or sc.stop()). However when closing session that is in 
> progress, you have to take care of SparkContext.

yes, but we need to support calling `close()` while `open()` is being run, as I 
described in my first comment. the (2) bullet point in the RB description 
states that you want to guard against multiple callers invoking the `open()` 
method, so logically you should just have a single lock to handle this 
behavior. i suggest you use a separate lock for this scenario because the 
`closeLock` is meant to handle synchronization of the `close()` method, it was 
not meant to handle synchronization of the `open()` method by multiple callers. 
IMO by re-using the lock we make the code harder to understand because we are 
using the `closeLock` for functionality that should be outside its scope.

I don't feel strongly about this though given we will need to re-factor this 
code later anyway to support calling `close()` while `open()` is running, so 
I'll leave it up to you.


> On Oct. 16, 2018, 1:47 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java
> > Line 352 (original)
> > <https://reviews.apache.org/r/69022/diff/2/?file=2097991#file2097991line361>
> >
> > why remove this?
> 
> denys kuzmenko wrote:
> it's not required. close() method is covered with the lock, and 
> activeJobs is a concurrent collection
> 
> Sahil Takiar wrote:
> what happens if a job is submitted after `hasTimedOut` returns true?
> 
> denys kuzmenko wrote:
> I see. However existing lock won't help as it doesn't prevent other 
> threads from adding new queries. 
> 
> public void onQuerySubmission(String queryId) {
> activeJobs.add(queryId);
> }
> 
> we might

Re: Review Request 68474: HIVE-20440

2018-10-16 Thread Sahil Takiar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review209628
---



Could we add some more E2E integration tests? I'm thinking they could at the 
granularity of a `MapJoinOperator`? For example, confirm that starting a new 
query actually evicts everything from the cache? We want to make sure we aren't 
accidentally leaking small tables.

- Sahil Takiar


On Oct. 10, 2018, 1:20 p.m., Antal Sinkovits wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> ---
> 
> (Updated Oct. 10, 2018, 1:20 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu 
> Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the 
> intern-ed string of the path.
> 
> 
> Diffs
> -
> 
>   ql/pom.xml d73deba440702ec39fc5610df28e0fe54baef025 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 
> da1dd426c9155290e30fd1e3ae7f19a5479a8967 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java
>  9e65fd98d6e4451421641b1429ccf334fe9a9586 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java
>  54377428eafdb79e1bbdc8a182eafb46f8febd23 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java
>  0e4b8df036724bd83e85fc3cc70f534272dab4c4 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java
>  74e0b120ea3560a6a2a0074e6c0026b4874b3d5e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
>  24b8fea33815867ce544fd284437c4d02a21f1a3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 
> cf27e92bafdc63096ec0fa8c3106657bab52f370 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 
> 3293100af96dc60408c53065fa89143ead98f818 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java
>  e8dcbf18cb09b190536f920a53d6e9fa870ce33b 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68474/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Antal Sinkovits
> 
>

Re: Review Request 68474: HIVE-20440

2018-10-16 Thread Sahil Takiar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review209626
---




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 117 (patched)
<https://reviews.apache.org/r/68474/#comment294162>

nit: if you want to leave the `@return` section empty, then just remove it 
entirely



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 127 (patched)
<https://reviews.apache.org/r/68474/#comment294163>

nit: same as above



ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java
Lines 178-190 (patched)
<https://reviews.apache.org/r/68474/#comment294161>

what about changing this to something like `getKey()` and just returning a 
`String`. I don't think the interface needs to be tied to reading data to a 
folder on HDFS.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 131 (patched)
<https://reviews.apache.org/r/68474/#comment294165>

why do we run the action just for the l2 cache?


- Sahil Takiar


On Oct. 10, 2018, 1:20 p.m., Antal Sinkovits wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> ---
> 
> (Updated Oct. 10, 2018, 1:20 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu 
> Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the 
> intern-ed string of the path.
> 
> 
> Diffs
> -
> 
>   ql/pom.xml d73deba440702ec39fc5610df28e0fe54baef025 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 
> da1dd426c9155290e30fd1e3ae7f19a5479a8967 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java
>  9e65fd98d6e4451421641b1429ccf334fe9a9586 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java
>  54377428eafdb79e1bbdc8a182eafb46f8febd23 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java
>  0e4b8df036724bd83e85fc3cc70f534272dab4c4 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java
>  74e0b120ea3560a6a2a0074e6c0026b4874b3d5e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
>  24b8fea33815867ce544fd284437c4d02a21f1a3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 
> cf27e92bafdc63096ec0fa8c3106657bab52f370 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 
> 3293100af96dc60408c53065fa89143ead98f818 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java
>  e8dcbf18cb09b190536f920a53d6e9fa870ce33b 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68474/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Antal Sinkovits
> 
>

Re: Review Request 69022: HIVE-20737: Local SparkContext is shared between user sessions and should be closed only when there is no active

2018-10-16 Thread Sahil Takiar



> On Oct. 16, 2018, 1:47 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java
> > Line 73 (original), 73 (patched)
> > <https://reviews.apache.org/r/69022/diff/2/?file=2097990#file2097990line75>
> >
> > if we expect multiple sessions to access this, should we make this 
> > `volatile`?
> 
> denys kuzmenko wrote:
> it's being accesed only inside of the critical section (within the lock 
> boundaries)

does java guarantee that non-volatile variables accessed inside a critical 
section are not cached locally by a CPU?


> On Oct. 16, 2018, 1:47 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java
> > Lines 112-116 (original)
> > <https://reviews.apache.org/r/69022/diff/2/?file=2097991#file2097991line112>
> >
> > do we have unit tests that cover this?
> 
> denys kuzmenko wrote:
> queryCompleted and (lastSparkJobCompletionTime = 0) are complementary 
> conditions that are checked and set at the same place
> 
> denys kuzmenko wrote:
> queryCompleted and (lastSparkJobCompletionTime > 0)
> 
> denys kuzmenko wrote:
> we do have bunch of tests (TestSparkSession*, TestLocalSparkClient) that 
> are covering this

i don't think we have a test that explicitly checks what happens when a timeout 
is triggered before the first HoS query is run, but i think i added some in 
HIVE-20519 already anyway


> On Oct. 16, 2018, 1:47 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java
> > Line 125 (original), 120 (patched)
> > <https://reviews.apache.org/r/69022/diff/2/?file=2097991#file2097991line125>
> >
> > we might need to re-think how we are synchronizing this method a bit. I 
> > think we want to support the use case where we call `close()` while 
> > `open()` is being run. The offers a way for the user to cancel a session 
> > while it is being opened, which can be useful if opening a session takes a 
> > long time, which can happen on a busy cluster where there aren't enough 
> > resources to open a session.
> > 
> > fixing that might be out of the scope of this JIRA, so I would 
> > recommend using a separate lock to guard against multiple users calling 
> > open on the same session.
> 
> Sahil Takiar wrote:
> Tracking the aformentioned fix in HIVE-20519, unless you want to fix it 
> in this patch.
> 
> denys kuzmenko wrote:
> i think it should be addressed in another JIRA, right now we need to have 
> working at least basic use-case

okay, still recommend using a separate lock


> On Oct. 16, 2018, 1:47 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java
> > Line 352 (original)
> > <https://reviews.apache.org/r/69022/diff/2/?file=2097991#file2097991line361>
> >
> > why remove this?
> 
> denys kuzmenko wrote:
> it's not required. close() method is covered with the lock, and 
> activeJobs is a concurrent collection

what happens if a job is submitted after `hasTimedOut` returns true?


- Sahil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69022/#review209617
---


On Oct. 15, 2018, 7:21 p.m., denys kuzmenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69022/
> ---
> 
> (Updated Oct. 15, 2018, 7:21 p.m.)
> 
> 
> Review request for hive, Sahil Takiar and Adam Szita.
> 
> 
> Bugs: HIVE-20737
> https://issues.apache.org/jira/browse/HIVE-20737
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> 1. Local SparkContext is shared between user sessions and should be closed 
> only when there is no active. 
> 2. Possible race condition in SparkSession.open() in case when user queries 
> run in parallel within the same session.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 
> 72ff53e3bd 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java
>  bb50129518 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestLocalHiveSparkClient.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/69022/diff/2/
> 
> 
> Testing
> ---
> 
> Added TestLocalHiveSparkClient test
> 
> 
> File Attachments
> 
> 
> HIVE-20737.7.patch
>   
> https://reviews.apache.org/media/uploaded/files/2018/10/15/9cf8a2b3-9ec1-4316-81d0-3cd124b1a9fd__HIVE-20737.7.patch
> 
> 
> Thanks,
> 
> denys kuzmenko
> 
>

Re: Review Request 69022: HIVE-20737: Local SparkContext is shared between user sessions and should be closed only when there is no active

2018-10-16 Thread Sahil Takiar



> On Oct. 16, 2018, 1:47 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java
> > Line 125 (original), 120 (patched)
> > <https://reviews.apache.org/r/69022/diff/2/?file=2097991#file2097991line125>
> >
> > we might need to re-think how we are synchronizing this method a bit. I 
> > think we want to support the use case where we call `close()` while 
> > `open()` is being run. The offers a way for the user to cancel a session 
> > while it is being opened, which can be useful if opening a session takes a 
> > long time, which can happen on a busy cluster where there aren't enough 
> > resources to open a session.
> > 
> > fixing that might be out of the scope of this JIRA, so I would 
> > recommend using a separate lock to guard against multiple users calling 
> > open on the same session.

Tracking the aformentioned fix in HIVE-20519, unless you want to fix it in this 
patch.


- Sahil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69022/#review209617
---


On Oct. 15, 2018, 7:21 p.m., denys kuzmenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69022/
> -------
> 
> (Updated Oct. 15, 2018, 7:21 p.m.)
> 
> 
> Review request for hive, Sahil Takiar and Adam Szita.
> 
> 
> Bugs: HIVE-20737
> https://issues.apache.org/jira/browse/HIVE-20737
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> 1. Local SparkContext is shared between user sessions and should be closed 
> only when there is no active. 
> 2. Possible race condition in SparkSession.open() in case when user queries 
> run in parallel within the same session.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 
> 72ff53e3bd 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java
>  bb50129518 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestLocalHiveSparkClient.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/69022/diff/2/
> 
> 
> Testing
> ---
> 
> Added TestLocalHiveSparkClient test
> 
> 
> File Attachments
> 
> 
> HIVE-20737.7.patch
>   
> https://reviews.apache.org/media/uploaded/files/2018/10/15/9cf8a2b3-9ec1-4316-81d0-3cd124b1a9fd__HIVE-20737.7.patch
> 
> 
> Thanks,
> 
> denys kuzmenko
> 
>

Re: Review Request 69022: HIVE-20737: Local SparkContext is shared between user sessions and should be closed only when there is no active

2018-10-16 Thread Sahil Takiar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69022/#review209617
---




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java
Line 73 (original), 73 (patched)
<https://reviews.apache.org/r/69022/#comment294142>

if we expect multiple sessions to access this, should we make this 
`volatile`?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java
Lines 74 (patched)
<https://reviews.apache.org/r/69022/#comment294141>

should probably make this `volatile` in case multiple threads try to get an 
instance



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java
Lines 112-116 (original)
<https://reviews.apache.org/r/69022/#comment294146>

do we have unit tests that cover this?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java
Line 125 (original), 120 (patched)
<https://reviews.apache.org/r/69022/#comment294143>

we might need to re-think how we are synchronizing this method a bit. I 
think we want to support the use case where we call `close()` while `open()` is 
being run. The offers a way for the user to cancel a session while it is being 
opened, which can be useful if opening a session takes a long time, which can 
happen on a busy cluster where there aren't enough resources to open a session.

fixing that might be out of the scope of this JIRA, so I would recommend 
using a separate lock to guard against multiple users calling open on the same 
session.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java
Line 352 (original)
<https://reviews.apache.org/r/69022/#comment294145>

    why remove this?


- Sahil Takiar


On Oct. 15, 2018, 7:21 p.m., denys kuzmenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69022/
> ---
> 
> (Updated Oct. 15, 2018, 7:21 p.m.)
> 
> 
> Review request for hive, Sahil Takiar and Adam Szita.
> 
> 
> Bugs: HIVE-20737
> https://issues.apache.org/jira/browse/HIVE-20737
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> 1. Local SparkContext is shared between user sessions and should be closed 
> only when there is no active. 
> 2. Possible race condition in SparkSession.open() in case when user queries 
> run in parallel within the same session.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 
> 72ff53e3bd 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/session/SparkSessionImpl.java
>  bb50129518 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestLocalHiveSparkClient.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/69022/diff/2/
> 
> 
> Testing
> ---
> 
> Added TestLocalHiveSparkClient test
> 
> 
> File Attachments
> 
> 
> HIVE-20737.7.patch
>   
> https://reviews.apache.org/media/uploaded/files/2018/10/15/9cf8a2b3-9ec1-4316-81d0-3cd124b1a9fd__HIVE-20737.7.patch
> 
> 
> Thanks,
> 
> denys kuzmenko
> 
>

Re: Review Request 68474: HIVE-20440: Create better cache eviction policy for SmallTableCache

2018-10-01 Thread Sahil Takiar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review209130
---




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Line 60 (original), 69 (patched)
<https://reviews.apache.org/r/68474/#comment293420>

keep the explicit cache method and call it in `MapJoinOperator#closeOp`. 
This way when a task finishes, we still keep the small table around for at 
least 30 seconds, which gives any tasks scheduled in the future a chance to 
re-use the small table.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 75 (patched)
<https://reviews.apache.org/r/68474/#comment293419>

can u add some javadocs to this class explaining what it is doing



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 82 (patched)
<https://reviews.apache.org/r/68474/#comment293416>

rename to something like `cleanupService`



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 90 (patched)
<https://reviews.apache.org/r/68474/#comment293417>

nit: make `INTEGER_ONE` a static import



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 91 (patched)
<https://reviews.apache.org/r/68474/#comment293415>

"SmallTableCache maintenance thread" -> "SmallTableCache Cleanup Thread"



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 117 (patched)
<https://reviews.apache.org/r/68474/#comment293418>

replace with `cacheL1.get(key, valueLoader)` where `valueLoader` loads from 
`cacheL2`


- Sahil Takiar


On Sept. 19, 2018, 11:14 p.m., Antal Sinkovits wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> ---
> 
> (Updated Sept. 19, 2018, 11:14 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu 
> Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the 
> intern-ed string of the path.
> 
> 
> Diffs
> -
> 
>   ql/pom.xml d73deba440702ec39fc5610df28e0fe54baef025 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 
> cf27e92bafdc63096ec0fa8c3106657bab52f370 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 
> 3293100af96dc60408c53065fa89143ead98f818 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68474/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Antal Sinkovits
> 
>

[jira] [Created] (HIVE-20519) Remove 30m min value for hive.spark.session.timeout

2018-09-07 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-20519:
---

 Summary: Remove 30m min value for hive.spark.session.timeout
 Key: HIVE-20519
 URL: https://issues.apache.org/jira/browse/HIVE-20519
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar
Assignee: Sahil Takiar


In HIVE-14162 we added the config \{{hive.spark.session.timeout}} which 
provided a way to time out Spark sessions that are active for a long period of 
time. The config has a lower bound of 30m which we should remove. It should be 
possible for users to configure this value so the HoS session is closed as soon 
as the query is complete.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-09-06 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-20512:
---

 Summary: Improve record and memory usage logging in 
SparkRecordHandler
 Key: HIVE-20512
 URL: https://issues.apache.org/jira/browse/HIVE-20512
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar


We currently log memory usage and # of records processed in Spark tasks, but we 
should improve the methodology for how frequently we log this info. Currently 
we use the following code:

{code:java}
private long getNextLogThreshold(long currentThreshold) {
// A very simple counter to keep track of number of rows processed by the
// reducer. It dumps
// every 1 million times, and quickly before that
if (currentThreshold >= 100) {
  return currentThreshold + 100;
}
return 10 * currentThreshold;
  }
{code}

The issue is that after a while, the increase by 10x factor means that you have 
to process a huge # of records before this gets triggered.

A better approach would be to log this info at a given interval. This would 
help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [ANNOUNCE] New committer: Andrew Sherman

2018-09-04 Thread Sahil Takiar

Congrats Andrew!

On Tue, Sep 4, 2018 at 12:02 AM Antal Sinkovits
 wrote:

> Congratulations Andrew!
>
> Deepak Jaiswal  ezt írta (időpont: 2018. szept.
> 4., K 7:34):
>
> > Congratulation Andrew.
> >
> > Deepak
> >
> > On 9/3/18, 10:17 PM, "Zoltan Haindrich"  wrote:
> >
> > Congratulations Andrew!
> >
> > On 2 September 2018 04:49:00 CEST, Lefty Leverenz <
> > leftylever...@gmail.com> wrote:
> > >Congratulations Andrew!
> > >
> > >-- Lefty
> > >
> > >
> > >On Tue, Aug 28, 2018 at 11:36 AM Ashutosh Chauhan
> > >
> > >wrote:
> > >
> > >> Apache Hive's Project Management Committee (PMC) has invited
> Andrew
> > >Sherman
> > >> to become a committer, and we are pleased to announce that he has
> > >accepted.
> > >>
> > >> Andrew, welcome, thank you for your contributions, and we look
> > >forward to
> > >> your
> > >> further interactions with the community!
> > >>
> > >> Ashutosh Chauhan (on behalf of the Apache Hive PMC)
> > >>
> >
> >
> >
>


-- 
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309

[jira] [Created] (HIVE-20495) Set hive.spark.dynamic.partition.pruning.map.join.only to true by default

2018-08-31 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-20495:
---

 Summary: Set hive.spark.dynamic.partition.pruning.map.join.only to 
true by default
 Key: HIVE-20495
 URL: https://issues.apache.org/jira/browse/HIVE-20495
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20488) SparkSubmitSparkClient#launchDriver should parse exceptions, not just errors

2018-08-30 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-20488:
---

 Summary: SparkSubmitSparkClient#launchDriver should parse 
exceptions, not just errors
 Key: HIVE-20488
 URL: https://issues.apache.org/jira/browse/HIVE-20488
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar


In {{SparkSubmitSparkClient#launchDriver}} we parse the stdout / stderr of 
{{bin/spark-submit}} for strings that contain "Error", but we should also look 
for "Exception".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20280) JobResultSerializer uses wrong registration id in KyroMessageCodec

2018-07-31 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-20280:
---

 Summary: JobResultSerializer uses wrong registration id in 
KyroMessageCodec
 Key: HIVE-20280
 URL: https://issues.apache.org/jira/browse/HIVE-20280
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Inside {{KryoMessageCodec}} the code:

{code}
  Kryo kryo = new Kryo();
  int count = 0;
  for (Class klass : messages) {
kryo.register(klass, REG_ID_BASE + count);
count++;
  }
  kryo.register(BaseProtocol.JobResult.class, new JobResultSerializer(), 
count);
{code}

Uses the wrong registration id for the {{JobResultSerializer}} it should be 
{{REG_ID_BASE + count}} not {{count}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20273) Spark jobs aren't cancelled if getSparkJobInfo or getSparkStagesInfo

2018-07-30 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-20273:
---

 Summary: Spark jobs aren't cancelled if getSparkJobInfo or 
getSparkStagesInfo
 Key: HIVE-20273
 URL: https://issues.apache.org/jira/browse/HIVE-20273
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar
Assignee: Sahil Takiar


HIVE-19053 and HIVE-19733 add handling of {{InterruptedException}} to 
{{#getSparkJobInfo}} and {{#getSparkStagesInfo}} in {{RemoteSparkJobStatus}}, 
but that means the {{InterruptedException}} is wrapped in a {{HiveException}} 
and then thrown. The {{HiveException}} is then cause in 
{{RemoteSparkJobMonitor}} and then wrapped in another Hive exception. The 
double nesting of hive exception causes the logic in 
{{SparkTask#setSparkException}} to break, and it doesn't kill the job if an 
interrupted exception is thrown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20271) Improve HoS query cancellation handling

2018-07-30 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-20271:
---

 Summary: Improve HoS query cancellation handling
 Key: HIVE-20271
 URL: https://issues.apache.org/jira/browse/HIVE-20271
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Uber-JIRA for improving HoS query cancellation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20270) Don't serialize hashCode for groupByKey

2018-07-30 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-20270:
---

 Summary: Don't serialize hashCode for groupByKey
 Key: HIVE-20270
 URL: https://issues.apache.org/jira/browse/HIVE-20270
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Similar to HIVE-20032, but for {{groupByKey}}. The tricky part with 
{{groupByKey}} is we need to preserve the {{hashCode}} until the key gets 
partitioned (via the {{HashPartitioner}}) but after that we don't really need 
to preserve the {{hashCode}}. The {{groupByKey}} operator in Spark does require 
a {{hashCode}} since it puts everything in a map, but it can use a different 
hash-code than the one specified in {{HiveKey}}. The hashcode in {{HiveKey}} is 
only important for determining the partition the key should be assigned to.

The drawback is that computing the hashcode for each {{HiveKey}} might require 
more CPU resources, but we should profile it just in case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [ANNOUNCE] New PMC Member : Sahil Takiar

2018-07-30 Thread Sahil Takiar

Thanks!

On Mon, Jul 30, 2018 at 2:44 AM, Peter Vary 
wrote:

> Congratulations Sahil!
>
> > On Jul 29, 2018, at 22:32, Vineet Garg  wrote:
> >
> > Congratulations Sahil!
> >
> >> On Jul 26, 2018, at 11:28 AM, Ashutosh Chauhan 
> wrote:
> >>
> >> On behalf of the Hive PMC I am delighted to announce Sahil Takiar is
> >> joining Hive PMC.
> >> Thanks Sahil for all your contributions till now. Looking forward to
> many
> >> more.
> >>
> >> Welcome, Sahil!
> >>
> >> Thanks,
> >> Ashutosh
> >
>
>


-- 
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309

[jira] [Created] (HIVE-20243) Ptests spend an excessive amount of time in GC

2018-07-25 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-20243:
---

 Summary: Ptests spend an excessive amount of time in GC
 Key: HIVE-20243
 URL: https://issues.apache.org/jira/browse/HIVE-20243
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Reporter: Sahil Takiar


While working on HIVE-17684, we found that ptests spend an excessive amount of 
time in GC. We should try to find a way to fix this, right now we are wasting 
resources will excessive GC pauses.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20211) DPP call to remove PartitionDescs from aliasToPartnInfo doesn't do anything

2018-07-19 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-20211:
---

 Summary: DPP call to remove PartitionDescs from aliasToPartnInfo 
doesn't do anything
 Key: HIVE-20211
 URL: https://issues.apache.org/jira/browse/HIVE-20211
 Project: Hive
  Issue Type: Sub-task
Reporter: Sahil Takiar


Noticed this while working on HIVE-20056, the {{SparkPartitionPruner}} calls 
{{work.getPartitionDescs().remove(desc)}} but {{work.getPartitionDescs}} 
returns a copy of the underlying {{aliasToPartnInfo}} object. So this code 
doesn't actually do anything.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20141) Turn hive.spark.use.groupby.shuffle off by default

2018-07-11 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-20141:
---

 Summary: Turn hive.spark.use.groupby.shuffle off by default
 Key: HIVE-20141
 URL: https://issues.apache.org/jira/browse/HIVE-20141
 Project: Hive
  Issue Type: Task
  Components: Spark
Reporter: Sahil Takiar
Assignee: Sahil Takiar


[~xuefuz] any thoughts on this? I think it would provide better out of the box 
behavior for Hive-on-Spark users, especially for users who are migrating from 
Hive-on-MR to HoS. Wondering what your experience with this config has been?

I've done a bunch of performance profiling with this config turned on vs. off, 
and for TPC-DS queries it doesn't make a significant difference. The main 
difference I can see is that when a Spark stage has to spill to disk, 
{{repartitionAndSortWithinPartitions}} spills more data to disk than 
{{groupByKey}} - my guess is that this happens because {{groupByKey}} stores 
everything in Spark's {{ExternalAppendOnlyMap}} (which only stores a single 
copy of the key for potentially multiple values) whereas 
{{repartitionAndSortWithinPartitions}} uses Spark's {{ExternalSorter}} which 
sorts all the K, V pairs (and thus doesn't de-duplicate keys, which results in 
more data being spilled to disk).

My understanding is that using {{repartitionAndSortWithinPartitions}} for Hive 
GROUP BYs is similar to what Hive-on-MR does. So disabling this config would 
provide a similar experience to HoMR. Furthermore, last I checked, 
{{groupByKey}} still can't spill within a row group.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20134) Improve logging when HoS Driver is killed due to exceeding memory limits

2018-07-10 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-20134:
---

 Summary: Improve logging when HoS Driver is killed due to 
exceeding memory limits
 Key: HIVE-20134
 URL: https://issues.apache.org/jira/browse/HIVE-20134
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar


This was improved in HIVE-18093, but more can be done. If a HoS Driver gets 
killed because it exceeds its memory limits, YARN will issue a SIGTERM on the 
process. The SIGTERM will cause the shutdown hook in the HoS Driver to be 
triggered. This causes the Driver to kill all running jobs, even if they are 
running. The user ends up seeing an error like the one below. Which isn't very 
informative. We should propagate the error from the Driver shutdown hook to the 
user.
{code:java}
INFO : 2018-07-09 17:48:42,580 Stage-64_0: 526/526 Finished Stage-65_0: 
1405/1405 Finished Stage-66_0: 0(+759)/1102 Stage-67_0: 0/1099 Stage-68_0: 
0/1099 Stage-69_0: 0/1
INFO : 2018-07-09 17:48:44,589 Stage-64_0: 526/526 Finished Stage-65_0: 
1405/1405 Finished Stage-66_0: 1(+759)/1102 Stage-67_0: 0/1099 Stage-68_0: 
0/1099 Stage-69_0: 0/1
INFO : 2018-07-09 17:48:45,591 Stage-64_0: 526/526 Finished Stage-65_0: 
1405/1405 Finished Stage-66_0: 2(+759)/1102 Stage-67_0: 0/1099 Stage-68_0: 
0/1099 Stage-69_0: 0/1
INFO : 2018-07-09 17:48:48,596 Stage-64_0: 526/526 Finished Stage-65_0: 
1405/1405 Finished Stage-66_0: 2(+759)/1102 Stage-67_0: 0/1099 Stage-68_0: 
0/1099 Stage-69_0: 0/1
ERROR : Spark job[23] failed
java.lang.InterruptedException: null
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
 ~[?:1.8.0_141]
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
 ~[?:1.8.0_141]
at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202) 
~[scala-library-2.11.8.jar:?]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) 
~[scala-library-2.11.8.jar:?]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153) 
~[scala-library-2.11.8.jar:?]
at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:125) 
~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:114) 
~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:222) 
~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:264) 
~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:277) 
~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:391)
 ~[hive-exec-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
at 
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:352)
 ~[hive-exec-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_141]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_141]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_141]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
ERROR : FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask. null
INFO : Completed executing 
command(queryId=hive_20180709174140_0f64ee17-f793-441a-9a77-3ee0cd0a9c32); Time 
taken: 249.727 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.spark.SparkTask. null 
(state=08S01,code=1){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20125) Typo in MetricsCollection for OutputMetrics

2018-07-09 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-20125:
---

 Summary: Typo in MetricsCollection for OutputMetrics
 Key: HIVE-20125
 URL: https://issues.apache.org/jira/browse/HIVE-20125
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar


When creating {{OutputMetrics}} in the {{aggregate}} method we check for 
{{hasInputMetrics}} instead of {{hasOutputMetrics}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20124) Re-add HIVE-19787: Log message when spark-submit has completed

2018-07-09 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-20124:
---

 Summary: Re-add HIVE-19787: Log message when spark-submit has 
completed
 Key: HIVE-20124
 URL: https://issues.apache.org/jira/browse/HIVE-20124
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Think this accidentally go reverted when re-basing HIVE-18916



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20108) Investigate alternatives to groupByKey

2018-07-06 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-20108:
---

 Summary: Investigate alternatives to groupByKey
 Key: HIVE-20108
 URL: https://issues.apache.org/jira/browse/HIVE-20108
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Sahil Takiar
Assignee: Sahil Takiar


We use {{groupByKey}} for aggregations (or if 
{{hive.spark.use.groupby.shuffle}} is false we use 
{{repartitionAndSortWithinPartitions}}).

{{groupByKey}} has its drawbacks because it can't spill records within a single 
key group. It also seems to be doing some unnecessary work in Spark's 
{{Aggregator}} (not positive about this part).

{{repartitionAndSortWithinPartitions}} is better, but the sorting within 
partitions isn't necessary for aggregations.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 67815: HIVE-19733 : RemoteSparkJobStatus#getSparkStageProgress inefficient implementation

2018-07-06 Thread Sahil Takiar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67815/#review205807
---




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
Line 194 (original), 197 (patched)
<https://reviews.apache.org/r/67815/#comment288716>

I don't think this makes sense given than the exceptions thrown in this 
method are specific to run a `GetJobInfoJob`.


- Sahil Takiar


On July 5, 2018, 7:34 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67815/
> ---
> 
> (Updated July 5, 2018, 7:34 p.m.)
> 
> 
> Review request for hive, Peter Vary and Sahil Takiar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Adding GetSparkStagesInfoJob which gets the jobInfo and stageInfos in a 
> single job
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
>  832832b325d7eb0e43fb34a4306d7ffa43ceaa78 
> 
> 
> Diff: https://reviews.apache.org/r/67815/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>

Re: Review Request 67815: HIVE-19733 : RemoteSparkJobStatus#getSparkStageProgress inefficient implementation

2018-07-03 Thread Sahil Takiar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67815/#review205681
---




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
Line 215 (original), 218 (patched)
<https://reviews.apache.org/r/67815/#comment288580>

Can this be deleted? Same with `GetStageInfoJob`



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
Lines 228 (patched)
<https://reviews.apache.org/r/67815/#comment288582>

Looks very similar to `getSparkJobInfo` should be re-factored



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
Lines 256 (patched)
<https://reviews.apache.org/r/67815/#comment288581>

This looks very similar to `GetJobInfoJob`, can the two be re-factored to 
share common code?


- Sahil Takiar


On July 3, 2018, 6:32 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67815/
> ---
> 
> (Updated July 3, 2018, 6:32 p.m.)
> 
> 
> Review request for hive, Peter Vary and Sahil Takiar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Adding GetSparkStagesInfoJob which gets the jobInfo and stageInfos in a 
> single job
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
>  832832b325d7eb0e43fb34a4306d7ffa43ceaa78 
> 
> 
> Diff: https://reviews.apache.org/r/67815/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>

[jira] [Created] (HIVE-20056) SparkPartitionPruner shouldn't be triggered by Spark tasks

2018-07-02 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-20056:
---

 Summary: SparkPartitionPruner shouldn't be triggered by Spark tasks
 Key: HIVE-20056
 URL: https://issues.apache.org/jira/browse/HIVE-20056
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar
Assignee: Sahil Takiar


It looks like {{SparkDynamicPartitionPruner}} is being called by every Spark 
task because it gets created whenever {{getRecordReader}} is called on the 
associated {{InputFormat}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20054) Propagate ExecutionExceptions from the driver thread to the client

2018-07-02 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-20054:
---

 Summary: Propagate ExecutionExceptions from the driver thread to 
the client
 Key: HIVE-20054
 URL: https://issues.apache.org/jira/browse/HIVE-20054
 Project: Hive
  Issue Type: Sub-task
Reporter: Sahil Takiar


In {{AbstractSparkClient}} we have the following code:

{code:java}
   try {
  driverFuture.get();
} catch (InterruptedException ie) {
  // Give up.
  LOG.warn("Interrupted before driver thread was finished.", ie);
} catch (ExecutionException ee) {
  LOG.error("Driver thread failed", ee);
}
{code}

If the driver {{Future}} throws an {{ExecutionException}} the error is simply 
logged, and a {{RuntimeException}} is thrown with the generic message "Error 
while waiting for Remote Spark Driver to connect back to HiveServer2."

We should propagate the {{ExecutionException}} to the client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 67636: HIVE-19176 : Add HoS support to progress bar on Beeline client.

2018-07-02 Thread Sahil Takiar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67636/#review205643
---


Ship it!




Ship It!

- Sahil Takiar


On June 22, 2018, 8:01 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67636/
> ---
> 
> (Updated June 22, 2018, 8:01 p.m.)
> 
> 
> Review request for hive, Peter Vary, Sahil Takiar, and Vihang Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This logic is similar to the RenderStrategy used in Tez to print the progress 
> bar.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> e7f5fc0c6a3d527671c354db8ef2c9772aab6dd0 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 
> ad8d1a7f1cca3a763bb7c07335998ab7d39d7598 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/LocalSparkJobMonitor.java
>  2a6c33bfd4824c96e7004cd1ecce48c62c97d685 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RemoteSparkJobMonitor.java
>  004b50ba95934280cf302055a46a5d984b421e07 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RenderStrategy.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java 
> 3531ac25a9959aacd5766a9a42316890c68a1cd5 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSparkTask.java 
> 368fa9f1fa3ccc8b78cc4f9e98acf352cbc1c4c3 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/spark/status/TestSparkJobMonitor.java
>  e66354f0869738bd3cf0eb831c13fa6af1eda256 
>   service/src/java/org/apache/hive/service/ServiceUtils.java 
> 226e43244df10c22143b91f92ef312e56739d036 
>   
> service/src/java/org/apache/hive/service/cli/SparkProgressMonitorStatusMapper.java
>  PRE-CREATION 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> 68fe8d8aa143fafbfc611253ce3a12065016a537 
> 
> 
> Diff: https://reviews.apache.org/r/67636/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>

[jira] [Created] (HIVE-20032) Don

2018-06-28 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-20032:
---

 Summary: Don
 Key: HIVE-20032
 URL: https://issues.apache.org/jira/browse/HIVE-20032
 Project: Hive
  Issue Type: Improvement
Reporter: Sahil Takiar






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 67636: HIVE-19176 : Add HoS support to progress bar on Beeline client.

2018-06-21 Thread Sahil Takiar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67636/#review205203
---




ql/src/test/org/apache/hadoop/hive/ql/exec/spark/status/TestSparkJobMonitor.java
Lines 96 (patched)
<https://reviews.apache.org/r/67636/#comment288136>

why does testOutput contain both formats?

also there is a unnecessary pair of "()" here



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RenderStrategy.java
Lines 54 (patched)
<https://reviews.apache.org/r/67636/#comment288131>

since this is a constant, can you make it static. FindBugs is complaining 
about it: "Unread field:field be static? At RenderStrategy.java:[line 47]"



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RenderStrategy.java
Lines 152-154 (patched)
<https://reviews.apache.org/r/67636/#comment288133>

use `progressMap.entrySet` for this instead. FindBugs is complaining about 
it 
"org.apache.hadoop.hive.ql.exec.spark.status.RenderStrategy$BaseUpdateFunction.isSameAsPreviousProgress(Map,
 Map) makes inefficient use of keySet iterator instead of entrySet iterator At 
RenderStrategy.java:of keySet iterator instead of entrySet iterator At 
RenderStrategy.java:[line 147]"



ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSparkTask.java
Lines 108 (patched)
<https://reviews.apache.org/r/67636/#comment288134>

why is this necessary?


- Sahil Takiar


On June 21, 2018, 3:50 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67636/
> ---
> 
> (Updated June 21, 2018, 3:50 p.m.)
> 
> 
> Review request for hive, Peter Vary, Sahil Takiar, and Vihang Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This logic is similar to the RenderStrategy used in Tez to print the progress 
> bar.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> e7f5fc0c6a3d527671c354db8ef2c9772aab6dd0 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 
> ad8d1a7f1cca3a763bb7c07335998ab7d39d7598 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/LocalSparkJobMonitor.java
>  2a6c33bfd4824c96e7004cd1ecce48c62c97d685 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RemoteSparkJobMonitor.java
>  004b50ba95934280cf302055a46a5d984b421e07 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RenderStrategy.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java 
> 3531ac25a9959aacd5766a9a42316890c68a1cd5 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSparkTask.java 
> 368fa9f1fa3ccc8b78cc4f9e98acf352cbc1c4c3 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/spark/status/TestSparkJobMonitor.java
>  e66354f0869738bd3cf0eb831c13fa6af1eda256 
>   service/src/java/org/apache/hive/service/ServiceUtils.java 
> 226e43244df10c22143b91f92ef312e56739d036 
>   
> service/src/java/org/apache/hive/service/cli/SparkProgressMonitorStatusMapper.java
>  PRE-CREATION 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> 68fe8d8aa143fafbfc611253ce3a12065016a537 
> 
> 
> Diff: https://reviews.apache.org/r/67636/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>

Re: New committer announcement : Marta Kuczora

2018-06-21 Thread Sahil Takiar

Congrats Marta!

On Thu, Jun 21, 2018 at 2:52 AM, Peter Vary 
wrote:

> Well done Marta!
> Congratulations!
>
> > On Jun 20, 2018, at 21:06, Ashutosh Chauhan 
> wrote:
> >
> > Apache Hive's Project Management Committee (PMC) has invited Marta
> Kuczora
> > to become a committer, and we are pleased to announce that he has
> accepted.
> >
> > Marta, welcome, thank you for your contributions, and we look forward
> your
> > further interactions with the community!
> >
> > Ashutosh Chauhan (on behalf of the Apache Hive PMC)
>
>


-- 
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309

Re: [ANNOUNCE] New committer: Adam Szita

2018-06-21 Thread Sahil Takiar

Congrats Adam!

On Thu, Jun 21, 2018 at 2:52 AM, Peter Vary 
wrote:

> Well done Adam!
> Congratulations!
>
> > On Jun 20, 2018, at 21:02, Ashutosh Chauhan 
> wrote:
> >
> > Apache Hive's Project Management Committee (PMC) has invited Adam Szita
> > to become a committer, and we are pleased to announce that he has
> accepted.
> >
> > Adam, welcome, thank you for your contributions, and we look forward your
> > further interactions with the community!
> >
> > Ashutosh Chauhan (on behalf of the Apache Hive PMC)
>
>


-- 
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309

Re: Review Request 67636: HIVE-19176 : Add HoS support to progress bar on Beeline client.

2018-06-19 Thread Sahil Takiar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67636/#review205020
---



is there a way to add some unit tests for this?


ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RenderStrategy.java
Lines 37-40 (patched)
<https://reviews.apache.org/r/67636/#comment287897>

can u add some javadocs for this



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RenderStrategy.java
Lines 43 (patched)
<https://reviews.apache.org/r/67636/#comment287899>

how much of this code is new coded vs. copied from other classes?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RenderStrategy.java
Lines 209 (patched)
<https://reviews.apache.org/r/67636/#comment287898>

javadocs please


- Sahil Takiar


On June 18, 2018, 8:41 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67636/
> ---
> 
> (Updated June 18, 2018, 8:41 p.m.)
> 
> 
> Review request for hive, Peter Vary, Sahil Takiar, and Vihang Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This logic is similar to the RenderStrategy used in Tez to print the progress 
> bar.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 933bda4ad01a6f7878019a7b4c971a0c39068ae2 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 
> ad8d1a7f1cca3a763bb7c07335998ab7d39d7598 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/LocalSparkJobMonitor.java
>  2a6c33bfd4824c96e7004cd1ecce48c62c97d685 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RemoteSparkJobMonitor.java
>  004b50ba95934280cf302055a46a5d984b421e07 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/RenderStrategy.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java 
> 3531ac25a9959aacd5766a9a42316890c68a1cd5 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/spark/status/TestSparkJobMonitor.java
>  e66354f0869738bd3cf0eb831c13fa6af1eda256 
>   service/src/java/org/apache/hive/service/ServiceUtils.java 
> 226e43244df10c22143b91f92ef312e56739d036 
>   
> service/src/java/org/apache/hive/service/cli/SparkProgressMonitorStatusMapper.java
>  PRE-CREATION 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> 68fe8d8aa143fafbfc611253ce3a12065016a537 
> 
> 
> Diff: https://reviews.apache.org/r/67636/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>

Re: Ptest timeouts

2018-06-19 Thread Sahil Takiar

I checked the master instance, and it doesn't seem to be an issue with disk
space. I can re-start the cluster and see if that helps, however, some Hive
QA jobs will fail as a result.

On Tue, Jun 19, 2018 at 12:44 PM, Thejas Nair  wrote:

> + Vihang
>
>
> On Tue, Jun 19, 2018 at 1:55 AM, Prasanth Jayachandran
>  wrote:
> > Precommit tests have started to timeout again. Could it be because of
> disk space issue? Should we need restart again? Sometimes >90% disk usage
> may also result in unhealthy node where no containers can be launched.
> >
> > Thanks
> > Prasanth
>

-- 
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309

[jira] [Created] (HIVE-19937) Intern JobConf objects in Spark tasks

2018-06-18 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19937:
---

 Summary: Intern JobConf objects in Spark tasks
 Key: HIVE-19937
 URL: https://issues.apache.org/jira/browse/HIVE-19937
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Sahil Takiar
Assignee: Sahil Takiar


When fixing HIVE-16395, we decided that each new Spark task should clone the 
{{JobConf}} object to prevent any {{ConcurrentModificationException}} from 
being thrown. However, setting this variable comes at a cost of storing a 
duplicate {{JobConf}} object for each Spark task. These objects can take up a 
significant amount of memory, we should intern them so that Spark tasks running 
in the same JVM don't store duplicate copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 67468: HIVE-18118: provide supportability support for Erasure Coding Update number of Erasure Coded Files in a directory as part of Basic (aka Quick) Stats This information is then

2018-06-15 Thread Sahil Takiar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67468/#review204836
---



Why not show the #of EC files for regular explain plans too? To decrease the # 
of q file updates, it can be omitted if the # of EC files = 0


standalone-metastore/src/main/java/org/apache/hadoop/hive/common/StatsSetupConst.java
Line 116 (original), 122 (patched)
<https://reviews.apache.org/r/67468/#comment287582>

why change this from an array to a list?


- Sahil Takiar


On June 6, 2018, 12:46 a.m., Andrew Sherman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67468/
> ---
> 
> (Updated June 6, 2018, 12:46 a.m.)
> 
> 
> Review request for hive and Sahil Takiar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-18118: provide supportability support for Erasure Coding 
> [NOTE THIS REVIEW INITIALLY OMITS 200+ .q.out changes]
> Update number of Erasure Coded Files in a directory as part of Basic (aka 
> Quick) Stats 
> This information is then (mostly) available through 'EXPLAIN EXTENDED' and 
> 'DESCRIBE EXTENDED' 
> Extend the MiniHS2 Builder to allow configuring the number of datanodes. 
> Add a jdbc MiniHS2/Spark test that uses Erasure Coding. 
> There are some change to StatsSetupConst to make checkstyle happy.
> 
> 
> Diffs
> -
> 
>   
> itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniHS2.java 
> d7d7097336fc6be4c2f7a35cd6897e0375486e81 
>   
> itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniHS2ErasureCoding.java
>  PRE-CREATION 
>   itests/src/test/resources/testconfiguration.properties 
> 463fda1913f6d5b928fcee038f19e124b0239e96 
>   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 
> 2365fb76bd08f3a310e81ac3a19ca64971aeec8e 
>   itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java 
> 1700c08d3f37285de43b5d4fe5c77ef55c170235 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
> e06949928d179cfd9a4dcb7176203b885509 
>   
> ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
>  209fdfb287cabc5bb7cab2117d771f7907deb2b9 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsNoJobTask.java 
> d4d46a3671efdaaed32f63b7262b963cce00b94e 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java 
> 8c238871765b0d5312a459a0e7f68c81f3837c13 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> 982b18076180ba300094f30a7f87f025f993b265 
>   ql/src/test/queries/clientpositive/erasure_explain.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/erasure_simple.q 
> c08409c17787417b986d90a43104f5ddd456e600 
>   ql/src/test/results/clientpositive/erasurecoding/erasure_explain.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/erasurecoding/erasure_simple.q.out 
> 01f6015a346c1e4283fd6a8cf1eaa3b670450e20 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/common/StatsSetupConst.java
>  78ea01d9687fe043d63441430c46b30c25cd9756 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  77ed2b4de4569fa8aca23b16f2b362b187c7c4fc 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
>  9b36d09eb9fb332e913d442bb476628eca334b6e 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/utils/TestMetaStoreUtils.java
>  55ff1502d415dea52095cfdd523d01f1e49ce084 
> 
> 
> Diff: https://reviews.apache.org/r/67468/diff/1/
> 
> 
> Testing
> ---
> 
> Ran driver tests and new jdbc test
> 
> 
> Thanks,
> 
> Andrew Sherman
> 
>

[jira] [Created] (HIVE-19910) hive.spark.log.dir isn't honored for TestSparkCliDriver

2018-06-15 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19910:
---

 Summary: hive.spark.log.dir isn't honored for TestSparkCliDriver
 Key: HIVE-19910
 URL: https://issues.apache.org/jira/browse/HIVE-19910
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar


I haven't actually confirmed this, but I don't think {{hive.spark.log.dir}} is 
getting honored for any Spark test that sets {{spark.master}} to 
{{local-cluster}} because it adds {{hive.spark.log.dir}} as a system property 
via the {{spark.driver.extraJavaOptions}} configuration, but according to the 
Spark docs passing system properties via this parameter doesn't work in client 
mode, users have to use {{--driver-java-options}} instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19821) Distributed HiveServer2

2018-06-06 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19821:
---

 Summary: Distributed HiveServer2
 Key: HIVE-19821
 URL: https://issues.apache.org/jira/browse/HIVE-19821
 Project: Hive
  Issue Type: New Feature
  Components: HiveServer2
Reporter: Sahil Takiar
Assignee: Sahil Takiar


HS2 deployments often hit OOM issues due to a number of factors: (1) too many 
concurrent connections, (2) query that scan a large number of partitions have 
to pull a lot of metadata into memory (e.g. a query reading thousands of 
partitions requires loading thousands of partitions into memory), (3) very 
large queries can take up a lot of heap space, especially during query parsing. 
There are a number of other factors that cause HiveServer2 to run out of 
memory, these are just some of the more commons ones.

Distributed HS2 proposes to do all query parsing, compilation, planning, and 
execution coordination inside a dedicated container. This should significantly 
decrease memory pressure on HS2 and allow HS2 to scale to a larger number of 
concurrent users.

For HoS (and I think Hive-on-Tez) this just requires moving all query 
compilation, planning, etc. inside the application master for the corresponding 
Hive session.

The main benefit here is isolation. A poorly written Hive query cannot bring 
down an entire HiveServer2 instance and force all other queries to fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 67263: HIVE-19602

2018-06-05 Thread Sahil Takiar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67263/#review204332
---


Ship it!




Ship It!

- Sahil Takiar


On June 4, 2018, 6:34 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67263/
> ---
> 
> (Updated June 4, 2018, 6:34 p.m.)
> 
> 
> Review request for hive, Sahil Takiar and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-19602
> https://issues.apache.org/jira/browse/HIVE-19602
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Refactor inplace progress code in Hive-on-spark progress monitor to use 
> ProgressMonitor instance
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java 
> 7afd8864075aa0d9708274eea8839c662324c732 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkProgressMonitor.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/67263/diff/4/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>

[jira] [Created] (HIVE-19797) Enable Spark InProcessLauncher on secure clusters

2018-06-05 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19797:
---

 Summary: Enable Spark InProcessLauncher on secure clusters
 Key: HIVE-19797
 URL: https://issues.apache.org/jira/browse/HIVE-19797
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Sahil Takiar
Assignee: Sahil Takiar


In HIVE-18533, we added the option to launch Spark app using Spark's 
{{InProcessLauncher}}. However, for the first version we decided not to support 
secure clusters. The goal of this JIRA is to get the {{InProcessLauncher}} to 
run on secure clusters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19788) Flaky test: TestHCatLoaderComplexSchema

2018-06-04 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19788:
---

 Summary: Flaky test: TestHCatLoaderComplexSchema
 Key: HIVE-19788
 URL: https://issues.apache.org/jira/browse/HIVE-19788
 Project: Hive
  Issue Type: Sub-task
  Components: Test
Reporter: Sahil Takiar
Assignee: Sahil Takiar


{{TestHCatLoaderComplexSchema}} is still flaky because its writing to {{/tmp/}} 
- HIVE-19731 was meant to fix this, and that fixes the tmp dir for any Hive 
queries, but these tests run a bunch of Pig queries too, and those queries 
write to {{/tmp/}} - we need to pass in custom configs to the embedded 
{{PigServer}} that is being created as part of these tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19787) Log message when spark-submit has completed

2018-06-04 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19787:
---

 Summary: Log message when spark-submit has completed
 Key: HIVE-19787
 URL: https://issues.apache.org/jira/browse/HIVE-19787
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar


If {{spark-submit}} runs successfully the "Driver" thread should log a message. 
Otherwise there is no way to know if {{spark-submit}} exited successfully. We 
should also rename the thread to some more informative than "Driver".

Without this, debugging timeout exceptions of the RemoteDriver -> HS2 
connection is difficult, because there is no way to know if {{spark-submit}} 
finished or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19786) RpcServer cancelTask log message is incorrect

2018-06-04 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19786:
---

 Summary: RpcServer cancelTask log message is incorrect
 Key: HIVE-19786
 URL: https://issues.apache.org/jira/browse/HIVE-19786
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar


The log message inside the {{cancelTask}} of the {{RpcServer}} 
{{ChannelInitializer}} is incorrect. It states its measuring the timeout for 
the "test" message to be sent (basically a "hello" message to test the 
connection works). However, the {{cancelTask}} is actually used to timeout the 
SASL negotiation between the client and the server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19785) Race condition when timeout task is invoked during SASL negotation

2018-06-04 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19785:
---

 Summary: Race condition when timeout task is invoked during SASL 
negotation
 Key: HIVE-19785
 URL: https://issues.apache.org/jira/browse/HIVE-19785
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar


There is a race condition that leads to some extraneous exception messages when 
the timeout task is invoked in {{RpcServer}}.

If a timeout is triggered by {{RpcServer#registerClient}} the method will 
remove the {[clientId}} from {{pendingClients}}. However, if the SASL 
negotiation is in progress when the timeout task is invoked, then 
{{SaslServerHandler#update}} will throw an {{IllegalArgumentException}} 
complaining that it can't find the {{clientId}} in the map of 
{{pendingClients}}.

The timeout still succeeds, but the logging is confusing and multiple 
exceptions make this difficult to debug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19784) Regression test selection framework for ptest

2018-06-04 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19784:
---

 Summary: Regression test selection framework for ptest
 Key: HIVE-19784
 URL: https://issues.apache.org/jira/browse/HIVE-19784
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Reporter: Sahil Takiar


Regression test selection is a methodology for decreasing the number of tests 
that are run in regression test suites. The idea is to that for a given change, 
only run the tests that are relevant to the given change, rather than all the 
tests.

For example, right now Hive QA runs all the {{standalone-metastore}} tests for 
every patch. However, most of the time this isn't necessary. If a patch is only 
modifying files in {{ql}} or {{common}} there is no need to run 
{{standalone-metastore}} tests as there is no dependency from the 
{{standalone-metastore}} to any other Hive module (exception for 
{{storage-api}}).

RTS is commonly used for CI systems. Google has published some interesting info 
on how they do this
* 
http://google-engtools.blogspot.com/2011/06/testing-at-speed-and-scale-of-google.html
* https://drive.google.com/file/d/0Bx-FLr0Egz9zYXJfMEZ6NERTbkU/view
* [Bazelhttps://bazel.build/] seems to provide some functionality to do this: 
http://code.hootsuite.com/faster-automated-tests-bazel/

There are a few other open-source projects that offer different ways of doing 
this: [Ekstazi|http://ekstazi.org/]

A short term solution would be to implement the following:
* Before each Hive QA, parse the Maven dependency graph
* Take the specified patch and check which Maven modules it modifies
* Runs tests contained inside the modified modules and their dependent modules



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 67415: HIVE-19525 : Spark task logs print PLAN PATH excessive number of times

2018-06-01 Thread Sahil Takiar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67415/#review204193
---


Ship it!




Ship It!

- Sahil Takiar


On June 1, 2018, 3:51 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67415/
> ---
> 
> (Updated June 1, 2018, 3:51 p.m.)
> 
> 
> Review request for hive, Sahil Takiar and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-19525
> https://issues.apache.org/jira/browse/HIVE-19525
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Changing path log level to debug.
> The code to check if base work is already present in the map is placed before 
> the logic to get kryo object and classloader.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 
> 406bea011da83aee55f385029a4df1af94400e4c 
> 
> 
> Diff: https://reviews.apache.org/r/67415/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>

[jira] [Created] (HIVE-19765) Add Parquet specific tests to BlobstoreCliDriver

2018-06-01 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19765:
---

 Summary: Add Parquet specific tests to BlobstoreCliDriver
 Key: HIVE-19765
 URL: https://issues.apache.org/jira/browse/HIVE-19765
 Project: Hive
  Issue Type: Sub-task
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Similar to what was done for RC and ORC files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19764) Add --SORT_QUERY_RESULTS to hive-blobstore/map_join.q.out

2018-06-01 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19764:
---

 Summary: Add --SORT_QUERY_RESULTS to hive-blobstore/map_join.q.out
 Key: HIVE-19764
 URL: https://issues.apache.org/jira/browse/HIVE-19764
 Project: Hive
  Issue Type: Sub-task
  Components: Test
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Fixes flakiness with this test



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19760) Flaky test: TestJdbcWithMiniLlapArrow.testDataTypes

2018-06-01 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19760:
---

 Summary: Flaky test: TestJdbcWithMiniLlapArrow.testDataTypes
 Key: HIVE-19760
 URL: https://issues.apache.org/jira/browse/HIVE-19760
 Project: Hive
  Issue Type: Sub-task
Reporter: Sahil Takiar


Error:

{code}
Error Message
expected:<2012-04-22 09:00:00.123456789> but was:<2012-04-22 09:00:00.123>
Stacktrace
java.lang.AssertionError: expected:<2012-04-22 09:00:00.123456789> but 
was:<2012-04-22 09:00:00.123>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow.testDataTypes(TestJdbcWithMiniLlapArrow.java:220)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19759) Flaky test: TestRpc#testServerPort

2018-06-01 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19759:
---

 Summary: Flaky test: TestRpc#testServerPort
 Key: HIVE-19759
 URL: https://issues.apache.org/jira/browse/HIVE-19759
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Conflict when opening ports:

{code}
java.io.IOException: Remote Spark Driver RPC Server cannot bind to any of the 
configured ports: [65535, 21, 22, 23]
at 
org.apache.hive.spark.client.rpc.RpcServer.bindServerPort(RpcServer.java:150)
at org.apache.hive.spark.client.rpc.RpcServer.(RpcServer.java:117)
at 
org.apache.hive.spark.client.rpc.TestRpc.testServerPort(TestRpc.java:209)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19752) PerfLogger integration for critical Hive-on-S3 paths

2018-05-31 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19752:
---

 Summary: PerfLogger integration for critical Hive-on-S3 paths
 Key: HIVE-19752
 URL: https://issues.apache.org/jira/browse/HIVE-19752
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Reporter: Sahil Takiar
Assignee: Sahil Takiar


There are several areas where Hive performs a lot of S3 operations, it would be 
good to add PerfLogger statements around this so we can measure how long they 
take.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 67263: HIVE-19602

2018-05-31 Thread Sahil Takiar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67263/#review204126
---


Fix it, then Ship it!




minor comment otherwise LGTM


ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java
Line 238 (original), 245 (patched)
<https://reviews.apache.org/r/67263/#comment286518>

can be private


- Sahil Takiar


On May 29, 2018, 10:53 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67263/
> ---
> 
> (Updated May 29, 2018, 10:53 p.m.)
> 
> 
> Review request for hive, Sahil Takiar and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-19602
> https://issues.apache.org/jira/browse/HIVE-19602
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Refactor inplace progress code in Hive-on-spark progress monitor to use 
> ProgressMonitor instance
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java 
> 7afd8864075aa0d9708274eea8839c662324c732 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkProgressMonitor.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/67263/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>

[jira] [Created] (HIVE-19733) RemoteSparkJobStatus#getSparkStageProgress inefficient implementation

2018-05-29 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19733:
---

 Summary: RemoteSparkJobStatus#getSparkStageProgress inefficient 
implementation
 Key: HIVE-19733
 URL: https://issues.apache.org/jira/browse/HIVE-19733
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar


The implementation of {{RemoteSparkJobStatus#getSparkStageProgress}} is a bit 
inefficient. There is one RPC call to get the {{SparkJobInfo}} and then for 
every stage there is another RPC call to get each {{SparkStageInfo}}. This 
could all be done in a single RPC call.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 67263: HIVE-19602

2018-05-29 Thread Sahil Takiar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67263/#review203993
---




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java
Line 340 (original), 234 (patched)
<https://reviews.apache.org/r/67263/#comment286394>

add some javadocs for this class



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkProgressMonitor.java
Lines 12 (patched)
<https://reviews.apache.org/r/67263/#comment286393>

can you add some javadocs for this class



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkProgressMonitor.java
Lines 18 (patched)
<https://reviews.apache.org/r/67263/#comment286392>

this can be package private right?


- Sahil Takiar


On May 23, 2018, 5:32 a.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67263/
> ---
> 
> (Updated May 23, 2018, 5:32 a.m.)
> 
> 
> Review request for hive, Sahil Takiar and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-19602
> https://issues.apache.org/jira/browse/HIVE-19602
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Refactor inplace progress code in Hive-on-spark progress monitor to use 
> ProgressMonitor instance
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java 
> 7afd8864075aa0d9708274eea8839c662324c732 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkProgressMonitor.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/67263/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>

Re: Review Request 67263: HIVE-19602

2018-05-29 Thread Sahil Takiar



> On May 25, 2018, 4:43 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java
> > Line 70 (original), 67 (patched)
> > <https://reviews.apache.org/r/67263/diff/1/?file=2027540#file2027540line72>
> >
> > is this still used?
> 
> Bharathkrishna Guruvayoor Murali wrote:
> Will remove this if we do not need to pass the headers , footers etc.. to 
> ProgressMonitor.
> (ie. if the progress bar format shown in below comment is acceptable).

So what changes about the progress bar format if you do pass the headers and 
footers?


> On May 25, 2018, 4:43 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java
> > Line 304 (original), 231 (patched)
> > <https://reviews.apache.org/r/67263/diff/1/?file=2027540#file2027540line306>
> >
> > whats the point of this class?
> 
> Bharathkrishna Guruvayoor Murali wrote:
> I used this class to follow same pattern as in tez. I will add the logic 
> similar to RenderStrategy used in tez while adding beeline progress bar, so 
> this should be useful.

Ok, can this be a private class? Is it used outside `SparkJobMonitor`?


> On May 25, 2018, 4:43 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkProgressMonitor.java
> > Lines 27 (patched)
> > <https://reviews.apache.org/r/67263/diff/1/?file=2027541#file2027541line27>
> >
> > whats the impact of having an extra argument? does the formatting 
> > change at all?
> 
> Bharathkrishna Guruvayoor Murali wrote:
> It changes a bit,like this:
> 
> 
> --
>   STAGES   ATTEMPTSTATUS  TOTAL  COMPLETED  RUNNING  
> PENDING  FAILED
> 
> --
> Stage-1  0  FINISHED  1  10   
>  0   0
> 
> --
> STAGES: 01/01[==>>] 100%  ELAPSED TIME: 1.01 s
> 
> --
> 
> Notice the bit of extra space at the end. But other than that, it looks 
> pretty much same.

Ok, thats probably fine for now.


- Sahil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67263/#review203891
---


On May 23, 2018, 5:32 a.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67263/
> ---
> 
> (Updated May 23, 2018, 5:32 a.m.)
> 
> 
> Review request for hive, Sahil Takiar and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-19602
> https://issues.apache.org/jira/browse/HIVE-19602
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Refactor inplace progress code in Hive-on-spark progress monitor to use 
> ProgressMonitor instance
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java 
> 7afd8864075aa0d9708274eea8839c662324c732 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkProgressMonitor.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/67263/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>

[jira] [Created] (HIVE-19716) Set spark.local.dir for a few more HoS integration tests

2018-05-25 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19716:
---

 Summary: Set spark.local.dir for a few more HoS integration tests
 Key: HIVE-19716
 URL: https://issues.apache.org/jira/browse/HIVE-19716
 Project: Hive
  Issue Type: Test
  Components: Spark
Reporter: Sahil Takiar
Assignee: Sahil Takiar


There are a few more flaky tests that are failing because the run a HoS queries 
that writes some temp data to {{/tmp/}}. These tests are regular JUnit tests, 
so they weren't covered in the previous attempts to do this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 67263: HIVE-19602

2018-05-25 Thread Sahil Takiar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67263/#review203891
---



are there any logic changes, or is most of the code just copied into the new 
class?


ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java
Line 70 (original), 67 (patched)
<https://reviews.apache.org/r/67263/#comment286249>

is this still used?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java
Line 304 (original), 231 (patched)
<https://reviews.apache.org/r/67263/#comment286251>

whats the point of this class?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkProgressMonitor.java
Lines 27 (patched)
<https://reviews.apache.org/r/67263/#comment286252>

whats the impact of having an extra argument? does the formatting change at 
all?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkProgressMonitor.java
Lines 77 (patched)
<https://reviews.apache.org/r/67263/#comment286254>

the current states LGTM


- Sahil Takiar


On May 23, 2018, 5:32 a.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67263/
> ---
> 
> (Updated May 23, 2018, 5:32 a.m.)
> 
> 
> Review request for hive, Sahil Takiar and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-19602
> https://issues.apache.org/jira/browse/HIVE-19602
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Refactor inplace progress code in Hive-on-spark progress monitor to use 
> ProgressMonitor instance
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobMonitor.java 
> 7afd8864075aa0d9708274eea8839c662324c732 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkProgressMonitor.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/67263/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>

Re: May 2018 Hive User Group Meeting

2018-05-23 Thread Sahil Takiar

Wanted to thank everyone for attending the meetup a few weeks ago, and a
huge thanks to all of our speakers! Apologies for the delay, but we finally
have the recording uploaded to Youtube along with all the slides uploaded
to Slidehshare. Below are the links:

Recording: https://youtu.be/gwX3KpHa2j0

   - Hive-on-Spark at Uber: Efficiency & Scale - Xuefu Zhang -
   https://www.slideshare.net/sahiltakiar/hive-on-spark-at-uber-scale
   - Hive-on-S3 Performance: Past, Present, and Future - Sahil Takiar -
   
https://www.slideshare.net/sahiltakiar/hive-ons3-performance-past-present-and-future
   - Dali: Data Access Layer at LinkedIn - Adwait Tumbde -
   https://www.slideshare.net/sahiltakiar/dali-data-access-layer
   - Parquet Vectorization in Hive - Vihang Karajgaonkar -
   https://www.slideshare.net/sahiltakiar/parquet-vectorization-in-hive
   - ORC Column Level Encryption - Owen O’Malley -
   https://www.slideshare.net/sahiltakiar/orc-column-encryption
   - Running Hive at Scale @ Lyft - Sharanya Santhanam, Rohit Menon -
   https://www.slideshare.net/sahiltakiar/running-hive-at-scale-lyft
   - Materialized Views in Hive - Jesus Camacho Rodriguez -
   
https://www.slideshare.net/sahiltakiar/accelerating-query-processing-with-materialized-views-in-apache-hive-98333641
   - Hive Metastore Caching - Daniel Dai -
   https://www.slideshare.net/sahiltakiar/hive-metastore-cache
   - Hive Metastore Separation - Alan Gates -
   https://www.slideshare.net/sahiltakiar/making-the-metastore-standalone
   - Customer Use Cases & Pain Points of (Big) Metadata - Rituparna Agrawal
   -
   
https://www.slideshare.net/sahiltakiar/customer-use-cases-pain-points-of-big-metadata

If you have any issues accessing the links, feel free to reach out to me.

Looking forward to our next Hive Meetup!

On Mon, May 14, 2018 at 8:45 AM, Sahil Takiar <takiar.sa...@gmail.com>
wrote:

> Hello,
>
> Yes, the meetup was recorded. We are in the process of getting it uploaded
> to Youtube. Once its publicly available I will send out the link on this
> email thread.
>
> Thanks
>
> --Sahil
>
> On Mon, May 14, 2018 at 6:04 AM, <roberto.tar...@stratebi.com> wrote:
>
>> Hi,
>>
>>
>>
>> If you have recorded the meeting share link please. I could not follow it
>> online for the schedule (I live in Spain).
>>
>>
>>
>> Kind Regards,
>>
>>
>>
>>
>>
>> *From:* Luis Figueroa [mailto:lef...@outlook.com]
>> *Sent:* miércoles, 9 de mayo de 2018 18:01
>> *To:* u...@hive.apache.org
>> *Cc:* dev@hive.apache.org
>> *Subject:* Re: May 2018 Hive User Group Meeting
>>
>>
>>
>> Hey everyone,
>>
>>
>>
>> Was the meeting recorded by any chance?
>>
>> Luis
>>
>>
>> On May 8, 2018, at 5:31 PM, Sahil Takiar <takiar.sa...@gmail.com> wrote:
>>
>> Hey Everyone,
>>
>>
>>
>> Almost time for the meetup! The live stream can be viewed on this link:
>> https://live.lifesizecloud.com/extension/2000992219?token=
>> 067078ac-a8df-45bc-b84c-4b371ecbc719==en
>> =Hive%20User%20Group%20Meetup
>>
>> The stream won't be live until the meetup starts.
>>
>> For those attending in person, there will be guest wifi:
>>
>> Login: HiveMeetup
>> Password: ClouderaHive
>>
>>
>>
>> On Mon, May 7, 2018 at 12:48 PM, Sahil Takiar <takiar.sa...@gmail.com>
>> wrote:
>>
>> Hey Everyone,
>>
>>
>>
>> The meetup is only a day away! Here
>> <https://docs.google.com/document/d/1v8iERias-LOq8-q4BrCNSUsOarCwOAYRwXYIF0DK5AU/edit?usp=sharing>
>> is a link to all the abstracts we have compiled thus far. Several of you
>> have asked about event streaming and recordings. The meetup will be both
>> streamed live and recorded. We will post the links on this thread and on
>> the meetup link tomorrow closer to the start of the meetup.
>>
>>
>>
>> The meetup will be at Cloudera HQ - 395 Page Mill Rd
>> <https://maps.google.com/?q=395+Page+Mill+Rd=gmail=g>. If
>> you have any trouble getting into the building, feel free to post on the
>> meetup link.
>>
>>
>>
>> Meetup Link: https://www.meetup.com/Hive-User-Group-Meeting/events/
>> 249641278/
>>
>>
>>
>> On Wed, May 2, 2018 at 7:48 AM, Sahil Takiar <takiar.sa...@gmail.com>
>> wrote:
>>
>> Hey Everyone,
>>
>>
>>
>> The agenda for the meetup has been set and I'm excited to say we have
>> lots of interesting talks scheduled! Below is final agenda, the full list
>> of abstracts will be sent out soon. If you are planning to attend, please
>> RSVP on the meetup

[jira] [Created] (HIVE-19676) Ability to selectively run tests in TestBlobstoreCliDriver

2018-05-23 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19676:
---

 Summary: Ability to selectively run tests in TestBlobstoreCliDriver
 Key: HIVE-19676
 URL: https://issues.apache.org/jira/browse/HIVE-19676
 Project: Hive
  Issue Type: Sub-task
Reporter: Sahil Takiar
Assignee: Sahil Takiar


The {{TestBlobstoreCliDriver}} contains a {{testconfiguration.properties}}, but 
it doesn't seem to be used anywhere. It would be nice if it could be used to 
define which to run or which tests to exclude.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Release branch updates

2018-05-22 Thread Sahil Takiar

Is there a reason we want to only release 3.x on branch-3? I don't think
this is what we did for the 2.1.0 and 2.2.0 releases.

Given that most of the changes in Hive go into master, why would we want to
upgrade it to 4.0 given that we just released 3.0?

On Wed, May 16, 2018 at 1:03 PM, Vineet Garg <vg...@hortonworks.com> wrote:

> Hello,
>
> An update on release branches:
>
> I have created a new branch branch-3.0 off branch-3 which is being used to
> release Hive 3.0. branch-3 will continue to be used for 3.x releases.
> I am preparing branch-3 for 3.1 release and is now open for commits.
> Please ping me if you would like to get your commit to branch-3 keeping in
> mind that we should target only minor features and bug fixes in there.
> Master will continue to be used for major features in 4.0.
>
>
> Thanks,
> Vineet Garg
>

-- 
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309

[jira] [Created] (HIVE-19603) Decrease batch size of TestMinimrCliDriver

2018-05-18 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19603:
---

 Summary: Decrease batch size of TestMinimrCliDriver
 Key: HIVE-19603
 URL: https://issues.apache.org/jira/browse/HIVE-19603
 Project: Hive
  Issue Type: Test
  Components: Tests
Reporter: Sahil Takiar
Assignee: Sahil Takiar


We have seen a lot of flakiness with the {{TestMinimrCliDriver}} - it keeps on 
timing out. I checked a recent Hive QA run and running the following tests 
locally takes my machine 1 hour:

{code}
mvn -B test -Dtest.groups= -Dtest=TestMinimrCliDriver 
-Dminimr.query.files=infer_bucket_sort_num_buckets.q,infer_bucket_sort_reducers_power_two.q,parallel_orderby.q,bucket_num_reducers_acid.q,scriptfile1.q,infer_bucket_sort_map_operators.q,infer_bucket_sort_merge.q,root_dir_external_table.q,infer_bucket_sort_dyn_part.q,udf_using.q
{code}

On ptest, the timeout is 40 minutes. I suggest we decrease the batch size from 
10 to 5.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 67023: HIVE-18117: Add a new Test Driver "TestErasureCodingHDFSCliDriver" that can be used to run tests over hdfs directories that employ Erasure Coding.

2018-05-16 Thread Sahil Takiar



> On May 10, 2018, noon, Sahil Takiar wrote:
> > itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliConfigs.java
> > Lines 674 (patched)
> > <https://reviews.apache.org/r/67023/diff/1/?file=2018287#file2018287line674>
> >
> > is this necessary since you set the cluster type to mr above?
> 
> Andrew Sherman wrote:
> Ha good question. Yes it is necessary as setClusterType() does not always 
> set the cluster type :-( - it allows the cluster type to overridden with 
> -Dclustermode=xxx
> 
> Sahil Takiar wrote:
> interesting, should we handle other cluster types like Spark or MR too?
> 
> Andrew Sherman wrote:
> looks like it does already
> 
> Sahil Takiar wrote:
> then is the tez specific code necessary?
> 
> Andrew Sherman wrote:
>     I think it is, as tez has its own hive-site.xml but I am not 100% sure
> 
> Sahil Takiar wrote:
> yes, tez has its own `hive-site.xml`, as does Spark; so do we need to do 
> this for Spark too?
> 
> Andrew Sherman wrote:
> That would make sense but then we have to decide if that is 
> data/conf/spark/local or data/conf/spark/local :-(
> These code paths are confusing becaus it is hard to anticipate how 
> they'll be used.

you can use `data/conf/spark/standalone` for `MiniClusterType.spark`; and 
`data/conf/spark/yarn-cluster` for `MiniClusterType.miniSparkOnYarn`. if u 
think this isn't worth adding support for, then i would suggest explicitly 
doing that rather than supporting certain execution engines and not others


- Sahil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67023/#review202831
---


On May 14, 2018, 9:44 p.m., Andrew Sherman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67023/
> -------
> 
> (Updated May 14, 2018, 9:44 p.m.)
> 
> 
> Review request for hive and Sahil Takiar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> TestErasureCodingHDFSCliDriver uses a test-only CommandProcessor 
> "ErasureProcessor"
> which allows .q files to contain Erasure Coding commands similar to those 
> provided
> by the hdfs ec command
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html.
> The Erasure Coding functionality is exposed through a new shim 
> "HdfsFileErasureCodingPolicy".
> At this stage there are two .q files:
> erasure_commnds.q (a simple test to show ERASURE commands can run on local fs 
> via
> TestCliDriver or on hdfs via TestErasureCodingHDFSCliDriver), and
> erasure_simple.q (which does some trivial queries to demonstrate basic 
> functionality).
> More tests will come in future commits.
> 
> 
> Diffs
> -
> 
>   
> itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestErasureCodingHDFSCliDriver.java
>  PRE-CREATION 
>   itests/src/test/resources/testconfiguration.properties 
> cf6d19a5937c3f4a82e4ffe09201af8a79da2e3d 
>   
> itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliConfigs.java 
> 1814f0fa190e0ebcf327aebcdaf6f9967a5fd14f 
>   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 
> 16571b3ff3288dfb99fbca570452592cc1650f9a 
>   
> ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessorFactory.java 
> 3d47991b603c94c8da2106e67192c8513ef783a7 
>   ql/src/java/org/apache/hadoop/hive/ql/processors/ErasureProcessor.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/processors/HiveCommand.java 
> 56c7516ecfaf2421b0f3d3a188d05f38715b25b2 
>   ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 
> 89129f99fe63f0384aff965ad665770d11e9af04 
>   
> ql/src/test/org/apache/hadoop/hive/ql/processors/TestCommandProcessorFactory.java
>  de43c2866f64e2ed5c74eab450de28f1a79248dc 
>   ql/src/test/queries/clientpositive/erasure_commands.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/erasure_simple.q PRE-CREATION 
>   ql/src/test/results/clientpositive/erasure_commands.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/erasurecoding/erasure_commands.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/erasurecoding/erasure_simple.q.out 
> PRE-CREATION 
>   shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java 
> ec06a88dc21346473bec6589c703167d50e3b367 
>   shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 
> b89081761780bf1f305d0196bb94bb0b54f7184f 
>   testutils/ptest2/conf/deployed/master-mr2.properties 
> 7edc307f85744d60d322ad8087164625677fc230 
> 
> 
> Diff: https://reviews.apache.org/r/67023/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrew Sherman
> 
>

[jira] [Created] (HIVE-19571) Ability to run multiple pre-commit jobs on a ptest server

2018-05-16 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19571:
---

 Summary: Ability to run multiple pre-commit jobs on a ptest server
 Key: HIVE-19571
 URL: https://issues.apache.org/jira/browse/HIVE-19571
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Reporter: Sahil Takiar
Assignee: Sahil Takiar


I've been taking a look at the Disk, Network, and CPU usage of the GCE 
instances we run ptest on, and it doesn't look like we are fully utilizing the 
machines. The resource usage is very up and down.

During each ptest execution, there is a large chunk of time (~20 min) where its 
just the Jenkins job that is doing any work (checking out github repos, 
building code, figuring out test batches, etc.). During this time, the ptest 
nodes are mostly idle - the CPU and Disk I/O are almost zero.

Even when ptest is running, I think some of resources are under-utilized. 
Network and disk resource spike at the beginning of the job, probably because 
ptest is distributing resources to each machine, each slave is downloading 
jars, etc. However, after that, when the actual tests run, there is almost 0 
network activity (which makes sense since tests runs on a single node). For 
disk usage, there is activity, but not nearly as high as when the setup phase 
was occuring. CPU usage fluctuates between 40-80%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 67023: HIVE-18117: Add a new Test Driver "TestErasureCodingHDFSCliDriver" that can be used to run tests over hdfs directories that employ Erasure Coding.

2018-05-15 Thread Sahil Takiar



> On May 10, 2018, noon, Sahil Takiar wrote:
> > itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliConfigs.java
> > Lines 674 (patched)
> > <https://reviews.apache.org/r/67023/diff/1/?file=2018287#file2018287line674>
> >
> > is this necessary since you set the cluster type to mr above?
> 
> Andrew Sherman wrote:
> Ha good question. Yes it is necessary as setClusterType() does not always 
> set the cluster type :-( - it allows the cluster type to overridden with 
> -Dclustermode=xxx
> 
> Sahil Takiar wrote:
> interesting, should we handle other cluster types like Spark or MR too?
> 
> Andrew Sherman wrote:
> looks like it does already
> 
> Sahil Takiar wrote:
> then is the tez specific code necessary?
> 
> Andrew Sherman wrote:
> I think it is, as tez has its own hive-site.xml but I am not 100% sure

yes, tez has its own `hive-site.xml`, as does Spark; so do we need to do this 
for Spark too?


- Sahil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67023/#review202831
---


On May 14, 2018, 9:44 p.m., Andrew Sherman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67023/
> -------
> 
> (Updated May 14, 2018, 9:44 p.m.)
> 
> 
> Review request for hive and Sahil Takiar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> TestErasureCodingHDFSCliDriver uses a test-only CommandProcessor 
> "ErasureProcessor"
> which allows .q files to contain Erasure Coding commands similar to those 
> provided
> by the hdfs ec command
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html.
> The Erasure Coding functionality is exposed through a new shim 
> "HdfsFileErasureCodingPolicy".
> At this stage there are two .q files:
> erasure_commnds.q (a simple test to show ERASURE commands can run on local fs 
> via
> TestCliDriver or on hdfs via TestErasureCodingHDFSCliDriver), and
> erasure_simple.q (which does some trivial queries to demonstrate basic 
> functionality).
> More tests will come in future commits.
> 
> 
> Diffs
> -
> 
>   
> itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestErasureCodingHDFSCliDriver.java
>  PRE-CREATION 
>   itests/src/test/resources/testconfiguration.properties 
> cf6d19a5937c3f4a82e4ffe09201af8a79da2e3d 
>   
> itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliConfigs.java 
> 1814f0fa190e0ebcf327aebcdaf6f9967a5fd14f 
>   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 
> 16571b3ff3288dfb99fbca570452592cc1650f9a 
>   
> ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessorFactory.java 
> 3d47991b603c94c8da2106e67192c8513ef783a7 
>   ql/src/java/org/apache/hadoop/hive/ql/processors/ErasureProcessor.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/processors/HiveCommand.java 
> 56c7516ecfaf2421b0f3d3a188d05f38715b25b2 
>   ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 
> 89129f99fe63f0384aff965ad665770d11e9af04 
>   
> ql/src/test/org/apache/hadoop/hive/ql/processors/TestCommandProcessorFactory.java
>  de43c2866f64e2ed5c74eab450de28f1a79248dc 
>   ql/src/test/queries/clientpositive/erasure_commands.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/erasure_simple.q PRE-CREATION 
>   ql/src/test/results/clientpositive/erasure_commands.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/erasurecoding/erasure_commands.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/erasurecoding/erasure_simple.q.out 
> PRE-CREATION 
>   shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java 
> ec06a88dc21346473bec6589c703167d50e3b367 
>   shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 
> b89081761780bf1f305d0196bb94bb0b54f7184f 
>   testutils/ptest2/conf/deployed/master-mr2.properties 
> 7edc307f85744d60d322ad8087164625677fc230 
> 
> 
> Diff: https://reviews.apache.org/r/67023/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrew Sherman
> 
>

Re: Review Request 67023: HIVE-18117: Add a new Test Driver "TestErasureCodingHDFSCliDriver" that can be used to run tests over hdfs directories that employ Erasure Coding.

2018-05-15 Thread Sahil Takiar



> On May 14, 2018, 4:57 p.m., Sahil Takiar wrote:
> > itests/src/test/resources/testconfiguration.properties
> > Line 1693 (original), 1693 (patched)
> > <https://reviews.apache.org/r/67023/diff/1-2/?file=2018286#file2018286line1693>
> >
> > why would we want the ec commands to work outside the 
> > `TestErasureCodingHDFSCliDriver`?
> 
> Andrew Sherman wrote:
> So you could run an ec test in TestCliDriver (sorry reviewboard lost my 
> ealrier reply)
> 
> Sahil Takiar wrote:
> why would you want to do that? whats the use case? shouldn't 
> `TestErasureCodingHDFSCliDriver` encapsulate all EC-related tests? plus I'm 
> not sure how you could run any EC commands against a local filesystem, 
> wouldn't they all be no-ops?
> 
> Andrew Sherman wrote:
> They would. When I'm developing an EC test I may sometimes want to run 
> the test in TestCLiDriver with erasure commands being no-ops as a way to 
> validate the script. Myabe this is weird, but I've been doing it.

I see, makes sense.


- Sahil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67023/#review203040
---


On May 14, 2018, 9:44 p.m., Andrew Sherman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67023/
> -------
> 
> (Updated May 14, 2018, 9:44 p.m.)
> 
> 
> Review request for hive and Sahil Takiar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> TestErasureCodingHDFSCliDriver uses a test-only CommandProcessor 
> "ErasureProcessor"
> which allows .q files to contain Erasure Coding commands similar to those 
> provided
> by the hdfs ec command
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html.
> The Erasure Coding functionality is exposed through a new shim 
> "HdfsFileErasureCodingPolicy".
> At this stage there are two .q files:
> erasure_commnds.q (a simple test to show ERASURE commands can run on local fs 
> via
> TestCliDriver or on hdfs via TestErasureCodingHDFSCliDriver), and
> erasure_simple.q (which does some trivial queries to demonstrate basic 
> functionality).
> More tests will come in future commits.
> 
> 
> Diffs
> -
> 
>   
> itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestErasureCodingHDFSCliDriver.java
>  PRE-CREATION 
>   itests/src/test/resources/testconfiguration.properties 
> cf6d19a5937c3f4a82e4ffe09201af8a79da2e3d 
>   
> itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliConfigs.java 
> 1814f0fa190e0ebcf327aebcdaf6f9967a5fd14f 
>   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 
> 16571b3ff3288dfb99fbca570452592cc1650f9a 
>   
> ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessorFactory.java 
> 3d47991b603c94c8da2106e67192c8513ef783a7 
>   ql/src/java/org/apache/hadoop/hive/ql/processors/ErasureProcessor.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/processors/HiveCommand.java 
> 56c7516ecfaf2421b0f3d3a188d05f38715b25b2 
>   ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 
> 89129f99fe63f0384aff965ad665770d11e9af04 
>   
> ql/src/test/org/apache/hadoop/hive/ql/processors/TestCommandProcessorFactory.java
>  de43c2866f64e2ed5c74eab450de28f1a79248dc 
>   ql/src/test/queries/clientpositive/erasure_commands.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/erasure_simple.q PRE-CREATION 
>   ql/src/test/results/clientpositive/erasure_commands.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/erasurecoding/erasure_commands.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/erasurecoding/erasure_simple.q.out 
> PRE-CREATION 
>   shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java 
> ec06a88dc21346473bec6589c703167d50e3b367 
>   shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 
> b89081761780bf1f305d0196bb94bb0b54f7184f 
>   testutils/ptest2/conf/deployed/master-mr2.properties 
> 7edc307f85744d60d322ad8087164625677fc230 
> 
> 
> Diff: https://reviews.apache.org/r/67023/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrew Sherman
> 
>

[jira] [Created] (HIVE-19562) Flaky test: TestMiniSparkOnYarn FileNotFoundException in spark-submit

2018-05-15 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19562:
---

 Summary: Flaky test: TestMiniSparkOnYarn FileNotFoundException in 
spark-submit
 Key: HIVE-19562
 URL: https://issues.apache.org/jira/browse/HIVE-19562
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Seeing sporadic failures during test setup. Specifically, when spark-submit 
runs this error (or a similar error) gets thrown:

{code}
2018-05-15T10:55:02,112  INFO 
[RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] 
client.SparkSubmitSparkClient: Exception in thread "main" 
java.io.FileNotFoundException: File 
file:/tmp/spark-56e217f7-b8a5-4c63-9a6b-d737a64f2820/__spark_libs__7371510645900072447.zip
 does not exist
2018-05-15T10:55:02,113  INFO 
[RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] 
client.SparkSubmitSparkClient:  at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
2018-05-15T10:55:02,113  INFO 
[RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] 
client.SparkSubmitSparkClient:  at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:867)
2018-05-15T10:55:02,113  INFO 
[RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] 
client.SparkSubmitSparkClient:  at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
2018-05-15T10:55:02,113  INFO 
[RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] 
client.SparkSubmitSparkClient:  at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
2018-05-15T10:55:02,113  INFO 
[RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] 
client.SparkSubmitSparkClient:  at 
org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:365)
2018-05-15T10:55:02,113  INFO 
[RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] 
client.SparkSubmitSparkClient:  at 
org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:316)
2018-05-15T10:55:02,113  INFO 
[RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] 
client.SparkSubmitSparkClient:  at 
org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:356)
2018-05-15T10:55:02,113  INFO 
[RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] 
client.SparkSubmitSparkClient:  at 
org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:478)
2018-05-15T10:55:02,113  INFO 
[RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] 
client.SparkSubmitSparkClient:  at 
org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:565)
2018-05-15T10:55:02,113  INFO 
[RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] 
client.SparkSubmitSparkClient:  at 
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:863)
2018-05-15T10:55:02,113  INFO 
[RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] 
client.SparkSubmitSparkClient:  at 
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:169)
2018-05-15T10:55:02,113  INFO 
[RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] 
client.SparkSubmitSparkClient:  at 
org.apache.spark.deploy.yarn.Client.run(Client.scala:1146)
2018-05-15T10:55:02,113  INFO 
[RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] 
client.SparkSubmitSparkClient:  at 
org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1518)
2018-05-15T10:55:02,113  INFO 
[RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] 
client.SparkSubmitSparkClient:  at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
2018-05-15T10:55:02,113  INFO 
[RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] 
client.SparkSubmitSparkClient:  at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
2018-05-15T10:55:02,113  INFO 
[RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] 
client.SparkSubmitSparkClient:  at 
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
2018-05-15T10:55:02,113  INFO 
[RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] 
client.SparkSubmitSparkClient:  at 
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
2018-05-15T10:55:02,113  INFO 
[RemoteDriver-stderr-redir-27d3dcfb-2a10-4118-9fae-c200d2e095a5 main] 
client.SparkSubmitSparkClient:  at 
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{code}

Essentially, Spark is writing some files for container localization to a tmp 
dir, and that tmp dir is getting deleted. We have seen a lot of issues with 
writing files to {{/tmp/}} in th

[jira] [Created] (HIVE-19559) SparkClientImpl shouldn't name redirector thread "RemoteDriver"

2018-05-15 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19559:
---

 Summary: SparkClientImpl shouldn't name redirector thread 
"RemoteDriver"
 Key: HIVE-19559
 URL: https://issues.apache.org/jira/browse/HIVE-19559
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar


The thread responsible for re-directing the stdout / stderr of the spark-submit 
process are named {{RemoteDriver...}} which is misleading because these threads 
are not re-directing output from the {{RemoteDriver}}, just from the 
spark-submit stdout / stderr.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [VOTE] Stricter commit guidelines

2018-05-15 Thread Sahil Takiar

+1

On Tue, May 15, 2018 at 10:56 AM, Owen O'Malley <owen.omal...@gmail.com>
wrote:

> +1
>
> On Tue, May 15, 2018 at 8:55 AM, Peter Vary <pv...@cloudera.com> wrote:
>
> > +1 - Hoping for something like this for a long while! Thanks for taking
> > this up all!
> >
> > > On May 15, 2018, at 5:44 PM, Jesus Camacho Rodriguez <
> > jcama...@apache.org> wrote:
> > >
> > > Forgot to mention the length of the vote in original message.
> > >
> > > Let's leave the vote open for a shorter period than usual, for instance
> > 48 hours, i.e., till Wednesday 10pm PST. Situation can only get worse
> than
> > it is now if we do not take action for a longer period.
> > >
> > > As Alan suggested, vote passes if there is a lazy majority (at least 3
> > votes, more +1s than -1s).
> > >
> > > Thanks,
> > > Jesús
> > >
> > >
> > > On 5/15/18, 8:37 AM, "Andrew Sherman" <asher...@cloudera.com> wrote:
> > >
> > >+1
> > >
> > >On Tue, May 15, 2018 at 2:34 AM Rui Li <lirui.fu...@gmail.com>
> wrote:
> > >
> > >> +1
> > >>
> > >> On Tue, May 15, 2018 at 2:24 PM, Prasanth Jayachandran <
> > >> pjayachand...@hortonworks.com> wrote:
> > >>
> > >>> +1
> > >>>
> > >>>
> > >>>
> > >>> Thanks
> > >>> Prasanth
> > >>>
> > >>>
> > >>>
> > >>> On Mon, May 14, 2018 at 10:44 PM -0700, "Jesus Camacho Rodriguez" <
> > >>> jcama...@apache.org<mailto:jcama...@apache.org>> wrote:
> > >>>
> > >>>
> > >>> After work has been done to ignore most of the tests that were
> failing
> > >>> consistently/intermittently [1], I wanted to start this vote to
> gather
> > >>> support from the community to be stricter wrt committing patches to
> > Hive.
> > >>> The committers guide [2] already specifies that a +1 should be
> obtained
> > >>> before committing, but there is another clause that allows committing
> > >> under
> > >>> the presence of flaky tests (clause 4). Flaky tests are as good as
> > having
> > >>> no tests, hence I propose to remove clause 4 and enforce the +1 from
> > >>> testing infra before committing.
> > >>>
> > >>>
> > >>>
> > >>> As I see it, by enforcing that we always get a +1 from the testing
> > infra
> > >>> before committing, 1) we will have a more stable project, and 2) we
> > will
> > >>> have another incentive as a community to create a more robust testing
> > >>> infra, e.g., replacing flaky tests for similar unit tests that are
> not
> > >>> flaky, trying to decrease running time for tests, etc.
> > >>>
> > >>>
> > >>>
> > >>> Please, share your thoughts about this.
> > >>>
> > >>>
> > >>>
> > >>> Here is my +1.
> > >>>
> > >>>
> > >>>
> > >>> Thanks,
> > >>>
> > >>> Jes?s
> > >>>
> > >>>
> > >>>
> > >>> [1] http://mail-archives.apache.org/mod_mbox/hive-dev/201805.
> > >>> mbox/%3C63023673-AEE5-41A9-BA52-5A5DFB2078B6%40apache.org%3E
> > >>>
> > >>> [2] https://cwiki.apache.org/confluence/display/Hive/
> > >>> HowToCommit#HowToCommit-PreCommitruns,andcommittingpatches
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>
> > >>
> > >> --
> > >> Best regards!
> > >> Rui Li
> > >>
> > >
> > >
> > >
> >
> >
>



-- 
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309

Re: Review Request 67023: HIVE-18117: Add a new Test Driver "TestErasureCodingHDFSCliDriver" that can be used to run tests over hdfs directories that employ Erasure Coding.

2018-05-15 Thread Sahil Takiar



> On May 10, 2018, noon, Sahil Takiar wrote:
> > itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliConfigs.java
> > Lines 674 (patched)
> > <https://reviews.apache.org/r/67023/diff/1/?file=2018287#file2018287line674>
> >
> > is this necessary since you set the cluster type to mr above?
> 
> Andrew Sherman wrote:
> Ha good question. Yes it is necessary as setClusterType() does not always 
> set the cluster type :-( - it allows the cluster type to overridden with 
> -Dclustermode=xxx
> 
> Sahil Takiar wrote:
> interesting, should we handle other cluster types like Spark or MR too?
> 
> Andrew Sherman wrote:
> looks like it does already

then is the tez specific code necessary?


- Sahil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67023/#review202831
---


On May 14, 2018, 9:44 p.m., Andrew Sherman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67023/
> ---
> 
> (Updated May 14, 2018, 9:44 p.m.)
> 
> 
> Review request for hive and Sahil Takiar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> TestErasureCodingHDFSCliDriver uses a test-only CommandProcessor 
> "ErasureProcessor"
> which allows .q files to contain Erasure Coding commands similar to those 
> provided
> by the hdfs ec command
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html.
> The Erasure Coding functionality is exposed through a new shim 
> "HdfsFileErasureCodingPolicy".
> At this stage there are two .q files:
> erasure_commnds.q (a simple test to show ERASURE commands can run on local fs 
> via
> TestCliDriver or on hdfs via TestErasureCodingHDFSCliDriver), and
> erasure_simple.q (which does some trivial queries to demonstrate basic 
> functionality).
> More tests will come in future commits.
> 
> 
> Diffs
> -
> 
>   
> itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestErasureCodingHDFSCliDriver.java
>  PRE-CREATION 
>   itests/src/test/resources/testconfiguration.properties 
> cf6d19a5937c3f4a82e4ffe09201af8a79da2e3d 
>   
> itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliConfigs.java 
> 1814f0fa190e0ebcf327aebcdaf6f9967a5fd14f 
>   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 
> 16571b3ff3288dfb99fbca570452592cc1650f9a 
>   
> ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessorFactory.java 
> 3d47991b603c94c8da2106e67192c8513ef783a7 
>   ql/src/java/org/apache/hadoop/hive/ql/processors/ErasureProcessor.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/processors/HiveCommand.java 
> 56c7516ecfaf2421b0f3d3a188d05f38715b25b2 
>   ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 
> 89129f99fe63f0384aff965ad665770d11e9af04 
>   
> ql/src/test/org/apache/hadoop/hive/ql/processors/TestCommandProcessorFactory.java
>  de43c2866f64e2ed5c74eab450de28f1a79248dc 
>   ql/src/test/queries/clientpositive/erasure_commands.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/erasure_simple.q PRE-CREATION 
>   ql/src/test/results/clientpositive/erasure_commands.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/erasurecoding/erasure_commands.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/erasurecoding/erasure_simple.q.out 
> PRE-CREATION 
>   shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java 
> ec06a88dc21346473bec6589c703167d50e3b367 
>   shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 
> b89081761780bf1f305d0196bb94bb0b54f7184f 
>   testutils/ptest2/conf/deployed/master-mr2.properties 
> 7edc307f85744d60d322ad8087164625677fc230 
> 
> 
> Diff: https://reviews.apache.org/r/67023/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrew Sherman
> 
>

Re: Review Request 67023: HIVE-18117: Add a new Test Driver "TestErasureCodingHDFSCliDriver" that can be used to run tests over hdfs directories that employ Erasure Coding.

2018-05-15 Thread Sahil Takiar



> On May 14, 2018, 4:57 p.m., Sahil Takiar wrote:
> > itests/src/test/resources/testconfiguration.properties
> > Line 1693 (original), 1693 (patched)
> > <https://reviews.apache.org/r/67023/diff/1-2/?file=2018286#file2018286line1693>
> >
> > why would we want the ec commands to work outside the 
> > `TestErasureCodingHDFSCliDriver`?
> 
> Andrew Sherman wrote:
> So you could run an ec test in TestCliDriver (sorry reviewboard lost my 
> ealrier reply)

why would you want to do that? whats the use case? shouldn't 
`TestErasureCodingHDFSCliDriver` encapsulate all EC-related tests? plus I'm not 
sure how you could run any EC commands against a local filesystem, wouldn't 
they all be no-ops?


- Sahil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67023/#review203040
---


On May 14, 2018, 9:44 p.m., Andrew Sherman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67023/
> ---
> 
> (Updated May 14, 2018, 9:44 p.m.)
> 
> 
> Review request for hive and Sahil Takiar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> TestErasureCodingHDFSCliDriver uses a test-only CommandProcessor 
> "ErasureProcessor"
> which allows .q files to contain Erasure Coding commands similar to those 
> provided
> by the hdfs ec command
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html.
> The Erasure Coding functionality is exposed through a new shim 
> "HdfsFileErasureCodingPolicy".
> At this stage there are two .q files:
> erasure_commnds.q (a simple test to show ERASURE commands can run on local fs 
> via
> TestCliDriver or on hdfs via TestErasureCodingHDFSCliDriver), and
> erasure_simple.q (which does some trivial queries to demonstrate basic 
> functionality).
> More tests will come in future commits.
> 
> 
> Diffs
> -
> 
>   
> itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestErasureCodingHDFSCliDriver.java
>  PRE-CREATION 
>   itests/src/test/resources/testconfiguration.properties 
> cf6d19a5937c3f4a82e4ffe09201af8a79da2e3d 
>   
> itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliConfigs.java 
> 1814f0fa190e0ebcf327aebcdaf6f9967a5fd14f 
>   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 
> 16571b3ff3288dfb99fbca570452592cc1650f9a 
>   
> ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessorFactory.java 
> 3d47991b603c94c8da2106e67192c8513ef783a7 
>   ql/src/java/org/apache/hadoop/hive/ql/processors/ErasureProcessor.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/processors/HiveCommand.java 
> 56c7516ecfaf2421b0f3d3a188d05f38715b25b2 
>   ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 
> 89129f99fe63f0384aff965ad665770d11e9af04 
>   
> ql/src/test/org/apache/hadoop/hive/ql/processors/TestCommandProcessorFactory.java
>  de43c2866f64e2ed5c74eab450de28f1a79248dc 
>   ql/src/test/queries/clientpositive/erasure_commands.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/erasure_simple.q PRE-CREATION 
>   ql/src/test/results/clientpositive/erasure_commands.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/erasurecoding/erasure_commands.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/erasurecoding/erasure_simple.q.out 
> PRE-CREATION 
>   shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java 
> ec06a88dc21346473bec6589c703167d50e3b367 
>   shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 
> b89081761780bf1f305d0196bb94bb0b54f7184f 
>   testutils/ptest2/conf/deployed/master-mr2.properties 
> 7edc307f85744d60d322ad8087164625677fc230 
> 
> 
> Diff: https://reviews.apache.org/r/67023/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrew Sherman
> 
>

Re: Review Request 67023: HIVE-18117: Add a new Test Driver "TestErasureCodingHDFSCliDriver" that can be used to run tests over hdfs directories that employ Erasure Coding.

2018-05-14 Thread Sahil Takiar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67023/#review203040
---




itests/src/test/resources/testconfiguration.properties
Line 1693 (original), 1693 (patched)
<https://reviews.apache.org/r/67023/#comment285103>

why would we want the ec commands to work outside the 
`TestErasureCodingHDFSCliDriver`?



ql/src/test/queries/clientpositive/erasure_commands.q
Lines 2 (patched)
<https://reviews.apache.org/r/67023/#comment285102>

why would we want to run this on the local fs?


- Sahil Takiar


On May 11, 2018, 11:38 p.m., Andrew Sherman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67023/
> ---
> 
> (Updated May 11, 2018, 11:38 p.m.)
> 
> 
> Review request for hive and Sahil Takiar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> TestErasureCodingHDFSCliDriver uses a test-only CommandProcessor 
> "ErasureProcessor"
> which allows .q files to contain Erasure Coding commands similar to those 
> provided
> by the hdfs ec command
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html.
> The Erasure Coding functionality is exposed through a new shim 
> "HdfsFileErasureCodingPolicy".
> At this stage there are two .q files:
> erasure_commnds.q (a simple test to show ERASURE commands can run on local fs 
> via
> TestCliDriver or on hdfs via TestErasureCodingHDFSCliDriver), and
> erasure_simple.q (which does some trivial queries to demonstrate basic 
> functionality).
> More tests will come in future commits.
> 
> 
> Diffs
> -
> 
>   
> itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestErasureCodingHDFSCliDriver.java
>  PRE-CREATION 
>   itests/src/test/resources/testconfiguration.properties 
> cf6d19a5937c3f4a82e4ffe09201af8a79da2e3d 
>   
> itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliConfigs.java 
> 6628336807b06cab49063673be0d8e9c5b5a7101 
>   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 
> 750fc69c5f210ca8f7bfe81b82ee9e3001fc07ba 
>   
> ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessorFactory.java 
> 3d47991b603c94c8da2106e67192c8513ef783a7 
>   ql/src/java/org/apache/hadoop/hive/ql/processors/ErasureProcessor.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/processors/HiveCommand.java 
> 56c7516ecfaf2421b0f3d3a188d05f38715b25b2 
>   ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 
> 9f65a771f95a7c0bd3fdb4e56e47c0fc70235850 
>   
> ql/src/test/org/apache/hadoop/hive/ql/processors/TestCommandProcessorFactory.java
>  de43c2866f64e2ed5c74eab450de28f1a79248dc 
>   ql/src/test/queries/clientpositive/erasure_commands.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/erasure_simple.q PRE-CREATION 
>   ql/src/test/results/clientpositive/erasure_commands.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/erasurecoding/erasure_commands.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/erasurecoding/erasure_simple.q.out 
> PRE-CREATION 
>   shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java 
> ec06a88dc21346473bec6589c703167d50e3b367 
>   shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 
> b89081761780bf1f305d0196bb94bb0b54f7184f 
>   testutils/ptest2/conf/deployed/master-mr2.properties 
> 7edc307f85744d60d322ad8087164625677fc230 
> 
> 
> Diff: https://reviews.apache.org/r/67023/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrew Sherman
> 
>

Re: Review Request 67023: HIVE-18117: Add a new Test Driver "TestErasureCodingHDFSCliDriver" that can be used to run tests over hdfs directories that employ Erasure Coding.

2018-05-14 Thread Sahil Takiar



> On May 10, 2018, noon, Sahil Takiar wrote:
> > itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliConfigs.java
> > Lines 674 (patched)
> > <https://reviews.apache.org/r/67023/diff/1/?file=2018287#file2018287line674>
> >
> > is this necessary since you set the cluster type to mr above?
> 
> Andrew Sherman wrote:
> Ha good question. Yes it is necessary as setClusterType() does not always 
> set the cluster type :-( - it allows the cluster type to overridden with 
> -Dclustermode=xxx

interesting, should we handle other cluster types like Spark or MR too?


> On May 10, 2018, noon, Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/processors/ErasureProcessor.java
> > Lines 89 (patched)
> > <https://reviews.apache.org/r/67023/diff/1/?file=2018290#file2018290line89>
> >
> > an `echo` command seems like a useful feature for all q-tests, does it 
> > need to be EC specific?
> 
> Andrew Sherman wrote:
> I put this in as I want it. I did think about putting it somewhere else 
> but there is no generic test-only CommandProcessor. Do you see an obvious 
> place to put it?

if its not easy to add for others, then just leave it


> On May 10, 2018, noon, Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
> > Lines 545 (patched)
> > <https://reviews.apache.org/r/67023/diff/1/?file=2018292#file2018292line545>
> >
> > whats the cache for? can q-tests even specify custom URIs? whats the 
> > use case for support multiple fs URIs?
> 
> Andrew Sherman wrote:
> OK I admit I copied this code from the way hdfsEncryptionShims works 
> without fully understanding it.

can we delete it then? i didn't realize this would require modifying code 
outside of itests, so I think we should make any changes to core Hive as 
minimal as possible


> On May 10, 2018, noon, Sahil Takiar wrote:
> > shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java
> > Lines 699 (patched)
> > <https://reviews.apache.org/r/67023/diff/1/?file=2018300#file2018300line699>
> >
> > whats the use case for this dummy class? so we can run ec tests on 
> > hadoop versions that don't support ec? wouldn't be just disable the 
> > clidriver entirely for versions that don't support ec?
> 
> Andrew Sherman wrote:
> I'm imagining wanting to run a test on bith EC and non-EC

i thought the `NoopHdfsErasureCodingShim` was used when "the hadoop version 
does not support hdfs Erasure Coding". u can still run on EC and non-EC folders 
without this, right?


> On May 10, 2018, noon, Sahil Takiar wrote:
> > shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java
> > Lines 741 (patched)
> > <https://reviews.apache.org/r/67023/diff/1/?file=2018300#file2018300line741>
> >
> > since we have a cache anyway, would it make more sense to just remove 
> > this and make it a loading cache?
> 
> Andrew Sherman wrote:
> I don't know, maybe. There is some advantage in keeping the same 
> structure as the HdfsEncryptionShim.

see comment above about removing the cache


> On May 10, 2018, noon, Sahil Takiar wrote:
> > testutils/ptest2/conf/deployed/master-mr2.properties
> > Line 75 (original), 75 (patched)
> > <https://reviews.apache.org/r/67023/diff/1/?file=2018301#file2018301line75>
> >
> > have you manually deployed these changes to the ptest server? this file 
> > is just a copy of whats already been deployed, so its just for reference
> > 
> > also, why skip batching?
> 
> Andrew Sherman wrote:
> OK I have no idea what ut.itests.qtest.skipBatching means I just copied 
> TestEncryptedHDFSCliDriver :-(
> 
> As for deploying these changes, I don't know what that means, my new 
> tests did appear to run. can you explain more?

anytime you add a new cli driver, you have to manually modify a file on the 
ptest master server, you have to modify the file 
`/usr/local/hiveptest/profiles/master-mr2.properties` you probably don't have 
permissions though, so let me know the final diff for the this file and I can 
deploy it.

can we check if batching can be used for this? i think batching means that 
q-tests get bundled together into a "batch" of q-tests that are run in a single 
`mvn test` command


- Sahil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67023/#review202831
-------


On May 11, 2018, 11:38 p.m., Andrew Sherman wrote:
> 
>

Re: Review Request 66290: HIVE-14388 : Add number of rows inserted message after insert command in Beeline

2018-05-14 Thread Sahil Takiar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66290/#review203035
---



I think you are safe to use the Java 8 specific method rather than copying it.

- Sahil Takiar


On May 12, 2018, 12:30 a.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66290/
> ---
> 
> (Updated May 12, 2018, 12:30 a.m.)
> 
> 
> Review request for hive, Sahil Takiar and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-14388
> https://issues.apache.org/jira/browse/HIVE-14388
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently, when you run insert command on beeline, it returns a message 
> saying "No rows affected .."
> A better and more intuitive msg would be "xxx rows inserted (26.068 seconds)"
> 
> Added the numRows parameter as part of QueryState.
> Adding the numRows to the response as well to display in beeline.
> 
> Getting the count in FileSinkOperator and setting it in statsMap, when it 
> operates only on table specific rows for the particular operation. (so that 
> we can get only the insert to table count and avoid counting non-table 
> specific file-sink operations happening during query execution).
> 
> 
> Diffs
> -
> 
>   beeline/src/main/resources/BeeLine.properties 
> c41b3ed637e04d8d2d9800ad5e9284264f7e4055 
>   itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 
> b217259553be472863cd33bb2259aa700e6c3528 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 
> 06542cee02e5dc4696f2621bb45cc4f24c67dfda 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
> 52799b30c39af2f192c4ae22ce7d68b403014183 
>   ql/src/java/org/apache/hadoop/hive/ql/MapRedStats.java 
> cf9c2273159c0d779ea90ad029613678fb0967a6 
>   ql/src/java/org/apache/hadoop/hive/ql/QueryState.java 
> 706c9ffa48b9c3b4a6fdaae78bab1d39c3d0efda 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 
> 01a5b4c9c328cb034a613a1539cea2584e122fb4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java 
> fcdc9967f12a454a9d3f31031e2261f264479118 
>   ql/src/test/results/clientpositive/llap/dp_counter_mm.q.out 
> 18f4c69a191bde3cae2d5efac5ef20fd0b1a9f0c 
>   ql/src/test/results/clientpositive/llap/dp_counter_non_mm.q.out 
> 28f376f8c4c2151383286e754447d1349050ef4e 
>   ql/src/test/results/clientpositive/llap/orc_ppd_basic.q.out 
> 96819f4e1c446f6de423f99c7697d548ff5dbe06 
>   ql/src/test/results/clientpositive/llap/tez_input_counters.q.out 
> d2fcdaa1bfba03e1f0e4191c8d056b05f334443d 
>   service-rpc/if/TCLIService.thrift 30f8af7f3e6e0598b410498782900ac27971aef0 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h 
> 4321ad6d3c966d30f7a69552f91804cf2f1ba6c4 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp 
> b2b62c71492b844f4439367364c5c81aa62f3908 
>   
> service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TGetOperationStatusResp.java
>  15e8220eb3eb12b72c7b64029410dced33bc0d72 
>   service-rpc/src/gen/thrift/gen-php/Types.php 
> abb7c1ff3a2c8b72dc97689758266b675880e32b 
>   service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py 
> 0f8fd0745be0f4ed9e96b7bbe0f092d03649bcdf 
>   service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 
> 60183dae9e9927bd09a9676e49eeb4aea2401737 
>   service/src/java/org/apache/hive/service/cli/CLIService.java 
> c9914ba9bf8653cbcbca7d6612e98a64058c0fcc 
>   service/src/java/org/apache/hive/service/cli/OperationStatus.java 
> 52cc3ae4f26b990b3e4edb52d9de85b3cc25f269 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 3706c72abc77ac8bd77947cc1c5d084ddf965e9f 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> c64c99120ad21ee98af81ec6659a2722e3e1d1c7 
> 
> 
> Diff: https://reviews.apache.org/r/66290/diff/9/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>

[jira] [Created] (HIVE-19525) Spark task logs print PLAN PATH excessive number of times

2018-05-14 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19525:
---

 Summary: Spark task logs print PLAN PATH excessive number of times
 Key: HIVE-19525
 URL: https://issues.apache.org/jira/browse/HIVE-19525
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar


A ton of logs with this {{Utilities - PLAN PATH = 
hdfs://localhost:59527/.../apache-hive/itests/qtest-spark/target/tmp/scratchdir/stakiar/6ebceb49-7a76-4159-9082-5bba44391e30/hive_2018-05-14_07-28-44_672_8205774950452575544-1/-mr-10006/bf14c0b5-a014-4ee8-8ddf-fdb7453eb0f0/map.xml}}

Seems it print multiple times per task exception, not sure where it is coming 
from, but its too verbose. It should be changed to DEBUG level. Furthermore, 
given that we are using {{Utilities#getBaseWork}} anytime we need to access a 
{{MapWork}} or {{ReduceWork}} object, we should make the method slightly more 
efficient. Right now it borrows a {{Kryo}} from a pool and does a bunch of 
stuff to set the classloader, then it checks the cache to see if the work 
object has already been created. It should check the cache before doing any of 
that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: May 2018 Hive User Group Meeting

2018-05-14 Thread Sahil Takiar

Hello,

Yes, the meetup was recorded. We are in the process of getting it uploaded
to Youtube. Once its publicly available I will send out the link on this
email thread.

Thanks

--Sahil

On Mon, May 14, 2018 at 6:04 AM, <roberto.tar...@stratebi.com> wrote:

> Hi,
>
>
>
> If you have recorded the meeting share link please. I could not follow it
> online for the schedule (I live in Spain).
>
>
>
> Kind Regards,
>
>
>
>
>
> *From:* Luis Figueroa [mailto:lef...@outlook.com]
> *Sent:* miércoles, 9 de mayo de 2018 18:01
> *To:* u...@hive.apache.org
> *Cc:* dev@hive.apache.org
> *Subject:* Re: May 2018 Hive User Group Meeting
>
>
>
> Hey everyone,
>
>
>
> Was the meeting recorded by any chance?
>
> Luis
>
>
> On May 8, 2018, at 5:31 PM, Sahil Takiar <takiar.sa...@gmail.com> wrote:
>
> Hey Everyone,
>
>
>
> Almost time for the meetup! The live stream can be viewed on this link:
> https://live.lifesizecloud.com/extension/2000992219?
> token=067078ac-a8df-45bc-b84c-4b371ecbc719==en&
> meeting=Hive%20User%20Group%20Meetup
>
> The stream won't be live until the meetup starts.
>
> For those attending in person, there will be guest wifi:
>
> Login: HiveMeetup
> Password: ClouderaHive
>
>
>
> On Mon, May 7, 2018 at 12:48 PM, Sahil Takiar <takiar.sa...@gmail.com>
> wrote:
>
> Hey Everyone,
>
>
>
> The meetup is only a day away! Here
> <https://docs.google.com/document/d/1v8iERias-LOq8-q4BrCNSUsOarCwOAYRwXYIF0DK5AU/edit?usp=sharing>
> is a link to all the abstracts we have compiled thus far. Several of you
> have asked about event streaming and recordings. The meetup will be both
> streamed live and recorded. We will post the links on this thread and on
> the meetup link tomorrow closer to the start of the meetup.
>
>
>
> The meetup will be at Cloudera HQ - 395 Page Mill Rd
> <https://maps.google.com/?q=395+Page+Mill+Rd=gmail=g>. If
> you have any trouble getting into the building, feel free to post on the
> meetup link.
>
>
>
> Meetup Link: https://www.meetup.com/Hive-User-Group-Meeting/
> events/249641278/
>
>
>
> On Wed, May 2, 2018 at 7:48 AM, Sahil Takiar <takiar.sa...@gmail.com>
> wrote:
>
> Hey Everyone,
>
>
>
> The agenda for the meetup has been set and I'm excited to say we have lots
> of interesting talks scheduled! Below is final agenda, the full list of
> abstracts will be sent out soon. If you are planning to attend, please RSVP
> on the meetup link so we can get an accurate headcount of attendees (
> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/).
>
>
> 6:30 - 7:00 PM Networking and Refreshments
>
> 7:00PM - 8:20 PM Lightning Talks (10 min each) - 8 talks total
>
> · What's new in Hive 3.0.0 - Ashutosh Chauhan
>
> · Hive-on-Spark at Uber: Efficiency & Scale - Xuefu Zhang
>
> · Hive-on-S3 Performance: Past, Present, and Future - Sahil Takiar
>
> · Dali: Data Access Layer at LinkedIn - Adwait Tumbde
>
> · Parquet Vectorization in Hive - Vihang Karajgaonkar
>
> · ORC Column Level Encryption - Owen O’Malley
>
> · Running Hive at Scale @ Lyft - Sharanya Santhanam, Rohit Menon
>
> · Materialized Views in Hive - Jesus Camacho Rodriguez
>
> 8:30 PM - 9:00 PM Hive Metastore Panel
>
> · Moderator: Vihang Karajgaonkar
>
> · Participants:
>
> oDaniel Dai - Hive Metastore Caching
>
> oAlan Gates - Hive Metastore Separation
>
> oRituparna Agrawal - Customer Use Cases & Pain Points of (Big)
> Metadata
>
> The Metastore panel will consist of a short presentation by each panelist
> followed by a Q session driven by the moderator.
>
>
>
> On Tue, Apr 24, 2018 at 2:53 PM, Sahil Takiar <takiar.sa...@gmail.com>
> wrote:
>
> We still have a few slots open for lightening talks, so if anyone is
> interested in giving a presentation don't hesitate to reach out!
>
>
>
> If you are planning to attend the meetup, please RSVP on the Meetup link (
> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/) so that
> we can get an accurate headcount for food.
>
>
>
> Thanks!
>
>
>
> --Sahil
>
>
>
> On Wed, Apr 11, 2018 at 5:08 PM, Sahil Takiar <takiar.sa...@gmail.com>
> wrote:
>
> Hi all,
>
> I'm happy to announce that the Hive community is organizing a Hive user
> group meeting in the Bay Area next month. The details can be found at
> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/
>
>
> The format of this meetup will be slightly different from previou

[jira] [Created] (HIVE-19515) TestRpc.testServerPort is consistently failing

2018-05-13 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19515:
---

 Summary: TestRpc.testServerPort is consistently failing
 Key: HIVE-19515
 URL: https://issues.apache.org/jira/browse/HIVE-19515
 Project: Hive
  Issue Type: Test
  Components: Spark
Reporter: Sahil Takiar
Assignee: Sahil Takiar


{{TestRpc.testServerPort}} is consistently failing due to HIVE-17838



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19508) SparkJobMonitor getReport doesn't print stage progress in order

2018-05-11 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19508:
---

 Summary: SparkJobMonitor getReport doesn't print stage progress in 
order
 Key: HIVE-19508
 URL: https://issues.apache.org/jira/browse/HIVE-19508
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar


You can end up with a progress output like this:

{code}
Stage-10_0: 0/29Stage-11_0: 0/44Stage-12_0: 0/11
Stage-13_0: 0/1 Stage-8_0: 258(+76)/468 Stage-9_0: 0/165
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 67073: HIVE-19370 : Retain time part in add_months function on timestamp datatype fields in hive

2018-05-11 Thread Sahil Takiar



> On May 11, 2018, 3:22 p.m., Sahil Takiar wrote:
> > common/src/java/org/apache/hive/common/util/DateUtils.java
> > Lines 39 (patched)
> > <https://reviews.apache.org/r/67073/diff/1/?file=2019551#file2019551line39>
> >
> > why does this need to be `ThreadLocal`? there doesn't seem to be 
> > anything thread specific to this object
> 
> Bharathkrishna Guruvayoor Murali wrote:
> The class DateUtils is mentioned to be thread-safe class (in javadoc 
> comment). and the existing date format is created as a threadlocal.
> So I thought this one too should be a threadlocal to keep it thread-safe.

make sense


- Sahil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67073/#review202942
---


On May 10, 2018, 9:55 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67073/
> ---
> 
> (Updated May 10, 2018, 9:55 p.m.)
> 
> 
> Review request for hive, Peter Vary, Sahil Takiar, and Vihang Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Adding support to retain the time part (HH:mm:ss) for add_months UDF when the 
> input is given as timestamp format.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hive/common/util/DateUtils.java 
> 65f3b9401916abdfa52fbf75d115ba6b61758fb0 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFAddMonths.java 
> dae4b97b4a17e98122431e5fda655fd9f873fdb5 
>   
> ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFAddMonths.java
>  af9b6c43c7dafc69c4944eab02894786af306f35 
> 
> 
> Diff: https://reviews.apache.org/r/67073/diff/1/
> 
> 
> Testing
> ---
> 
> Added unit tests.
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>

Re: Review Request 67073: HIVE-19370 : Retain time part in add_months function on timestamp datatype fields in hive

2018-05-11 Thread Sahil Takiar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67073/#review202942
---




common/src/java/org/apache/hive/common/util/DateUtils.java
Lines 39 (patched)
<https://reviews.apache.org/r/67073/#comment285001>

why does this need to be `ThreadLocal`? there doesn't seem to be anything 
thread specific to this object



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFAddMonths.java
Lines 49-50 (original), 51-52 (patched)
<https://reviews.apache.org/r/67073/#comment285002>

Perhaps we should change the docs too.


- Sahil Takiar


On May 10, 2018, 9:55 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67073/
> ---
> 
> (Updated May 10, 2018, 9:55 p.m.)
> 
> 
> Review request for hive, Peter Vary, Sahil Takiar, and Vihang Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Adding support to retain the time part (HH:mm:ss) for add_months UDF when the 
> input is given as timestamp format.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hive/common/util/DateUtils.java 
> 65f3b9401916abdfa52fbf75d115ba6b61758fb0 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFAddMonths.java 
> dae4b97b4a17e98122431e5fda655fd9f873fdb5 
>   
> ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFAddMonths.java
>  af9b6c43c7dafc69c4944eab02894786af306f35 
> 
> 
> Diff: https://reviews.apache.org/r/67073/diff/1/
> 
> 
> Testing
> ---
> 
> Added unit tests.
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>

Re: Review Request 66290: HIVE-14388 : Add number of rows inserted message after insert command in Beeline

2018-05-10 Thread Sahil Takiar



> On May 7, 2018, 6:07 p.m., Bharathkrishna Guruvayoor Murali wrote:
> > Added new version of patch.
> > Adding the result as "Unknown rows affected" for return value -1 from 
> > beeline.
> > Fixing test failures, and modifying tests to accommodate the change.
> > Further changes in this version are:
> >   - Using the waitForOperationToComplete method itself in 
> > HiveStatement#getUpdateCount, because in executeAsync mode it fails 
> > otherwise.
> >   - I converted the while loop to do-while in 
> > HiveStatement#waitForOperationToComplete, because otherwise some cases the 
> > response is never initialized.

why are the changes to the while loop in `HiveStatement` necessary?


- Sahil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66290/#review202561
---


On May 9, 2018, 8:46 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66290/
> -------
> 
> (Updated May 9, 2018, 8:46 p.m.)
> 
> 
> Review request for hive, Sahil Takiar and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-14388
> https://issues.apache.org/jira/browse/HIVE-14388
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently, when you run insert command on beeline, it returns a message 
> saying "No rows affected .."
> A better and more intuitive msg would be "xxx rows inserted (26.068 seconds)"
> 
> Added the numRows parameter as part of QueryState.
> Adding the numRows to the response as well to display in beeline.
> 
> Getting the count in FileSinkOperator and setting it in statsMap, when it 
> operates only on table specific rows for the particular operation. (so that 
> we can get only the insert to table count and avoid counting non-table 
> specific file-sink operations happening during query execution).
> 
> 
> Diffs
> -
> 
>   beeline/src/main/resources/BeeLine.properties 
> c41b3ed637e04d8d2d9800ad5e9284264f7e4055 
>   itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 
> b217259553be472863cd33bb2259aa700e6c3528 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 
> 06542cee02e5dc4696f2621bb45cc4f24c67dfda 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
> 52799b30c39af2f192c4ae22ce7d68b403014183 
>   ql/src/java/org/apache/hadoop/hive/ql/MapRedStats.java 
> cf9c2273159c0d779ea90ad029613678fb0967a6 
>   ql/src/java/org/apache/hadoop/hive/ql/QueryState.java 
> 706c9ffa48b9c3b4a6fdaae78bab1d39c3d0efda 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 
> 01a5b4c9c328cb034a613a1539cea2584e122fb4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java 
> fcdc9967f12a454a9d3f31031e2261f264479118 
>   ql/src/test/results/clientpositive/llap/dp_counter_mm.q.out 
> 18f4c69a191bde3cae2d5efac5ef20fd0b1a9f0c 
>   ql/src/test/results/clientpositive/llap/dp_counter_non_mm.q.out 
> 28f376f8c4c2151383286e754447d1349050ef4e 
>   ql/src/test/results/clientpositive/llap/orc_ppd_basic.q.out 
> 96819f4e1c446f6de423f99c7697d548ff5dbe06 
>   ql/src/test/results/clientpositive/llap/tez_input_counters.q.out 
> d2fcdaa1bfba03e1f0e4191c8d056b05f334443d 
>   service-rpc/if/TCLIService.thrift 30f8af7f3e6e0598b410498782900ac27971aef0 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h 
> 4321ad6d3c966d30f7a69552f91804cf2f1ba6c4 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp 
> b2b62c71492b844f4439367364c5c81aa62f3908 
>   
> service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TGetOperationStatusResp.java
>  15e8220eb3eb12b72c7b64029410dced33bc0d72 
>   service-rpc/src/gen/thrift/gen-php/Types.php 
> abb7c1ff3a2c8b72dc97689758266b675880e32b 
>   service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py 
> 0f8fd0745be0f4ed9e96b7bbe0f092d03649bcdf 
>   service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 
> 60183dae9e9927bd09a9676e49eeb4aea2401737 
>   service/src/java/org/apache/hive/service/cli/CLIService.java 
> c9914ba9bf8653cbcbca7d6612e98a64058c0fcc 
>   service/src/java/org/apache/hive/service/cli/OperationStatus.java 
> 52cc3ae4f26b990b3e4edb52d9de85b3cc25f269 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 3706c72abc77ac8bd77947cc1c5d084ddf965e9f 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> c64c99120ad21ee98af81ec6659a2722e3e1d1c7 
> 
> 
> Diff: https://reviews.apache.org/r/66290/diff/7/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>

Re: Review Request 66290: HIVE-14388 : Add number of rows inserted message after insert command in Beeline

2018-05-10 Thread Sahil Takiar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66290/#review202835
---




ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
Lines 134 (patched)
<https://reviews.apache.org/r/66290/#comment284856>

remove `transient`, `static` fields can't be serialized anyway



service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java
Lines 23 (patched)
<https://reviews.apache.org/r/66290/#comment284857>

don't think these imports need to be re-ordered?


- Sahil Takiar


On May 9, 2018, 8:46 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66290/
> ---
> 
> (Updated May 9, 2018, 8:46 p.m.)
> 
> 
> Review request for hive, Sahil Takiar and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-14388
> https://issues.apache.org/jira/browse/HIVE-14388
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently, when you run insert command on beeline, it returns a message 
> saying "No rows affected .."
> A better and more intuitive msg would be "xxx rows inserted (26.068 seconds)"
> 
> Added the numRows parameter as part of QueryState.
> Adding the numRows to the response as well to display in beeline.
> 
> Getting the count in FileSinkOperator and setting it in statsMap, when it 
> operates only on table specific rows for the particular operation. (so that 
> we can get only the insert to table count and avoid counting non-table 
> specific file-sink operations happening during query execution).
> 
> 
> Diffs
> -
> 
>   beeline/src/main/resources/BeeLine.properties 
> c41b3ed637e04d8d2d9800ad5e9284264f7e4055 
>   itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 
> b217259553be472863cd33bb2259aa700e6c3528 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 
> 06542cee02e5dc4696f2621bb45cc4f24c67dfda 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
> 52799b30c39af2f192c4ae22ce7d68b403014183 
>   ql/src/java/org/apache/hadoop/hive/ql/MapRedStats.java 
> cf9c2273159c0d779ea90ad029613678fb0967a6 
>   ql/src/java/org/apache/hadoop/hive/ql/QueryState.java 
> 706c9ffa48b9c3b4a6fdaae78bab1d39c3d0efda 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 
> 01a5b4c9c328cb034a613a1539cea2584e122fb4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java 
> fcdc9967f12a454a9d3f31031e2261f264479118 
>   ql/src/test/results/clientpositive/llap/dp_counter_mm.q.out 
> 18f4c69a191bde3cae2d5efac5ef20fd0b1a9f0c 
>   ql/src/test/results/clientpositive/llap/dp_counter_non_mm.q.out 
> 28f376f8c4c2151383286e754447d1349050ef4e 
>   ql/src/test/results/clientpositive/llap/orc_ppd_basic.q.out 
> 96819f4e1c446f6de423f99c7697d548ff5dbe06 
>   ql/src/test/results/clientpositive/llap/tez_input_counters.q.out 
> d2fcdaa1bfba03e1f0e4191c8d056b05f334443d 
>   service-rpc/if/TCLIService.thrift 30f8af7f3e6e0598b410498782900ac27971aef0 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h 
> 4321ad6d3c966d30f7a69552f91804cf2f1ba6c4 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp 
> b2b62c71492b844f4439367364c5c81aa62f3908 
>   
> service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TGetOperationStatusResp.java
>  15e8220eb3eb12b72c7b64029410dced33bc0d72 
>   service-rpc/src/gen/thrift/gen-php/Types.php 
> abb7c1ff3a2c8b72dc97689758266b675880e32b 
>   service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py 
> 0f8fd0745be0f4ed9e96b7bbe0f092d03649bcdf 
>   service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 
> 60183dae9e9927bd09a9676e49eeb4aea2401737 
>   service/src/java/org/apache/hive/service/cli/CLIService.java 
> c9914ba9bf8653cbcbca7d6612e98a64058c0fcc 
>   service/src/java/org/apache/hive/service/cli/OperationStatus.java 
> 52cc3ae4f26b990b3e4edb52d9de85b3cc25f269 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 3706c72abc77ac8bd77947cc1c5d084ddf965e9f 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> c64c99120ad21ee98af81ec6659a2722e3e1d1c7 
> 
> 
> Diff: https://reviews.apache.org/r/66290/diff/7/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>

Re: May 2018 Hive User Group Meeting

2018-05-08 Thread Sahil Takiar

Hey Everyone,

Almost time for the meetup! The live stream can be viewed on this link:
https://live.lifesizecloud.com/extension/2000992219?token=067078ac-a8df-45bc-b84c-4b371ecbc719==en=Hive%20User%20Group%20Meetup

The stream won't be live until the meetup starts.

For those attending in person, there will be guest wifi:

Login: HiveMeetup
Password: ClouderaHive

On Mon, May 7, 2018 at 12:48 PM, Sahil Takiar <takiar.sa...@gmail.com>
wrote:

> Hey Everyone,
>
> The meetup is only a day away! Here
> <https://docs.google.com/document/d/1v8iERias-LOq8-q4BrCNSUsOarCwOAYRwXYIF0DK5AU/edit?usp=sharing>
> is a link to all the abstracts we have compiled thus far. Several of you
> have asked about event streaming and recordings. The meetup will be both
> streamed live and recorded. We will post the links on this thread and on
> the meetup link tomorrow closer to the start of the meetup.
>
> The meetup will be at Cloudera HQ - 395 Page Mill Rd. If you have any
> trouble getting into the building, feel free to post on the meetup link.
>
> Meetup Link: https://www.meetup.com/Hive-User-Group-Meeting/
> events/249641278/
>
> On Wed, May 2, 2018 at 7:48 AM, Sahil Takiar <takiar.sa...@gmail.com>
> wrote:
>
>> Hey Everyone,
>>
>> The agenda for the meetup has been set and I'm excited to say we have
>> lots of interesting talks scheduled! Below is final agenda, the full list
>> of abstracts will be sent out soon. If you are planning to attend, please
>> RSVP on the meetup link so we can get an accurate headcount of attendees (
>> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/).
>>
>> 6:30 - 7:00 PM Networking and Refreshments
>> 7:00PM - 8:20 PM Lightning Talks (10 min each) - 8 talks total
>>
>>- What's new in Hive 3.0.0 - Ashutosh Chauhan
>>- Hive-on-Spark at Uber: Efficiency & Scale - Xuefu Zhang
>>- Hive-on-S3 Performance: Past, Present, and Future - Sahil Takiar
>>- Dali: Data Access Layer at LinkedIn - Adwait Tumbde
>>- Parquet Vectorization in Hive - Vihang Karajgaonkar
>>- ORC Column Level Encryption - Owen O’Malley
>>- Running Hive at Scale @ Lyft - Sharanya Santhanam, Rohit Menon
>>- Materialized Views in Hive - Jesus Camacho Rodriguez
>>
>> 8:30 PM - 9:00 PM Hive Metastore Panel
>>
>>- Moderator: Vihang Karajgaonkar
>>- Participants:
>>   - Daniel Dai - Hive Metastore Caching
>>   - Alan Gates - Hive Metastore Separation
>>   - Rituparna Agrawal - Customer Use Cases & Pain Points of (Big)
>>   Metadata
>>
>> The Metastore panel will consist of a short presentation by each panelist
>> followed by a Q session driven by the moderator.
>>
>> On Tue, Apr 24, 2018 at 2:53 PM, Sahil Takiar <takiar.sa...@gmail.com>
>> wrote:
>>
>>> We still have a few slots open for lightening talks, so if anyone is
>>> interested in giving a presentation don't hesitate to reach out!
>>>
>>> If you are planning to attend the meetup, please RSVP on the Meetup link
>>> (https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/) so
>>> that we can get an accurate headcount for food.
>>>
>>> Thanks!
>>>
>>> --Sahil
>>>
>>> On Wed, Apr 11, 2018 at 5:08 PM, Sahil Takiar <takiar.sa...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm happy to announce that the Hive community is organizing a Hive user
>>>> group meeting in the Bay Area next month. The details can be found at
>>>> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/
>>>>
>>>> The format of this meetup will be slightly different from previous
>>>> ones. There will be one hour dedicated to lightning talks, followed by a
>>>> group discussion on the future of the Hive Metastore.
>>>>
>>>> We are inviting talk proposals from Hive users as well as developers at
>>>> this time. Please contact either myself (takiar.sa...@gmail.com),
>>>> Vihang Karajgaonkar (vih...@cloudera.com), or Peter Vary (
>>>> pv...@cloudera.com) with proposals. We currently have 5 openings.
>>>>
>>>> Please let me know if you have any questions or suggestions.
>>>>
>>>> Thanks,
>>>> Sahil
>>>>
>>>
>>>
>>>
>>> --
>>> Sahil Takiar
>>> Software Engineer
>>> takiar.sa...@gmail.com | (510) 673-0309
>>>
>>
>>
>>
>> --
>> Sahil Takiar
>> Software Engineer
>> takiar.sa...@gmail.com | (510) 673-0309
>>
>
>
>
> --
> Sahil Takiar
> Software Engineer
> takiar.sa...@gmail.com | (510) 673-0309
>



-- 
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309

Re: Integrating Yetus with Precommit job

2018-05-07 Thread Sahil Takiar

The FindBugs plugin for Yetus is now working. Yetus will give a -1 if it
finds any FindBugs warning in your patch. It gives a 0 for any patch
applied to a module that contains existing FindBugs warnings (e.g. ql has
2318 existing FindBugs issues).

On Mon, Nov 27, 2017 at 8:57 AM, Andrew Sherman <asher...@cloudera.com>
wrote:

> Thanks, this is going to be useful
>
> On Wed, Nov 22, 2017 at 11:28 AM, Vineet Garg <vg...@hortonworks.com>
> wrote:
>
> > Thanks Adam!
> >
> > > On Nov 22, 2017, at 5:46 AM, Adam Szita <sz...@cloudera.com> wrote:
> > >
> > > This is now done. Patch is committed and we deployed the new war file
> to
> > > the ptest server.
> > >
> > > Jobs that were waiting in queue at the time of ptest server restart
> have
> > > been retriggered in Jenkins.
> > >
> > > I hope this change will contribute to the overall code quality of Hive
> in
> > > our future patches to come :)
> > >
> > > On 21 November 2017 at 17:39, Adam Szita <sz...@cloudera.com> wrote:
> > >
> > >> Hi,
> > >>
> > >> In the last days all prerequisites have been resolved for this:
> > >> -ASF headers are fixed
> > >> -checkstyle is upgraded to support Java8
> > >> -proper checkstyle configuration has been introduced to poms that are
> > >> disconnected from Hive's root pom
> > >>
> > >> Thanks Alan for reviewing these.
> > >>
> > >> Therefore we plan to move ahead with this tomorrow around 10:00AM CET,
> > do
> > >> the commit with Peter Vary and replace the war file among ptest
> servers
> > >> Tomcat webapps.
> > >>
> > >> Thanks,
> > >> Adam
> > >>
> > >> On 7 November 2017 at 18:42, Alan Gates <alanfga...@gmail.com> wrote:
> > >>
> > >>> I’ve put some feedback in HIVE-17995.  17996 and 17997 look good.
> I’ll
> > >>> commit them once the tests run.
> > >>>
> > >>> I think you’ll need to do similar patches for storage-api, as it is
> > also
> > >>> not connected to the hive pom anymore.
> > >>>
> > >>> Alan.
> > >>>
> > >>> On Tue, Nov 7, 2017 at 6:17 AM, Adam Szita <sz...@cloudera.com>
> wrote:
> > >>>
> > >>>> Thanks for all the replies.
> > >>>>
> > >>>> Vihang: Good idea on making everything green before turning this on.
> > For
> > >>>> this purpose I've filed a couple of jiras:
> > >>>> -HIVE-17995 <https://issues.apache.org/jira/browse/HIVE-17995> Run
> > >>>> checkstyle on standalone-metastore module with proper configuration
> > >>>> -HIVE-17996 <https://issues.apache.org/jira/browse/HIVE-17996> Fix
> > ASF
> > >>>> headers
> > >>>> -HIVE-17997 <https://issues.apache.org/jira/browse/HIVE-17997> Add
> > rat
> > >>>> plugin and configuration to standalone metastore pom
> > >>>>
> > >>>> Sahil: there is an umbrella jira (HIVE-13503
> > >>>> <https://issues.apache.org/jira/browse/HIVE-13503>) for test
> > >>> improvements,
> > >>>> the Yetus integration itself is also a subtask of it. I think any
> > >>> further
> > >>>> improvements on what Yetus features we want to enable should go here
> > >>> too.
> > >>>>
> > >>>> Adam
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> > >>
> >
> >
>



-- 
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309

Re: May 2018 Hive User Group Meeting

2018-05-07 Thread Sahil Takiar

Hey Everyone,

The meetup is only a day away! Here
<https://docs.google.com/document/d/1v8iERias-LOq8-q4BrCNSUsOarCwOAYRwXYIF0DK5AU/edit?usp=sharing>
is a link to all the abstracts we have compiled thus far. Several of you
have asked about event streaming and recordings. The meetup will be both
streamed live and recorded. We will post the links on this thread and on
the meetup link tomorrow closer to the start of the meetup.

The meetup will be at Cloudera HQ - 395 Page Mill Rd. If you have any
trouble getting into the building, feel free to post on the meetup link.

Meetup Link:
https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/

On Wed, May 2, 2018 at 7:48 AM, Sahil Takiar <takiar.sa...@gmail.com> wrote:

> Hey Everyone,
>
> The agenda for the meetup has been set and I'm excited to say we have lots
> of interesting talks scheduled! Below is final agenda, the full list of
> abstracts will be sent out soon. If you are planning to attend, please RSVP
> on the meetup link so we can get an accurate headcount of attendees (
> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/).
>
> 6:30 - 7:00 PM Networking and Refreshments
> 7:00PM - 8:20 PM Lightning Talks (10 min each) - 8 talks total
>
>- What's new in Hive 3.0.0 - Ashutosh Chauhan
>- Hive-on-Spark at Uber: Efficiency & Scale - Xuefu Zhang
>- Hive-on-S3 Performance: Past, Present, and Future - Sahil Takiar
>- Dali: Data Access Layer at LinkedIn - Adwait Tumbde
>- Parquet Vectorization in Hive - Vihang Karajgaonkar
>- ORC Column Level Encryption - Owen O’Malley
>- Running Hive at Scale @ Lyft - Sharanya Santhanam, Rohit Menon
>- Materialized Views in Hive - Jesus Camacho Rodriguez
>
> 8:30 PM - 9:00 PM Hive Metastore Panel
>
>- Moderator: Vihang Karajgaonkar
>- Participants:
>   - Daniel Dai - Hive Metastore Caching
>   - Alan Gates - Hive Metastore Separation
>   - Rituparna Agrawal - Customer Use Cases & Pain Points of (Big)
>   Metadata
>
> The Metastore panel will consist of a short presentation by each panelist
> followed by a Q session driven by the moderator.
>
> On Tue, Apr 24, 2018 at 2:53 PM, Sahil Takiar <takiar.sa...@gmail.com>
> wrote:
>
>> We still have a few slots open for lightening talks, so if anyone is
>> interested in giving a presentation don't hesitate to reach out!
>>
>> If you are planning to attend the meetup, please RSVP on the Meetup link (
>> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/) so
>> that we can get an accurate headcount for food.
>>
>> Thanks!
>>
>> --Sahil
>>
>> On Wed, Apr 11, 2018 at 5:08 PM, Sahil Takiar <takiar.sa...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I'm happy to announce that the Hive community is organizing a Hive user
>>> group meeting in the Bay Area next month. The details can be found at
>>> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/
>>>
>>> The format of this meetup will be slightly different from previous ones.
>>> There will be one hour dedicated to lightning talks, followed by a group
>>> discussion on the future of the Hive Metastore.
>>>
>>> We are inviting talk proposals from Hive users as well as developers at
>>> this time. Please contact either myself (takiar.sa...@gmail.com),
>>> Vihang Karajgaonkar (vih...@cloudera.com), or Peter Vary (
>>> pv...@cloudera.com) with proposals. We currently have 5 openings.
>>>
>>> Please let me know if you have any questions or suggestions.
>>>
>>> Thanks,
>>> Sahil
>>>
>>
>>
>>
>> --
>> Sahil Takiar
>> Software Engineer
>> takiar.sa...@gmail.com | (510) 673-0309
>>
>
>
>
> --
> Sahil Takiar
> Software Engineer
> takiar.sa...@gmail.com | (510) 673-0309
>



-- 
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309

Re: PTest Maintenance

2018-05-06 Thread Sahil Takiar

Maintenance has been completed. There should be no missed Hive QA runs.

On Sun, May 6, 2018 at 5:25 PM, Sahil Takiar <takiar.sa...@gmail.com> wrote:

> Will be performing some maintenance on PTest this evening to
> deploy HIVE-19212. If there are any issues, I will ping the individual
> JIRAs affected.
>
> --Sahil
>

-- 
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309

PTest Maintenance

2018-05-06 Thread Sahil Takiar

Will be performing some maintenance on PTest this evening to
deploy HIVE-19212. If there are any issues, I will ping the individual
JIRAs affected.

--Sahil

[jira] [Created] (HIVE-19422) Create Docker env for running HoS locally

2018-05-04 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19422:
---

 Summary: Create Docker env for running HoS locally
 Key: HIVE-19422
 URL: https://issues.apache.org/jira/browse/HIVE-19422
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Sahil Takiar


It's really hard to run HoS on a locally installed distribution of Hive built 
using {{mvn package}}. The only way developers can really run HoS is via the 
Spark CLI Drivers. However, there are occasions where devs need to run HoS on a 
proper Hive distribution in order to validate some behavior.

The docker image will also be useful to users who want to play around with HoS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19405) AddPartitionDesc should intern its fields

2018-05-03 Thread Sahil Takiar (JIRA)

Sahil Takiar created HIVE-19405:
---

 Summary: AddPartitionDesc should intern its fields
 Key: HIVE-19405
 URL: https://issues.apache.org/jira/browse/HIVE-19405
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Sahil Takiar
Assignee: Sahil Takiar


A lot of heap is wasted on duplicate strings between we accumulate tons of 
{{AddPartitionDesc}} objects during operations such as msck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: May 2018 Hive User Group Meeting

2018-05-02 Thread Sahil Takiar

Hey Everyone,

The agenda for the meetup has been set and I'm excited to say we have lots
of interesting talks scheduled! Below is final agenda, the full list of
abstracts will be sent out soon. If you are planning to attend, please RSVP
on the meetup link so we can get an accurate headcount of attendees (
https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/).

6:30 - 7:00 PM Networking and Refreshments
7:00PM - 8:20 PM Lightning Talks (10 min each) - 8 talks total

   - What's new in Hive 3.0.0 - Ashutosh Chauhan
   - Hive-on-Spark at Uber: Efficiency & Scale - Xuefu Zhang
   - Hive-on-S3 Performance: Past, Present, and Future - Sahil Takiar
   - Dali: Data Access Layer at LinkedIn - Adwait Tumbde
   - Parquet Vectorization in Hive - Vihang Karajgaonkar
   - ORC Column Level Encryption - Owen O’Malley
   - Running Hive at Scale @ Lyft - Sharanya Santhanam, Rohit Menon
   - Materialized Views in Hive - Jesus Camacho Rodriguez

8:30 PM - 9:00 PM Hive Metastore Panel

   - Moderator: Vihang Karajgaonkar
   - Participants:
  - Daniel Dai - Hive Metastore Caching
  - Alan Gates - Hive Metastore Separation
  - Rituparna Agrawal - Customer Use Cases & Pain Points of (Big)
  Metadata

The Metastore panel will consist of a short presentation by each panelist
followed by a Q session driven by the moderator.

On Tue, Apr 24, 2018 at 2:53 PM, Sahil Takiar <takiar.sa...@gmail.com>
wrote:

> We still have a few slots open for lightening talks, so if anyone is
> interested in giving a presentation don't hesitate to reach out!
>
> If you are planning to attend the meetup, please RSVP on the Meetup link (
> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/) so that
> we can get an accurate headcount for food.
>
> Thanks!
>
> --Sahil
>
> On Wed, Apr 11, 2018 at 5:08 PM, Sahil Takiar <takiar.sa...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I'm happy to announce that the Hive community is organizing a Hive user
>> group meeting in the Bay Area next month. The details can be found at
>> https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/
>>
>> The format of this meetup will be slightly different from previous ones.
>> There will be one hour dedicated to lightning talks, followed by a group
>> discussion on the future of the Hive Metastore.
>>
>> We are inviting talk proposals from Hive users as well as developers at
>> this time. Please contact either myself (takiar.sa...@gmail.com), Vihang
>> Karajgaonkar (vih...@cloudera.com), or Peter Vary (pv...@cloudera.com)
>> with proposals. We currently have 5 openings.
>>
>> Please let me know if you have any questions or suggestions.
>>
>> Thanks,
>> Sahil
>>
>
>
>
> --
> Sahil Takiar
> Software Engineer
> takiar.sa...@gmail.com | (510) 673-0309
>



-- 
Sahil Takiar
Software Engineer
takiar.sa...@gmail.com | (510) 673-0309

Re: Review Request 66290: HIVE-14388 : Add number of rows inserted message after insert command in Beeline

2018-05-01 Thread Sahil Takiar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66290/#review202193
---


Fix it, then Ship it!




One minor comment, otherwise LGTM.


jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java
Lines 713 (patched)
<https://reviews.apache.org/r/66290/#comment283931>

Should we modify beeline so if it gets a value of `-1` it says something 
like "Unknown number of rows affected", or just don't print anything at all. 
Right now it would print "No rows affected..." if an overflow happens, which 
would be odd.


- Sahil Takiar


On April 27, 2018, 11:05 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66290/
> ---
> 
> (Updated April 27, 2018, 11:05 p.m.)
> 
> 
> Review request for hive, Sahil Takiar and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-14388
> https://issues.apache.org/jira/browse/HIVE-14388
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently, when you run insert command on beeline, it returns a message 
> saying "No rows affected .."
> A better and more intuitive msg would be "xxx rows inserted (26.068 seconds)"
> 
> Added the numRows parameter as part of QueryState.
> Adding the numRows to the response as well to display in beeline.
> 
> Getting the count in FileSinkOperator and setting it in statsMap, when it 
> operates only on table specific rows for the particular operation. (so that 
> we can get only the insert to table count and avoid counting non-table 
> specific file-sink operations happening during query execution).
> 
> 
> Diffs
> -
> 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 
> 06542cee02e5dc4696f2621bb45cc4f24c67dfda 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
> 41ad002abf3d2a6969ef0d1d48f7db22e096bb47 
>   ql/src/java/org/apache/hadoop/hive/ql/MapRedStats.java 
> cf9c2273159c0d779ea90ad029613678fb0967a6 
>   ql/src/java/org/apache/hadoop/hive/ql/QueryState.java 
> 706c9ffa48b9c3b4a6fdaae78bab1d39c3d0efda 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 
> c084fa054cb771bfdb033d244935713e3c7eb874 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java 
> fcdc9967f12a454a9d3f31031e2261f264479118 
>   service-rpc/if/TCLIService.thrift 30f8af7f3e6e0598b410498782900ac27971aef0 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.h 
> 4321ad6d3c966d30f7a69552f91804cf2f1ba6c4 
>   service-rpc/src/gen/thrift/gen-cpp/TCLIService_types.cpp 
> b2b62c71492b844f4439367364c5c81aa62f3908 
>   
> service-rpc/src/gen/thrift/gen-javabean/org/apache/hive/service/rpc/thrift/TGetOperationStatusResp.java
>  15e8220eb3eb12b72c7b64029410dced33bc0d72 
>   service-rpc/src/gen/thrift/gen-php/Types.php 
> abb7c1ff3a2c8b72dc97689758266b675880e32b 
>   service-rpc/src/gen/thrift/gen-py/TCLIService/ttypes.py 
> 0f8fd0745be0f4ed9e96b7bbe0f092d03649bcdf 
>   service-rpc/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 
> 60183dae9e9927bd09a9676e49eeb4aea2401737 
>   service/src/java/org/apache/hive/service/cli/CLIService.java 
> c9914ba9bf8653cbcbca7d6612e98a64058c0fcc 
>   service/src/java/org/apache/hive/service/cli/OperationStatus.java 
> 52cc3ae4f26b990b3e4edb52d9de85b3cc25f269 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 3706c72abc77ac8bd77947cc1c5d084ddf965e9f 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> c64c99120ad21ee98af81ec6659a2722e3e1d1c7 
> 
> 
> Diff: https://reviews.apache.org/r/66290/diff/5/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>

1 2 3 4 >

1 - 100 of 356 matches

Mail list logo