[jira] [Updated] (HIVE-10474) LLAP: investigate why TPCH Q1 1k is slow

2015-12-17 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-10474:

Component/s: llap

> LLAP: investigate why TPCH Q1 1k is slow
> 
>
> Key: HIVE-10474
> URL: https://issues.apache.org/jira/browse/HIVE-10474
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Sergey Shelukhin
> Attachments: llap-gc-pauses.png
>
>
> While most queries run faster in LLAP than just Tez with container reuse, 
> TPCH Q1 is much slower.
> On my run, on tez with container reuse (current default LLAP configuration 
> but mode == container and no daemons running)  runs 2-6 (out of 6 consecutive 
> runs in the same session) finished in 25.5sec average; with 16 LLAP daemons 
> in default config the average was 35.5sec; same w/o IO elevator (to rule out 
> its impact) it took 59.7sec w/strange distribution (later runs were slower 
> than earlier runs, still, fastest run was 49.5sec).
> So excluding IO elevator it's more than 2x degradation.
> We need to figure out why this is happening. Is it just slot discrepancy? 
> Regardless, this needs to be addressed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10474) LLAP: investigate why TPCH Q1 1k is slow

2015-04-24 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10474:
---
Attachment: llap-gc-pauses.png

I restored the HADOOP-11772 fix on the cluster and re-ran this.

The GC pressure has gone way up since I tested this last - 20-25 full 
collections every minute.

!llap-gc-pauses.png!

something's changed that made the tenured generation huge recently - the daemon 
slows down as you keep using it. This looks like a recent regression in perf.

> LLAP: investigate why TPCH Q1 1k is slow
> 
>
> Key: HIVE-10474
> URL: https://issues.apache.org/jira/browse/HIVE-10474
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
> Attachments: llap-gc-pauses.png
>
>
> While most queries run faster in LLAP than just Tez with container reuse, 
> TPCH Q1 is much slower.
> On my run, on tez with container reuse (current default LLAP configuration 
> but mode == container and no daemons running)  runs 2-6 (out of 6 consecutive 
> runs in the same session) finished in 25.5sec average; with 16 LLAP daemons 
> in default config the average was 35.5sec; same w/o IO elevator (to rule out 
> its impact) it took 59.7sec w/strange distribution (later runs were slower 
> than earlier runs, still, fastest run was 49.5sec).
> So excluding IO elevator it's more than 2x degradation.
> We need to figure out why this is happening. Is it just slot discrepancy? 
> Regardless, this needs to be addressed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10474) LLAP: investigate why TPCH Q1 1k is slow

2015-04-23 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10474:

Description: 
While most queries run faster in LLAP than just Tez with container reuse, TPCH 
Q1 is much slower.
On my run, on tez with container reuse (current default LLAP configuration but 
mode == container and no daemons running)  runs 2-6 (out of 6 consecutive runs 
in the same session) finished in 25.5sec average; with 16 LLAP daemons in 
default config the average was 35.5sec; same w/o IO elevator (to rule out its 
impact) it took 59.7sec w/strange distribution (later runs were slower than 
earlier runs, still, fastest run was 49.5sec).

So excluding IO elevator it's more than 2x degradation.

We need to figure out why this is happening. Is it just slot discrepancy? 
Regardless, this needs to be addressed.

  was:
While most queries run faster in LLAP than just Tez with container reuse, TPCH 
Q1 is much slower.
On my run, tez with container reuse (current default LLAP configuration but 
mode == container and no daemons running)  runs 2-6 (out of 6 consecutive runs 
in the same session) finished in 25.5sec average; with 16 LLAP daemons in 
default config the average was 35.5sec; same w/o IO elevator (to rule out its 
impact) it took 59.7sec w/strange distribution (later runs were slower than 
earlier runs, still, fastest run was 49.5sec).

So excluding IO elevator it's more than 2x degradation.

We need to figure out why this is happening. Is it just slot discrepancy? 
Regardless, this needs to be addressed.


> LLAP: investigate why TPCH Q1 1k is slow
> 
>
> Key: HIVE-10474
> URL: https://issues.apache.org/jira/browse/HIVE-10474
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>
> While most queries run faster in LLAP than just Tez with container reuse, 
> TPCH Q1 is much slower.
> On my run, on tez with container reuse (current default LLAP configuration 
> but mode == container and no daemons running)  runs 2-6 (out of 6 consecutive 
> runs in the same session) finished in 25.5sec average; with 16 LLAP daemons 
> in default config the average was 35.5sec; same w/o IO elevator (to rule out 
> its impact) it took 59.7sec w/strange distribution (later runs were slower 
> than earlier runs, still, fastest run was 49.5sec).
> So excluding IO elevator it's more than 2x degradation.
> We need to figure out why this is happening. Is it just slot discrepancy? 
> Regardless, this needs to be addressed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10474) LLAP: investigate why TPCH Q1 1k is slow

2015-04-23 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10474:

Description: 
While most queries run faster in LLAP than just Tez with container reuse, TPCH 
Q1 is much slower.
On my run, tez with container reuse (current default LLAP configuration but 
mode == container and no daemons running)  runs 2-6 (out of 6 consecutive runs 
in the same session) finished in 25.5sec average; with 16 LLAP daemons in 
default config the average was 35.5sec; same w/o IO elevator (to rule out its 
impact) it took 59.7sec w/strange distribution (later runs were slower than 
earlier runs, still, fastest run was 49.5sec).

So excluding IO elevator it's more than 2x degradation.

We need to figure out why this is happening. Is it just slot discrepancy? 
Regardless, this needs to be addressed.

  was:
While most queries run faster in LLAP than just Tez with container reuse, TPCH 
Q1 is much slower.
On my run, tez with container reuse (current default LLAP configuration but 
mode == container and no daemons running)  run 2-6 (out of 6) finished in 
25.5sec average; with 16 LLAP daemons in default config it finished in 35.5sec; 
w/the daemons w/o IO elevator (to rule out its impact) it took 59.7sec 
w/strange distribution (later runs were slower than earlier runs, still, 
fastest run was 49.5sec).

We need to figure out why this is happening. Is it just slot discrepancy? 
Regardless, this needs to be addressed.


> LLAP: investigate why TPCH Q1 1k is slow
> 
>
> Key: HIVE-10474
> URL: https://issues.apache.org/jira/browse/HIVE-10474
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>
> While most queries run faster in LLAP than just Tez with container reuse, 
> TPCH Q1 is much slower.
> On my run, tez with container reuse (current default LLAP configuration but 
> mode == container and no daemons running)  runs 2-6 (out of 6 consecutive 
> runs in the same session) finished in 25.5sec average; with 16 LLAP daemons 
> in default config the average was 35.5sec; same w/o IO elevator (to rule out 
> its impact) it took 59.7sec w/strange distribution (later runs were slower 
> than earlier runs, still, fastest run was 49.5sec).
> So excluding IO elevator it's more than 2x degradation.
> We need to figure out why this is happening. Is it just slot discrepancy? 
> Regardless, this needs to be addressed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10474) LLAP: investigate why TPCH Q1 1k is slow

2015-04-23 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10474:

Description: 
While most queries run faster in LLAP than just Tez with container reuse, TPCH 
Q1 is much slower.
On my run, tez with container reuse (current default LLAP configuration but 
mode == container and no daemons running)  run 2-6 (out of 6) finished in 
25.5sec average; with 16 LLAP daemons in default config it finished in 35.5sec; 
w/the daemons w/o IO elevator (to rule out its impact) it took 59.7sec 
w/strange distribution (later runs were slower than earlier runs, still, 
fastest run was 49.5sec).

We need to figure out why this is happening. Is it just slot discrepancy? 
Regardless, this needs to be addressed.

  was:
While most queries run faster in LLAP than just Tez with container reuse, TPCH 
Q1 is much slower.
On my run, tez with container reuse (current default LLAP configuration but 
mode == container and no daemons running) finished in 25.5sec average; with 16 
LLAP daemons in default config it finished in 35.5sec; w/the daemons w/o IO 
elevator (to rule out its impact) it took 59.7sec w/strange distribution (later 
runs were slower than earlier runs, still, fastest run was 49.5sec).

We need to figure out why this is happening. Is it just slot discrepancy? 
Regardless, this needs to be addressed.


> LLAP: investigate why TPCH Q1 1k is slow
> 
>
> Key: HIVE-10474
> URL: https://issues.apache.org/jira/browse/HIVE-10474
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>
> While most queries run faster in LLAP than just Tez with container reuse, 
> TPCH Q1 is much slower.
> On my run, tez with container reuse (current default LLAP configuration but 
> mode == container and no daemons running)  run 2-6 (out of 6) finished in 
> 25.5sec average; with 16 LLAP daemons in default config it finished in 
> 35.5sec; w/the daemons w/o IO elevator (to rule out its impact) it took 
> 59.7sec w/strange distribution (later runs were slower than earlier runs, 
> still, fastest run was 49.5sec).
> We need to figure out why this is happening. Is it just slot discrepancy? 
> Regardless, this needs to be addressed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)