[jira] [Updated] (HIVE-10474) LLAP: investigate why TPCH Q1 1k is slow
[ https://issues.apache.org/jira/browse/HIVE-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-10474: Component/s: llap > LLAP: investigate why TPCH Q1 1k is slow > > > Key: HIVE-10474 > URL: https://issues.apache.org/jira/browse/HIVE-10474 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Sergey Shelukhin > Attachments: llap-gc-pauses.png > > > While most queries run faster in LLAP than just Tez with container reuse, > TPCH Q1 is much slower. > On my run, on tez with container reuse (current default LLAP configuration > but mode == container and no daemons running) runs 2-6 (out of 6 consecutive > runs in the same session) finished in 25.5sec average; with 16 LLAP daemons > in default config the average was 35.5sec; same w/o IO elevator (to rule out > its impact) it took 59.7sec w/strange distribution (later runs were slower > than earlier runs, still, fastest run was 49.5sec). > So excluding IO elevator it's more than 2x degradation. > We need to figure out why this is happening. Is it just slot discrepancy? > Regardless, this needs to be addressed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10474) LLAP: investigate why TPCH Q1 1k is slow
[ https://issues.apache.org/jira/browse/HIVE-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10474: --- Attachment: llap-gc-pauses.png I restored the HADOOP-11772 fix on the cluster and re-ran this. The GC pressure has gone way up since I tested this last - 20-25 full collections every minute. !llap-gc-pauses.png! something's changed that made the tenured generation huge recently - the daemon slows down as you keep using it. This looks like a recent regression in perf. > LLAP: investigate why TPCH Q1 1k is slow > > > Key: HIVE-10474 > URL: https://issues.apache.org/jira/browse/HIVE-10474 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin > Attachments: llap-gc-pauses.png > > > While most queries run faster in LLAP than just Tez with container reuse, > TPCH Q1 is much slower. > On my run, on tez with container reuse (current default LLAP configuration > but mode == container and no daemons running) runs 2-6 (out of 6 consecutive > runs in the same session) finished in 25.5sec average; with 16 LLAP daemons > in default config the average was 35.5sec; same w/o IO elevator (to rule out > its impact) it took 59.7sec w/strange distribution (later runs were slower > than earlier runs, still, fastest run was 49.5sec). > So excluding IO elevator it's more than 2x degradation. > We need to figure out why this is happening. Is it just slot discrepancy? > Regardless, this needs to be addressed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10474) LLAP: investigate why TPCH Q1 1k is slow
[ https://issues.apache.org/jira/browse/HIVE-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10474: Description: While most queries run faster in LLAP than just Tez with container reuse, TPCH Q1 is much slower. On my run, on tez with container reuse (current default LLAP configuration but mode == container and no daemons running) runs 2-6 (out of 6 consecutive runs in the same session) finished in 25.5sec average; with 16 LLAP daemons in default config the average was 35.5sec; same w/o IO elevator (to rule out its impact) it took 59.7sec w/strange distribution (later runs were slower than earlier runs, still, fastest run was 49.5sec). So excluding IO elevator it's more than 2x degradation. We need to figure out why this is happening. Is it just slot discrepancy? Regardless, this needs to be addressed. was: While most queries run faster in LLAP than just Tez with container reuse, TPCH Q1 is much slower. On my run, tez with container reuse (current default LLAP configuration but mode == container and no daemons running) runs 2-6 (out of 6 consecutive runs in the same session) finished in 25.5sec average; with 16 LLAP daemons in default config the average was 35.5sec; same w/o IO elevator (to rule out its impact) it took 59.7sec w/strange distribution (later runs were slower than earlier runs, still, fastest run was 49.5sec). So excluding IO elevator it's more than 2x degradation. We need to figure out why this is happening. Is it just slot discrepancy? Regardless, this needs to be addressed. > LLAP: investigate why TPCH Q1 1k is slow > > > Key: HIVE-10474 > URL: https://issues.apache.org/jira/browse/HIVE-10474 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin > > While most queries run faster in LLAP than just Tez with container reuse, > TPCH Q1 is much slower. > On my run, on tez with container reuse (current default LLAP configuration > but mode == container and no daemons running) runs 2-6 (out of 6 consecutive > runs in the same session) finished in 25.5sec average; with 16 LLAP daemons > in default config the average was 35.5sec; same w/o IO elevator (to rule out > its impact) it took 59.7sec w/strange distribution (later runs were slower > than earlier runs, still, fastest run was 49.5sec). > So excluding IO elevator it's more than 2x degradation. > We need to figure out why this is happening. Is it just slot discrepancy? > Regardless, this needs to be addressed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10474) LLAP: investigate why TPCH Q1 1k is slow
[ https://issues.apache.org/jira/browse/HIVE-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10474: Description: While most queries run faster in LLAP than just Tez with container reuse, TPCH Q1 is much slower. On my run, tez with container reuse (current default LLAP configuration but mode == container and no daemons running) runs 2-6 (out of 6 consecutive runs in the same session) finished in 25.5sec average; with 16 LLAP daemons in default config the average was 35.5sec; same w/o IO elevator (to rule out its impact) it took 59.7sec w/strange distribution (later runs were slower than earlier runs, still, fastest run was 49.5sec). So excluding IO elevator it's more than 2x degradation. We need to figure out why this is happening. Is it just slot discrepancy? Regardless, this needs to be addressed. was: While most queries run faster in LLAP than just Tez with container reuse, TPCH Q1 is much slower. On my run, tez with container reuse (current default LLAP configuration but mode == container and no daemons running) run 2-6 (out of 6) finished in 25.5sec average; with 16 LLAP daemons in default config it finished in 35.5sec; w/the daemons w/o IO elevator (to rule out its impact) it took 59.7sec w/strange distribution (later runs were slower than earlier runs, still, fastest run was 49.5sec). We need to figure out why this is happening. Is it just slot discrepancy? Regardless, this needs to be addressed. > LLAP: investigate why TPCH Q1 1k is slow > > > Key: HIVE-10474 > URL: https://issues.apache.org/jira/browse/HIVE-10474 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin > > While most queries run faster in LLAP than just Tez with container reuse, > TPCH Q1 is much slower. > On my run, tez with container reuse (current default LLAP configuration but > mode == container and no daemons running) runs 2-6 (out of 6 consecutive > runs in the same session) finished in 25.5sec average; with 16 LLAP daemons > in default config the average was 35.5sec; same w/o IO elevator (to rule out > its impact) it took 59.7sec w/strange distribution (later runs were slower > than earlier runs, still, fastest run was 49.5sec). > So excluding IO elevator it's more than 2x degradation. > We need to figure out why this is happening. Is it just slot discrepancy? > Regardless, this needs to be addressed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10474) LLAP: investigate why TPCH Q1 1k is slow
[ https://issues.apache.org/jira/browse/HIVE-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10474: Description: While most queries run faster in LLAP than just Tez with container reuse, TPCH Q1 is much slower. On my run, tez with container reuse (current default LLAP configuration but mode == container and no daemons running) run 2-6 (out of 6) finished in 25.5sec average; with 16 LLAP daemons in default config it finished in 35.5sec; w/the daemons w/o IO elevator (to rule out its impact) it took 59.7sec w/strange distribution (later runs were slower than earlier runs, still, fastest run was 49.5sec). We need to figure out why this is happening. Is it just slot discrepancy? Regardless, this needs to be addressed. was: While most queries run faster in LLAP than just Tez with container reuse, TPCH Q1 is much slower. On my run, tez with container reuse (current default LLAP configuration but mode == container and no daemons running) finished in 25.5sec average; with 16 LLAP daemons in default config it finished in 35.5sec; w/the daemons w/o IO elevator (to rule out its impact) it took 59.7sec w/strange distribution (later runs were slower than earlier runs, still, fastest run was 49.5sec). We need to figure out why this is happening. Is it just slot discrepancy? Regardless, this needs to be addressed. > LLAP: investigate why TPCH Q1 1k is slow > > > Key: HIVE-10474 > URL: https://issues.apache.org/jira/browse/HIVE-10474 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin > > While most queries run faster in LLAP than just Tez with container reuse, > TPCH Q1 is much slower. > On my run, tez with container reuse (current default LLAP configuration but > mode == container and no daemons running) run 2-6 (out of 6) finished in > 25.5sec average; with 16 LLAP daemons in default config it finished in > 35.5sec; w/the daemons w/o IO elevator (to rule out its impact) it took > 59.7sec w/strange distribution (later runs were slower than earlier runs, > still, fastest run was 49.5sec). > We need to figure out why this is happening. Is it just slot discrepancy? > Regardless, this needs to be addressed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)