[jira] [Commented] (IMPALA-6070) Speed up test execution
[ https://issues.apache.org/jira/browse/IMPALA-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481444#comment-16481444 ] ASF subversion and git services commented on IMPALA-6070: - Commit 85ed7ae88bcec17ffd45b1dd66d07818cb1d55b0 in impala's branch refs/heads/master from [~philip] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=85ed7ae ] IMPALA-6070: Adding ASAN, --tail to test-with-docker. * Adds -ASAN suites to test-with-docker. * Adds --tail flag, which starts a tail subprocess. This isn't pretty (there's potential for overlap), but it's a dead simple way to keep an eye on what's going on. * Fixes a bug wherein I could call "docker rm " twice simultaneously, which would make Docker fail the second call, and then fail the related "docker rmi". It's better to serialize, and I did that with a simple lock. Change-Id: I51451cdf1352fc0f9516d729b9a77700488d993f Reviewed-on: http://gerrit.cloudera.org:8080/10319 Reviewed-by: Joe McDonnellTested-by: Impala Public Jenkins > Speed up test execution > --- > > Key: IMPALA-6070 > URL: https://issues.apache.org/jira/browse/IMPALA-6070 > Project: IMPALA > Issue Type: Bug >Reporter: Philip Zeyliger >Assignee: Philip Zeyliger >Priority: Major > Attachments: screenshot-1.png > > > Our tests (e.g., > https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/buildTimeTrend) tend > to take about 4 hours. This can be improved. > I'm opening this JIRA track those changes. I'm currently looking at: > * Parallelizing multiple data-load steps: TPC-DS, TPC-H, and Functional take > ~65 minutes when serialized. They take 35 minutes if running in parallel. > * Parallelizing compute stats: this takes ~10 minutes; probably can be faster. > The trickier thing is parallelizing fe tests, ee tests, and custom cluster > tests. The approach I'm taking is to create a docker container with > everything in it (including data load), and then running tests in parallel. > This is a bit messier, but I think it has some legs when it comes to using > machines with many cores. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7035) Impala HDFS Encryption tests failing after OpenJDK update
[ https://issues.apache.org/jira/browse/IMPALA-7035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481443#comment-16481443 ] ASF subversion and git services commented on IMPALA-7035: - Commit 24207bbdde98baed8e0b48aadc606dfe89ad3b0a in impala's branch refs/heads/2.x from [~philip] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=24207bb ] IMPALA-7035: Configure jceks.key.serialFilter for KMS. Configures a Java property for KMS to account for JDK 8u171's security fixes. I was seeing impala-py.test tests/metadata/test_hdfs_encryption.py fail with the following error: AssertionError: Error creating encryption zone: RemoteException: Can't recover key for testkey1 from keystore file:/home/impdev/Impala/testdata/cluster/cdh6/node-1/data/kms.keystore The issue is described in HDFS-13494, and I imagine it'll be fixed in due time. In the meanwhile, setting this property seems to do the trick. Change-Id: I2d21c9cce3b91e8fd8b2b4f1cda75e3958c977d5 Reviewed-on: http://gerrit.cloudera.org:8080/10418 Reviewed-by: Joe McDonnellTested-by: Impala Public Jenkins Reviewed-on: http://gerrit.cloudera.org:8080/10446 > Impala HDFS Encryption tests failing after OpenJDK update > - > > Key: IMPALA-7035 > URL: https://issues.apache.org/jira/browse/IMPALA-7035 > Project: IMPALA > Issue Type: Task >Reporter: Philip Zeyliger >Assignee: Philip Zeyliger >Priority: Major > > I have seen {{impala-py.test tests/metadata/test_hdfs_encryption.py}} fail > with the following error: > {{E AssertionError: Error creating encryption zone: RemoteException: Can't > recover key for testkey1 from keystore > [file:/home/impdev/Impala/testdata/cluster/cdh6/node-1/data/kms.keystore|file:///home/impdev/Impala/testdata/cluster/cdh6/node-1/data/kms.keystore]}} > I believe what's going on is described in > https://issues.apache.org/jira/browse/HDFS-13494. In short, the JDK now has a > special whitelist for an API as a result of a security vulnerability. > A workaround in the KMS init script to configure $HADOOP_OPTS seems to do the > trick. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7051) Concurrent Maven invocations can break build
Philip Zeyliger created IMPALA-7051: --- Summary: Concurrent Maven invocations can break build Key: IMPALA-7051 URL: https://issues.apache.org/jira/browse/IMPALA-7051 Project: IMPALA Issue Type: Task Components: Infrastructure Reporter: Philip Zeyliger Assignee: Philip Zeyliger Rarely I've seen our build fail when executing two Maven targets simultaneously. Maven isn't really safe for concurrent execution (e.g., ~/.m2/repository has no locking). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7050) Impala Doc: Document inc_stats_size_limit_bytes command line option for Impalad
[ https://issues.apache.org/jira/browse/IMPALA-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481391#comment-16481391 ] Alex Rodoni commented on IMPALA-7050: - https://gerrit.cloudera.org/#/c/10457/ > Impala Doc: Document inc_stats_size_limit_bytes command line option for > Impalad > --- > > Key: IMPALA-7050 > URL: https://issues.apache.org/jira/browse/IMPALA-7050 > Project: IMPALA > Issue Type: Task > Components: Docs >Affects Versions: Impala 2.8.0 >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > > nformation - > == > -inc_stats_size_limit_bytes (Maximum size of incremental stats the catalog > is allowed to serialize per table. This limit is set as a safety check, > to prevent the JVM from hitting a maximum array limit of 1GB (or OOM) > while building the thrift objects to send to impalads. By default, it's > set to 200MB) type: int64 default: 209715200 > == > Way to modify: > > You can change inc_stats_size_limit_bytes value using the below steps: > 1. CM > Impala Service > Configuration > Search Impala Command Line Argument > Advanced Configuration Snippet (Safety Valve) > 2. Please input -inc_stats_size_limit_bytes= > Please note that: you have to input the integer value in > For example, if you want to set 1GB, please input 1048576000(1024*1024*1024). > 3. Please save it and restart the impala service -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7050) Impala Doc: Document inc_stats_size_limit_bytes command line option for Impalad
Alex Rodoni created IMPALA-7050: --- Summary: Impala Doc: Document inc_stats_size_limit_bytes command line option for Impalad Key: IMPALA-7050 URL: https://issues.apache.org/jira/browse/IMPALA-7050 Project: IMPALA Issue Type: Task Components: Docs Affects Versions: Impala 2.8.0 Reporter: Alex Rodoni Assignee: Alex Rodoni nformation - == -inc_stats_size_limit_bytes (Maximum size of incremental stats the catalog is allowed to serialize per table. This limit is set as a safety check, to prevent the JVM from hitting a maximum array limit of 1GB (or OOM) while building the thrift objects to send to impalads. By default, it's set to 200MB) type: int64 default: 209715200 == Way to modify: You can change inc_stats_size_limit_bytes value using the below steps: 1. CM > Impala Service > Configuration > Search Impala Command Line Argument Advanced Configuration Snippet (Safety Valve) 2. Please input -inc_stats_size_limit_bytes= Please note that: you have to input the integer value in For example, if you want to set 1GB, please input 1048576000(1024*1024*1024). 3. Please save it and restart the impala service -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-6714) Impala 2.13 Doc: ORC file format support
[ https://issues.apache.org/jira/browse/IMPALA-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481363#comment-16481363 ] Alex Rodoni commented on IMPALA-6714: - Thank you [~stiga-huang]! > Impala 2.13 Doc: ORC file format support > > > Key: IMPALA-6714 > URL: https://issues.apache.org/jira/browse/IMPALA-6714 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Affects Versions: Impala 2.13.0 >Reporter: Alex Rodoni >Assignee: Quanlong Huang >Priority: Major > Labels: future_release_doc > > ORC is a columnar file format much like Parquet. Docs: > [https://orc.apache.org/docs/index.html] > Currently, we only support reading primitive types in ORC files (on-going > works are tracked at IMPALA-6943). It's still an experimental feature which > can be disabled by setting startup option --enable_orc_scanner to false. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-6714) Impala 2.13 Doc: ORC file format support
[ https://issues.apache.org/jira/browse/IMPALA-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481358#comment-16481358 ] Quanlong Huang commented on IMPALA-6714: Hi [~arodoni_cloudera], feel free to reassign this to me if you don't have enough capacity. :) > Impala 2.13 Doc: ORC file format support > > > Key: IMPALA-6714 > URL: https://issues.apache.org/jira/browse/IMPALA-6714 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Affects Versions: Impala 2.13.0 >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc > > ORC is a columnar file format much like Parquet. Docs: > [https://orc.apache.org/docs/index.html] > Currently, we only support reading primitive types in ORC files (on-going > works are tracked at IMPALA-6943). It's still an experimental feature which > can be disabled by setting startup option --enable_orc_scanner to false. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-7049) Scan node reservation calculation seems off
[ https://issues.apache.org/jira/browse/IMPALA-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Ho resolved IMPALA-7049. Resolution: Not A Bug Fix Version/s: Not Applicable Turns out that the new build wasn't deployed correctly. Sorry for the confusion. > Scan node reservation calculation seems off > --- > > Key: IMPALA-7049 > URL: https://issues.apache.org/jira/browse/IMPALA-7049 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.13.0, Impala 3.1.0 >Reporter: Michael Ho >Assignee: Tim Armstrong >Priority: Critical > Fix For: Not Applicable > > Attachments: profile.txt > > > Running the query TPC-DS Q77a with a memory limit, we ran into the error > *HDFS scan min reservation 0 must be >= min buffer size 8192*: > {noformat} > Query Type: QUERY > Query State: EXCEPTION > Query Status: HDFS scan min reservation 0 must be >= min buffer size 8192 > Sql Statement: /* Mem: 2375 MB. Coordinator: machine. */ > -- RESULT MISMATCH FROM ORIGINAL > -- FIXED. TAKE ACTUAL RESULT AS EXPECTED > with ss as > (select s_store_sk, > sum(ss_ext_sales_price) as sales, > sum(ss_net_profit) as profit > from store_sales, > date_dim, > store > where ss_sold_date_sk = d_date_sk >and cast(d_date as timestamp) between cast('2000-08-23' as timestamp) > and (cast('2000-08-23' as timestamp) + interval 30 days) >and ss_store_sk = s_store_sk > group by s_store_sk) > , > sr as > (select s_store_sk, > sum(sr_return_amt) as return_amt, > sum(sr_net_loss) as profit_loss > from store_returns, > date_dim, > store > where sr_returned_date_sk = d_date_sk >and cast(d_date as timestamp) between cast('2000-08-23' as timestamp) > and (cast('2000-08-23' as timestamp) + interval 30 days) >and sr_store_sk = s_store_sk > group by s_store_sk), > cs as > (select cs_call_center_sk, > sum(cs_ext_sales_price) as sales, > sum(cs_net_profit) as profit > from catalog_sales, > date_dim > where cs_sold_date_sk = d_date_sk >and cast(d_date as timestamp) between cast('2000-08-23' as timestamp) > and (cast('2000-08-23' as timestamp) + interval 30 days) > group by cs_call_center_sk > ), > cr as > (select cr_call_center_sk, > sum(cr_return_amount) as return_amt, > sum(cr_net_loss) as profit_loss > from catalog_returns, > date_dim > where cr_returned_date_sk = d_date_sk >and cast(d_date as timestamp) between cast('2000-08-23' as timestamp) > and (cast('2000-08-23' as timestamp) + interval 30 days) > group by cr_call_center_sk > ), > ws as > ( select wp_web_page_sk, > sum(ws_ext_sales_price) as sales, > sum(ws_net_profit) as profit > from web_sales, > date_dim, > web_page > where ws_sold_date_sk = d_date_sk >and cast(d_date as timestamp) between cast('2000-08-23' as timestamp) > and (cast('2000-08-23' as timestamp) + interval 30 days) >and ws_web_page_sk = wp_web_page_sk > group by wp_web_page_sk), > wr as > (select wp_web_page_sk, > sum(wr_return_amt) as return_amt, > sum(wr_net_loss) as profit_loss > from web_returns, > date_dim, > web_page > where wr_returned_date_sk = d_date_sk >and cast(d_date as timestamp) between cast('2000-08-23' as timestamp) > and (cast('2000-08-23' as timestamp) + interval 30 days) >and wr_web_page_sk = wp_web_page_sk > group by wp_web_page_sk) > , > results as > (select channel > , id > , sum(sales) as sales > , sum(return_amt) as return_amt > , sum(profit) as profit > from > (select 'store channel' as channel > , ss.s_store_sk as id > , sales > , coalesce(return_amt, 0) as return_amt > , (profit - coalesce(profit_loss,0)) as profit > from ss left join sr > on ss.s_store_sk = sr.s_store_sk > union all > select 'catalog channel' as channel > , cs_call_center_sk as id > , sales > , return_amt > , (profit - profit_loss) as profit > from cs >, cr > union all > select 'web channel' as channel > , ws.wp_web_page_sk as id > , sales > , coalesce(return_amt, 0) return_amt > , (profit - coalesce(profit_loss,0)) as profit > from ws left join wr > on ws.wp_web_page_sk = wr.wp_web_page_sk > ) x > group by channel, id ) > select * > from ( > select channel, id, sales, return_amt, profit from results > union > select channel, NULL AS id, sum(sales) as sales, sum(return_amt) as > return_amt, sum(profit) as
[jira] [Commented] (IMPALA-7049) Scan node reservation calculation seems off
[ https://issues.apache.org/jira/browse/IMPALA-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481320#comment-16481320 ] Tim Armstrong commented on IMPALA-7049: --- Attached a profile from an affected query. It doesn't really make sense to me - the plan looks like it's from before the below commit, but that's the commit that added the error message: {noformat} commit fb5dc9eb484e54cf9f37d06168392c5bc2a0f4fe Author: Tim ArmstrongDate: Sun Oct 29 12:38:47 2017 -0700 IMPALA-4835: switch I/O buffers to buffer pool {noformat} > Scan node reservation calculation seems off > --- > > Key: IMPALA-7049 > URL: https://issues.apache.org/jira/browse/IMPALA-7049 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.13.0, Impala 3.1.0 >Reporter: Michael Ho >Assignee: Tim Armstrong >Priority: Critical > Attachments: profile.txt > > > Running the query TPC-DS Q77a with a memory limit, we ran into the error > *HDFS scan min reservation 0 must be >= min buffer size 8192*: > {noformat} > Query Type: QUERY > Query State: EXCEPTION > Query Status: HDFS scan min reservation 0 must be >= min buffer size 8192 > Sql Statement: /* Mem: 2375 MB. Coordinator: machine. */ > -- RESULT MISMATCH FROM ORIGINAL > -- FIXED. TAKE ACTUAL RESULT AS EXPECTED > with ss as > (select s_store_sk, > sum(ss_ext_sales_price) as sales, > sum(ss_net_profit) as profit > from store_sales, > date_dim, > store > where ss_sold_date_sk = d_date_sk >and cast(d_date as timestamp) between cast('2000-08-23' as timestamp) > and (cast('2000-08-23' as timestamp) + interval 30 days) >and ss_store_sk = s_store_sk > group by s_store_sk) > , > sr as > (select s_store_sk, > sum(sr_return_amt) as return_amt, > sum(sr_net_loss) as profit_loss > from store_returns, > date_dim, > store > where sr_returned_date_sk = d_date_sk >and cast(d_date as timestamp) between cast('2000-08-23' as timestamp) > and (cast('2000-08-23' as timestamp) + interval 30 days) >and sr_store_sk = s_store_sk > group by s_store_sk), > cs as > (select cs_call_center_sk, > sum(cs_ext_sales_price) as sales, > sum(cs_net_profit) as profit > from catalog_sales, > date_dim > where cs_sold_date_sk = d_date_sk >and cast(d_date as timestamp) between cast('2000-08-23' as timestamp) > and (cast('2000-08-23' as timestamp) + interval 30 days) > group by cs_call_center_sk > ), > cr as > (select cr_call_center_sk, > sum(cr_return_amount) as return_amt, > sum(cr_net_loss) as profit_loss > from catalog_returns, > date_dim > where cr_returned_date_sk = d_date_sk >and cast(d_date as timestamp) between cast('2000-08-23' as timestamp) > and (cast('2000-08-23' as timestamp) + interval 30 days) > group by cr_call_center_sk > ), > ws as > ( select wp_web_page_sk, > sum(ws_ext_sales_price) as sales, > sum(ws_net_profit) as profit > from web_sales, > date_dim, > web_page > where ws_sold_date_sk = d_date_sk >and cast(d_date as timestamp) between cast('2000-08-23' as timestamp) > and (cast('2000-08-23' as timestamp) + interval 30 days) >and ws_web_page_sk = wp_web_page_sk > group by wp_web_page_sk), > wr as > (select wp_web_page_sk, > sum(wr_return_amt) as return_amt, > sum(wr_net_loss) as profit_loss > from web_returns, > date_dim, > web_page > where wr_returned_date_sk = d_date_sk >and cast(d_date as timestamp) between cast('2000-08-23' as timestamp) > and (cast('2000-08-23' as timestamp) + interval 30 days) >and wr_web_page_sk = wp_web_page_sk > group by wp_web_page_sk) > , > results as > (select channel > , id > , sum(sales) as sales > , sum(return_amt) as return_amt > , sum(profit) as profit > from > (select 'store channel' as channel > , ss.s_store_sk as id > , sales > , coalesce(return_amt, 0) as return_amt > , (profit - coalesce(profit_loss,0)) as profit > from ss left join sr > on ss.s_store_sk = sr.s_store_sk > union all > select 'catalog channel' as channel > , cs_call_center_sk as id > , sales > , return_amt > , (profit - profit_loss) as profit > from cs >, cr > union all > select 'web channel' as channel > , ws.wp_web_page_sk as id > , sales > , coalesce(return_amt, 0) return_amt > , (profit - coalesce(profit_loss,0)) as profit > from ws left join wr > on
[jira] [Created] (IMPALA-7049) Scan node reservation calculation seems off
Michael Ho created IMPALA-7049: -- Summary: Scan node reservation calculation seems off Key: IMPALA-7049 URL: https://issues.apache.org/jira/browse/IMPALA-7049 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 2.13.0, Impala 3.1.0 Reporter: Michael Ho Assignee: Tim Armstrong Running the query TPC-DS Q77a with a memory limit, we ran into the error *HDFS scan min reservation 0 must be >= min buffer size 8192*: {noformat} Query Type: QUERY Query State: EXCEPTION Query Status: HDFS scan min reservation 0 must be >= min buffer size 8192 Sql Statement: /* Mem: 2375 MB. Coordinator: machine. */ -- RESULT MISMATCH FROM ORIGINAL -- FIXED. TAKE ACTUAL RESULT AS EXPECTED with ss as (select s_store_sk, sum(ss_ext_sales_price) as sales, sum(ss_net_profit) as profit from store_sales, date_dim, store where ss_sold_date_sk = d_date_sk and cast(d_date as timestamp) between cast('2000-08-23' as timestamp) and (cast('2000-08-23' as timestamp) + interval 30 days) and ss_store_sk = s_store_sk group by s_store_sk) , sr as (select s_store_sk, sum(sr_return_amt) as return_amt, sum(sr_net_loss) as profit_loss from store_returns, date_dim, store where sr_returned_date_sk = d_date_sk and cast(d_date as timestamp) between cast('2000-08-23' as timestamp) and (cast('2000-08-23' as timestamp) + interval 30 days) and sr_store_sk = s_store_sk group by s_store_sk), cs as (select cs_call_center_sk, sum(cs_ext_sales_price) as sales, sum(cs_net_profit) as profit from catalog_sales, date_dim where cs_sold_date_sk = d_date_sk and cast(d_date as timestamp) between cast('2000-08-23' as timestamp) and (cast('2000-08-23' as timestamp) + interval 30 days) group by cs_call_center_sk ), cr as (select cr_call_center_sk, sum(cr_return_amount) as return_amt, sum(cr_net_loss) as profit_loss from catalog_returns, date_dim where cr_returned_date_sk = d_date_sk and cast(d_date as timestamp) between cast('2000-08-23' as timestamp) and (cast('2000-08-23' as timestamp) + interval 30 days) group by cr_call_center_sk ), ws as ( select wp_web_page_sk, sum(ws_ext_sales_price) as sales, sum(ws_net_profit) as profit from web_sales, date_dim, web_page where ws_sold_date_sk = d_date_sk and cast(d_date as timestamp) between cast('2000-08-23' as timestamp) and (cast('2000-08-23' as timestamp) + interval 30 days) and ws_web_page_sk = wp_web_page_sk group by wp_web_page_sk), wr as (select wp_web_page_sk, sum(wr_return_amt) as return_amt, sum(wr_net_loss) as profit_loss from web_returns, date_dim, web_page where wr_returned_date_sk = d_date_sk and cast(d_date as timestamp) between cast('2000-08-23' as timestamp) and (cast('2000-08-23' as timestamp) + interval 30 days) and wr_web_page_sk = wp_web_page_sk group by wp_web_page_sk) , results as (select channel , id , sum(sales) as sales , sum(return_amt) as return_amt , sum(profit) as profit from (select 'store channel' as channel , ss.s_store_sk as id , sales , coalesce(return_amt, 0) as return_amt , (profit - coalesce(profit_loss,0)) as profit from ss left join sr on ss.s_store_sk = sr.s_store_sk union all select 'catalog channel' as channel , cs_call_center_sk as id , sales , return_amt , (profit - profit_loss) as profit from cs , cr union all select 'web channel' as channel , ws.wp_web_page_sk as id , sales , coalesce(return_amt, 0) return_amt , (profit - coalesce(profit_loss,0)) as profit from ws left join wr on ws.wp_web_page_sk = wr.wp_web_page_sk ) x group by channel, id ) select * from ( select channel, id, sales, return_amt, profit from results union select channel, NULL AS id, sum(sales) as sales, sum(return_amt) as return_amt, sum(profit) as profit from results group by channel union select NULL AS channel, NULL AS id, sum(sales) as sales, sum(return_amt) as return_amt, sum(profit) as profit from results ) foo order by channel, id limit 100; Coordinator: machine Query Options (set by configuration): ABORT_ON_ERROR=1,MEM_LIMIT=2490368000 Query Options (set by configuration and planner): ABORT_ON_ERROR=1,MEM_LIMIT=2490368000,MT_DOP=0 Plan: {noformat} According to the code, the reservation for the scan node is supposed to be computed correctly in the FE but this doesn't appear to be the case {noformat} // Check if reservation was enough to allocate at least one
[jira] [Created] (IMPALA-7048) Failed test: query_test.test_parquet_page_index.TestHdfsParquetTableIndexWriter.test_write_index_many_columns_tables
Dimitris Tsirogiannis created IMPALA-7048: - Summary: Failed test: query_test.test_parquet_page_index.TestHdfsParquetTableIndexWriter.test_write_index_many_columns_tables Key: IMPALA-7048 URL: https://issues.apache.org/jira/browse/IMPALA-7048 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Dimitris Tsirogiannis Assignee: Zoltán Borók-Nagy The following test fails when the filesystem is LOCAL: {code:java} query_test.test_parquet_page_index.TestHdfsParquetTableIndexWriter.test_write_index_many_columns_tables[exec_option: \{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] (from pytest) {code} Zoltan, assigning to you since this looks suspiciously related to the fix for IMPALA-5842. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-6909) Impala 2.13 Doc: SET ROW FORMAT in ALTER TABLE
[ https://issues.apache.org/jira/browse/IMPALA-6909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni updated IMPALA-6909: Affects Version/s: (was: Impala 2.13.0) Impala 2.12.0 > Impala 2.13 Doc: SET ROW FORMAT in ALTER TABLE > -- > > Key: IMPALA-6909 > URL: https://issues.apache.org/jira/browse/IMPALA-6909 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Affects Versions: Impala 2.12.0 >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > > With IMPALA-4323 the ALTER TABLE statement has a new option to set "ROW > FORMAT". The format is the same as the ROW FORMAT parameters that currently > exist with the CREATE TABLE statement. The documents need to be updated for > ALTER TABLE to reflect the new functionality. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-6909) Impala 2.13 Doc: SET ROW FORMAT in ALTER TABLE
[ https://issues.apache.org/jira/browse/IMPALA-6909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni updated IMPALA-6909: Description: With IMPALA-4323 the ALTER TABLE statement has a new option to set "ROW FORMAT". The format is the same as the ROW FORMAT parameters that currently exist with the CREATE TABLE statement. The documents need to be updated for ALTER TABLE to reflect the new functionality. > Impala 2.13 Doc: SET ROW FORMAT in ALTER TABLE > -- > > Key: IMPALA-6909 > URL: https://issues.apache.org/jira/browse/IMPALA-6909 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Affects Versions: Impala 2.13.0 >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc > > With IMPALA-4323 the ALTER TABLE statement has a new option to set "ROW > FORMAT". The format is the same as the ROW FORMAT parameters that currently > exist with the CREATE TABLE statement. The documents need to be updated for > ALTER TABLE to reflect the new functionality. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-6994) Avoid reloading a table's HMS data for file-only operations
[ https://issues.apache.org/jira/browse/IMPALA-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481014#comment-16481014 ] Pranay Singh commented on IMPALA-6994: -- TUpdateCatalogResponse updateCatalog(TUpdateCatalogRequest update) does call currently with false option to loadTableMetadata(). loadTableMetadata calls ((HdfsTable) tbl).load which in turn calls updatePartitionsFromHms() which calls loadMetadataAndDiskIds() to load file metadata via Hdfs and also make a call to HMS to reload the partition. Now for the case when there is a create/recreate/drop of a partition we need to call HMS so I think this is the place we can skip calling HMS only if an existing partition is being updated. -Pranay > Avoid reloading a table's HMS data for file-only operations > --- > > Key: IMPALA-6994 > URL: https://issues.apache.org/jira/browse/IMPALA-6994 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 2.12.0 >Reporter: Balazs Jeszenszky >Assignee: Pranay Singh >Priority: Major > > Reloading file metadata for HDFS tables (e.g. as a final step in an 'insert') > is done via > https://github.com/apache/impala/blob/branch-2.12.0/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L628 > , which calls > https://github.com/apache/impala/blob/branch-2.12.0/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1243 > HdfsTable.load has no option to only load file metadata. HMS metadata will > also be reloaded every time, which is an unnecessary overhead (and potential > point of failure) when adding files to existing locations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-6994) Avoid reloading a table's HMS data for file-only operations
[ https://issues.apache.org/jira/browse/IMPALA-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16480978#comment-16480978 ] bharath v commented on IMPALA-6994: --- Which exact loadTableMetadata() (line number/stack) we are trying to optimize here? Isn't the fix as simple as setting reloadTableSchema=false or am I missing something here? > Avoid reloading a table's HMS data for file-only operations > --- > > Key: IMPALA-6994 > URL: https://issues.apache.org/jira/browse/IMPALA-6994 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 2.12.0 >Reporter: Balazs Jeszenszky >Assignee: Pranay Singh >Priority: Major > > Reloading file metadata for HDFS tables (e.g. as a final step in an 'insert') > is done via > https://github.com/apache/impala/blob/branch-2.12.0/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L628 > , which calls > https://github.com/apache/impala/blob/branch-2.12.0/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1243 > HdfsTable.load has no option to only load file metadata. HMS metadata will > also be reloaded every time, which is an unnecessary overhead (and potential > point of failure) when adding files to existing locations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org