[jira] [Commented] (IMPALA-6070) Speed up test execution

2018-05-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481444#comment-16481444
 ] 

ASF subversion and git services commented on IMPALA-6070:
-

Commit 85ed7ae88bcec17ffd45b1dd66d07818cb1d55b0 in impala's branch 
refs/heads/master from [~philip]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=85ed7ae ]

IMPALA-6070: Adding ASAN, --tail to test-with-docker.

* Adds -ASAN suites to test-with-docker.
* Adds --tail flag, which starts a tail subprocess. This
  isn't pretty (there's potential for overlap), but it's a dead simple
  way to keep an eye on what's going on.
* Fixes a bug wherein I could call "docker rm " twice
  simultaneously, which would make Docker fail the second call,
  and then fail the related "docker rmi". It's better to serialize,
  and I did that with a simple lock.

Change-Id: I51451cdf1352fc0f9516d729b9a77700488d993f
Reviewed-on: http://gerrit.cloudera.org:8080/10319
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 


> Speed up test execution
> ---
>
> Key: IMPALA-6070
> URL: https://issues.apache.org/jira/browse/IMPALA-6070
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Philip Zeyliger
>Assignee: Philip Zeyliger
>Priority: Major
> Attachments: screenshot-1.png
>
>
> Our tests (e.g., 
> https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/buildTimeTrend) tend 
> to take about 4 hours. This can be improved.
> I'm opening this JIRA track those changes. I'm currently looking at:
> * Parallelizing multiple data-load steps: TPC-DS, TPC-H, and Functional take 
> ~65 minutes when serialized. They take 35 minutes if running in parallel.
> * Parallelizing compute stats: this takes ~10 minutes; probably can be faster.
> The trickier thing is parallelizing fe tests, ee tests, and custom cluster 
> tests. The approach I'm taking is to create a docker container with 
> everything in it (including data load), and then running tests in parallel. 
> This is a bit messier, but I think it has some legs when it comes to using 
> machines with many cores.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7035) Impala HDFS Encryption tests failing after OpenJDK update

2018-05-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-7035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481443#comment-16481443
 ] 

ASF subversion and git services commented on IMPALA-7035:
-

Commit 24207bbdde98baed8e0b48aadc606dfe89ad3b0a in impala's branch 
refs/heads/2.x from [~philip]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=24207bb ]

IMPALA-7035: Configure jceks.key.serialFilter for KMS.

Configures a Java property for KMS to account for JDK 8u171's security fixes. I
was seeing impala-py.test tests/metadata/test_hdfs_encryption.py fail with the
following error:

  AssertionError: Error creating encryption zone: RemoteException: Can't 
recover key for testkey1 from keystore 
file:/home/impdev/Impala/testdata/cluster/cdh6/node-1/data/kms.keystore

The issue is described in HDFS-13494, and I imagine it'll be fixed in due time. 
In the
meanwhile, setting this property seems to do the trick.

Change-Id: I2d21c9cce3b91e8fd8b2b4f1cda75e3958c977d5
Reviewed-on: http://gerrit.cloudera.org:8080/10418
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 
Reviewed-on: http://gerrit.cloudera.org:8080/10446


> Impala HDFS Encryption tests failing after OpenJDK update
> -
>
> Key: IMPALA-7035
> URL: https://issues.apache.org/jira/browse/IMPALA-7035
> Project: IMPALA
>  Issue Type: Task
>Reporter: Philip Zeyliger
>Assignee: Philip Zeyliger
>Priority: Major
>
> I have seen {{impala-py.test tests/metadata/test_hdfs_encryption.py}} fail 
> with the following error:
> {{E AssertionError: Error creating encryption zone: RemoteException: Can't 
> recover key for testkey1 from keystore 
> [file:/home/impdev/Impala/testdata/cluster/cdh6/node-1/data/kms.keystore|file:///home/impdev/Impala/testdata/cluster/cdh6/node-1/data/kms.keystore]}}
> I believe what's going on is described in 
> https://issues.apache.org/jira/browse/HDFS-13494. In short, the JDK now has a 
> special whitelist for an API as a result of a security vulnerability.
> A workaround in the KMS init script to configure $HADOOP_OPTS seems to do the 
> trick.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7051) Concurrent Maven invocations can break build

2018-05-18 Thread Philip Zeyliger (JIRA)
Philip Zeyliger created IMPALA-7051:
---

 Summary: Concurrent Maven invocations can break build
 Key: IMPALA-7051
 URL: https://issues.apache.org/jira/browse/IMPALA-7051
 Project: IMPALA
  Issue Type: Task
  Components: Infrastructure
Reporter: Philip Zeyliger
Assignee: Philip Zeyliger


Rarely I've seen our build fail when executing two Maven targets 
simultaneously. Maven isn't really safe for concurrent execution (e.g., 
~/.m2/repository has no locking).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7050) Impala Doc: Document inc_stats_size_limit_bytes command line option for Impalad

2018-05-18 Thread Alex Rodoni (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481391#comment-16481391
 ] 

Alex Rodoni commented on IMPALA-7050:
-

https://gerrit.cloudera.org/#/c/10457/

> Impala Doc: Document inc_stats_size_limit_bytes command line option for 
> Impalad
> ---
>
> Key: IMPALA-7050
> URL: https://issues.apache.org/jira/browse/IMPALA-7050
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Affects Versions: Impala 2.8.0
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>
> nformation -
> ==
> -inc_stats_size_limit_bytes (Maximum size of incremental stats the catalog
> is allowed to serialize per table. This limit is set as a safety check,
> to prevent the JVM from hitting a maximum array limit of 1GB (or OOM)
> while building the thrift objects to send to impalads. By default, it's
> set to 200MB) type: int64 default: 209715200
> ==
> Way to modify:
> 
> You can change inc_stats_size_limit_bytes value using the below steps:
> 1. CM > Impala Service > Configuration > Search Impala Command Line Argument 
> Advanced Configuration Snippet (Safety Valve)
> 2. Please input -inc_stats_size_limit_bytes= 
> Please note that: you have to input the integer value in  
> For example, if you want to set 1GB, please input 1048576000(1024*1024*1024).
> 3. Please save it and restart the impala service



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7050) Impala Doc: Document inc_stats_size_limit_bytes command line option for Impalad

2018-05-18 Thread Alex Rodoni (JIRA)
Alex Rodoni created IMPALA-7050:
---

 Summary: Impala Doc: Document inc_stats_size_limit_bytes command 
line option for Impalad
 Key: IMPALA-7050
 URL: https://issues.apache.org/jira/browse/IMPALA-7050
 Project: IMPALA
  Issue Type: Task
  Components: Docs
Affects Versions: Impala 2.8.0
Reporter: Alex Rodoni
Assignee: Alex Rodoni


nformation -
==
-inc_stats_size_limit_bytes (Maximum size of incremental stats the catalog
is allowed to serialize per table. This limit is set as a safety check,
to prevent the JVM from hitting a maximum array limit of 1GB (or OOM)
while building the thrift objects to send to impalads. By default, it's
set to 200MB) type: int64 default: 209715200
==

Way to modify:

You can change inc_stats_size_limit_bytes value using the below steps:
1. CM > Impala Service > Configuration > Search Impala Command Line Argument 
Advanced Configuration Snippet (Safety Valve)
2. Please input -inc_stats_size_limit_bytes= 
Please note that: you have to input the integer value in  
For example, if you want to set 1GB, please input 1048576000(1024*1024*1024).
3. Please save it and restart the impala service



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6714) Impala 2.13 Doc: ORC file format support

2018-05-18 Thread Alex Rodoni (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481363#comment-16481363
 ] 

Alex Rodoni commented on IMPALA-6714:
-

Thank you [~stiga-huang]!

> Impala 2.13 Doc: ORC file format support
> 
>
> Key: IMPALA-6714
> URL: https://issues.apache.org/jira/browse/IMPALA-6714
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Affects Versions: Impala 2.13.0
>Reporter: Alex Rodoni
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: future_release_doc
>
> ORC is a columnar file format much like Parquet. Docs: 
> [https://orc.apache.org/docs/index.html]
> Currently, we only support reading primitive types in ORC files (on-going 
> works are tracked at IMPALA-6943). It's still an experimental feature which 
> can be disabled by setting startup option --enable_orc_scanner to false.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6714) Impala 2.13 Doc: ORC file format support

2018-05-18 Thread Quanlong Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481358#comment-16481358
 ] 

Quanlong Huang commented on IMPALA-6714:


Hi [~arodoni_cloudera], feel free to reassign this to me if you don't have 
enough capacity. :)

> Impala 2.13 Doc: ORC file format support
> 
>
> Key: IMPALA-6714
> URL: https://issues.apache.org/jira/browse/IMPALA-6714
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Affects Versions: Impala 2.13.0
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc
>
> ORC is a columnar file format much like Parquet. Docs: 
> [https://orc.apache.org/docs/index.html]
> Currently, we only support reading primitive types in ORC files (on-going 
> works are tracked at IMPALA-6943). It's still an experimental feature which 
> can be disabled by setting startup option --enable_orc_scanner to false.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7049) Scan node reservation calculation seems off

2018-05-18 Thread Michael Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho resolved IMPALA-7049.

   Resolution: Not A Bug
Fix Version/s: Not Applicable

Turns out that the new build wasn't deployed correctly. Sorry for the confusion.

> Scan node reservation calculation seems off
> ---
>
> Key: IMPALA-7049
> URL: https://issues.apache.org/jira/browse/IMPALA-7049
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.13.0, Impala 3.1.0
>Reporter: Michael Ho
>Assignee: Tim Armstrong
>Priority: Critical
> Fix For: Not Applicable
>
> Attachments: profile.txt
>
>
> Running the query TPC-DS Q77a with a memory limit, we ran into the error 
> *HDFS scan min reservation 0 must be >= min buffer size 8192*:
> {noformat}
> Query Type: QUERY
> Query State: EXCEPTION
> Query Status: HDFS scan min reservation 0 must be >= min buffer size 8192
> Sql Statement: /* Mem: 2375 MB. Coordinator: machine. */
> -- RESULT MISMATCH FROM ORIGINAL
> -- FIXED. TAKE ACTUAL RESULT AS EXPECTED
> with ss as
>  (select s_store_sk,
>  sum(ss_ext_sales_price) as sales,
>  sum(ss_net_profit) as profit
>  from store_sales,
>   date_dim,
>   store
>  where ss_sold_date_sk = d_date_sk
>and cast(d_date as timestamp) between cast('2000-08-23' as timestamp)
>   and (cast('2000-08-23' as timestamp) + interval 30 days)
>and ss_store_sk = s_store_sk
>  group by s_store_sk)
>  ,
>  sr as
>  (select s_store_sk,
>  sum(sr_return_amt) as return_amt,
>  sum(sr_net_loss) as profit_loss
>  from store_returns,
>   date_dim,
>   store
>  where sr_returned_date_sk = d_date_sk
>and cast(d_date as timestamp) between cast('2000-08-23' as timestamp)
>   and (cast('2000-08-23' as timestamp) + interval 30 days)
>and sr_store_sk = s_store_sk
>  group by s_store_sk),
>  cs as
>  (select cs_call_center_sk,
> sum(cs_ext_sales_price) as sales,
> sum(cs_net_profit) as profit
>  from catalog_sales,
>   date_dim
>  where cs_sold_date_sk = d_date_sk
>and cast(d_date as timestamp) between cast('2000-08-23' as timestamp)
>   and (cast('2000-08-23' as timestamp) + interval 30 days)
>  group by cs_call_center_sk
>  ),
>  cr as
>  (select cr_call_center_sk,
>  sum(cr_return_amount) as return_amt,
>  sum(cr_net_loss) as profit_loss
>  from catalog_returns,
>   date_dim
>  where cr_returned_date_sk = d_date_sk
>and cast(d_date as timestamp) between cast('2000-08-23' as timestamp)
>   and (cast('2000-08-23' as timestamp) + interval 30 days)
>  group by cr_call_center_sk
>  ),
>  ws as
>  ( select wp_web_page_sk,
> sum(ws_ext_sales_price) as sales,
> sum(ws_net_profit) as profit
>  from web_sales,
>   date_dim,
>   web_page
>  where ws_sold_date_sk = d_date_sk
>and cast(d_date as timestamp) between cast('2000-08-23' as timestamp)
>   and (cast('2000-08-23' as timestamp) + interval 30 days)
>and ws_web_page_sk = wp_web_page_sk
>  group by wp_web_page_sk),
>  wr as
>  (select wp_web_page_sk,
> sum(wr_return_amt) as return_amt,
> sum(wr_net_loss) as profit_loss
>  from web_returns,
>   date_dim,
>   web_page
>  where wr_returned_date_sk = d_date_sk
>and cast(d_date as timestamp) between cast('2000-08-23' as timestamp)
>   and (cast('2000-08-23' as timestamp) + interval 30 days)
>and wr_web_page_sk = wp_web_page_sk
>  group by wp_web_page_sk)
>  ,
>  results as
>  (select channel
> , id
> , sum(sales) as sales
> , sum(return_amt) as return_amt
> , sum(profit) as profit
>  from
>  (select 'store channel' as channel
> , ss.s_store_sk as id
> , sales
> , coalesce(return_amt, 0) as return_amt
> , (profit - coalesce(profit_loss,0)) as profit
>  from   ss left join sr
> on  ss.s_store_sk = sr.s_store_sk
>  union all
>  select 'catalog channel' as channel
> , cs_call_center_sk as id
> , sales
> , return_amt
> , (profit - profit_loss) as profit
>  from  cs
>, cr
>  union all
>  select 'web channel' as channel
> , ws.wp_web_page_sk as id
> , sales
> , coalesce(return_amt, 0) return_amt
> , (profit - coalesce(profit_loss,0)) as profit
>  from   ws left join wr
> on  ws.wp_web_page_sk = wr.wp_web_page_sk
>  ) x
>  group by channel, id )
>   select  *
>  from (
>  select channel, id, sales, return_amt, profit from  results
>  union
>  select channel, NULL AS id, sum(sales) as sales, sum(return_amt) as 
> return_amt, sum(profit) as 

[jira] [Commented] (IMPALA-7049) Scan node reservation calculation seems off

2018-05-18 Thread Tim Armstrong (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481320#comment-16481320
 ] 

Tim Armstrong commented on IMPALA-7049:
---

Attached a profile from an affected query. It doesn't really make sense to me - 
the plan looks like it's from before the below commit, but that's the commit 
that added the error message:

{noformat}

commit fb5dc9eb484e54cf9f37d06168392c5bc2a0f4fe
Author: Tim Armstrong 
Date:   Sun Oct 29 12:38:47 2017 -0700

IMPALA-4835: switch I/O buffers to buffer pool
{noformat}

> Scan node reservation calculation seems off
> ---
>
> Key: IMPALA-7049
> URL: https://issues.apache.org/jira/browse/IMPALA-7049
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.13.0, Impala 3.1.0
>Reporter: Michael Ho
>Assignee: Tim Armstrong
>Priority: Critical
> Attachments: profile.txt
>
>
> Running the query TPC-DS Q77a with a memory limit, we ran into the error 
> *HDFS scan min reservation 0 must be >= min buffer size 8192*:
> {noformat}
> Query Type: QUERY
> Query State: EXCEPTION
> Query Status: HDFS scan min reservation 0 must be >= min buffer size 8192
> Sql Statement: /* Mem: 2375 MB. Coordinator: machine. */
> -- RESULT MISMATCH FROM ORIGINAL
> -- FIXED. TAKE ACTUAL RESULT AS EXPECTED
> with ss as
>  (select s_store_sk,
>  sum(ss_ext_sales_price) as sales,
>  sum(ss_net_profit) as profit
>  from store_sales,
>   date_dim,
>   store
>  where ss_sold_date_sk = d_date_sk
>and cast(d_date as timestamp) between cast('2000-08-23' as timestamp)
>   and (cast('2000-08-23' as timestamp) + interval 30 days)
>and ss_store_sk = s_store_sk
>  group by s_store_sk)
>  ,
>  sr as
>  (select s_store_sk,
>  sum(sr_return_amt) as return_amt,
>  sum(sr_net_loss) as profit_loss
>  from store_returns,
>   date_dim,
>   store
>  where sr_returned_date_sk = d_date_sk
>and cast(d_date as timestamp) between cast('2000-08-23' as timestamp)
>   and (cast('2000-08-23' as timestamp) + interval 30 days)
>and sr_store_sk = s_store_sk
>  group by s_store_sk),
>  cs as
>  (select cs_call_center_sk,
> sum(cs_ext_sales_price) as sales,
> sum(cs_net_profit) as profit
>  from catalog_sales,
>   date_dim
>  where cs_sold_date_sk = d_date_sk
>and cast(d_date as timestamp) between cast('2000-08-23' as timestamp)
>   and (cast('2000-08-23' as timestamp) + interval 30 days)
>  group by cs_call_center_sk
>  ),
>  cr as
>  (select cr_call_center_sk,
>  sum(cr_return_amount) as return_amt,
>  sum(cr_net_loss) as profit_loss
>  from catalog_returns,
>   date_dim
>  where cr_returned_date_sk = d_date_sk
>and cast(d_date as timestamp) between cast('2000-08-23' as timestamp)
>   and (cast('2000-08-23' as timestamp) + interval 30 days)
>  group by cr_call_center_sk
>  ),
>  ws as
>  ( select wp_web_page_sk,
> sum(ws_ext_sales_price) as sales,
> sum(ws_net_profit) as profit
>  from web_sales,
>   date_dim,
>   web_page
>  where ws_sold_date_sk = d_date_sk
>and cast(d_date as timestamp) between cast('2000-08-23' as timestamp)
>   and (cast('2000-08-23' as timestamp) + interval 30 days)
>and ws_web_page_sk = wp_web_page_sk
>  group by wp_web_page_sk),
>  wr as
>  (select wp_web_page_sk,
> sum(wr_return_amt) as return_amt,
> sum(wr_net_loss) as profit_loss
>  from web_returns,
>   date_dim,
>   web_page
>  where wr_returned_date_sk = d_date_sk
>and cast(d_date as timestamp) between cast('2000-08-23' as timestamp)
>   and (cast('2000-08-23' as timestamp) + interval 30 days)
>and wr_web_page_sk = wp_web_page_sk
>  group by wp_web_page_sk)
>  ,
>  results as
>  (select channel
> , id
> , sum(sales) as sales
> , sum(return_amt) as return_amt
> , sum(profit) as profit
>  from
>  (select 'store channel' as channel
> , ss.s_store_sk as id
> , sales
> , coalesce(return_amt, 0) as return_amt
> , (profit - coalesce(profit_loss,0)) as profit
>  from   ss left join sr
> on  ss.s_store_sk = sr.s_store_sk
>  union all
>  select 'catalog channel' as channel
> , cs_call_center_sk as id
> , sales
> , return_amt
> , (profit - profit_loss) as profit
>  from  cs
>, cr
>  union all
>  select 'web channel' as channel
> , ws.wp_web_page_sk as id
> , sales
> , coalesce(return_amt, 0) return_amt
> , (profit - coalesce(profit_loss,0)) as profit
>  from   ws left join wr
> on  

[jira] [Created] (IMPALA-7049) Scan node reservation calculation seems off

2018-05-18 Thread Michael Ho (JIRA)
Michael Ho created IMPALA-7049:
--

 Summary: Scan node reservation calculation seems off
 Key: IMPALA-7049
 URL: https://issues.apache.org/jira/browse/IMPALA-7049
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 2.13.0, Impala 3.1.0
Reporter: Michael Ho
Assignee: Tim Armstrong


Running the query TPC-DS Q77a with a memory limit, we ran into the error *HDFS 
scan min reservation 0 must be >= min buffer size 8192*:

{noformat}
Query Type: QUERY
Query State: EXCEPTION
Query Status: HDFS scan min reservation 0 must be >= min buffer size 8192
Sql Statement: /* Mem: 2375 MB. Coordinator: machine. */
-- RESULT MISMATCH FROM ORIGINAL
-- FIXED. TAKE ACTUAL RESULT AS EXPECTED
with ss as
 (select s_store_sk,
 sum(ss_ext_sales_price) as sales,
 sum(ss_net_profit) as profit
 from store_sales,
  date_dim,
  store
 where ss_sold_date_sk = d_date_sk
   and cast(d_date as timestamp) between cast('2000-08-23' as timestamp)
  and (cast('2000-08-23' as timestamp) + interval 30 days)
   and ss_store_sk = s_store_sk
 group by s_store_sk)
 ,
 sr as
 (select s_store_sk,
 sum(sr_return_amt) as return_amt,
 sum(sr_net_loss) as profit_loss
 from store_returns,
  date_dim,
  store
 where sr_returned_date_sk = d_date_sk
   and cast(d_date as timestamp) between cast('2000-08-23' as timestamp)
  and (cast('2000-08-23' as timestamp) + interval 30 days)
   and sr_store_sk = s_store_sk
 group by s_store_sk),
 cs as
 (select cs_call_center_sk,
sum(cs_ext_sales_price) as sales,
sum(cs_net_profit) as profit
 from catalog_sales,
  date_dim
 where cs_sold_date_sk = d_date_sk
   and cast(d_date as timestamp) between cast('2000-08-23' as timestamp)
  and (cast('2000-08-23' as timestamp) + interval 30 days)
 group by cs_call_center_sk
 ),
 cr as
 (select cr_call_center_sk,
 sum(cr_return_amount) as return_amt,
 sum(cr_net_loss) as profit_loss
 from catalog_returns,
  date_dim
 where cr_returned_date_sk = d_date_sk
   and cast(d_date as timestamp) between cast('2000-08-23' as timestamp)
  and (cast('2000-08-23' as timestamp) + interval 30 days)
 group by cr_call_center_sk
 ),
 ws as
 ( select wp_web_page_sk,
sum(ws_ext_sales_price) as sales,
sum(ws_net_profit) as profit
 from web_sales,
  date_dim,
  web_page
 where ws_sold_date_sk = d_date_sk
   and cast(d_date as timestamp) between cast('2000-08-23' as timestamp)
  and (cast('2000-08-23' as timestamp) + interval 30 days)
   and ws_web_page_sk = wp_web_page_sk
 group by wp_web_page_sk),
 wr as
 (select wp_web_page_sk,
sum(wr_return_amt) as return_amt,
sum(wr_net_loss) as profit_loss
 from web_returns,
  date_dim,
  web_page
 where wr_returned_date_sk = d_date_sk
   and cast(d_date as timestamp) between cast('2000-08-23' as timestamp)
  and (cast('2000-08-23' as timestamp) + interval 30 days)
   and wr_web_page_sk = wp_web_page_sk
 group by wp_web_page_sk)
 ,
 results as
 (select channel
, id
, sum(sales) as sales
, sum(return_amt) as return_amt
, sum(profit) as profit
 from
 (select 'store channel' as channel
, ss.s_store_sk as id
, sales
, coalesce(return_amt, 0) as return_amt
, (profit - coalesce(profit_loss,0)) as profit
 from   ss left join sr
on  ss.s_store_sk = sr.s_store_sk
 union all
 select 'catalog channel' as channel
, cs_call_center_sk as id
, sales
, return_amt
, (profit - profit_loss) as profit
 from  cs
   , cr
 union all
 select 'web channel' as channel
, ws.wp_web_page_sk as id
, sales
, coalesce(return_amt, 0) return_amt
, (profit - coalesce(profit_loss,0)) as profit
 from   ws left join wr
on  ws.wp_web_page_sk = wr.wp_web_page_sk
 ) x
 group by channel, id )

  select  *
 from (
 select channel, id, sales, return_amt, profit from  results
 union
 select channel, NULL AS id, sum(sales) as sales, sum(return_amt) as 
return_amt, sum(profit) as profit from  results group by channel
 union
 select NULL AS channel, NULL AS id, sum(sales) as sales, sum(return_amt) as 
return_amt, sum(profit) as profit from  results
) foo
order by channel, id
 limit 100;

Coordinator: machine
Query Options (set by configuration): ABORT_ON_ERROR=1,MEM_LIMIT=2490368000
Query Options (set by configuration and planner): 
ABORT_ON_ERROR=1,MEM_LIMIT=2490368000,MT_DOP=0
Plan: 
 {noformat}

According to the code, the reservation for the scan node is supposed to be 
computed correctly in the FE but this doesn't appear to be the case
{noformat}
  // Check if reservation was enough to allocate at least one 

[jira] [Created] (IMPALA-7048) Failed test: query_test.test_parquet_page_index.TestHdfsParquetTableIndexWriter.test_write_index_many_columns_tables

2018-05-18 Thread Dimitris Tsirogiannis (JIRA)
Dimitris Tsirogiannis created IMPALA-7048:
-

 Summary: Failed test: 
query_test.test_parquet_page_index.TestHdfsParquetTableIndexWriter.test_write_index_many_columns_tables
 Key: IMPALA-7048
 URL: https://issues.apache.org/jira/browse/IMPALA-7048
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Dimitris Tsirogiannis
Assignee: Zoltán Borók-Nagy


The following test fails when the filesystem is LOCAL:
{code:java}
query_test.test_parquet_page_index.TestHdfsParquetTableIndexWriter.test_write_index_many_columns_tables[exec_option:
 \{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
'exec_single_node_rows_threshold': 0} | table_format: parquet/none] (from 
pytest) {code}
Zoltan, assigning to you since this looks suspiciously related to the fix for 
IMPALA-5842. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6909) Impala 2.13 Doc: SET ROW FORMAT in ALTER TABLE

2018-05-18 Thread Alex Rodoni (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni updated IMPALA-6909:

Affects Version/s: (was: Impala 2.13.0)
   Impala 2.12.0

> Impala 2.13 Doc: SET ROW FORMAT in ALTER TABLE
> --
>
> Key: IMPALA-6909
> URL: https://issues.apache.org/jira/browse/IMPALA-6909
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Affects Versions: Impala 2.12.0
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>
> With IMPALA-4323 the ALTER TABLE statement has a new option to set "ROW 
> FORMAT".  The format is the same as the ROW FORMAT parameters that currently 
> exist with the CREATE TABLE statement.  The documents need to be updated for 
> ALTER TABLE to reflect the new functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6909) Impala 2.13 Doc: SET ROW FORMAT in ALTER TABLE

2018-05-18 Thread Alex Rodoni (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni updated IMPALA-6909:

Description: With IMPALA-4323 the ALTER TABLE statement has a new option to 
set "ROW FORMAT".  The format is the same as the ROW FORMAT parameters that 
currently exist with the CREATE TABLE statement.  The documents need to be 
updated for ALTER TABLE to reflect the new functionality.

> Impala 2.13 Doc: SET ROW FORMAT in ALTER TABLE
> --
>
> Key: IMPALA-6909
> URL: https://issues.apache.org/jira/browse/IMPALA-6909
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Affects Versions: Impala 2.13.0
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc
>
> With IMPALA-4323 the ALTER TABLE statement has a new option to set "ROW 
> FORMAT".  The format is the same as the ROW FORMAT parameters that currently 
> exist with the CREATE TABLE statement.  The documents need to be updated for 
> ALTER TABLE to reflect the new functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6994) Avoid reloading a table's HMS data for file-only operations

2018-05-18 Thread Pranay Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481014#comment-16481014
 ] 

Pranay Singh commented on IMPALA-6994:
--

TUpdateCatalogResponse updateCatalog(TUpdateCatalogRequest update) does call 
currently with false option to loadTableMetadata(). loadTableMetadata calls 
((HdfsTable) tbl).load which in turn calls updatePartitionsFromHms() which 
calls loadMetadataAndDiskIds() to load file metadata via Hdfs and also make a 
call to HMS to reload the partition.

Now for the case when there is a create/recreate/drop of a partition we need to 
call HMS so I think this is the place we can skip calling HMS only if an 
existing partition is being updated.

-Pranay



> Avoid reloading a table's HMS data for file-only operations
> ---
>
> Key: IMPALA-6994
> URL: https://issues.apache.org/jira/browse/IMPALA-6994
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.12.0
>Reporter: Balazs Jeszenszky
>Assignee: Pranay Singh
>Priority: Major
>
> Reloading file metadata for HDFS tables (e.g. as a final step in an 'insert') 
> is done via
> https://github.com/apache/impala/blob/branch-2.12.0/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L628
> , which calls
> https://github.com/apache/impala/blob/branch-2.12.0/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1243
> HdfsTable.load has no option to only load file metadata. HMS metadata will 
> also be reloaded every time, which is an unnecessary overhead (and potential 
> point of failure) when adding files to existing locations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6994) Avoid reloading a table's HMS data for file-only operations

2018-05-18 Thread bharath v (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16480978#comment-16480978
 ] 

bharath v commented on IMPALA-6994:
---

Which exact loadTableMetadata() (line number/stack) we are trying to optimize 
here? Isn't the fix as simple as setting reloadTableSchema=false or am I 
missing something here?

> Avoid reloading a table's HMS data for file-only operations
> ---
>
> Key: IMPALA-6994
> URL: https://issues.apache.org/jira/browse/IMPALA-6994
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.12.0
>Reporter: Balazs Jeszenszky
>Assignee: Pranay Singh
>Priority: Major
>
> Reloading file metadata for HDFS tables (e.g. as a final step in an 'insert') 
> is done via
> https://github.com/apache/impala/blob/branch-2.12.0/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L628
> , which calls
> https://github.com/apache/impala/blob/branch-2.12.0/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1243
> HdfsTable.load has no option to only load file metadata. HMS metadata will 
> also be reloaded every time, which is an unnecessary overhead (and potential 
> point of failure) when adding files to existing locations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org