date:20181029

[jira] [Resolved] (IMPALA-7166) ExecSummary should be a first class object

2018-10-29 Thread Yongjun Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang resolved IMPALA-7166.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

Many thanks to [~tarmstrong] for review and commit. 

> ExecSummary should be a first class object
> --
>
> Key: IMPALA-7166
> URL: https://issues.apache.org/jira/browse/IMPALA-7166
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: sandeep akinapelli
>Assignee: Yongjun Zhang
>Priority: Major
>  Labels: resource-management, usability
> Fix For: Impala 3.1.0
>
>
> Impala RuntimeProfile currently contains "ExecSummary" as a string. We should 
> make it a first class thrift object, so that tools can extract these fields 
> (Est rows etc..), 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7190) Remove unsupported format write support

2018-10-29 Thread Alex Rodoni (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667947#comment-16667947
 ] 

Alex Rodoni commented on IMPALA-7190:
-

[~bikramjeet.vig] Most of the removed query options are being removed from 
documentation as well. https://issues.apache.org/jira/browse/IMPALA-6463

Will 3.1 be a compatibility breaking release that we can safely remove the 
options? If yes, I will remove the above 2 options from docs. 

> Remove unsupported format write support
> ---
>
> Key: IMPALA-7190
> URL: https://issues.apache.org/jira/browse/IMPALA-7190
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Bikramjeet Vig
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> Let's remove the formats gated by ALLOW_UNSUPPORTED_FORMATS since progress 
> stalled a long time ago. It sounds like there's a consensus on the mailing 
> list to remove the code:
> [https://lists.apache.org/thread.html/749bef4914350ae0756bc88961db2dd39901a649a9cef6949eda5870@%3Cdev.impala.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-7244) Impala 3.1 Doc: Remove unsupported format write support

2018-10-29 Thread Alex Rodoni (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-7244 started by Alex Rodoni.
---
> Impala 3.1 Doc: Remove unsupported format write support
> ---
>
> Key: IMPALA-7244
> URL: https://issues.apache.org/jira/browse/IMPALA-7244
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Affects Versions: Impala 3.1.0
>Reporter: Bikramjeet Vig
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc
>
> the parent task removes write support for unsupported formats like 
> *Sequence,* *Avro and compressed text*. Also, the related query options 
> *ALLOW_UNSUPPORTED_FORMATS* and *SEQ_COMPRESSION_MODE* have been migrated to 
> the 'REMOVED' query options type and therefore are no longer functional.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7687) Impala 3.1 Doc: Add support for multiple distinct operators in the same query block

2018-10-29 Thread Alex Rodoni (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni updated IMPALA-7687:

Description: https://gerrit.cloudera.org/#/c/11823/

> Impala 3.1 Doc: Add support for multiple distinct operators in the same query 
> block
> ---
>
> Key: IMPALA-7687
> URL: https://issues.apache.org/jira/browse/IMPALA-7687
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Affects Versions: Impala 3.1.0
>Reporter: Adam Holley
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc
>
> https://gerrit.cloudera.org/#/c/11823/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7244) Impala 3.1 Doc: Remove unsupported format write support

2018-10-29 Thread Bikramjeet Vig (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667925#comment-16667925
 ] 

Bikramjeet Vig commented on IMPALA-7244:


[~arodoni_cloudera] Looks like there are multiple things done for previously 
removed query options, some were removed, or some just had a note, or a "no 
longer supported" message in the text and then eventually removed. I think 
something like adding a note and mentioning it in the text seems good enough 
for now, then we can eventually remove in subsequent releases. see 
[V_CPU_CORES|https://impala.apache.org/docs/build/html/topics/impala_v_cpu_cores.html]

> Impala 3.1 Doc: Remove unsupported format write support
> ---
>
> Key: IMPALA-7244
> URL: https://issues.apache.org/jira/browse/IMPALA-7244
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Affects Versions: Impala 3.1.0
>Reporter: Bikramjeet Vig
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc
>
> the parent task removes write support for unsupported formats like 
> *Sequence,* *Avro and compressed text*. Also, the related query options 
> *ALLOW_UNSUPPORTED_FORMATS* and *SEQ_COMPRESSION_MODE* have been migrated to 
> the 'REMOVED' query options type and therefore are no longer functional.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-7687) Impala 3.1 Doc: Add support for multiple distinct operators in the same query block

2018-10-29 Thread Alex Rodoni (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-7687 started by Alex Rodoni.
---
> Impala 3.1 Doc: Add support for multiple distinct operators in the same query 
> block
> ---
>
> Key: IMPALA-7687
> URL: https://issues.apache.org/jira/browse/IMPALA-7687
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Affects Versions: Impala 3.1.0
>Reporter: Adam Holley
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7102) Add a query option to enable/disable running queries erasure coded files

2018-10-29 Thread Alex Rodoni (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni updated IMPALA-7102:

Labels:   (was: future_release_doc)

> Add a query option to enable/disable running queries erasure coded files
> 
>
> Key: IMPALA-7102
> URL: https://issues.apache.org/jira/browse/IMPALA-7102
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Taras Bobrovytsky
>Assignee: Tianyi Wang
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> If the erasure coding query option is disabled, Impala should return an error 
> when a query requires scanning an erasure coded file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-7244) Impala 3.1 Doc: Remove unsupported format write support

2018-10-29 Thread Alex Rodoni (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni reassigned IMPALA-7244:
---

Assignee: Alex Rodoni  (was: Bikramjeet Vig)

> Impala 3.1 Doc: Remove unsupported format write support
> ---
>
> Key: IMPALA-7244
> URL: https://issues.apache.org/jira/browse/IMPALA-7244
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Affects Versions: Impala 3.1.0
>Reporter: Bikramjeet Vig
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc
>
> the parent task removes write support for unsupported formats like 
> *Sequence,* *Avro and compressed text*. Also, the related query options 
> *ALLOW_UNSUPPORTED_FORMATS* and *SEQ_COMPRESSION_MODE* have been migrated to 
> the 'REMOVED' query options type and therefore are no longer functional.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-7266) test_insert failure: Unable to drop partition on s3

2018-10-29 Thread Bikramjeet Vig (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikramjeet Vig resolved IMPALA-7266.

Resolution: Information Provided

The symptom looks similar to IMPALA-6094 which as [~sailesh] described is due 
to the eventually consistent nature of S3. We are also doing something similar 
to IMPALA-6094 where we write and drop partitions consecutively so this seems 
like the expected behavior.

> test_insert failure: Unable to drop partition on s3
> ---
>
> Key: IMPALA-7266
> URL: https://issues.apache.org/jira/browse/IMPALA-7266
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Bikramjeet Vig
>Assignee: Bikramjeet Vig
>Priority: Major
>  Labels: broken-build
>
> {noformat}
> 06:01:28 === FAILURES 
> ===
> 06:01:28  TestInsertQueries.test_insert[exec_option: {'sync_ddl': 0, 
> 'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: text/none] 
> 06:01:28 query_test/test_insert.py:122: in test_insert
> 06:01:28 multiple_impalad=vector.get_value('exec_option')['sync_ddl'] == 
> 1)
> 06:01:28 .../Impala/tests/common/impala_test_suite.py:366: in run_test_case
> 06:01:28 self.execute_test_case_setup(test_section['SETUP'], 
> table_format_info)
> 06:01:28 .../Impala/tests/common/impala_test_suite.py:489: in 
> execute_test_case_setup
> 06:01:28 self.__drop_partitions(db_name, table_name)
> 06:01:28 .../Impala/tests/common/impala_test_suite.py:614: in 
> __drop_partitions
> 06:01:28 partition, True), 'Could not drop partition: %s' % partition
> 06:01:28 .../Impala/shell/gen-py/hive_metastore/ThriftHiveMetastore.py:2862: 
> in drop_partition_by_name
> 06:01:28 return self.recv_drop_partition_by_name()
> 06:01:28 .../Impala/shell/gen-py/hive_metastore/ThriftHiveMetastore.py:2891: 
> in recv_drop_partition_by_name
> 06:01:28 raise result.o2
> 06:01:28 E   MetaException: MetaException(_message='No such file or 
> directory: 
> s3a://impala-cdh5-s3-test/test-warehouse/functional.db/alltypesinsert/year=2009/month=4')
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-7727) failed compute stats child query status no longer propagates to parent query

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-7727:
-

Assignee: bharath v  (was: Tim Armstrong)

> failed compute stats child query status no longer propagates to parent query
> 
>
> Key: IMPALA-7727
> URL: https://issues.apache.org/jira/browse/IMPALA-7727
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Michael Brown
>Assignee: bharath v
>Priority: Blocker
>  Labels: regression, stress
> Attachments: 2.12-child-profile.txt, 2.12-compute-stats-profile.txt, 
> 3.1-child-profile.txt, 3.1-compute-stats-profile.txt
>
>
> [~bharathv] since you have been dealing with stats, please take a look. 
> Otherwise feel free to reassign. This bug prevents the stress test from 
> running with compute stats statements. It triggers in non-stressful 
> conditions, too.
> {noformat}
> $ impala-shell.sh -d tpch_parquet
> [localhost:21000] tpch_parquet> set mem_limit=24m;
> MEM_LIMIT set to 24m
> [localhost:21000] tpch_parquet> compute stats customer;
> Query: compute stats customer
> WARNINGS: Cancelled
> [localhost:21000] tpch_parquet>
> {noformat}
> The problem is that the child query didn't have enough memory to run, but 
> this error didn't propagate up.
> {noformat}
> Query (id=384d37fb2826a962:f4b10357):
>   DEBUG MODE WARNING: Query profile created while running a DEBUG build of 
> Impala. Use RELEASE builds to measure query performance.
>   Summary:
> Session ID: d343e1026d497bb0:7e87b342c73c108d
> Session Type: BEESWAX
> Start Time: 2018-10-18 15:16:34.036363000
> End Time: 2018-10-18 15:16:34.177711000
> Query Type: QUERY
> Query State: EXCEPTION
> Query Status: Rejected query from pool default-pool: minimum memory 
> reservation is greater than memory available to the query for buffer 
> reservations. Memory reservation needed given the current plan: 128.00 KB. 
> Adjust either the mem_limit or the pool config (max-query-mem-limit, 
> min-query-mem-limit) for the query to allow the query memory limit to be at 
> least 32.12 MB. Note that changing the mem_limit may also change the plan. 
> See the query profile for more information about the per-node memory 
> requirements.
> Impala Version: impalad version 3.1.0-SNAPSHOT DEBUG (build 
> 9f5c5e6df03824cba292fe5a619153462c11669c)
> User: mikeb
> Connected User: mikeb
> Delegated User: 
> Network Address: :::127.0.0.1:46458
> Default Db: tpch_parquet
> Sql Statement: SELECT COUNT(*) FROM customer
> Coordinator: mikeb-ub162:22000
> Query Options (set by configuration): MEM_LIMIT=25165824,MT_DOP=4
> Query Options (set by configuration and planner): 
> MEM_LIMIT=25165824,NUM_SCANNER_THREADS=1,MT_DOP=4
> Plan: 
> 
> Max Per-Host Resource Reservation: Memory=512.00KB Threads=5
> Per-Host Resource Estimates: Memory=146MB
> F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
> |  Per-Host Resources: mem-estimate=10.00MB mem-reservation=0B 
> thread-reservation=1
> PLAN-ROOT SINK
> |  mem-estimate=0B mem-reservation=0B thread-reservation=0
> |
> 03:AGGREGATE [FINALIZE]
> |  output: count:merge(*)
> |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB 
> thread-reservation=0
> |  tuple-ids=1 row-size=8B cardinality=1
> |  in pipelines: 03(GETNEXT), 01(OPEN)
> |
> 02:EXCHANGE [UNPARTITIONED]
> |  mem-estimate=0B mem-reservation=0B thread-reservation=0
> |  tuple-ids=1 row-size=8B cardinality=1
> |  in pipelines: 01(GETNEXT)
> |
> F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=4
> Per-Host Resources: mem-estimate=136.00MB mem-reservation=512.00KB 
> thread-reservation=4
> 01:AGGREGATE
> |  output: sum_init_zero(tpch_parquet.customer.parquet-stats: num_rows)
> |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB 
> thread-reservation=0
> |  tuple-ids=1 row-size=8B cardinality=1
> |  in pipelines: 01(GETNEXT), 00(OPEN)
> |
> 00:SCAN HDFS [tpch_parquet.customer, RANDOM]
>partitions=1/1 files=1 size=12.34MB
>stored statistics:
>  table: rows=15 size=12.34MB
>  columns: all
>extrapolated-rows=disabled max-scan-range-rows=15
>mem-estimate=24.00MB mem-reservation=128.00KB thread-reservation=0
>tuple-ids=0 row-size=8B cardinality=15
>in pipelines: 00(GETNEXT)
> 
> Estimated Per-Host Mem: 153092096
> Per Host Min Memory Reservation: mikeb-ub162:22000(0) 
> mikeb-ub162:22001(128.00 KB)
> Request Pool: default-pool
> Admission result: Rejected
> Query Compilation: 126.903ms
>- Metadata of all 1 tables cached: 5.484ms (5.484ms)
>- Analysis finished: 16.104ms (10.619ms)
>

[jira] [Resolved] (IMPALA-7749) Merge aggregation node memory estimate is incorrectly influenced by limit

2018-10-29 Thread Pooja Nilangekar (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-7749.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Merge aggregation node memory estimate is incorrectly influenced by limit
> -
>
> Key: IMPALA-7749
> URL: https://issues.apache.org/jira/browse/IMPALA-7749
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Affects Versions: Impala 2.11.0, Impala 3.0, Impala 2.12.0, Impala 3.1.0
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Critical
> Fix For: Impala 3.1.0
>
>
> In the below query the estimate for node ID 3 is too low. If you remove the 
> limit it is correct. 
> {noformat}
> [localhost:21000] default> set explain_level=2; explain select l_orderkey, 
> l_partkey, l_linenumber, count(*) from tpch.lineitem group by 1, 2, 3 limit 5;
> EXPLAIN_LEVEL set to 2
> Query: explain select l_orderkey, l_partkey, l_linenumber, count(*) from 
> tpch.lineitem group by 1, 2, 3 limit 5
> +---+
> | Explain String  
>   |
> +---+
> | Max Per-Host Resource Reservation: Memory=43.94MB Threads=4 
>   |
> | Per-Host Resource Estimates: Memory=450MB   
>   |
> | 
>   |
> | F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1   
>   |
> | |  Per-Host Resources: mem-estimate=0B mem-reservation=0B 
> thread-reservation=1|
> | PLAN-ROOT SINK  
>   |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0  
>   |
> | |   
>   |
> | 04:EXCHANGE [UNPARTITIONED] 
>   |
> | |  limit: 5 
>   |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0  
>   |
> | |  tuple-ids=1 row-size=28B cardinality=5   
>   |
> | |  in pipelines: 03(GETNEXT)
>   |
> | |   
>   |
> | F01:PLAN FRAGMENT [HASH(l_orderkey,l_partkey,l_linenumber)] hosts=3 
> instances=3   |
> | Per-Host Resources: mem-estimate=10.00MB mem-reservation=1.94MB 
> thread-reservation=1  |
> | 03:AGGREGATE [FINALIZE] 
>   |
> | |  output: count:merge(*)   
>   |
> | |  group by: l_orderkey, l_partkey, l_linenumber
>   |
> | |  limit: 5 
>   |
> | |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB 
> thread-reservation=0  |
> | |  tuple-ids=1 row-size=28B cardinality=5   
>   |
> | |  in pipelines: 03(GETNEXT), 00(OPEN)  
>   |
> | |   
>   |
> | 02:EXCHANGE [HASH(l_orderkey,l_partkey,l_linenumber)]   
>   |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0  
>   |
> | |  tuple-ids=1 row-size=28B cardinality=6001215 
>   |
> | |  in pipelines: 00(GETNEXT)
>   |
> | |   
>   |
> | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3  
>   |
> | Per-Host Resources: mem-estimate=440.27MB mem-reservation=42.00MB 
> thread-reservation=2|
> | 01:AGGREGATE [STREAMING]
>   |
> | |  output: count(*) 
>   |
> | |  group by: l_orderkey, l_partkey, l_linenumber
>   |
> | |  mem-estimate=176.27MB mem-reservation=34.00MB spill-buffer=2.00MB 
>

[jira] [Commented] (IMPALA-7742) User names in Sentry are now case sensitive

2018-10-29 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667752#comment-16667752
 ] 

ASF subversion and git services commented on IMPALA-7742:
-

Commit bf7bb58d0bbce971cda02dc4e6cfe9d359e43922 in impala's branch 
refs/heads/master from [~fredyw]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=bf7bb58 ]

IMPALA-7742: Store the Sentry user names in a case sensitive way

SENTRY-2432 changes the way it stores user names by making them case
sensitive. This patch updates Impala to match the behavior in Sentry
to make the catalog store the user names in a case sensitive way.

Testing:
- Ran all FE tests
- Ran all authorization E2E tests
- Added a new E2E test

Change-Id: I04bec045e3f70fc4f41b16b9b5c55eeb60bd63b8
Reviewed-on: http://gerrit.cloudera.org:8080/11762
Reviewed-by: Impala Public Jenkins 
Reviewed-by: Fredy Wijaya 
Tested-by: Impala Public Jenkins 


> User names in Sentry are now case sensitive
> ---
>
> Key: IMPALA-7742
> URL: https://issues.apache.org/jira/browse/IMPALA-7742
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.1.0
>Reporter: Fredy Wijaya
>Assignee: Fredy Wijaya
>Priority: Critical
> Fix For: Impala 3.1.0
>
>
> Sentry no longer stores user names in lower case 
> (https://issues.apache.org/jira/browse/SENTRY-2432).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7749) Merge aggregation node memory estimate is incorrectly influenced by limit

2018-10-29 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667753#comment-16667753
 ] 

ASF subversion and git services commented on IMPALA-7749:
-

Commit 44e69e8182954db90b22809b5440bba59ed8d0ae in impala's branch 
refs/heads/master from poojanilangekar
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=44e69e8 ]

IMPALA-7749: Compute AggregationNode's memory estimate using input cardinality

Prior to this change, the AggregationNode's perInstanceCardinality
was influenced by the node's selectivity and limit. This was
incorrect because the hash table is constructed over the entire
input stream before any row batches are produced. This change
ensures that the input cardinality is used to determine the
perInstanceCardinality.

Testing:
Added a planner test which ensures that an AggregationNode with a
limit estimates memory based on the input cardinality.
Ran front-end and end-to-end tests affected by this change.

Change-Id: Ifd95d2ad5b677fca459c9c32b98f6176842161fc
Reviewed-on: http://gerrit.cloudera.org:8080/11806
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Merge aggregation node memory estimate is incorrectly influenced by limit
> -
>
> Key: IMPALA-7749
> URL: https://issues.apache.org/jira/browse/IMPALA-7749
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Affects Versions: Impala 2.11.0, Impala 3.0, Impala 2.12.0, Impala 3.1.0
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Critical
>
> In the below query the estimate for node ID 3 is too low. If you remove the 
> limit it is correct. 
> {noformat}
> [localhost:21000] default> set explain_level=2; explain select l_orderkey, 
> l_partkey, l_linenumber, count(*) from tpch.lineitem group by 1, 2, 3 limit 5;
> EXPLAIN_LEVEL set to 2
> Query: explain select l_orderkey, l_partkey, l_linenumber, count(*) from 
> tpch.lineitem group by 1, 2, 3 limit 5
> +---+
> | Explain String  
>   |
> +---+
> | Max Per-Host Resource Reservation: Memory=43.94MB Threads=4 
>   |
> | Per-Host Resource Estimates: Memory=450MB   
>   |
> | 
>   |
> | F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1   
>   |
> | |  Per-Host Resources: mem-estimate=0B mem-reservation=0B 
> thread-reservation=1|
> | PLAN-ROOT SINK  
>   |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0  
>   |
> | |   
>   |
> | 04:EXCHANGE [UNPARTITIONED] 
>   |
> | |  limit: 5 
>   |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0  
>   |
> | |  tuple-ids=1 row-size=28B cardinality=5   
>   |
> | |  in pipelines: 03(GETNEXT)
>   |
> | |   
>   |
> | F01:PLAN FRAGMENT [HASH(l_orderkey,l_partkey,l_linenumber)] hosts=3 
> instances=3   |
> | Per-Host Resources: mem-estimate=10.00MB mem-reservation=1.94MB 
> thread-reservation=1  |
> | 03:AGGREGATE [FINALIZE] 
>   |
> | |  output: count:merge(*)   
>   |
> | |  group by: l_orderkey, l_partkey, l_linenumber
>   |
> | |  limit: 5 
>   |
> | |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB 
> thread-reservation=0  |
> | |  tuple-ids=1 row-size=28B cardinality=5   
>   |
> | |  in pipelines: 03(GETNEXT), 00(OPEN)  
>   |
> | |   
>   |
> | 02:EXCHANGE [HASH(l_orderkey,l_partkey,l_linenumber)]   
>   |
> | |  mem-estimate=0B

[jira] [Resolved] (IMPALA-7662) test_parquet reads bad_magic_number.parquet without an error

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7662.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> test_parquet reads bad_magic_number.parquet without an error
> 
>
> Key: IMPALA-7662
> URL: https://issues.apache.org/jira/browse/IMPALA-7662
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
> Environment: Impala ddef2cb9b14e7f8cf9a68a2a382e10a8e0f91c3d 
> exhaustive debug build
>Reporter: Tianyi Wang
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: correctness
> Fix For: Impala 3.1.0
>
>
> {noformat}
> 09:51:41 === FAILURES 
> ===
> 09:51:41  TestParquet.test_parquet[exec_option: {'batch_size': 0, 
> 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': 
> False, 'abort_on_error': 1, 'debug_action': 
> 'HDFS_SCANNER_THREAD_CHECK_SOFT_MEM_LIMIT:FAIL@0.5', 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> 09:51:41 [gw5] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/bin/../infra/python/env/bin/python
> 09:51:41 query_test/test_scanners.py:300: in test_parquet
> 09:51:41 self.run_test_case('QueryTest/parquet', vector)
> 09:51:41 common/impala_test_suite.py:423: in run_test_case
> 09:51:41 assert False, "Expected exception: %s" % expected_str
> 09:51:41 E   AssertionError: Expected exception: File 
> 'hdfs://localhost:20500/test-warehouse/bad_magic_number_parquet/bad_magic_number.parquet'
>  has an invalid version number: 
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-7752) Stmt.reset() doesn't (and can't)

2018-10-29 Thread Paul Rogers (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned IMPALA-7752:
---

Assignee: (was: Paul Rogers)
 Summary: Stmt.reset() doesn't (and can't)  (was: Invalid test logic in 
ExprRewriterTest)

> Stmt.reset() doesn't (and can't)
> 
>
> Key: IMPALA-7752
> URL: https://issues.apache.org/jira/browse/IMPALA-7752
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Priority: Minor
>
> The test {{ExprRewriteTest}} has the following logic:
> {code:java}
>   public void RewritesOk(String stmt, int expectedNumChanges,
>   int expectedNumExprTrees) throws ImpalaException {
> // Analyze without rewrites since that's what we want to test here.
> StatementBase parsedStmt = (StatementBase) ParsesOk(stmt);
> ...
> parsedStmt.rewriteExprs(exprToTrue_);
> ...
> // Make sure the stmt can be successfully re-analyzed.
> parsedStmt.reset();
> AnalyzesOkNoRewrite(parsedStmt);
>   }
> {code}
> Basically, this replaces all expressions with a Boolean constant, then counts 
> the number of replacements. A fine test. Then, the {{reset())}} call is 
> supposed to put things back the way they were.
> The problem is, the rewrite rule replaces the one and only copy of the 
> {{SELECT}} list expressions. The second time around, we get a failure because 
> the {{ORDER BY}} clause (which was kept as an original copy) refers to the 
> now-gone {{SELECT}} clause.
> This error was not previously seen because a prior bug masked it.
> This is an odd bug as {{reset()}} is called only from this one place.
> The premise of test itself is invalid: we want to know that, after we rewrite 
> the query from
> {code:sql}
> select a.int_col a, 10 b, 20.2 c, ...
> order by a.int_col, 4 limit 10
> {code}
> To
> {code:sql}
> select FALSE a, FALSE b, FALSE c, ...
> order by a.int_col, 4 limit 10
> {code}
> We assert that the query should again analyze correctly. This is an 
> unrealistic expectation. Once the above bug is fixed, we verify that the new 
> query is actually invalid, which, in fact, it is.
> Two fixes are possible:
> # Create copies of all lists that are rewritten ({{SELECT}}, {{HAVING}}, etc.)
> # Remove the {{reset()}} test and (since this is the only use), the 
> {{reset()}} code since it cannot actually do what it is advertised to do.
> Since {{reset()}} is never used except in tests, and the premise is invalid, 
> this ticket proposes to remove the {{reset()}} logic and remove the part of 
> the test code that validates the reset.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7780) Rebase PlannerTest expected output for estimates, errors

2018-10-29 Thread Pooja Nilangekar (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667720#comment-16667720
 ] 

Pooja Nilangekar commented on IMPALA-7780:
--

I have faced the issue of different estimates earlier. After asking around and 
poking though the dataload, I found that the file sizes and hence the estimates 
may vary between different instances of data load. You might end up using 
less/more disk space to load the exact same data on the same/ another machine 
with identical specs. So I am not sure rebase would actually solve this issue. 

> Rebase PlannerTest expected output for estimates, errors
> 
>
> Key: IMPALA-7780
> URL: https://issues.apache.org/jira/browse/IMPALA-7780
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Priority: Trivial
>
> The front-end includes the {{PlannerTest}} test which works by running a 
> query, writing the plan to a file, comparing selected parts of the file to 
> expected results, and flagging if the results differ.
> A plan includes some things we test (operators) and some we do not (text of 
> error messages, value of memory estimates). Over time the expected and actual 
> files have drifted apart. Example:
> {noformat}
> Expected:partitions=1/1 files=2 size=54.20MB
> Actual:  partitions=1/1 files=2 size=54.21MB
> {noformat}
> While the tests still pass (because we ignore the parts which have drifted), 
> it is a pain to track down issues because we must learn to manually ignore 
> "unimportant" differences.
> This ticket asks to "rebase" planner tests on the latest results, copying 
> into the expected results file the current "noise" values from the actual 
> results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-2680) Align un-aligned memory copies

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2680.
---
Resolution: Won't Fix

I don't think this actually makes sense, we've generally see that it's better 
to have unpadded, denser rows than to try to align things.

> Align un-aligned memory copies
> --
>
> Key: IMPALA-2680
> URL: https://issues.apache.org/jira/browse/IMPALA-2680
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.2
>Reporter: Mostafa Mokhtar
>Assignee: Youwei Wang
>Priority: Minor
>  Labels: performance
>
> Several operators do un-aligned memory copies which are significantly slower 
> than aligned copies. 
> These are the top call stacks 
> {code}
> 1 of 31: 35.2% (1.000s of 2.844s)
> libc.so.6!__memcpy_sse2_unaligned - memcpy-sse2-unaligned.S
> impalad!memcpy+0x1d - string3.h:51
> impalad!impala::BufferedTupleStream::DeepCopyInternal<(bool)0>+0x80 - 
> buffered-tuple-stream-ir.cc:83
> libc.so.6![Unknown stack frame(s)] - [Unknown]:[Unknown]
> impalad!impala::PartitionedHashJoinNode::ConstructBuildSide+0xe1 - 
> partitioned-hash-join-node.cc:580
> impalad!impala::BlockingJoinNode::BuildSideThread+0x7c - 
> blocking-join-node.cc:143
> {code}
> {code}
> Data Of Interest (CPU Metrics)
> 2 of 31: 14.3% (0.408s of 2.844s)
> libc.so.6!__memcpy_sse2_unaligned - memcpy-sse2-unaligned.S
> impalad!memcpy+0xa - string3.h:51
> impalad!impala::Tuple::DeepCopy+0x1d - tuple.cc:116
> impalad!impala::RowBatch::SerializeInternal+0x142 - row-batch.cc:285
> impalad!impala::RowBatch::Serialize+0x1c4 - row-batch.cc:195
> impalad!impala::RowBatch::Serialize+0x25 - row-batch.cc:168
> impalad!impala::DataStreamSender::SerializeBatch+0x12c - 
> data-stream-sender.cc:463
> impalad!impala::DataStreamSender::Send+0x266 - data-stream-sender.cc:411
> impalad!impala::PlanFragmentExecutor::OpenInternal+0x327 - 
> plan-fragment-executor.cc:355
> impalad!impala::PlanFragmentExecutor::Open+0x26b - 
> plan-fragment-executor.cc:320
> impalad!impala::FragmentMgr::FragmentExecState::Exec+0x19 - 
> fragment-exec-state.cc:50
> impalad!impala::FragmentMgr::FragmentExecThread+0x50 - fragment-mgr.cc:83
> impalad!boost::function0::operator()+0x18 - function_template.hpp:767
> {code}
> {code}
> Data Of Interest (CPU Metrics)
> 3 of 31: 7.7% (0.220s of 2.844s)
> libc.so.6!__memcpy_sse2_unaligned - memcpy-sse2-unaligned.S
> impalad!memcpy+0x12 - string3.h:51
> impalad!impala::RowBatch::CopyRow+0 - row-batch.h:182
> impalad!impala::ExchangeNode::GetNext+0x148 - exchange-node.cc:135
> impalad!impala::PartitionedHashJoinNode::ProcessBuildInput+0x3bb - 
> partitioned-hash-join-node.cc:628
> impalad!impala::PartitionedHashJoinNode::ConstructBuildSide+0xe1 - 
> partitioned-hash-join-node.cc:580
> impalad!impala::BlockingJoinNode::BuildSideThread+0x7c - 
> blocking-join-node.cc:143
> impalad!boost::function0::operator()+0x18 - function_template.hpp:767
> {code}
> {code}
> Data Of Interest (CPU Metrics)
> 4 of 31: 7.5% (0.212s of 2.844s)
> libc.so.6!__memcpy_sse2_unaligned - memcpy-sse2-unaligned.S
> impalad!memcpy+0x7 - string3.h:51
> impalad!apache::thrift::transport::TBufferBase::write+0x18 - 
> TBufferTransports.h:97
> impalad!apache::thrift::transport::TVirtualTransport  apache::thrift::transport::TBufferBase>::write_virt+0x1 - 
> TVirtualTransport.h:103
> impalad!apache::thrift::transport::TTransport::write+0xf - TTransport.h:158
> impalad!apache::thrift::protocol::TBinaryProtocolT::writeI32+0x6
>  - TBinaryProtocol.tcc:155
> impalad!apache::thrift::protocol::TVirtualProtocol,
>  apache::thrift::protocol::TProtocolDefaults>::writeI32_virt+0x8 - 
> TVirtualProtocol.h:405
> impalad!apache::thrift::protocol::TProtocol::writeI32+0x12 - TProtocol.h:448
> impalad!impala::TRowBatch::write+0x138 - Results_types.cpp:161
> impalad!impala::TTransmitDataParams::write+0x128 - 
> ImpalaInternalService_types.cpp:2914
> impalad!impala::ImpalaInternalService_TransmitData_pargs::write+0x4a - 
> ImpalaInternalService.cpp:571
> impalad!impala::ImpalaInternalServiceClient::send_TransmitData+0x6e - 
> ImpalaInternalService.cpp:862
> impalad!impala::ImpalaInternalServiceClient::TransmitData+0x13 - 
> ImpalaInternalService.cpp:851
> impalad!impala::ClientConnection::DoRpc  (impala::TTransmitDataResult&, impala::TTransmitDataParams const&) 
> impala::ImpalaInternalServiceClient::*, impala::TTransmitDataParams, 
> impala::TTransmitDataResult>+0x4b - client-cache.h:229
> impalad!impala::DataStreamSender::Channel::TransmitDataHelper+0x58f - 
> data-stream-sender.cc:205
> impalad!impala::DataStreamSender::Channel::TransmitData+0x10 - 
> data-stream-sender.cc:177
> impalad!boost::function2 const>::operator()+0x26d - function_template.hpp:767
> {code}
> {code}
> Data Of Interest (CPU Metrics)
>

[jira] [Created] (IMPALA-7781) Clean up ad-hoc instance of, casts to use predicates in expr rewriter

2018-10-29 Thread Paul Rogers (JIRA)

Paul Rogers created IMPALA-7781:
---

 Summary: Clean up ad-hoc instance of, casts to use predicates in 
expr rewriter
 Key: IMPALA-7781
 URL: https://issues.apache.org/jira/browse/IMPALA-7781
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 3.0
Reporter: Paul Rogers


The expression rewriter rules have evolved over time, it seems. When originally 
written, it seem that the standard way to check if an expression is a null 
literal is to do an {{instance of}} as in {{SimplifyConditionalsRule}}:

{code:java}
  private Expr simplifyCaseExpr(CaseExpr expr, Analyzer analyzer)
  throws AnalysisException {
...
  if (child instanceof NullLiteral) continue;
{code}

Since this was written, we added {{Expr.isNullLiteral()}} which not only checks 
if the expression is null, it also tests for the {{CAST(NULL AS )}} form 
created by the constant folding rule.

The result is that rewrites miss optimization cases for expressions such as 
{{NULL + 1}} which are rewritten to {{CASE(NULL AS INT)}}. (IMPALA-7769).

Code also does manual casts to Boolean literals in the same function:

{code:java}
  if (whenExpr instanceof BoolLiteral) {
if (((BoolLiteral) whenExpr).getValue()) {
{code}

Which can be replaced with the {{Expr.IS_TRUE_LITERAL}} predicate. Same is true 
for the {{FALSE}} check.

Finally, there are places in the rewriter that check {{isLiteral()}} when it 
wants to know a more generic "is near literal". Consider the {{CAST(NULL...)}} 
issue above. The {{CAST}} is not a literal, but it acts like one in some cases 
(such as in the constant folding rule itself, IMPALA-7769.)

In short, a number of minor, obscure errors could be avoided if we made 
consistent use of the higher-level predicates already available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7733) TestInsertParquetQueries.test_insert_parquet is flaky in S3 due to rename

2018-10-29 Thread Vuk Ercegovac (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vuk Ercegovac updated IMPALA-7733:
--
Labels:   (was: flaky-test)

> TestInsertParquetQueries.test_insert_parquet is flaky in S3 due to rename
> -
>
> Key: IMPALA-7733
> URL: https://issues.apache.org/jira/browse/IMPALA-7733
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Vuk Ercegovac
>Assignee: Tianyi Wang
>Priority: Blocker
>
> I see two examples in the past two months or so where this test fails due to 
> a rename error on S3. The test's stacktrace looks like this:
> {noformat}
> query_test/test_insert_parquet.py:112: in test_insert_parquet
> self.run_test_case('insert_parquet', vector, unique_database, 
> multiple_impalad=True)
> common/impala_test_suite.py:408: in run_test_case
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:625: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:160: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:176: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:350: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:371: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:Error(s) moving partition files. First error (of 1) was: 
> Hdfs op (RENAME 
> s3a:///test_insert_parquet_968f37fe.db/orders_insert_table/_impala_insert_staging/4e45cd68bcddd451_3c7156ed/.4e45cd68bcddd451-3c7156ed0002_803672621_dir/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq
>  TO 
> s3a:///test-warehouse/test_insert_parquet_968f37fe.db/orders_insert_table/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq)
>  failed, error was: 
> s3a:///test-warehouse/test_insert_parquet_968f37fe.db/orders_insert_table/_impala_insert_staging/4e45cd68bcddd451_3c7156ed/.4e45cd68bcddd451-3c7156ed0002_803672621_dir/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq
> E   Error(5): Input/output error{noformat}
> Since we know this happens once in a while, some ideas to deflake it:
>  * retry
>  * check for this specific issue... if we think its platform flakiness, then 
> we should skip it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7733) TestInsertParquetQueries.test_insert_parquet is flaky in S3 due to rename

2018-10-29 Thread Vuk Ercegovac (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667703#comment-16667703
 ] 

Vuk Ercegovac commented on IMPALA-7733:
---

Thanks for the background, so yes, agreed that this can't be labeled as flaky 
from the testing side.

> TestInsertParquetQueries.test_insert_parquet is flaky in S3 due to rename
> -
>
> Key: IMPALA-7733
> URL: https://issues.apache.org/jira/browse/IMPALA-7733
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Vuk Ercegovac
>Assignee: Tianyi Wang
>Priority: Blocker
>
> I see two examples in the past two months or so where this test fails due to 
> a rename error on S3. The test's stacktrace looks like this:
> {noformat}
> query_test/test_insert_parquet.py:112: in test_insert_parquet
> self.run_test_case('insert_parquet', vector, unique_database, 
> multiple_impalad=True)
> common/impala_test_suite.py:408: in run_test_case
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:625: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:160: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:176: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:350: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:371: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:Error(s) moving partition files. First error (of 1) was: 
> Hdfs op (RENAME 
> s3a:///test_insert_parquet_968f37fe.db/orders_insert_table/_impala_insert_staging/4e45cd68bcddd451_3c7156ed/.4e45cd68bcddd451-3c7156ed0002_803672621_dir/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq
>  TO 
> s3a:///test-warehouse/test_insert_parquet_968f37fe.db/orders_insert_table/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq)
>  failed, error was: 
> s3a:///test-warehouse/test_insert_parquet_968f37fe.db/orders_insert_table/_impala_insert_staging/4e45cd68bcddd451_3c7156ed/.4e45cd68bcddd451-3c7156ed0002_803672621_dir/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq
> E   Error(5): Input/output error{noformat}
> Since we know this happens once in a while, some ideas to deflake it:
>  * retry
>  * check for this specific issue... if we think its platform flakiness, then 
> we should skip it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-7780) Rebase PlannerTest expected output for estimates, errors

2018-10-29 Thread Paul Rogers (JIRA)

Paul Rogers created IMPALA-7780:
---

 Summary: Rebase PlannerTest expected output for estimates, errors
 Key: IMPALA-7780
 URL: https://issues.apache.org/jira/browse/IMPALA-7780
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 3.0
Reporter: Paul Rogers


The front-end includes the {{PlannerTest}} test which works by running a query, 
writing the plan to a file, comparing selected parts of the file to expected 
results, and flagging if the results differ.

A plan includes some things we test (operators) and some we do not (text of 
error messages, value of memory estimates). Over time the expected and actual 
files have drifted apart. Example:

{noformat}
Expected:partitions=1/1 files=2 size=54.20MB

Actual:  partitions=1/1 files=2 size=54.21MB
{noformat}

While the tests still pass (because we ignore the parts which have drifted), it 
is a pain to track down issues because we must learn to manually ignore 
"unimportant" differences.

This ticket asks to "rebase" planner tests on the latest results, copying into 
the expected results file the current "noise" values from the actual results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7750) Prune trivial ELSE clause in CASE simplification

2018-10-29 Thread Paul Rogers (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated IMPALA-7750:

   Priority: Trivial  (was: Major)
Description: 
A trivial optimization is to omit ELSE if it adds no value:

{code:sql}
CASE WHEN id = 10 THEN id ELSE NULL END
{code}

The {{ELSE}} case defaults to null if not provided, so the above can be 
rewritten to:

{code:sql}
CASE WHEN id = 10 THEN id END
{code}

Also, the simplification can omit the only {{WHEN}} clause if it returns 
{{NULL}}. For example, when rewriting {{nullif()}} we get

{code:sql}
CASE WHEN id IS DISTINCT FROM NULL THEN NULL ELSE NULL END
{code}

This should be simplified to just {{NULL}}.

  was:
The current FE {{CASE}} rewrite code in 
{{SimplifyConditionalsRule.simplifyCaseExpr()}} misses some opportunities for 
optimizations. If these rules are implemented, then the ad-hoc rules for 
several other functions can be removed.

h4. Constant Folding

Consider a typical un-optimized conditional function rewrite:

{code:sql}
CASE WHEN NULL IS NULL THEN 10 ELSE 20 END
{code}

Should be rewritten to just {{10}} since the expression is always true. 
(Currently the expression is not rewritten.)

The same issue occurs for the inverse:

{code:sql}
CASE WHEN 10 IS NULL THEN 10 ELSE 20 END
{code}

Fix these and we can remove the ad-hoc rules for {{NULLIF}} and aliases in 
{{rewriteNullIfFn()}}. Also {{nvl2()}} in {{rewriteNvl2Fn}} and {{ifnull()}} in 
{{rewriteIfNullFn()}}.

In general, any constant expression should be evaluated:

{code:sql}
CASE WHEN isTrue(TRUE) THEN 10 ELSE 20 END
{code}

The constant expression can be evaluated and optimized as for constants. Tests 
suggest that the {{ConstantFoldingRule}} does not handle these cases.

h4. Prune Trivial ELSE Clause

A trivial optimization is to omit ELSE if it adds no value:

{code:sql}
CASE WHEN id = 10 THEN id ELSE NULL END
{code}

The {{ELSE}} case defaults to null if not provided, so the above can be 
rewritten to:

{code:sql}
CASE WHEN id = 10 THEN id END
{code}

Summary: Prune trivial ELSE clause in CASE simplification  (was: 
Additional FE optimizations for CASE expressions)

> Prune trivial ELSE clause in CASE simplification
> 
>
> Key: IMPALA-7750
> URL: https://issues.apache.org/jira/browse/IMPALA-7750
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Priority: Trivial
>
> A trivial optimization is to omit ELSE if it adds no value:
> {code:sql}
> CASE WHEN id = 10 THEN id ELSE NULL END
> {code}
> The {{ELSE}} case defaults to null if not provided, so the above can be 
> rewritten to:
> {code:sql}
> CASE WHEN id = 10 THEN id END
> {code}
> Also, the simplification can omit the only {{WHEN}} clause if it returns 
> {{NULL}}. For example, when rewriting {{nullif()}} we get
> {code:sql}
> CASE WHEN id IS DISTINCT FROM NULL THEN NULL ELSE NULL END
> {code}
> This should be simplified to just {{NULL}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7777) Fail queries where the sum of offset and limit exceed the max value of int64

2018-10-29 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667679#comment-16667679
 ] 

Sahil Takiar commented on IMPALA-:
--

An example query is:
{code:java}
select int_col from functional.alltypes order by 1 limit 9223372036854775800 
offset 9223372036854775800;{code}
The exception thrown by Impala is:
{code:java}
row-batch.h:334] Check failed: dest <= src (0 vs. -8)
{code}

> Fail queries where the sum of offset and limit exceed the max value of int64
> 
>
> Key: IMPALA-
> URL: https://issues.apache.org/jira/browse/IMPALA-
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Blocker
>
> A follow up to IMPALA-5004. We should prevent users from running queries 
> where the sum of the offset and limit exceeds some threshold (e.g. 
> {{Long.MAX_VALUE}}). If a user tries to run this query the impalad will 
> crash, so we should reject queries that exceed the threshold.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-7753) Rewrite engine ignores top-level expressions in ORDER BY clause

2018-10-29 Thread Paul Rogers (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned IMPALA-7753:
---

   Assignee: (was: Paul Rogers)
Description: 
The select statement represents the ORDER BY clause in two distinct ways. 
First, there is a list of the "original" ordering expressions, 
{{orderByElements_}}. Second, there is an analyzed list in {{sortInfo_}}. The 
explanation is:

{code:java}
  // create copies, we don't want to modify the original parse node, in case
  // we need to print it
{code}

Later, we apply rewrite rules to the {{ORDER BY}} expression, but we do so 
using the original version, not the copy:

{code:java}
  for (OrderByElement orderByElem: orderByElements_) {
orderByElem.setExpr(rewriteCheckOrdinalResult(rewriter, 
orderByElem.getExpr()));
  }
{code}

The result is that we apply rewrite rules to expressions which have not been 
analyzed, triggering the assertion mentioned above. This assertion is telling 
us something: we skipped a step. Here, it turns out we are rewriting the wrong 
set of expressions. Modifying the code to rewrite those in {{sortInfo_}} solves 
the problem. The current behavior is a bug as the rewrites currently do 
nothing, and the expressions we thought we were rewriting are never touched.

The correct code would rewrite the expressions which are actually used when 
analyzing the query:

{code}if (orderByElements_ != null) {
  List sortExprs = sortInfo_.getSortExprs();
  for (int i = 0; i < sortExprs.size(); i++) {
sortExprs.set(i, rewriteCheckOrdinalResult(rewriter, sortExprs.get(i)));
  }
{code}

We can, in addition, ask a more basic question: do we even need to do rewrites 
for {{ORDER BY}} expressions? The only valid expressions are column references, 
aren't they? Or, does Impala allow expressions in the {{ORDER BY}} clause?

Here is the result of a {{PlannerTest}} run that shows how the bug affects the 
conditional function rewrite:

{noformat}
07:AGGREGATE [FINALIZE]
|  output: avg(sum(t1.id)), sum(avg(g)), count(id)
|  group by: if(TupleIsNull(), NULL, CASE WHEN int_col IS NOT NULL THEN int_col 
ELSE 20 END)
|
06:ANALYTIC
|  functions: avg(if(TupleIsNull(), NULL, CASE WHEN id + bigint_col IS NOT NULL 
THEN id + bigint_col ELSE 40 END))
|  order by: if(TupleIsNull(), NULL, CASE WHEN bigint_col IS NOT NULL THEN 
bigint_col ELSE 30 END) ASC
|  window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
|
05:SORT
|  order by: if(TupleIsNull(), NULL, CASE WHEN bigint_col IS NOT NULL THEN 
bigint_col ELSE 30 END) ASC
{noformat}

Note that the top-level ifs are not rewritten, but the nested coalesce’s are 
rewritten to CASE.

Because this code rewrites the wrong list, the rewrite rules contain 
if-statements to check for un-analyzed nodes. In actuality, no node should be 
un-analyzed when passing through the rewrite engine.

  was:
The select statement represents the ORDER BY clause in two distinct ways. 
First, there is a list of the "original" ordering expressions, 
{{orderByElements_}}. Second, there is an analyzed list in {{sortInfo_}}. The 
explanation is:

{code:java}
  // create copies, we don't want to modify the original parse node, in case
  // we need to print it
{code}

Later, we apply rewrite rules to the {{ORDER BY}} expression, but we do so 
using the original version, not the copy:

{code:java}
  for (OrderByElement orderByElem: orderByElements_) {
orderByElem.setExpr(rewriteCheckOrdinalResult(rewriter, 
orderByElem.getExpr()));
  }
{code}

The result is that we apply rewrite rules to expressions which have not been 
analyzed, triggering the assertion mentioned above. This assertion is telling 
us something: we skipped a step. Here, it turns out we are rewriting the wrong 
set of expressions. Modifying the code to rewrite those in {{sortInfo_}} solves 
the problem. The current behavior is a bug as the rewrites currently do 
nothing, and the expressions we thought we were rewriting are never touched.

The correct code would rewrite the expressions which are actually used when 
analyzing the query:

{code}if (orderByElements_ != null) {
  List sortExprs = sortInfo_.getSortExprs();
  for (int i = 0; i < sortExprs.size(); i++) {
sortExprs.set(i, rewriteCheckOrdinalResult(rewriter, sortExprs.get(i)));
  }
{code}

We can, in addition, ask a more basic question: do we even need to do rewrites 
for {{ORDER BY}} expressions? The only valid expressions are column references, 
aren't they? Or, does Impala allow expressions in the {{ORDER BY}} clause?

Summary: Rewrite engine ignores top-level expressions in ORDER BY 
clause  (was: Invalid logic when rewriting ORDER BY clause expressions)

> Rewrite engine ignores top-level expressions in ORDER BY clause
> ---
>
> Key:

[jira] [Assigned] (IMPALA-7755) IS [NOT] DISTINCT FROM rewrite rules don't handle aggregates

2018-10-29 Thread Paul Rogers (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned IMPALA-7755:
---

   Assignee: (was: Paul Rogers)
Description: 
Testing revealed one issue and one limitation of the {{IS [NOT] DISTINCT FROM}} 
optimizations in {{SimplifyDistinctFromRule}}.

The rule can simplify {{sum(id) <=> sum(id)}} to {{false}}. But, according to 
IMPALA-5125, the rewrite cannot be applied if it would remove the only 
aggregate. This is a bug.

A simplification is missing: {{id IS DISTINCT FROM NULL}}  {{id IS NOT 
NULL}}.

Here is a specific test case from {{FullRewriteTest}} create as part of 
IMPALA-7655:

{code:sql}
  @Test
  public void TestSimplifyDistinctFromRule() throws ImpalaException {
verifySelectRewrite("if(sum(int_col) <=> sum(int_col), 1, 2)",
"CASE WHEN sum(int_col) IS NOT DISTINCT FROM sum(int_col) THEN 1 ELSE 2 
END");
  }
{code}

The above fails, the re-writer produces "2" because it ignores the "must 
preserve aggregate" rule.

  was:
Testing revealed one issue and one limitation of the {{IS [NOT] DISTINCT FROM}} 
optimizations in {{SimplifyDistinctFromRule}}.

The rule can simplify {{sum(id) <=> sum(id)}} to {{false}}. But, according to 
IMPALA-5125, the rewrite cannot be applied if it would remove the only 
aggregate. This is a bug.

A simplification is missing: {{id IS DISTINCT FROM NULL}}  {{id IS NOT 
NULL}}.


> IS [NOT] DISTINCT FROM rewrite rules don't handle aggregates
> 
>
> Key: IMPALA-7755
> URL: https://issues.apache.org/jira/browse/IMPALA-7755
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Testing revealed one issue and one limitation of the {{IS [NOT] DISTINCT 
> FROM}} optimizations in {{SimplifyDistinctFromRule}}.
> The rule can simplify {{sum(id) <=> sum(id)}} to {{false}}. But, according to 
> IMPALA-5125, the rewrite cannot be applied if it would remove the only 
> aggregate. This is a bug.
> A simplification is missing: {{id IS DISTINCT FROM NULL}}  {{id IS NOT 
> NULL}}.
> Here is a specific test case from {{FullRewriteTest}} create as part of 
> IMPALA-7655:
> {code:sql}
>   @Test
>   public void TestSimplifyDistinctFromRule() throws ImpalaException {
> verifySelectRewrite("if(sum(int_col) <=> sum(int_col), 1, 2)",
> "CASE WHEN sum(int_col) IS NOT DISTINCT FROM sum(int_col) THEN 1 ELSE 
> 2 END");
>   }
> {code}
> The above fails, the re-writer produces "2" because it ignores the "must 
> preserve aggregate" rule.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7741) Functions nvl2(), decode(), nullif() not listed in _impala_builtins

2018-10-29 Thread Paul Rogers (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667658#comment-16667658
 ] 

Paul Rogers commented on IMPALA-7741:
-

We can solve the n^2 arguments problem by providing a custom function 
description class. For example:

{code:java}
  /**
   * Special case for
   * nvl2(type1 expr, type2 ifNotNull, type2 ifNull).
   * Type check on the second and third arguments, since that drives the
   * return type. Accept any type for the first. This works because
   * nvl2() is rewritten to use CASE.
   */
  public static class Nvl2Function extends ScalarFunction {

public Nvl2Function(Type retType) {
  super(new FunctionName(BuiltinsDb.NAME, "nvl2"),
  Lists.newArrayList(Type.NULL, retType, retType), retType, true);
}

@Override
protected boolean isIndistinguishable(Function o) {
  return
  o.getArgs()[1].matchesType(this.getArgs()[1]) &&
  o.getArgs()[2].matchesType(this.getArgs()[2]);
}

@Override
protected boolean isSuperTypeOf(Function other, boolean strict) {
  return
  Type.isImplicitlyCastable(
  other.getArgs()[1], this.getArgs()[1], strict, strict) &&
  Type.isImplicitlyCastable(
  other.getArgs()[2], this.getArgs()[2], strict, strict);
}
  }
{code}

Then, add custom code to register this function:

{code:java}
  /**
   * Create entries for the odd-duck NVL2 function:
   * type1 nvl2(type2 expr, type1 ifNotNull, type1 ifNull).
   * The types form an n^2 matrix that can't easily be represented.
   * Instead, we define a special function that matches on the
   * second and third arguments, since they determine the return
   * type. We then ignore the first argument since we only care if
   * it is null, and CASE will take care of the details.
   */
  public static void initBuiltins(Db db) {
for (Type t: Type.getSupportedTypes()) {
  if (t.isNull()) continue;
  if (t.isScalarType(PrimitiveType.CHAR)) continue;
  db.addBuiltin(new ScalarFunction.Nvl2Function(t));
}
  }
{code}

With that, {{nvl2()}} acts like any other function and appears in the builtin 
functions table. It no longer needs custom rewrite rules in the parser; it can 
be rewritten in {{RewriteConditionalFnsRule()}} along with other functions. 
That is, remove the following from {{FunctionCallExpr}}:

{code:java}
if (functionNameEqualsBuiltin(fnName, "nvl2")) {
  List plist = Lists.newArrayList(params.exprs());
  if (!plist.isEmpty()) {   
plist.set(0, new IsNullPredicate(plist.get(0), true));  
  } 
  return new FunctionCallExpr("if", plist); 
}   
{code}

> Functions nvl2(), decode(), nullif() not listed in _impala_builtins
> ---
>
> Key: IMPALA-7741
> URL: https://issues.apache.org/jira/browse/IMPALA-7741
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Priority: Major
>
> The 
> [docs|https://impala.apache.org/docs/build3x/html/topics/impala_show.html] 
> for {{SHOW FUNCTIONS}} says that we can use the following to list all 
> built-in functions:
> {code:sql}
> show functions in _impala_builtins like '*week*';
> {code}
> However several Impala functions are removed early in the FE planning process 
> and thus do not appear in the FE's function table in {{ScalarBuiltins}}: 
> {{nvl2()}}, {{decode()}}, and {{nullif()}}. For example:
> {noformat}
> show functions in _impala_builtins like '*decode**'
> +-+--+-+---+
> | return type | signature| binary type | is persistent |
> +-+--+-+---+
> | STRING  | base64decode(STRING) | BUILTIN | true  |
> | STRING  | madlib_decode_vector(STRING) | BUILTIN | true  |
> +-+--+-+---+
> {noformat}
> However, since these three are perfectly valid functions, would have expected 
> them to appear in the table. How they are processed internally is an 
> implementation detail unimportant to the end user.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7733) TestInsertParquetQueries.test_insert_parquet is flaky in S3 due to rename

2018-10-29 Thread Steve Loughran (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667650#comment-16667650
 ] 

Steve Loughran commented on IMPALA-7733:


This looks like you are hitting S3 inconsistency; rename is usually the place 
where it surfaces

* you shouldn't be using any commit algorithm which relies on rename; see 
HADOOP-13786
* unless you can implement resilience to inconsistency (e.g. spinning), you are 
going to have to embrace S3Guard with some consistency metadata store.

You can't just view this as a flaky test: this is probably a symptom of a 
problem which surfaces in production: *this test has successfully found it*

> TestInsertParquetQueries.test_insert_parquet is flaky in S3 due to rename
> -
>
> Key: IMPALA-7733
> URL: https://issues.apache.org/jira/browse/IMPALA-7733
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Vuk Ercegovac
>Assignee: Tianyi Wang
>Priority: Blocker
>  Labels: flaky-test
>
> I see two examples in the past two months or so where this test fails due to 
> a rename error on S3. The test's stacktrace looks like this:
> {noformat}
> query_test/test_insert_parquet.py:112: in test_insert_parquet
> self.run_test_case('insert_parquet', vector, unique_database, 
> multiple_impalad=True)
> common/impala_test_suite.py:408: in run_test_case
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:625: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:160: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:176: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:350: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:371: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:Error(s) moving partition files. First error (of 1) was: 
> Hdfs op (RENAME 
> s3a:///test_insert_parquet_968f37fe.db/orders_insert_table/_impala_insert_staging/4e45cd68bcddd451_3c7156ed/.4e45cd68bcddd451-3c7156ed0002_803672621_dir/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq
>  TO 
> s3a:///test-warehouse/test_insert_parquet_968f37fe.db/orders_insert_table/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq)
>  failed, error was: 
> s3a:///test-warehouse/test_insert_parquet_968f37fe.db/orders_insert_table/_impala_insert_staging/4e45cd68bcddd451_3c7156ed/.4e45cd68bcddd451-3c7156ed0002_803672621_dir/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq
> E   Error(5): Input/output error{noformat}
> Since we know this happens once in a while, some ideas to deflake it:
>  * retry
>  * check for this specific issue... if we think its platform flakiness, then 
> we should skip it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-7742) User names in Sentry are now case sensitive

2018-10-29 Thread Fredy Wijaya (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredy Wijaya resolved IMPALA-7742.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> User names in Sentry are now case sensitive
> ---
>
> Key: IMPALA-7742
> URL: https://issues.apache.org/jira/browse/IMPALA-7742
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.1.0
>Reporter: Fredy Wijaya
>Assignee: Fredy Wijaya
>Priority: Critical
> Fix For: Impala 3.1.0
>
>
> Sentry no longer stores user names in lower case 
> (https://issues.apache.org/jira/browse/SENTRY-2432).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7070) Failed test: query_test.test_nested_types.TestParquetArrayEncodings.test_thrift_array_of_arrays on S3

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7070:
--
Target Version: Impala 3.1.0

> Failed test: 
> query_test.test_nested_types.TestParquetArrayEncodings.test_thrift_array_of_arrays
>  on S3
> -
>
> Key: IMPALA-7070
> URL: https://issues.apache.org/jira/browse/IMPALA-7070
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.0
>Reporter: Dimitris Tsirogiannis
>Assignee: Lars Volker
>Priority: Critical
>  Labels: broken-build, flaky, s3, test-failure
>
>  
> {code:java}
> Error Message
> query_test/test_nested_types.py:406: in test_thrift_array_of_arrays "col1 
> array>") query_test/test_nested_types.py:579: in 
> _create_test_table check_call(["hadoop", "fs", "-put", local_path, 
> location], shell=False) /usr/lib64/python2.6/subprocess.py:505: in check_call 
> raise CalledProcessError(retcode, cmd) E   CalledProcessError: Command 
> '['hadoop', 'fs', '-put', 
> '/data/jenkins/workspace/impala-asf-2.x-core-s3/repos/Impala/testdata/parquet_nested_types_encodings/bad-thrift.parquet',
>  
> 's3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays']'
>  returned non-zero exit status 1
> Stacktrace
> query_test/test_nested_types.py:406: in test_thrift_array_of_arrays
> "col1 array>")
> query_test/test_nested_types.py:579: in _create_test_table
> check_call(["hadoop", "fs", "-put", local_path, location], shell=False)
> /usr/lib64/python2.6/subprocess.py:505: in check_call
> raise CalledProcessError(retcode, cmd)
> E   CalledProcessError: Command '['hadoop', 'fs', '-put', 
> '/data/jenkins/workspace/impala-asf-2.x-core-s3/repos/Impala/testdata/parquet_nested_types_encodings/bad-thrift.parquet',
>  
> 's3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays']'
>  returned non-zero exit status 1
> Standard Error
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_thrift_array_of_arrays_11da5fde` CASCADE;
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_thrift_array_of_arrays_11da5fde`;
> MainThread: Created database "test_thrift_array_of_arrays_11da5fde" for test 
> ID 
> "query_test/test_nested_types.py::TestParquetArrayEncodings::()::test_thrift_array_of_arrays[exec_option:
>  {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]"
> -- executing against localhost:21000
> create table test_thrift_array_of_arrays_11da5fde.ThriftArrayOfArrays (col1 
> array>) stored as parquet location 
> 's3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays';
> 18/05/20 18:31:03 WARN impl.MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
> 18/05/20 18:31:03 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 
> 10 second(s).
> 18/05/20 18:31:03 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> started
> 18/05/20 18:31:06 INFO Configuration.deprecation: 
> fs.s3a.server-side-encryption-key is deprecated. Instead, use 
> fs.s3a.server-side-encryption.key
> put: rename 
> `s3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays/bad-thrift.parquet._COPYING_'
>  to 
> `s3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays/bad-thrift.parquet':
>  Input/output error
> 18/05/20 18:31:08 INFO impl.MetricsSystemImpl: Stopping s3a-file-system 
> metrics system...
> 18/05/20 18:31:08 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> stopped.
> 18/05/20 18:31:08 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> shutdown complete.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7070) Failed test: query_test.test_nested_types.TestParquetArrayEncodings.test_thrift_array_of_arrays on S3

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7070:
--
Fix Version/s: (was: Impala 3.1.0)

> Failed test: 
> query_test.test_nested_types.TestParquetArrayEncodings.test_thrift_array_of_arrays
>  on S3
> -
>
> Key: IMPALA-7070
> URL: https://issues.apache.org/jira/browse/IMPALA-7070
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.0
>Reporter: Dimitris Tsirogiannis
>Assignee: Lars Volker
>Priority: Critical
>  Labels: broken-build, flaky, s3, test-failure
>
>  
> {code:java}
> Error Message
> query_test/test_nested_types.py:406: in test_thrift_array_of_arrays "col1 
> array>") query_test/test_nested_types.py:579: in 
> _create_test_table check_call(["hadoop", "fs", "-put", local_path, 
> location], shell=False) /usr/lib64/python2.6/subprocess.py:505: in check_call 
> raise CalledProcessError(retcode, cmd) E   CalledProcessError: Command 
> '['hadoop', 'fs', '-put', 
> '/data/jenkins/workspace/impala-asf-2.x-core-s3/repos/Impala/testdata/parquet_nested_types_encodings/bad-thrift.parquet',
>  
> 's3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays']'
>  returned non-zero exit status 1
> Stacktrace
> query_test/test_nested_types.py:406: in test_thrift_array_of_arrays
> "col1 array>")
> query_test/test_nested_types.py:579: in _create_test_table
> check_call(["hadoop", "fs", "-put", local_path, location], shell=False)
> /usr/lib64/python2.6/subprocess.py:505: in check_call
> raise CalledProcessError(retcode, cmd)
> E   CalledProcessError: Command '['hadoop', 'fs', '-put', 
> '/data/jenkins/workspace/impala-asf-2.x-core-s3/repos/Impala/testdata/parquet_nested_types_encodings/bad-thrift.parquet',
>  
> 's3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays']'
>  returned non-zero exit status 1
> Standard Error
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_thrift_array_of_arrays_11da5fde` CASCADE;
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_thrift_array_of_arrays_11da5fde`;
> MainThread: Created database "test_thrift_array_of_arrays_11da5fde" for test 
> ID 
> "query_test/test_nested_types.py::TestParquetArrayEncodings::()::test_thrift_array_of_arrays[exec_option:
>  {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]"
> -- executing against localhost:21000
> create table test_thrift_array_of_arrays_11da5fde.ThriftArrayOfArrays (col1 
> array>) stored as parquet location 
> 's3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays';
> 18/05/20 18:31:03 WARN impl.MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
> 18/05/20 18:31:03 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 
> 10 second(s).
> 18/05/20 18:31:03 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> started
> 18/05/20 18:31:06 INFO Configuration.deprecation: 
> fs.s3a.server-side-encryption-key is deprecated. Instead, use 
> fs.s3a.server-side-encryption.key
> put: rename 
> `s3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays/bad-thrift.parquet._COPYING_'
>  to 
> `s3a://impala-cdh5-s3-test/test-warehouse/test_thrift_array_of_arrays_11da5fde.db/ThriftArrayOfArrays/bad-thrift.parquet':
>  Input/output error
> 18/05/20 18:31:08 INFO impl.MetricsSystemImpl: Stopping s3a-file-system 
> metrics system...
> 18/05/20 18:31:08 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> stopped.
> 18/05/20 18:31:08 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> shutdown complete.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-6890) split-hbase.sh: Can't get master address from ZooKeeper; znode data == null

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6890:
--
Priority: Critical  (was: Blocker)

> split-hbase.sh: Can't get master address from ZooKeeper; znode data == null
> ---
>
> Key: IMPALA-6890
> URL: https://issues.apache.org/jira/browse/IMPALA-6890
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.12.0
>Reporter: Vuk Ercegovac
>Assignee: Joe McDonnell
>Priority: Critical
>
> {noformat}
> 20:57:13 FAILED (Took: 7 min 58 sec)
> 20:57:13 
> '/data/jenkins/workspace/impala-cdh5-2.12.0_5.15.0-exhaustive-thrift/repos/Impala/testdata/bin/split-hbase.sh'
>  failed. Tail of log:
> 20:57:13 Wed Apr 18 20:49:43 PDT 2018, 
> RpcRetryingCaller{globalStartTime=1524109783051, pause=100, retries=31}, 
> org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: Can't 
> get master address from ZooKeeper; znode data == null
> 20:57:13 Wed Apr 18 20:49:43 PDT 2018, 
> RpcRetryingCaller{globalStartTime=1524109783051, pause=100, retries=31}, 
> org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: Can't 
> get master address from ZooKeeper; znode data == null
> 20:57:13 Wed Apr 18 20:49:44 PDT 2018, 
> RpcRetryingCaller{globalStartTime=1524109783051, pause=100, retries=31}, 
> org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: Can't 
> get master address from ZooKeeper; znode data == null
> ...
> 20:57:13 Wed Apr 18 20:57:13 PDT 2018, 
> RpcRetryingCaller{globalStartTime=1524109783051, pause=100, retries=31}, 
> org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: Can't 
> get master address from ZooKeeper; znode data == null
> 20:57:13 
> 20:57:13  at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:157)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4329)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4321)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.HBaseAdmin.getClusterStatus(HBaseAdmin.java:2952)
> 20:57:13  at 
> org.apache.impala.datagenerator.HBaseTestDataRegionAssigment.(HBaseTestDataRegionAssigment.java:74)
> 20:57:13  at 
> org.apache.impala.datagenerator.HBaseTestDataRegionAssigment.main(HBaseTestDataRegionAssigment.java:310)
> 20:57:13 Caused by: org.apache.hadoop.hbase.MasterNotRunningException: 
> java.io.IOException: Can't get master address from ZooKeeper; znode data == 
> null
> 20:57:13  at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1698)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1718)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1875)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:38)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
> 20:57:13  ... 5 more
> 20:57:13 Caused by: java.io.IOException: Can't get master address from 
> ZooKeeper; znode data == null
> 20:57:13  at 
> org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:154)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1648)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1689)
> 20:57:13  ... 9 more
> 20:57:13 Error in 
> /data/jenkins/workspace/impala-cdh5-2.12.0_5.15.0-exhaustive-thrift/repos/Impala/testdata/bin/split-hbase.sh
>  at line 41: "$JAVA" ${JAVA_KERBEROS_MAGIC} \
> 20:57:13 Error in 
> /data/jenkins/workspace/impala-cdh5-2.12.0_5.15.0-exhaustive-thrift/repos/Impala/bin/run-all-tests.sh
>  at line 48: # Run End-to-end Tests{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-6890) split-hbase.sh: Can't get master address from ZooKeeper; znode data == null

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6890:
--
Target Version: Impala 3.1.0, Impala 2.13.0  (was: Impala 2.13.0)

> split-hbase.sh: Can't get master address from ZooKeeper; znode data == null
> ---
>
> Key: IMPALA-6890
> URL: https://issues.apache.org/jira/browse/IMPALA-6890
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.12.0
>Reporter: Vuk Ercegovac
>Assignee: Joe McDonnell
>Priority: Blocker
>
> {noformat}
> 20:57:13 FAILED (Took: 7 min 58 sec)
> 20:57:13 
> '/data/jenkins/workspace/impala-cdh5-2.12.0_5.15.0-exhaustive-thrift/repos/Impala/testdata/bin/split-hbase.sh'
>  failed. Tail of log:
> 20:57:13 Wed Apr 18 20:49:43 PDT 2018, 
> RpcRetryingCaller{globalStartTime=1524109783051, pause=100, retries=31}, 
> org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: Can't 
> get master address from ZooKeeper; znode data == null
> 20:57:13 Wed Apr 18 20:49:43 PDT 2018, 
> RpcRetryingCaller{globalStartTime=1524109783051, pause=100, retries=31}, 
> org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: Can't 
> get master address from ZooKeeper; znode data == null
> 20:57:13 Wed Apr 18 20:49:44 PDT 2018, 
> RpcRetryingCaller{globalStartTime=1524109783051, pause=100, retries=31}, 
> org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: Can't 
> get master address from ZooKeeper; znode data == null
> ...
> 20:57:13 Wed Apr 18 20:57:13 PDT 2018, 
> RpcRetryingCaller{globalStartTime=1524109783051, pause=100, retries=31}, 
> org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: Can't 
> get master address from ZooKeeper; znode data == null
> 20:57:13 
> 20:57:13  at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:157)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4329)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4321)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.HBaseAdmin.getClusterStatus(HBaseAdmin.java:2952)
> 20:57:13  at 
> org.apache.impala.datagenerator.HBaseTestDataRegionAssigment.(HBaseTestDataRegionAssigment.java:74)
> 20:57:13  at 
> org.apache.impala.datagenerator.HBaseTestDataRegionAssigment.main(HBaseTestDataRegionAssigment.java:310)
> 20:57:13 Caused by: org.apache.hadoop.hbase.MasterNotRunningException: 
> java.io.IOException: Can't get master address from ZooKeeper; znode data == 
> null
> 20:57:13  at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1698)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1718)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1875)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:38)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
> 20:57:13  ... 5 more
> 20:57:13 Caused by: java.io.IOException: Can't get master address from 
> ZooKeeper; znode data == null
> 20:57:13  at 
> org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:154)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1648)
> 20:57:13  at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1689)
> 20:57:13  ... 9 more
> 20:57:13 Error in 
> /data/jenkins/workspace/impala-cdh5-2.12.0_5.15.0-exhaustive-thrift/repos/Impala/testdata/bin/split-hbase.sh
>  at line 41: "$JAVA" ${JAVA_KERBEROS_MAGIC} \
> 20:57:13 Error in 
> /data/jenkins/workspace/impala-cdh5-2.12.0_5.15.0-exhaustive-thrift/repos/Impala/bin/run-all-tests.sh
>  at line 48: # Run End-to-end Tests{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7003) Support erasure-coding in impala

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7003:
--
Target Version: Product Backlog  (was: Impala 3.2.0)

I'll just put this on backlog until there is someone to pick it up.

> Support erasure-coding in impala
> 
>
> Key: IMPALA-7003
> URL: https://issues.apache.org/jira/browse/IMPALA-7003
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend, Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Tianyi Wang
>Priority: Critical
>
> This is the parent Jira for the erasure coding feature



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-7565) Extends TAcceptQueueServer connection_setup_pool to be multi-threaded

2018-10-29 Thread Michael Ho (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667597#comment-16667597
 ] 

Michael Ho edited comment on IMPALA-7565 at 10/29/18 6:56 PM:
--

A minor first step is to convert {{CONNECTION_SETUP_POOL_SIZE}} to a tunable 
knob with default value of 1. In the meantime, I have yet to look more into 
code in {{TAcceptQueueServer::SetupConnection()}} to see if they are thread 
safe. At a first glance, the {{TSaslServerTransport::getTransport()}} and 
friends already seem to assume multi-threading so there may not be much work. 
Of course, we need to study the code and run some tests to confirm.

There were anecdotal evidence that having a larger 
{{CONNECTION_SETUP_POOL_SIZE}} worked fine during test. Please see comment 
[here|https://issues.apache.org/jira/browse/IMPALA-7638?focusedCommentId=16632255=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16632255]


was (Author: kwho):
A minor first step is to convert {{CONNECTION_SETUP_POOL_SIZE}} to a tunable 
knob with default value of 1. In the meantime, I have yet to look more into 
code in {{TAcceptQueueServer::SetupConnection()}} to see if they are thread 
safe. At a first glance, the {{TSaslServerTransport::getTransport()}} and 
friends already seem to assume multi-threading so there may not be much work. 
Of course, we need to study the code and run some tests to confirm.

> Extends TAcceptQueueServer connection_setup_pool to be multi-threaded
> -
>
> Key: IMPALA-7565
> URL: https://issues.apache.org/jira/browse/IMPALA-7565
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients, Distributed Exec
>Affects Versions: Impala 2.11.0, Impala 3.0, Impala 2.12.0
>Reporter: Michael Ho
>Priority: Blocker
>
> In {{TAcceptQueueServer.cpp}}, we currently have one thread in the 
> {{connection_setup_pool}} for handling connection establishment.
> {noformat}
>   // Only using one thread here is sufficient for performance, and it avoids 
> potential
>   // thread safety issues with the thrift code called in SetupConnection.
>   constexpr int CONNECTION_SETUP_POOL_SIZE = 1;
>   // New - this is the thread pool used to process the internal accept queue.
>   ThreadPool> connection_setup_pool("setup-server", 
> "setup-worker",
>   CONNECTION_SETUP_POOL_SIZE, FLAGS_accepted_cnxn_queue_depth,
>   [this](int tid, const shared_ptr& item) {
> this->SetupConnection(item);
>   });
> {noformat}
> While that makes the code easier to reason about, it also makes Impala less 
> robust in case a client intentionally or unintentionally freezes during 
> connection establishment. For instance, one can telnet to the beeswax or HS2 
> port of Impala (e.g. telnet localhost:21000) and then leave the telnet 
> session open. In a secure cluster with TLS enabled, Impalad will be stuck in 
> the SSL handshake. Other clients trying to connect to that port (e.g. Impala 
> shell) will hang forever.
> {noformat}
> Thread 551 (Thread 0x7fddde563700 (LWP 166354)):
> #0  0x003ce2a0e82d in read () from /lib64/libpthread.so.0
> #1  0x003ce56dea71 in ?? () from /usr/lib64/libcrypto.so.10
> #2  0x003ce56dcdc9 in BIO_read () from /usr/lib64/libcrypto.so.10
> #3  0x003ce9a31873 in ssl23_read_bytes () from /usr/lib64/libssl.so.10
> #4  0x003ce9a2fe63 in ssl23_get_client_hello () from 
> /usr/lib64/libssl.so.10
> #5  0x003ce9a302f3 in ssl23_accept () from /usr/lib64/libssl.so.10
> #6  0x0208ebd5 in 
> apache::thrift::transport::TSSLSocket::checkHandshake() ()
> #7  0x0208edbc in 
> apache::thrift::transport::TSSLSocket::read(unsigned char*, unsigned int) ()
> #8  0x0208b6f3 in unsigned int 
> apache::thrift::transport::readAll(apache::thrift::transport::TSocket&,
>  unsigned char*, unsigned int) ()
> #9  0x00cb2aa9 in 
> apache::thrift::transport::TSaslTransport::receiveSaslMessage(apache::thrift::transport::NegotiationStatus*,
>  unsigned int*) ()
> #10 0x00cb03e4 in 
> apache::thrift::transport::TSaslServerTransport::handleSaslStartMessage() ()
> #11 0x00cb2c23 in 
> apache::thrift::transport::TSaslTransport::doSaslNegotiation() ()
> #12 0x00cb10b8 in 
> apache::thrift::transport::TSaslServerTransport::Factory::getTransport(boost::shared_ptr)
>  ()
> #13 0x00b13e47 in 
> apache::thrift::server::TAcceptQueueServer::SetupConnection(boost::shared_ptr)
>  ()
> #14 0x00b14932 in 
> boost::detail::function::void_function_obj_invoker2  boost::shared_ptr const&)#1}, void, 
> int, boost::shared_ptr 
> const&>::invoke(boost::detail::function::function_buffer&, int, 
> boost::shared_ptr const&) ()
> #15 0x00b177f9 in 
> impala::ThreadPool 
> >::WorkerThread(int) ()
>

[jira] [Updated] (IMPALA-6194) Ensure all fragment instances notice cancellation

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6194:
--
Target Version: Impala 3.2.0

> Ensure all fragment instances notice cancellation
> -
>
> Key: IMPALA-6194
> URL: https://issues.apache.org/jira/browse/IMPALA-6194
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Lars Volker
>Priority: Critical
>  Labels: observability, supportability
>
> Currently queries can get stuck in an uncancellable state, e.g. when blocking 
> on function calls or condition variables without periodically checking for 
> cancellation. We should eliminate all those calls and make sure we don't 
> re-introduce such issues. One option would be a watchdog to check that each 
> fragment instance regularly calls RETURN_IF_CANCEL.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7523) Planner Test failing with "Failed to assign regions to servers after 60000 millis."

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7523:
--
Target Version: Impala 3.2.0

> Planner Test failing with "Failed to assign regions to servers after 6 
> millis."
> ---
>
> Key: IMPALA-7523
> URL: https://issues.apache.org/jira/browse/IMPALA-7523
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Reporter: Philip Zeyliger
>Priority: Critical
>  Labels: broken-build, flaky
>
> I've seen 
> {{org.apache.impala.planner.PlannerTest.org.apache.impala.planner.PlannerTest}}
>  fail with the following trace:
> {code}
> java.lang.IllegalStateException: Failed to assign regions to servers after 
> 6 millis.
>   at 
> org.apache.impala.datagenerator.HBaseTestDataRegionAssignment.performAssignment(HBaseTestDataRegionAssignment.java:153)
>   at 
> org.apache.impala.planner.PlannerTestBase.setUp(PlannerTestBase.java:120)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> {code}
> I think we've seen it before as indicated in IMPALA-7061.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-6808) Impala python code should be installed into infra/python/env like other packages

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6808:
--
Target Version: Product Backlog
  Priority: Major  (was: Critical)

I'm just going to change the status and target since the number of critical 
JIRAs is a bit overwhelming and a lot of them are arguably not critical - this 
seems like it would be a great improvement regardless
.

> Impala python code should be installed into infra/python/env like other 
> packages
> 
>
> Key: IMPALA-6808
> URL: https://issues.apache.org/jira/browse/IMPALA-6808
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients, Infrastructure
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: David Knupp
>Assignee: David Knupp
>Priority: Major
>
> Impala/infra/python/env is the environment where necessary upstream python 
> libraries and packages get installed -- e.g., the packages listed in 
> https://github.com/apache/impala/blob/master/infra/python/deps/requirements.txt
>  and other similar files.
> Impala's own internal python code (like the impala-shell, or the common test 
> libraries that we rely upon) should be made available the same way -- as 
> actual packages installed into the environment -- rather than by resorting to 
> PYTHONPATH/sys.path sleight-of-hand, performed by such as 
> bin/set-pythonpath.sh and bin/impala-python-common.sh.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-6783) Rethink the end-to-end queuing at KrpcDataStreamReceiver

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6783:
--
Target Version: Impala 3.2.0

> Rethink the end-to-end queuing at KrpcDataStreamReceiver
> 
>
> Key: IMPALA-6783
> URL: https://issues.apache.org/jira/browse/IMPALA-6783
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 2.12.0
>Reporter: Michael Ho
>Priority: Critical
>
> Follow up from IMPALA-6116. We currently bound the memory usage of service 
> queue and force a RPC to retry if the memory usage exceeds the configured 
> limit. The deserialization of row batches happen in the context of service 
> threads. The deserialized row batches are stored in a queue in the receiver 
> and its memory consumption is bound by FLAGS_exchg_node_buffer_size_bytes. 
> Exceeding that limit, we will put incoming row batches into a deferred RPC 
> queue, which will be drained by deserialization threads. This makes it hard 
> to size the service queues as its capacity may need to grow as the number of 
> nodes in the cluster grows.
> We may need to reconsider the role of service queue: it could just be a 
> transition queue before KrpcDataStreamMgr routes the incoming row batches to 
> the appropriate receivers. The actual queuing may happen in the receiver. The 
> deserialization should always happen in the context of deserialization 
> threads so the service threads will just be responsible for routing the RPC 
> requests. This allows us to keep a rather small service queue. Incoming 
> serialized row batches will always sit in a queue to be drained by 
> deserialization threads. We may still need to keep a certain number of 
> deserialized row batches around ready to be consumed. In this way, we can 
> account for the memory consumption and size the queue based on number of 
> senders and memory budget of a query.
> One hurdle is that we need to overcome the undesirable cross-thread 
> allocation pattern as rpc_context is allocated from service threads but freed 
> by the deserialization thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7565) Extends TAcceptQueueServer connection_setup_pool to be multi-threaded

2018-10-29 Thread Michael Ho (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667597#comment-16667597
 ] 

Michael Ho commented on IMPALA-7565:


A minor first step is to convert {{CONNECTION_SETUP_POOL_SIZE}} to a tunable 
knob with default value of 1. In the meantime, I have yet to look more into 
code in {{TAcceptQueueServer::SetupConnection()}} to see if they are thread 
safe. At a first glance, the {{TSaslServerTransport::getTransport()}} and 
friends already seem to assume multi-threading so there may not be much work. 
Of course, we need to study the code and run some tests to confirm.

> Extends TAcceptQueueServer connection_setup_pool to be multi-threaded
> -
>
> Key: IMPALA-7565
> URL: https://issues.apache.org/jira/browse/IMPALA-7565
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients, Distributed Exec
>Affects Versions: Impala 2.11.0, Impala 3.0, Impala 2.12.0
>Reporter: Michael Ho
>Priority: Blocker
>
> In {{TAcceptQueueServer.cpp}}, we currently have one thread in the 
> {{connection_setup_pool}} for handling connection establishment.
> {noformat}
>   // Only using one thread here is sufficient for performance, and it avoids 
> potential
>   // thread safety issues with the thrift code called in SetupConnection.
>   constexpr int CONNECTION_SETUP_POOL_SIZE = 1;
>   // New - this is the thread pool used to process the internal accept queue.
>   ThreadPool> connection_setup_pool("setup-server", 
> "setup-worker",
>   CONNECTION_SETUP_POOL_SIZE, FLAGS_accepted_cnxn_queue_depth,
>   [this](int tid, const shared_ptr& item) {
> this->SetupConnection(item);
>   });
> {noformat}
> While that makes the code easier to reason about, it also makes Impala less 
> robust in case a client intentionally or unintentionally freezes during 
> connection establishment. For instance, one can telnet to the beeswax or HS2 
> port of Impala (e.g. telnet localhost:21000) and then leave the telnet 
> session open. In a secure cluster with TLS enabled, Impalad will be stuck in 
> the SSL handshake. Other clients trying to connect to that port (e.g. Impala 
> shell) will hang forever.
> {noformat}
> Thread 551 (Thread 0x7fddde563700 (LWP 166354)):
> #0  0x003ce2a0e82d in read () from /lib64/libpthread.so.0
> #1  0x003ce56dea71 in ?? () from /usr/lib64/libcrypto.so.10
> #2  0x003ce56dcdc9 in BIO_read () from /usr/lib64/libcrypto.so.10
> #3  0x003ce9a31873 in ssl23_read_bytes () from /usr/lib64/libssl.so.10
> #4  0x003ce9a2fe63 in ssl23_get_client_hello () from 
> /usr/lib64/libssl.so.10
> #5  0x003ce9a302f3 in ssl23_accept () from /usr/lib64/libssl.so.10
> #6  0x0208ebd5 in 
> apache::thrift::transport::TSSLSocket::checkHandshake() ()
> #7  0x0208edbc in 
> apache::thrift::transport::TSSLSocket::read(unsigned char*, unsigned int) ()
> #8  0x0208b6f3 in unsigned int 
> apache::thrift::transport::readAll(apache::thrift::transport::TSocket&,
>  unsigned char*, unsigned int) ()
> #9  0x00cb2aa9 in 
> apache::thrift::transport::TSaslTransport::receiveSaslMessage(apache::thrift::transport::NegotiationStatus*,
>  unsigned int*) ()
> #10 0x00cb03e4 in 
> apache::thrift::transport::TSaslServerTransport::handleSaslStartMessage() ()
> #11 0x00cb2c23 in 
> apache::thrift::transport::TSaslTransport::doSaslNegotiation() ()
> #12 0x00cb10b8 in 
> apache::thrift::transport::TSaslServerTransport::Factory::getTransport(boost::shared_ptr)
>  ()
> #13 0x00b13e47 in 
> apache::thrift::server::TAcceptQueueServer::SetupConnection(boost::shared_ptr)
>  ()
> #14 0x00b14932 in 
> boost::detail::function::void_function_obj_invoker2  boost::shared_ptr const&)#1}, void, 
> int, boost::shared_ptr 
> const&>::invoke(boost::detail::function::function_buffer&, int, 
> boost::shared_ptr const&) ()
> #15 0x00b177f9 in 
> impala::ThreadPool 
> >::WorkerThread(int) ()
> #16 0x00d602af in 
> impala::Thread::SuperviseThread(std::basic_string std::char_traits, std::allocator > const&, 
> std::basic_string, std::allocator > 
> const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) ()
> #17 0x00d60aaa in boost::detail::thread_data void (*)(std::basic_string, std::allocator 
> > const&, std::basic_string, 
> std::allocator > const&, boost::function, 
> impala::ThreadDebugInfo const*, impala::Promise*), 
> boost::_bi::list5 std::char_traits, std::allocator > >, 
> boost::_bi::value, 
> std::allocator > >, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> > > >::run() ()
> #18 0x012d756a in thread_proxy ()
> #19 0x003ce2a07aa1 in start_thread () from /lib64/libpthread.so.0
> #20 0x003ce26e893d in clone () from

[jira] [Updated] (IMPALA-6048) Queries make very slow progress and report WaitForRPC() stuck for too long

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6048:
--
Target Version: Impala 3.2.0

> Queries make very slow progress and report  WaitForRPC() stuck for too long
> ---
>
> Key: IMPALA-6048
> URL: https://issues.apache.org/jira/browse/IMPALA-6048
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Distributed Exec
>Affects Versions: Impala 2.11.0
>Reporter: Mostafa Mokhtar
>Assignee: Michael Ho
>Priority: Critical
> Attachments: Archive 2.zip
>
>
> When running 32 concurrent queries from TPCDS a couple of instances from 
> TPC-DS Q78 9 hours to finish and it appeared to be hung.
> On an idle cluster the query finished in under 5 minutes, profiles attached. 
> When the query ran for long fragments reported +16 hours of network 
> send/receive time
> The logs show there is a lot of messages like the one below, there are 
> incidents for this log message where a node waited too long from an RPC from 
> itself
> {code}
> W1012 00:47:57.633549 117475 krpc-data-stream-sender.cc:360] XXX: 
> WaitForRPC() stuck for too long address=10.17.234.37:29000 
> fragment_instace_id_=1e48ef897e797131:2f05789b05eb dest_node_id_=24 
> sender_id_=81
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-6692) When partition exchange is followed by sort each sort node becomes a synchronization point across the cluster

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6692:
--
Target Version: Impala 3.2.0

> When partition exchange is followed by sort each sort node becomes a 
> synchronization point across the cluster
> -
>
> Key: IMPALA-6692
> URL: https://issues.apache.org/jira/browse/IMPALA-6692
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Distributed Exec
>Affects Versions: Impala 2.10.0
>Reporter: Mostafa Mokhtar
>Priority: Critical
>  Labels: perf, resource-management
> Attachments: Kudu table insert without KRPC no sort.txt, Kudu table 
> insert without KRPC.txt, kudu_partial_sort_insert_vd1129.foo.com_2.txt
>
>
> Issue described in this JIRA applies to 
> * Analytical functions
> * Writes to Partitioned Parquet tables
> * Writes to Kudu tables
> When inserting into a Kudu table from Impala the plan is something like HDFS 
> SCAN -> Partition Exchange -> Partial Sort -> Kudu Insert.
> The query initially makes good progress then significantly slows down and 
> very few nodes make progress.
> While the insert is running the query goes through different phases 
> * Phase 1
> ** Scan is reading data fast, sending data through to exchange 
> ** Partial Sort keeps accumulating batches
> ** Network and CPU is busy, life appears to be OK
> * Phase 2
> ** One of the Sort operators reaches its memory limit and stops calling 
> ExchangeNode::GetNext for a while
> ** This creates back pressure against the DataStreamSenders
> ** The Partial Sort doesn't call GetNext until it has finished sorting GBs of 
> data (Partial sort memory is unbounded as of 03/16/2018)
> ** All exchange operators in the cluster eventually get blocked on that Sort 
> operator and can no longer make progress
> ** After a while the Sort is able to accept more batches which temporarily 
> unblocks execution across the cluster
> ** Another sort operator reaches its memory limit and this loop repeats itself
> Below are stacks from one of the blocked hosts
> _Sort node waiting on data from exchange node as it didn't start sorting 
> since the memory limit for the sort wasn't reached_
> {code}
> Thread 90 (Thread 0x7f8d7d233700 (LWP 21625)):
> #0  0x003a6f00b68c in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x7fab1422174c in 
> std::condition_variable::wait(std::unique_lock&) () from 
> /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.205/lib/impala/lib/libstdc++.so.6
> #2  0x00b4d5aa in void 
> std::_V2::condition_variable_any::wait 
> >(boost::unique_lock&) ()
> #3  0x00b4ab6a in 
> impala::KrpcDataStreamRecvr::SenderQueue::GetBatch(impala::RowBatch**) ()
> #4  0x00b4b0c8 in 
> impala::KrpcDataStreamRecvr::GetBatch(impala::RowBatch**) ()
> #5  0x00dca7c5 in 
> impala::ExchangeNode::FillInputRowBatch(impala::RuntimeState*) ()
> #6  0x00dcacae in 
> impala::ExchangeNode::GetNext(impala::RuntimeState*, impala::RowBatch*, 
> bool*) ()
> #7  0x01032ac3 in 
> impala::PartialSortNode::GetNext(impala::RuntimeState*, impala::RowBatch*, 
> bool*) ()
> #8  0x00ba9c92 in impala::FragmentInstanceState::ExecInternal() ()
> #9  0x00bac7df in impala::FragmentInstanceState::Exec() ()
> #10 0x00b9ab1a in 
> impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) ()
> #11 0x00d5da9f in 
> impala::Thread::SuperviseThread(std::basic_string std::char_traits, std::allocator > const&, 
> std::basic_string, std::allocator > 
> const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) ()
> #12 0x00d5e29a in boost::detail::thread_data void (*)(std::basic_string, std::allocator 
> > const&, std::basic_string, 
> std::allocator > const&, boost::function, 
> impala::ThreadDebugInfo const*, impala::Promise*), 
> boost::_bi::list5 std::char_traits, std::allocator > >, 
> boost::_bi::value, 
> std::allocator > >, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> > > >::run() ()
> #13 0x012d70ba in thread_proxy ()
> #14 0x003a6f007aa1 in start_thread () from /lib64/libpthread.so.0
> #15 0x003a6ece893d in clone () from /lib64/libc.so.6
> {code}
> _DataStreamSender blocked due to back pressure from the DataStreamRecvr on 
> the node which has a Sort that is spilling_
> {code}
> Thread 89 (Thread 0x7fa8f6a15700 (LWP 21626)):
> #0  0x003a6f00ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x01237e77 in 
> impala::KrpcDataStreamSender::Channel::WaitForRpc(std::unique_lock*)
>  ()
> #2  0x01238b8d in 
> impala::KrpcDataStreamSender::Channel::TransmitData(impala::OutboundRowBatch 
>

[jira] [Updated] (IMPALA-6294) Concurrent hung with lots of spilling make slow progress due to blocking in DataStreamRecvr and DataStreamSender

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6294:
--
Target Version: Impala 3.2.0

> Concurrent hung with lots of spilling make slow progress due to blocking in 
> DataStreamRecvr and DataStreamSender
> 
>
> Key: IMPALA-6294
> URL: https://issues.apache.org/jira/browse/IMPALA-6294
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.11.0
>Reporter: Mostafa Mokhtar
>Assignee: Michael Ho
>Priority: Critical
> Attachments: IMPALA-6285 TPCDS Q3 slow broadcast, 
> slow_broadcast_q3_reciever.txt, slow_broadcast_q3_sender.txt
>
>
> While running a highly concurrent spilling workload on a large cluster 
> queries start running slower, even light weight queries that are not running 
> are affected by this slow down. 
> {code}
>   EXCHANGE_NODE (id=9):(Total: 3m1s, non-child: 3m1s, % non-child: 
> 100.00%)
>  - ConvertRowBatchTime: 999.990us
>  - PeakMemoryUsage: 0
>  - RowsReturned: 108.00K (108001)
>  - RowsReturnedRate: 593.00 /sec
> DataStreamReceiver:
>   BytesReceived(4s000ms): 254.47 KB, 338.82 KB, 338.82 KB, 852.43 
> KB, 1.32 MB, 1.33 MB, 1.50 MB, 2.53 MB, 2.99 MB, 3.00 MB, 3.00 MB, 3.00 MB, 
> 3.00 MB, 3.00 MB, 3.00 MB, 3.00 MB, 3.00 MB, 3.00 MB, 3.16 MB, 3.49 MB, 3.80 
> MB, 4.15 MB, 4.55 MB, 4.84 MB, 4.99 MB, 5.07 MB, 5.41 MB, 5.75 MB, 5.92 MB, 
> 6.00 MB, 6.00 MB, 6.00 MB, 6.07 MB, 6.28 MB, 6.33 MB, 6.43 MB, 6.67 MB, 6.91 
> MB, 7.29 MB, 8.03 MB, 9.12 MB, 9.68 MB, 9.90 MB, 9.97 MB, 10.44 MB, 11.25 MB
>- BytesReceived: 11.73 MB (12301692)
>- DeserializeRowBatchTimer: 957.990ms
>- FirstBatchArrivalWaitTime: 0.000ns
>- PeakMemoryUsage: 644.44 KB (659904)
>- SendersBlockedTimer: 0.000ns
>- SendersBlockedTotalTimer(*): 0.000ns
> {code}
> {code}
> DataStreamSender (dst_id=9):(Total: 1s819ms, non-child: 1s819ms, % 
> non-child: 100.00%)
>- BytesSent: 234.64 MB (246033840)
>- NetworkThroughput(*): 139.58 MB/sec
>- OverallThroughput: 128.92 MB/sec
>- PeakMemoryUsage: 33.12 KB (33920)
>- RowsReturned: 108.00K (108001)
>- SerializeBatchTime: 133.998ms
>- TransmitDataRPCTime: 1s680ms
>- UncompressedRowBatchSize: 446.42 MB (468102200)
> {code}
> Timeouts seen in IMPALA-6285 are caused by this issue
> {code}
> I1206 12:44:14.925405 25274 status.cc:58] RPC recv timed out: Client 
> foo-17.domain.com:22000 timed-out during recv call.
> @   0x957a6a  impala::Status::Status()
> @  0x11dd5fe  
> impala::DataStreamSender::Channel::DoTransmitDataRpc()
> @  0x11ddcd4  
> impala::DataStreamSender::Channel::TransmitDataHelper()
> @  0x11de080  impala::DataStreamSender::Channel::TransmitData()
> @  0x11e1004  impala::ThreadPool<>::WorkerThread()
> @   0xd10063  impala::Thread::SuperviseThread()
> @   0xd107a4  boost::detail::thread_data<>::run()
> @  0x128997a  (unknown)
> @ 0x7f68c5bc7e25  start_thread
> @ 0x7f68c58f534d  __clone
> {code}
> A similar behavior was also observed with KRPC enabled IMPALA-6048



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7183) We should print the sender name when logging a report for an unknown status report on the coordinatior

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7183:
--
Target Version: Impala 3.2.0

> We should print the sender name when logging a report for an unknown status 
> report on the coordinatior
> --
>
> Key: IMPALA-7183
> URL: https://issues.apache.org/jira/browse/IMPALA-7183
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Distributed Exec
>Affects Versions: Impala 2.13.0, Impala 3.1.0
>Reporter: Lars Volker
>Assignee: Michal Ostrowski
>Priority: Critical
>  Labels: ramp-up
>
> We should print the sender name when logging a report for an unknown status 
> report on the coordinatior in 
> [impala-server.cc:1229|https://github.com/apache/impala/blob/e7d5a25a4516337ef651983b1d945abf06c3a831/be/src/service/impala-server.cc#L1229].
> That will help identify backends with stuck fragment instances who fail to 
> get cancelled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7670) Drop table with a concurrent refresh throws ConcurrentModificationException

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7670:
--
Target Version: Impala 3.2.0

> Drop table with a concurrent refresh throws ConcurrentModificationException
> ---
>
> Key: IMPALA-7670
> URL: https://issues.apache.org/jira/browse/IMPALA-7670
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.1.0
>Reporter: bharath v
>Assignee: Tianyi Wang
>Priority: Critical
>
> * This bug was found on a V2 Catalog and probably also applies to V1.
> Saw this in the Catalog server.
> {noformat}
> I1004 16:38:55.236702 85380 jni-util.cc:308] 
> java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextNode(HashMap.java:1442)
> at java.util.HashMap$ValueIterator.next(HashMap.java:1471)
> at 
> org.apache.impala.catalog.FeFsTable$Utils.getPartitionFromThriftPartitionSpec(FeFsTable.java:407)
> at 
> org.apache.impala.catalog.HdfsTable.getPartitionFromThriftPartitionSpec(HdfsTable.java:694)
> at 
> org.apache.impala.catalog.Catalog.getHdfsPartition(Catalog.java:407)
> at 
> org.apache.impala.catalog.Catalog.getHdfsPartition(Catalog.java:386)
> at 
> org.apache.impala.service.CatalogOpExecutor.bulkAlterPartitions(CatalogOpExecutor.java:3193)
> at 
> org.apache.impala.service.CatalogOpExecutor.dropTableStats(CatalogOpExecutor.java:1255)
> at 
> org.apache.impala.service.CatalogOpExecutor.dropStats(CatalogOpExecutor.java:1148)
> at 
> org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:301)
> at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:157)
> {noformat}
> Still need to dig into it, but seems like something is off with locking 
> somewhere.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7282) Sentry privilege disappears after a catalog refresh

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7282:
--
Target Version: Impala 3.2.0

> Sentry privilege disappears after a catalog refresh
> ---
>
> Key: IMPALA-7282
> URL: https://issues.apache.org/jira/browse/IMPALA-7282
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Security
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Fredy Wijaya
>Priority: Critical
>  Labels: security
>
> {noformat}
> [localhost:21000] default> grant select on database functional to role 
> foo_role;
> Query: grant select on database functional to role foo_role
> +-+
> | summary |
> +-+
> | Privilege(s) have been granted. |
> +-+
> Fetched 1 row(s) in 0.05s
> [localhost:21000] default> grant all on database functional to role foo_role;
> Query: grant all on database functional to role foo_role
> +-+
> | summary |
> +-+
> | Privilege(s) have been granted. |
> +-+
> Fetched 1 row(s) in 0.03s
> [localhost:21000] default> show grant role foo_role;
> Query: show grant role foo_role
> +--++---++-+---+--+-+
> | scope| database   | table | column | uri | privilege | grant_option | 
> create_time |
> +--++---++-+---+--+-+
> | database | functional |   || | select| false| 
> NULL|
> | database | functional |   || | all   | false| 
> NULL|
> +--++---++-+---+--+-+
> Fetched 2 row(s) in 0.02s
> [localhost:21000] default> show grant role foo_role;
> Query: show grant role foo_role
> +--++---++-+---+--+---+
> | scope| database   | table | column | uri | privilege | grant_option | 
> create_time   |
> +--++---++-+---+--+---+
> | database | functional |   || | all   | false| 
> Wed, Jul 11 2018 15:38:41.113 |
> +--++---++-+---+--+---+
> Fetched 1 row(s) in 0.01s
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-4018) Add support for SQL:2016 datetime templates/patterns/masks to CAST(... AS ... FORMAT )

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-4018:
--
Target Version: Product Backlog

> Add support for SQL:2016 datetime templates/patterns/masks to CAST(... AS ... 
> FORMAT )
> 
>
> Key: IMPALA-4018
> URL: https://issues.apache.org/jira/browse/IMPALA-4018
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Affects Versions: Impala 2.2.4
>Reporter: Greg Rahn
>Priority: Critical
>  Labels: ansi-sql, compatibility, sql-language
>
> *Summary*
> The format masks/templates for currently are implemented using the [Java 
> SimpleDateFormat 
> patterns|http://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html],
>  and although this is what Hive has implemented, it is not what most standard 
> SQL systems implement.  For example see 
> [Vertica|https://my.vertica.com/docs/7.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Formatting/TemplatePatternsForDateTimeFormatting.htm],
>  
> [Netezza|http://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.dbu.doc/r_dbuser_ntz_sql_extns_templ_patterns_date_time_conv.html],
>   
> [Oracle|https://docs.oracle.com/database/121/SQLRF/sql_elements004.htm#SQLRF00212],
>  and 
> [PostgreSQL|https://www.postgresql.org/docs/9.5/static/functions-formatting.html#FUNCTIONS-FORMATTING-DATETIME-TABLE].
>  
> *Examples of incompatibilities*
> {noformat}
> -- PostgreSQL/Netezza/Vertica/Oracle
> select to_timestamp('May 15, 2015 12:00:00', 'mon dd,  hh:mi:ss');
> -- Impala
> select to_timestamp('May 15, 2015 12:00:00', 'MMM dd,  HH:mm:ss');
> -- PostgreSQL/Netezza/Vertica/Oracle
> select to_timestamp('2015-02-14 20:19:07','-mm-dd hh24:mi:ss');
> -- Impala
> select to_timestamp('2015-02-14 20:19:07','-MM-dd HH:mm:ss');
> -- Vertica/Oracle
> select to_timestamp('2015-02-14 20:19:07.123456','-mm-dd hh24:mi:ss.ff');
> -- Impala
> select to_timestamp('2015-02-14 20:19:07.123456','-MM-dd 
> HH:mm:ss.SS');
> {noformat}
> *Considerations*
> Because this is a change in default behavior for to_timestamp(), if possible, 
> having a feature flag to revert to the legacy Java SimpleDateFormat patterns 
> should be strongly considered.  This would allow users to chose the behavior 
> they desire and scope it to a session if need be.
> SQL:2016 defines the following datetime templates
> {noformat}
>  ::=
>   {  }...
>  ::=
> 
>   | 
>  ::=
> 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>  ::=
> 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
> | 
>  ::=
>    | YYY | YY | Y
>  ::=
>    | RR
>  ::=
>   MM
>  ::=
>   DD
>  ::=
>   DDD
>  ::=
>   HH | HH12
>  ::=
>   HH24
>  ::=
>   MI
>  ::=
>   SS
>  ::=
>   S
>  ::=
>   FF1 | FF2 | FF3 | FF4 | FF5 | FF6 | FF7 | FF8 | FF9
>  ::=
>   A.M. | P.M.
>  ::=
>   TZH
>  ::=
>   TZM
> {noformat}
> SQL:2016 also introduced the FORMAT clause for CAST which is the standard way 
> to do string <> datetime conversions
> {noformat}
>  ::=
>   CAST 
>AS 
>   [ FORMAT  ]
>   
>  ::=
> 
>   | 
>  ::=
> 
> | 
>  ::=
>   
> {noformat}
> For example:
> {noformat}
> CAST( AS  [FORMAT ])
> CAST( AS  [FORMAT ])
> cast(dt as string format 'DD-MM-')
> cast('01-05-2017' as date format 'DD-MM-')
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-5746) Remote fragments continue to hold onto memory after stopping the coordinator daemon

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-5746:
--
Target Version: Product Backlog

Per Michael's comment, it doesn't sound like we're going to tackle this right 
now.

> Remote fragments continue to hold onto memory after stopping the coordinator 
> daemon
> ---
>
> Key: IMPALA-5746
> URL: https://issues.apache.org/jira/browse/IMPALA-5746
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.10.0
>Reporter: Mostafa Mokhtar
>Assignee: Michael Ho
>Priority: Critical
> Attachments: remote_fragments_holding_memory.txt
>
>
> Repro 
> # Start running queries 
> # Kill the coordinator node 
> # On the running Impalad check the memz tab, remote fragments continue to run 
> and hold on to resources
> Remote fragments held on to memory +30 minutes after stopping the coordinator 
> service. 
> Attached thread dump from an Impalad running remote fragments .
> Snapshot of memz tab 30 minutes after killing the coordinator
> {code}
> Process: Limit=201.73 GB Total=5.32 GB Peak=179.36 GB
>   Free Disk IO Buffers: Total=1.87 GB Peak=1.87 GB
>   RequestPool=root.default: Total=1.35 GB Peak=178.51 GB
> Query(f64169d4bb3c901c:3a21d8ae): Total=2.64 MB Peak=104.73 MB
>   Fragment f64169d4bb3c901c:3a21d8ae0051: Total=2.64 MB Peak=2.67 MB
> AGGREGATION_NODE (id=15): Total=2.54 MB Peak=2.57 MB
>   Exprs: Total=30.12 KB Peak=30.12 KB
> EXCHANGE_NODE (id=14): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=12.29 KB
> DataStreamSender (dst_id=17): Total=85.31 KB Peak=85.31 KB
> CodeGen: Total=1.53 KB Peak=374.50 KB
>   Block Manager: Limit=161.39 GB Total=512.00 KB Peak=1.54 MB
> Query(2a4f12b3b4b1dc8c:db7e8cf2): Total=258.29 MB Peak=412.98 MB
>   Fragment 2a4f12b3b4b1dc8c:db7e8cf2008c: Total=2.29 MB Peak=2.29 MB
> SORT_NODE (id=11): Total=4.00 KB Peak=4.00 KB
> AGGREGATION_NODE (id=20): Total=2.27 MB Peak=2.27 MB
>   Exprs: Total=25.12 KB Peak=25.12 KB
> EXCHANGE_NODE (id=19): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=0
> DataStreamSender (dst_id=21): Total=3.88 KB Peak=3.88 KB
> CodeGen: Total=4.17 KB Peak=1.05 MB
>   Block Manager: Limit=161.39 GB Total=256.25 MB Peak=321.66 MB
> Query(68421d2a5dea0775:83f5d972): Total=282.77 MB Peak=443.53 MB
>   Fragment 68421d2a5dea0775:83f5d972004a: Total=26.77 MB Peak=26.92 MB
> SORT_NODE (id=8): Total=8.00 KB Peak=8.00 KB
>   Exprs: Total=4.00 KB Peak=4.00 KB
> ANALYTIC_EVAL_NODE (id=7): Total=4.00 KB Peak=4.00 KB
>   Exprs: Total=4.00 KB Peak=4.00 KB
> SORT_NODE (id=6): Total=24.00 MB Peak=24.00 MB
> AGGREGATION_NODE (id=12): Total=2.72 MB Peak=2.83 MB
>   Exprs: Total=85.12 KB Peak=85.12 KB
> EXCHANGE_NODE (id=11): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=84.80 KB
> DataStreamSender (dst_id=13): Total=1.27 KB Peak=1.27 KB
> CodeGen: Total=24.80 KB Peak=4.13 MB
>   Block Manager: Limit=161.39 GB Total=280.50 MB Peak=286.52 MB
> Query(e94c89fa89a74d27:82812bf9): Total=258.29 MB Peak=436.85 MB
>   Fragment e94c89fa89a74d27:82812bf9008e: Total=2.29 MB Peak=2.29 MB
> SORT_NODE (id=11): Total=4.00 KB Peak=4.00 KB
> AGGREGATION_NODE (id=20): Total=2.27 MB Peak=2.27 MB
>   Exprs: Total=25.12 KB Peak=25.12 KB
> EXCHANGE_NODE (id=19): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=0
> DataStreamSender (dst_id=21): Total=3.88 KB Peak=3.88 KB
> CodeGen: Total=4.17 KB Peak=1.05 MB
>   Block Manager: Limit=161.39 GB Total=256.25 MB Peak=321.62 MB
> Query(4e43dad3bdc935d8:938b8b7e): Total=2.65 MB Peak=105.60 MB
>   Fragment 4e43dad3bdc935d8:938b8b7e0052: Total=2.65 MB Peak=2.68 MB
> AGGREGATION_NODE (id=15): Total=2.55 MB Peak=2.57 MB
>   Exprs: Total=30.12 KB Peak=30.12 KB
> EXCHANGE_NODE (id=14): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=13.68 KB
> DataStreamSender (dst_id=17): Total=91.41 KB Peak=91.41 KB
> CodeGen: Total=1.53 KB Peak=374.50 KB
>   Block Manager: Limit=161.39 GB Total=512.00 KB Peak=1.30 MB
> Query(b34bdd65f1ed017e:5a0291bd): Total=2.37 MB Peak=106.56 MB
>   Fragment b34bdd65f1ed017e:5a0291bd004b: Total=2.37 MB Peak=2.37 MB
> SORT_NODE (id=6): Total=4.00 KB Peak=4.00 KB
> AGGREGATION_NODE (id=10): Total=2.35 MB Peak=2.35 MB
>   Exprs: Total=34.12 KB Peak=34.12 KB
> EXCHANGE_NODE (id=9): Total=0 Peak=0
> DataStreamRecvr:

[jira] [Updated] (IMPALA-5746) Remote fragments continue to hold onto memory after stopping the coordinator daemon

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-5746:
--
Target Version: Impala 3.2.0  (was: Product Backlog)

> Remote fragments continue to hold onto memory after stopping the coordinator 
> daemon
> ---
>
> Key: IMPALA-5746
> URL: https://issues.apache.org/jira/browse/IMPALA-5746
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.10.0
>Reporter: Mostafa Mokhtar
>Assignee: Michael Ho
>Priority: Critical
> Attachments: remote_fragments_holding_memory.txt
>
>
> Repro 
> # Start running queries 
> # Kill the coordinator node 
> # On the running Impalad check the memz tab, remote fragments continue to run 
> and hold on to resources
> Remote fragments held on to memory +30 minutes after stopping the coordinator 
> service. 
> Attached thread dump from an Impalad running remote fragments .
> Snapshot of memz tab 30 minutes after killing the coordinator
> {code}
> Process: Limit=201.73 GB Total=5.32 GB Peak=179.36 GB
>   Free Disk IO Buffers: Total=1.87 GB Peak=1.87 GB
>   RequestPool=root.default: Total=1.35 GB Peak=178.51 GB
> Query(f64169d4bb3c901c:3a21d8ae): Total=2.64 MB Peak=104.73 MB
>   Fragment f64169d4bb3c901c:3a21d8ae0051: Total=2.64 MB Peak=2.67 MB
> AGGREGATION_NODE (id=15): Total=2.54 MB Peak=2.57 MB
>   Exprs: Total=30.12 KB Peak=30.12 KB
> EXCHANGE_NODE (id=14): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=12.29 KB
> DataStreamSender (dst_id=17): Total=85.31 KB Peak=85.31 KB
> CodeGen: Total=1.53 KB Peak=374.50 KB
>   Block Manager: Limit=161.39 GB Total=512.00 KB Peak=1.54 MB
> Query(2a4f12b3b4b1dc8c:db7e8cf2): Total=258.29 MB Peak=412.98 MB
>   Fragment 2a4f12b3b4b1dc8c:db7e8cf2008c: Total=2.29 MB Peak=2.29 MB
> SORT_NODE (id=11): Total=4.00 KB Peak=4.00 KB
> AGGREGATION_NODE (id=20): Total=2.27 MB Peak=2.27 MB
>   Exprs: Total=25.12 KB Peak=25.12 KB
> EXCHANGE_NODE (id=19): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=0
> DataStreamSender (dst_id=21): Total=3.88 KB Peak=3.88 KB
> CodeGen: Total=4.17 KB Peak=1.05 MB
>   Block Manager: Limit=161.39 GB Total=256.25 MB Peak=321.66 MB
> Query(68421d2a5dea0775:83f5d972): Total=282.77 MB Peak=443.53 MB
>   Fragment 68421d2a5dea0775:83f5d972004a: Total=26.77 MB Peak=26.92 MB
> SORT_NODE (id=8): Total=8.00 KB Peak=8.00 KB
>   Exprs: Total=4.00 KB Peak=4.00 KB
> ANALYTIC_EVAL_NODE (id=7): Total=4.00 KB Peak=4.00 KB
>   Exprs: Total=4.00 KB Peak=4.00 KB
> SORT_NODE (id=6): Total=24.00 MB Peak=24.00 MB
> AGGREGATION_NODE (id=12): Total=2.72 MB Peak=2.83 MB
>   Exprs: Total=85.12 KB Peak=85.12 KB
> EXCHANGE_NODE (id=11): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=84.80 KB
> DataStreamSender (dst_id=13): Total=1.27 KB Peak=1.27 KB
> CodeGen: Total=24.80 KB Peak=4.13 MB
>   Block Manager: Limit=161.39 GB Total=280.50 MB Peak=286.52 MB
> Query(e94c89fa89a74d27:82812bf9): Total=258.29 MB Peak=436.85 MB
>   Fragment e94c89fa89a74d27:82812bf9008e: Total=2.29 MB Peak=2.29 MB
> SORT_NODE (id=11): Total=4.00 KB Peak=4.00 KB
> AGGREGATION_NODE (id=20): Total=2.27 MB Peak=2.27 MB
>   Exprs: Total=25.12 KB Peak=25.12 KB
> EXCHANGE_NODE (id=19): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=0
> DataStreamSender (dst_id=21): Total=3.88 KB Peak=3.88 KB
> CodeGen: Total=4.17 KB Peak=1.05 MB
>   Block Manager: Limit=161.39 GB Total=256.25 MB Peak=321.62 MB
> Query(4e43dad3bdc935d8:938b8b7e): Total=2.65 MB Peak=105.60 MB
>   Fragment 4e43dad3bdc935d8:938b8b7e0052: Total=2.65 MB Peak=2.68 MB
> AGGREGATION_NODE (id=15): Total=2.55 MB Peak=2.57 MB
>   Exprs: Total=30.12 KB Peak=30.12 KB
> EXCHANGE_NODE (id=14): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=13.68 KB
> DataStreamSender (dst_id=17): Total=91.41 KB Peak=91.41 KB
> CodeGen: Total=1.53 KB Peak=374.50 KB
>   Block Manager: Limit=161.39 GB Total=512.00 KB Peak=1.30 MB
> Query(b34bdd65f1ed017e:5a0291bd): Total=2.37 MB Peak=106.56 MB
>   Fragment b34bdd65f1ed017e:5a0291bd004b: Total=2.37 MB Peak=2.37 MB
> SORT_NODE (id=6): Total=4.00 KB Peak=4.00 KB
> AGGREGATION_NODE (id=10): Total=2.35 MB Peak=2.35 MB
>   Exprs: Total=34.12 KB Peak=34.12 KB
> EXCHANGE_NODE (id=9): Total=0 Peak=0
> DataStreamRecvr: Total=0 Peak=4.23 KB
> DataStreamSender (dst_id=11):

[jira] [Resolved] (IMPALA-5122) Impala and special character

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-5122.
---
Resolution: Invalid

Hi [~Tina],
  I'm going to close this purely because we don't track issues with non-Apache 
drivers here. 

Putting on my cloudera hat for a second, I'd recommend you ask here or via a 
support channel: 
http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/bd-p/Impala . I'm 
not an expert on this driver but I did take a quick look at the docs and it 
seems like there's an option UseSQLUnicodeTypes that will interpret values as 
UTF-8  
https://www.cloudera.com/documentation/other/connectors/impala-odbc/latest/Cloudera-ODBC-Driver-for-Impala-Install-Guide.pdf

> Impala and special character
> 
>
> Key: IMPALA-5122
> URL: https://issues.apache.org/jira/browse/IMPALA-5122
> Project: IMPALA
>  Issue Type: Question
>  Components: Clients
> Environment: linux
>Reporter: Tina Avef
>Priority: Critical
>
> Hi,
> We have Impala odbc connection in odbc.ini file and have problem to see some 
> special character correctly(scandinavian characters) in tables when we are 
> connected to a database. Is there any options that can be defined to change 
> the encoding(in the following example)? Here is an example of the odbc 
> connection we have:
> [Impala]
> Description=Cloudera ODBC Driver for Impala (32-bit) DSN
> Driver=/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so
> HOST=dd300en09.bbd.net
> PORT=21053
> Database=default
> AuthMech=1
> KrbFQDN=dd300bn09.bbd.net
> KrbRealm=ABC.DEF.COM
> KrbServiceName=impala
> UID=
> PWD=
> CAIssuedCertNamesMismatch=1
> TSaslTransportBufSize=1000
> RowsFetchedPerBlock=1
> SocketTimeout=0
> StringColumnLength=32767
> UseNativeQuery=0 
> Or is there any other way to resolve this issue?
> Regards,
> Tina



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-5765) Flaky tpc-ds data loading

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-5765:
--
Target Version: Impala 2.12.0, Product Backlog  (was: Impala 2.12.0)
  Priority: Major  (was: Critical)

> Flaky tpc-ds data loading
> -
>
> Key: IMPALA-5765
> URL: https://issues.apache.org/jira/browse/IMPALA-5765
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.10.0
>Reporter: Matthew Jacobs
>Assignee: Philip Zeyliger
>Priority: Major
>  Labels: flaky
>
> Saw this on a number of gerrit-verify-dryrun jobs:
> {code}
> 23:49:37 Loading TPC-DS data (logging to 
> /home/ubuntu/Impala/logs/data_loading/load-tpcds.log)... 
> 23:55:39 FAILED (Took: 6 min 2 sec)
> 23:55:39 'load-data tpcds core' failed. Tail of log:
> 23:55:39 ss_net_profit,
> 23:55:39 ss_sold_date_sk
> 23:55:39 from store_sales_unpartitioned
> 23:55:39 WHERE ss_sold_date_sk < 2451272
> 23:55:39 distribute by ss_sold_date_sk
> 23:55:39 INFO  : Query ID = 
> ubuntu_2017073123_26963c6a-a58b-4cad-b0c7-c3790f9b22dc
> 23:55:39 INFO  : Total jobs = 1
> 23:55:39 INFO  : Launching Job 1 out of 1
> 23:55:39 INFO  : Starting task [Stage-1:MAPRED] in serial mode
> 23:55:39 INFO  : Number of reduce tasks not specified. Estimated from input 
> data size: 2
> 23:55:39 INFO  : In order to change the average load for a reducer (in bytes):
> 23:55:39 INFO  :   set hive.exec.reducers.bytes.per.reducer=
> 23:55:39 INFO  : In order to limit the maximum number of reducers:
> 23:55:39 INFO  :   set hive.exec.reducers.max=
> 23:55:39 INFO  : In order to set a constant number of reducers:
> 23:55:39 INFO  :   set mapreduce.job.reduces=
> 23:55:39 INFO  : number of splits:2
> 23:55:39 INFO  : Submitting tokens for job: job_local1252085428_0826
> 23:55:39 INFO  : The url to track the job: http://localhost:8080/
> 23:55:39 INFO  : Job running in-process (local Hadoop)
> 23:55:39 INFO  : 2017-07-31 23:55:06,606 Stage-1 map = 0%,  reduce = 0%
> 23:55:39 INFO  : 2017-07-31 23:55:13,609 Stage-1 map = 100%,  reduce = 0%
> 23:55:39 INFO  : 2017-07-31 23:55:28,621 Stage-1 map = 100%,  reduce = 33%
> 23:55:39 ERROR : Ended Job = job_local1252085428_0826 with errors
> 23:55:39 ERROR : FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> 23:55:39 INFO  : MapReduce Jobs Launched: 
> 23:55:39 INFO  : Stage-Stage-1:  HDFS Read: 26483258512 HDFS Write: 
> 19378762131 FAIL
> 23:55:39 INFO  : Total MapReduce CPU Time Spent: 0 msec
> 23:55:39 INFO  : Completed executing 
> command(queryId=ubuntu_2017073123_26963c6a-a58b-4cad-b0c7-c3790f9b22dc); 
> Time taken: 33.276 seconds
> 23:55:39 Error: Error while processing statement: FAILED: Execution Error, 
> return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask 
> (state=08S01,code=2)
> 23:55:39 java.sql.SQLException: Error while processing statement: FAILED: 
> Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> 23:55:39  at 
> org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:292)
> 23:55:39  at 
> org.apache.hive.beeline.Commands.executeInternal(Commands.java:989)
> 23:55:39  at org.apache.hive.beeline.Commands.execute(Commands.java:1203)
> 23:55:39  at org.apache.hive.beeline.Commands.sql(Commands.java:1117)
> 23:55:39  at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1176)
> 23:55:39  at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1010)
> 23:55:39  at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:987)
> 23:55:39  at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:914)
> 23:55:39  at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:518)
> 23:55:39  at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501)
> 23:55:39  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 23:55:39  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 23:55:39  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 23:55:39  at java.lang.reflect.Method.invoke(Method.java:606)
> 23:55:39  at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> 23:55:39  at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> 23:55:39 
> 23:55:39 Closing: 0: jdbc:hive2://localhost:11050/default;auth=none
> 23:55:39 Error executing file from Hive: load-tpcds-core-hive-generated.sql
> 23:55:39 Error in /home/ubuntu/Impala/testdata/bin/create-load-data.sh at 
> line 48: LOAD_DATA_ARGS=""
> {code}
> https://jenkins.impala.io/job/ubuntu-14.04-from-scratch/1827/
> It's been reported a few times in the last week. Here's another failed job 
> reported on dev@:
>

[jira] [Updated] (IMPALA-6955) Timeout when starting test_query_expiration custom cluster

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6955:
--
Target Version: Impala 3.2.0
Labels: broken-build flaky  (was: )
  Priority: Critical  (was: Blocker)

> Timeout when starting test_query_expiration custom cluster
> --
>
> Key: IMPALA-6955
> URL: https://issues.apache.org/jira/browse/IMPALA-6955
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Vuk Ercegovac
>Priority: Critical
>  Labels: broken-build, flaky
>
> Ran into the following crash on a rhel test recently:
> {noformat}
> Error starting cluster: num_known_live_backends did not reach expected value 
> in time{noformat}
> Backtrace:
> {noformat}
> #0 0x7f92365185c9 in raise () from /lib64/libc.so.6
> #1 0x7f9236519cd8 in abort () from /lib64/libc.so.6
> #2 0x7f92393841a5 in os::abort(bool) () from 
> /opt/toolchain/sun-jdk-64bit-1.8.0.05/jre/lib/amd64/server/libjvm.so
> #3 0x7f9239514843 in VMError::report_and_die() () from 
> /opt/toolchain/sun-jdk-64bit-1.8.0.05/jre/lib/amd64/server/libjvm.so
> #4 0x7f9239389562 in JVM_handle_linux_signal () from 
> /opt/toolchain/sun-jdk-64bit-1.8.0.05/jre/lib/amd64/server/libjvm.so
> #5 0x7f92393804f3 in signalHandler(int, siginfo*, void*) () from 
> /opt/toolchain/sun-jdk-64bit-1.8.0.05/jre/lib/amd64/server/libjvm.so
> #6 
> #7 0x016fded0 in base::subtle::NoBarrier_CompareAndSwap (ptr=0x238, 
> old_value=0, new_value=1) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-rhel7/repos/Impala/be/src/gutil/atomicops-internals-x86.h:85
> #8 0x016fdf50 in base::subtle::Acquire_CompareAndSwap (ptr=0x238, 
> old_value=0, new_value=1) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-rhel7/repos/Impala/be/src/gutil/atomicops-internals-x86.h:138
> #9 0x016fe26c in base::SpinLock::Lock (this=0x238) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-rhel7/repos/Impala/be/src/gutil/spinlock.h:74
> #10 0x016fe2f6 in impala::SpinLock::lock (this=0x238) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-rhel7/repos/Impala/be/src/util/spinlock.h:34
> #11 0x01aa8c96 in 
> impala::ScopedShardedMapRef 
> >::ScopedShardedMapRef (this=0x7f91aa81eb90, query_id=..., sharded_map=0x1c0) 
> at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-rhel7/repos/Impala/be/src/util/sharded-query-map-util.h:99
> #12 0x01a999e2 in impala::ImpalaServer::GetClientRequestState 
> (this=0xa569000, query_id=...) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-rhel7/repos/Impala/be/src/service/impala-server.cc:2123
> #13 0x01b3ace6 in impala::ImpalaHttpHandler::QuerySummaryHandler 
> (this=0x6f057a0, include_json_plan=true, include_summary=true, args=..., 
> document=0x7f91aa81f230) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-rhel7/repos/Impala/be/src/service/impala-http-handler.cc:755
> #14 0x01b3cc11 in impala::ImpalaHttpHandler:: auto:6*)>::operator(), 
> std::basic_string >, rapidjson::GenericDocument > 
> >(const std::map, 
> std::allocator >, std::basic_string, 
> std::allocator >, std::less std::char_traits, std::allocator > >, 
> std::allocator, 
> std::allocator > const, std::basic_string, 
> std::allocator > > > > &, 
> rapidjson::GenericDocument, 
> rapidjson::MemoryPoolAllocator > *) const 
> (__closure=0xd9884b8, args=..., doc=0x7f91aa81f230) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-rhel7/repos/Impala/be/src/service/impala-http-handler.cc:132
> #15 0x01b3cc46 in 
> boost::detail::function::void_function_obj_invoker2  auto:5&, auto:6*)>, void, const std::map std::char_traits, std::allocator >, std::basic_string std::char_traits, std::allocator >, 
> std::less, 
> std::allocator > >, std::allocator std::basic_string, std::allocator >, 
> std::basic_string, std::allocator > > > 
> >&, rapidjson::GenericDocument, 
> rapidjson::MemoryPoolAllocator 
> >*>::invoke(boost::detail::function::function_buffer &, const 
> std::map, std::allocator 
> >, std::basic_string, std::allocator >, 
> std::le\
> ss, std::allocator > >, 
> std::allocator, 
> std::allocator > const, std::basic_string, 
> std::allocator > > > > &, 
> rapidjson::GenericDocument, 
> rapidjson::MemoryPoolAllocator > *) 
> (function_obj_ptr=..., a0=..., a1=0x7f91aa81f230) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-rhel7/Impala-Toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:153
> #16 0x01c4f528 in boost::function2 std::string, std::less, std::allocator const, std::string> > > const&, 
> rapidjson::GenericDocument, 
> rapidjson::MemoryPoolAllocator >*>::operator() 
> (this=0xd9884b0, a0=..., a1=0x7f91aa81f230) at 
>

[jira] [Updated] (IMPALA-7777) Fail queries where the sum of offset and limit exceed the max value of int64

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-:
--
Target Version: Impala 3.1.0

> Fail queries where the sum of offset and limit exceed the max value of int64
> 
>
> Key: IMPALA-
> URL: https://issues.apache.org/jira/browse/IMPALA-
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Blocker
>
> A follow up to IMPALA-5004. We should prevent users from running queries 
> where the sum of the offset and limit exceeds some threshold (e.g. 
> {{Long.MAX_VALUE}}). If a user tries to run this query the impalad will 
> crash, so we should reject queries that exceed the threshold.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-6955) Timeout when starting test_query_expiration custom cluster

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6955:
--
Fix Version/s: (was: Not Applicable)

> Timeout when starting test_query_expiration custom cluster
> --
>
> Key: IMPALA-6955
> URL: https://issues.apache.org/jira/browse/IMPALA-6955
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Vuk Ercegovac
>Priority: Blocker
>
> Ran into the following crash on a rhel test recently:
> {noformat}
> Error starting cluster: num_known_live_backends did not reach expected value 
> in time{noformat}
> Backtrace:
> {noformat}
> #0 0x7f92365185c9 in raise () from /lib64/libc.so.6
> #1 0x7f9236519cd8 in abort () from /lib64/libc.so.6
> #2 0x7f92393841a5 in os::abort(bool) () from 
> /opt/toolchain/sun-jdk-64bit-1.8.0.05/jre/lib/amd64/server/libjvm.so
> #3 0x7f9239514843 in VMError::report_and_die() () from 
> /opt/toolchain/sun-jdk-64bit-1.8.0.05/jre/lib/amd64/server/libjvm.so
> #4 0x7f9239389562 in JVM_handle_linux_signal () from 
> /opt/toolchain/sun-jdk-64bit-1.8.0.05/jre/lib/amd64/server/libjvm.so
> #5 0x7f92393804f3 in signalHandler(int, siginfo*, void*) () from 
> /opt/toolchain/sun-jdk-64bit-1.8.0.05/jre/lib/amd64/server/libjvm.so
> #6 
> #7 0x016fded0 in base::subtle::NoBarrier_CompareAndSwap (ptr=0x238, 
> old_value=0, new_value=1) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-rhel7/repos/Impala/be/src/gutil/atomicops-internals-x86.h:85
> #8 0x016fdf50 in base::subtle::Acquire_CompareAndSwap (ptr=0x238, 
> old_value=0, new_value=1) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-rhel7/repos/Impala/be/src/gutil/atomicops-internals-x86.h:138
> #9 0x016fe26c in base::SpinLock::Lock (this=0x238) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-rhel7/repos/Impala/be/src/gutil/spinlock.h:74
> #10 0x016fe2f6 in impala::SpinLock::lock (this=0x238) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-rhel7/repos/Impala/be/src/util/spinlock.h:34
> #11 0x01aa8c96 in 
> impala::ScopedShardedMapRef 
> >::ScopedShardedMapRef (this=0x7f91aa81eb90, query_id=..., sharded_map=0x1c0) 
> at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-rhel7/repos/Impala/be/src/util/sharded-query-map-util.h:99
> #12 0x01a999e2 in impala::ImpalaServer::GetClientRequestState 
> (this=0xa569000, query_id=...) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-rhel7/repos/Impala/be/src/service/impala-server.cc:2123
> #13 0x01b3ace6 in impala::ImpalaHttpHandler::QuerySummaryHandler 
> (this=0x6f057a0, include_json_plan=true, include_summary=true, args=..., 
> document=0x7f91aa81f230) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-rhel7/repos/Impala/be/src/service/impala-http-handler.cc:755
> #14 0x01b3cc11 in impala::ImpalaHttpHandler:: auto:6*)>::operator(), 
> std::basic_string >, rapidjson::GenericDocument > 
> >(const std::map, 
> std::allocator >, std::basic_string, 
> std::allocator >, std::less std::char_traits, std::allocator > >, 
> std::allocator, 
> std::allocator > const, std::basic_string, 
> std::allocator > > > > &, 
> rapidjson::GenericDocument, 
> rapidjson::MemoryPoolAllocator > *) const 
> (__closure=0xd9884b8, args=..., doc=0x7f91aa81f230) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-rhel7/repos/Impala/be/src/service/impala-http-handler.cc:132
> #15 0x01b3cc46 in 
> boost::detail::function::void_function_obj_invoker2  auto:5&, auto:6*)>, void, const std::map std::char_traits, std::allocator >, std::basic_string std::char_traits, std::allocator >, 
> std::less, 
> std::allocator > >, std::allocator std::basic_string, std::allocator >, 
> std::basic_string, std::allocator > > > 
> >&, rapidjson::GenericDocument, 
> rapidjson::MemoryPoolAllocator 
> >*>::invoke(boost::detail::function::function_buffer &, const 
> std::map, std::allocator 
> >, std::basic_string, std::allocator >, 
> std::le\
> ss, std::allocator > >, 
> std::allocator, 
> std::allocator > const, std::basic_string, 
> std::allocator > > > > &, 
> rapidjson::GenericDocument, 
> rapidjson::MemoryPoolAllocator > *) 
> (function_obj_ptr=..., a0=..., a1=0x7f91aa81f230) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-rhel7/Impala-Toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:153
> #16 0x01c4f528 in boost::function2 std::string, std::less, std::allocator const, std::string> > > const&, 
> rapidjson::GenericDocument, 
> rapidjson::MemoryPoolAllocator >*>::operator() 
> (this=0xd9884b0, a0=..., a1=0x7f91aa81f230) at 
> /data/jenkins/workspace/impala-cdh6.x-exhaustive-rhel7/Impala-Toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:767

[jira] [Updated] (IMPALA-6671) Metadata operations that modify a table blocks topic updates for other unrelated operations

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6671:
--
Target Version: Product Backlog

> Metadata operations that modify a table blocks topic updates for other 
> unrelated operations
> ---
>
> Key: IMPALA-6671
> URL: https://issues.apache.org/jira/browse/IMPALA-6671
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.10.0, Impala 2.11.0, Impala 2.12.0
>Reporter: Mostafa Mokhtar
>Priority: Critical
>  Labels: catalog-server, perfomance
>
> Metadata operations that mutate the state of a table like "compute stats foo" 
> or "alter recover partitions" block topic updates for read only operations 
> against unrelated tables as "describe bar".
> Thread for blocked operation
> {code}
> "Thread-7" prio=10 tid=0x11613000 nid=0x21b3b waiting on condition 
> [0x7f5f2ef52000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x7f6f57ff0240> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
> at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
> at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.addTableToCatalogDeltaHelper(CatalogServiceCatalog.java:639)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.addTableToCatalogDelta(CatalogServiceCatalog.java:611)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.addDatabaseToCatalogDelta(CatalogServiceCatalog.java:567)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.getCatalogDelta(CatalogServiceCatalog.java:449)
> at 
> org.apache.impala.service.JniCatalog.getCatalogDelta(JniCatalog.java:126)
> {code}
> Thread for blocking operation 
> {code}
> "Thread-130" prio=10 tid=0x113d5800 nid=0x2499d runnable 
> [0x7f5ef80d]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:152)
> at java.net.SocketInputStream.read(SocketInputStream.java:122)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> - locked <0x7f5fffcd9f18> (a java.io.BufferedInputStream)
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at 
> org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:346)
> at 
> org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:423)
> at 
> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:405)
> at 
> org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at 
> org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_add_partitions_req(ThriftHiveMetastore.java:1639)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.add_partitions_req(ThriftHiveMetastore.java:1626)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.add_partitions(HiveMetaStoreClient.java:609)
> at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at

[jira] [Commented] (IMPALA-2566) Result of casttochar() not handled properly in SQL operations

2018-10-29 Thread Bikramjeet Vig (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667577#comment-16667577
 ] 

Bikramjeet Vig commented on IMPALA-2566:


[~tarmstrong] Sure I'll take a look

> Result of casttochar() not handled properly in SQL operations
> -
>
> Key: IMPALA-2566
> URL: https://issues.apache.org/jira/browse/IMPALA-2566
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.3.0
>Reporter: John Russell
>Assignee: Bikramjeet Vig
>Priority: Critical
>  Labels: crash
>
> If I use casttochar() during a CTAS to set the type of a column, Impala 
> considers the result to be STRING. However, somehow the length information 
> for the CHAR results must be getting passed back and messing things up in the 
> output. Trying to query the resulting table causes the query to hang:
> {code}
> [blah:21000] > create table char_types as select casttochar('hello world') as 
> c1, casttochar('xyz') as c2, casttochar('x') as c3;
> Query: create table char_types as select casttochar('hello world') as c1, 
> casttochar('xyz') as c2, casttochar('x') as c3
> +---+
> | summary   |
> +---+
> | Inserted 1 row(s) |
> +---+
> Fetched 1 row(s) in 6.89s
> [blah:21000] > desc char_types;
> Query: describe char_types
> +--++-+
> | name | type   | comment |
> +--++-+
> | c1   | string | |
> | c2   | string | |
> | c3   | string | |
> +--++-+
> [blah:21000] > show functions in _impala_builtins like 'casttochar';
> Query: show functions in _impala_builtins like 'casttochar'
> +-+--+
> | return type | signature|
> +-+--+
> | CHAR(*) | casttochar(BIGINT)   |
> | CHAR(*) | casttochar(BOOLEAN)  |
> | CHAR(*) | casttochar(CHAR(*))  |
> | CHAR(*) | casttochar(DECIMAL(*,*)) |
> | CHAR(*) | casttochar(DOUBLE)   |
> | CHAR(*) | casttochar(FLOAT)|
> | CHAR(*) | casttochar(INT)  |
> | CHAR(*) | casttochar(SMALLINT) |
> | CHAR(*) | casttochar(STRING)   |
> | CHAR(*) | casttochar(TIMESTAMP)|
> | CHAR(*) | casttochar(TINYINT)  |
> | CHAR(*) | casttochar(VARCHAR(*))   |
> +-+--+
> Fetched 12 row(s) in 0.10s
> [blah:21000] > select * from char_types;
> Query: select * from char_types
> ^C Cancelling Query
> {code}
> The HDFS data file has the original text info plus extra control characters. 
> Doing hdfs dfs -cat on the data file causes the OS X terminal to go haywire 
> and lock up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-6853) COMPUTE STATS does an unnecessary REFRESH after writing to the Metastore

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6853:
--
Target Version: Impala 3.2.0, Impala 2.13.0  (was: Impala 2.13.0, Impala 
3.1.0)

> COMPUTE STATS does an unnecessary REFRESH after writing to the Metastore
> 
>
> Key: IMPALA-6853
> URL: https://issues.apache.org/jira/browse/IMPALA-6853
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 2.12.0
>Reporter: Alexander Behm
>Priority: Critical
>  Labels: compute-stats, perfomance
>
> COMPUTE STATS and possibly other DDL operations unnecessarily do the 
> equivalent of a REFRESH after writing to the Hive Metastore. This unnecessary 
> operation can be very expensive, so should be avoided.
> The behavior can be confirmed from the catalogd logs:
> {code}
> compute stats functional_parquet.alltypes;
> +---+
> | summary   |
> +---+
> | Updated 24 partition(s) and 11 column(s). |
> +---+
> Relevant catalogd.INFO snippet
> I0413 14:40:24.210749 27295 HdfsTable.java:1263] Incrementally loading table 
> metadata for: functional_parquet.alltypes
> I0413 14:40:24.242122 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=1: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.244634 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=10: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.247174 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=11: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.249713 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=12: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.252288 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=2: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.254629 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=3: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.256991 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=4: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.259464 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=5: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.262197 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=6: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.264463 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=7: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.266736 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=8: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.269210 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=9: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.271800 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=1: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.274348

[jira] [Assigned] (IMPALA-2566) Result of casttochar() not handled properly in SQL operations

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-2566:
-

Assignee: Bikramjeet Vig

> Result of casttochar() not handled properly in SQL operations
> -
>
> Key: IMPALA-2566
> URL: https://issues.apache.org/jira/browse/IMPALA-2566
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.3.0
>Reporter: John Russell
>Assignee: Bikramjeet Vig
>Priority: Critical
>  Labels: crash
>
> If I use casttochar() during a CTAS to set the type of a column, Impala 
> considers the result to be STRING. However, somehow the length information 
> for the CHAR results must be getting passed back and messing things up in the 
> output. Trying to query the resulting table causes the query to hang:
> {code}
> [blah:21000] > create table char_types as select casttochar('hello world') as 
> c1, casttochar('xyz') as c2, casttochar('x') as c3;
> Query: create table char_types as select casttochar('hello world') as c1, 
> casttochar('xyz') as c2, casttochar('x') as c3
> +---+
> | summary   |
> +---+
> | Inserted 1 row(s) |
> +---+
> Fetched 1 row(s) in 6.89s
> [blah:21000] > desc char_types;
> Query: describe char_types
> +--++-+
> | name | type   | comment |
> +--++-+
> | c1   | string | |
> | c2   | string | |
> | c3   | string | |
> +--++-+
> [blah:21000] > show functions in _impala_builtins like 'casttochar';
> Query: show functions in _impala_builtins like 'casttochar'
> +-+--+
> | return type | signature|
> +-+--+
> | CHAR(*) | casttochar(BIGINT)   |
> | CHAR(*) | casttochar(BOOLEAN)  |
> | CHAR(*) | casttochar(CHAR(*))  |
> | CHAR(*) | casttochar(DECIMAL(*,*)) |
> | CHAR(*) | casttochar(DOUBLE)   |
> | CHAR(*) | casttochar(FLOAT)|
> | CHAR(*) | casttochar(INT)  |
> | CHAR(*) | casttochar(SMALLINT) |
> | CHAR(*) | casttochar(STRING)   |
> | CHAR(*) | casttochar(TIMESTAMP)|
> | CHAR(*) | casttochar(TINYINT)  |
> | CHAR(*) | casttochar(VARCHAR(*))   |
> +-+--+
> Fetched 12 row(s) in 0.10s
> [blah:21000] > select * from char_types;
> Query: select * from char_types
> ^C Cancelling Query
> {code}
> The HDFS data file has the original text info plus extra control characters. 
> Doing hdfs dfs -cat on the data file causes the OS X terminal to go haywire 
> and lock up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7662) test_parquet reads bad_magic_number.parquet without an error

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7662:
--
Priority: Blocker  (was: Critical)

> test_parquet reads bad_magic_number.parquet without an error
> 
>
> Key: IMPALA-7662
> URL: https://issues.apache.org/jira/browse/IMPALA-7662
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
> Environment: Impala ddef2cb9b14e7f8cf9a68a2a382e10a8e0f91c3d 
> exhaustive debug build
>Reporter: Tianyi Wang
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: correctness
>
> {noformat}
> 09:51:41 === FAILURES 
> ===
> 09:51:41  TestParquet.test_parquet[exec_option: {'batch_size': 0, 
> 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': 
> False, 'abort_on_error': 1, 'debug_action': 
> 'HDFS_SCANNER_THREAD_CHECK_SOFT_MEM_LIMIT:FAIL@0.5', 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> 09:51:41 [gw5] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-asf-master-exhaustive/repos/Impala/bin/../infra/python/env/bin/python
> 09:51:41 query_test/test_scanners.py:300: in test_parquet
> 09:51:41 self.run_test_case('QueryTest/parquet', vector)
> 09:51:41 common/impala_test_suite.py:423: in run_test_case
> 09:51:41 assert False, "Expected exception: %s" % expected_str
> 09:51:41 E   AssertionError: Expected exception: File 
> 'hdfs://localhost:20500/test-warehouse/bad_magic_number_parquet/bad_magic_number.parquet'
>  has an invalid version number: 
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-7586) Incorrect results when querying primary = "\"" in Kudu and HBase

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-7586 started by Tim Armstrong.
-
> Incorrect results when querying primary = "\"" in Kudu and HBase
> 
>
> Key: IMPALA-7586
> URL: https://issues.apache.org/jira/browse/IMPALA-7586
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Will Berkeley
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: correctness, kudu
> Attachments: impalakudu_pred_bug.profile
>
>
> Version string from catalogd web ui:
> {noformat}
> catalogd version 3.1.0-cdh6.x-SNAPSHOT RELEASE (build 
> 8baac7f5849b6bacb02fedeb9b3fe2b2ee9450ee)
> {noformat}
> A reproduction script for the impala-shell:
> {noformat}
> create table test(name string, primary key(name) ) stored as kudu;
> insert into test values ("\"");
> -- Modified 1 row(s), 0 row error(s) in 4.01s
> -- row found in full table scan
> select * from test;
> -- Fetched 1 row(s) in 0.15s
> -- row not found on = predicate (pushed to kudu)
> select * from test where name="\"";
> -- Fetched 0 row(s) in 0.13s
> -- row found when predicate cannot be pushed to kudu
> select * from test where name like "\"";
> -- Fetched 1 row(s) in 0.13s
> {noformat}
> This was originally reported as KUDU-2575. I tried to reproduce directly 
> against Kudu using the python client but got the expected result.
> From the plan and profile, Impala is pushing down the predicate, but Kudu is 
> not being scanned, possibly because the Kudu client short-circuits the scan 
> as having no results based on the predicate Impala pushes down.
> {noformat}
> 00:SCAN KUDU [default.test]
>kudu predicates: name = '"'
>mem-estimate=0B mem-reservation=0B thread-reservation=1
>tuple-ids=0 row-size=15B cardinality=unavailable
>in pipelines: 00(GETNEXT)
> {noformat}
> {noformat}
> KUDU_SCAN_NODE (id=0)
>   - AverageScannerThreadConcurrency: 0.00 (0.0)
>   - InactiveTotalTime: 0ns (0)
>   - KuduRemoteScanTokens: 0 (0)
>   - MaterializeTupleTime(*): 0ns (0)
>   - NumScannerThreadMemUnavailable: 0 (0)
>   - NumScannerThreadsStarted: 1 (1)
>   - PeakMemoryUsage: 24.0 KiB (24576)
>   - PeakScannerThreadConcurrency: 1 (1)
>   - RowBatchBytesEnqueued: 16.0 KiB (16384)
>   - RowBatchQueueGetWaitTime: 0ns (0)
>   - RowBatchQueuePeakMemoryUsage: 0 B (0)
>   - RowBatchQueuePutWaitTime: 0ns (0)
>   - RowBatchesEnqueued: 1 (1)
>   - RowsRead: 0 (0)
> ===>  - RowsReturned: 0 (0)
>   - RowsReturnedRate: 0 per second (0)
>   - ScanRangesComplete: 1 (1)
>   - ScannerThreadsInvoluntaryContextSwitches: 0 (0)
>   - ScannerThreadsTotalWallClockTime: 0ns (0)
> - ScannerThreadsSysTime: 158.00us (158000)
> - ScannerThreadsUserTime: 0ns (0)
>   - ScannerThreadsVoluntaryContextSwitches: 2 (2)
> ===>  - TotalKuduScanRoundTrips: 0 (0)
>   - TotalTime: 1ms (172)
> {noformat}
> I also confirmed Kudu sees no scan from Impala for this query using the 
> /scans page of the tablet servers.
> Full profile attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-7727) failed compute stats child query status no longer propagates to parent query

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-7727 started by Tim Armstrong.
-
> failed compute stats child query status no longer propagates to parent query
> 
>
> Key: IMPALA-7727
> URL: https://issues.apache.org/jira/browse/IMPALA-7727
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Michael Brown
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: regression, stress
> Attachments: 2.12-child-profile.txt, 2.12-compute-stats-profile.txt, 
> 3.1-child-profile.txt, 3.1-compute-stats-profile.txt
>
>
> [~bharathv] since you have been dealing with stats, please take a look. 
> Otherwise feel free to reassign. This bug prevents the stress test from 
> running with compute stats statements. It triggers in non-stressful 
> conditions, too.
> {noformat}
> $ impala-shell.sh -d tpch_parquet
> [localhost:21000] tpch_parquet> set mem_limit=24m;
> MEM_LIMIT set to 24m
> [localhost:21000] tpch_parquet> compute stats customer;
> Query: compute stats customer
> WARNINGS: Cancelled
> [localhost:21000] tpch_parquet>
> {noformat}
> The problem is that the child query didn't have enough memory to run, but 
> this error didn't propagate up.
> {noformat}
> Query (id=384d37fb2826a962:f4b10357):
>   DEBUG MODE WARNING: Query profile created while running a DEBUG build of 
> Impala. Use RELEASE builds to measure query performance.
>   Summary:
> Session ID: d343e1026d497bb0:7e87b342c73c108d
> Session Type: BEESWAX
> Start Time: 2018-10-18 15:16:34.036363000
> End Time: 2018-10-18 15:16:34.177711000
> Query Type: QUERY
> Query State: EXCEPTION
> Query Status: Rejected query from pool default-pool: minimum memory 
> reservation is greater than memory available to the query for buffer 
> reservations. Memory reservation needed given the current plan: 128.00 KB. 
> Adjust either the mem_limit or the pool config (max-query-mem-limit, 
> min-query-mem-limit) for the query to allow the query memory limit to be at 
> least 32.12 MB. Note that changing the mem_limit may also change the plan. 
> See the query profile for more information about the per-node memory 
> requirements.
> Impala Version: impalad version 3.1.0-SNAPSHOT DEBUG (build 
> 9f5c5e6df03824cba292fe5a619153462c11669c)
> User: mikeb
> Connected User: mikeb
> Delegated User: 
> Network Address: :::127.0.0.1:46458
> Default Db: tpch_parquet
> Sql Statement: SELECT COUNT(*) FROM customer
> Coordinator: mikeb-ub162:22000
> Query Options (set by configuration): MEM_LIMIT=25165824,MT_DOP=4
> Query Options (set by configuration and planner): 
> MEM_LIMIT=25165824,NUM_SCANNER_THREADS=1,MT_DOP=4
> Plan: 
> 
> Max Per-Host Resource Reservation: Memory=512.00KB Threads=5
> Per-Host Resource Estimates: Memory=146MB
> F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
> |  Per-Host Resources: mem-estimate=10.00MB mem-reservation=0B 
> thread-reservation=1
> PLAN-ROOT SINK
> |  mem-estimate=0B mem-reservation=0B thread-reservation=0
> |
> 03:AGGREGATE [FINALIZE]
> |  output: count:merge(*)
> |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB 
> thread-reservation=0
> |  tuple-ids=1 row-size=8B cardinality=1
> |  in pipelines: 03(GETNEXT), 01(OPEN)
> |
> 02:EXCHANGE [UNPARTITIONED]
> |  mem-estimate=0B mem-reservation=0B thread-reservation=0
> |  tuple-ids=1 row-size=8B cardinality=1
> |  in pipelines: 01(GETNEXT)
> |
> F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=4
> Per-Host Resources: mem-estimate=136.00MB mem-reservation=512.00KB 
> thread-reservation=4
> 01:AGGREGATE
> |  output: sum_init_zero(tpch_parquet.customer.parquet-stats: num_rows)
> |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB 
> thread-reservation=0
> |  tuple-ids=1 row-size=8B cardinality=1
> |  in pipelines: 01(GETNEXT), 00(OPEN)
> |
> 00:SCAN HDFS [tpch_parquet.customer, RANDOM]
>partitions=1/1 files=1 size=12.34MB
>stored statistics:
>  table: rows=15 size=12.34MB
>  columns: all
>extrapolated-rows=disabled max-scan-range-rows=15
>mem-estimate=24.00MB mem-reservation=128.00KB thread-reservation=0
>tuple-ids=0 row-size=8B cardinality=15
>in pipelines: 00(GETNEXT)
> 
> Estimated Per-Host Mem: 153092096
> Per Host Min Memory Reservation: mikeb-ub162:22000(0) 
> mikeb-ub162:22001(128.00 KB)
> Request Pool: default-pool
> Admission result: Rejected
> Query Compilation: 126.903ms
>- Metadata of all 1 tables cached: 5.484ms (5.484ms)
>- Analysis finished: 16.104ms (10.619ms)
>- Value transfer graph

[jira] [Updated] (IMPALA-7777) Fail queries where the sum of offset and limit exceed the max value of int64

2018-10-29 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-:
-
Affects Version/s: Impala 2.10.0
   Impala 2.11.0
   Impala 3.0
   Impala 2.12.0

> Fail queries where the sum of offset and limit exceed the max value of int64
> 
>
> Key: IMPALA-
> URL: https://issues.apache.org/jira/browse/IMPALA-
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Blocker
>
> A follow up to IMPALA-5004. We should prevent users from running queries 
> where the sum of the offset and limit exceeds some threshold (e.g. 
> {{Long.MAX_VALUE}}). If a user tries to run this query the impalad will 
> crash, so we should reject queries that exceed the threshold.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7565) Extends TAcceptQueueServer connection_setup_pool to be multi-threaded

2018-10-29 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667572#comment-16667572
 ] 

Tim Armstrong commented on IMPALA-7565:
---

[~kwho] [~bikram.sngh91] [~lv] do we think anyone is going to be able to pick 
this up for the 3.1 release? It's the only unassigned blocker.

> Extends TAcceptQueueServer connection_setup_pool to be multi-threaded
> -
>
> Key: IMPALA-7565
> URL: https://issues.apache.org/jira/browse/IMPALA-7565
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients, Distributed Exec
>Affects Versions: Impala 2.11.0, Impala 3.0, Impala 2.12.0
>Reporter: Michael Ho
>Priority: Blocker
>
> In {{TAcceptQueueServer.cpp}}, we currently have one thread in the 
> {{connection_setup_pool}} for handling connection establishment.
> {noformat}
>   // Only using one thread here is sufficient for performance, and it avoids 
> potential
>   // thread safety issues with the thrift code called in SetupConnection.
>   constexpr int CONNECTION_SETUP_POOL_SIZE = 1;
>   // New - this is the thread pool used to process the internal accept queue.
>   ThreadPool> connection_setup_pool("setup-server", 
> "setup-worker",
>   CONNECTION_SETUP_POOL_SIZE, FLAGS_accepted_cnxn_queue_depth,
>   [this](int tid, const shared_ptr& item) {
> this->SetupConnection(item);
>   });
> {noformat}
> While that makes the code easier to reason about, it also makes Impala less 
> robust in case a client intentionally or unintentionally freezes during 
> connection establishment. For instance, one can telnet to the beeswax or HS2 
> port of Impala (e.g. telnet localhost:21000) and then leave the telnet 
> session open. In a secure cluster with TLS enabled, Impalad will be stuck in 
> the SSL handshake. Other clients trying to connect to that port (e.g. Impala 
> shell) will hang forever.
> {noformat}
> Thread 551 (Thread 0x7fddde563700 (LWP 166354)):
> #0  0x003ce2a0e82d in read () from /lib64/libpthread.so.0
> #1  0x003ce56dea71 in ?? () from /usr/lib64/libcrypto.so.10
> #2  0x003ce56dcdc9 in BIO_read () from /usr/lib64/libcrypto.so.10
> #3  0x003ce9a31873 in ssl23_read_bytes () from /usr/lib64/libssl.so.10
> #4  0x003ce9a2fe63 in ssl23_get_client_hello () from 
> /usr/lib64/libssl.so.10
> #5  0x003ce9a302f3 in ssl23_accept () from /usr/lib64/libssl.so.10
> #6  0x0208ebd5 in 
> apache::thrift::transport::TSSLSocket::checkHandshake() ()
> #7  0x0208edbc in 
> apache::thrift::transport::TSSLSocket::read(unsigned char*, unsigned int) ()
> #8  0x0208b6f3 in unsigned int 
> apache::thrift::transport::readAll(apache::thrift::transport::TSocket&,
>  unsigned char*, unsigned int) ()
> #9  0x00cb2aa9 in 
> apache::thrift::transport::TSaslTransport::receiveSaslMessage(apache::thrift::transport::NegotiationStatus*,
>  unsigned int*) ()
> #10 0x00cb03e4 in 
> apache::thrift::transport::TSaslServerTransport::handleSaslStartMessage() ()
> #11 0x00cb2c23 in 
> apache::thrift::transport::TSaslTransport::doSaslNegotiation() ()
> #12 0x00cb10b8 in 
> apache::thrift::transport::TSaslServerTransport::Factory::getTransport(boost::shared_ptr)
>  ()
> #13 0x00b13e47 in 
> apache::thrift::server::TAcceptQueueServer::SetupConnection(boost::shared_ptr)
>  ()
> #14 0x00b14932 in 
> boost::detail::function::void_function_obj_invoker2  boost::shared_ptr const&)#1}, void, 
> int, boost::shared_ptr 
> const&>::invoke(boost::detail::function::function_buffer&, int, 
> boost::shared_ptr const&) ()
> #15 0x00b177f9 in 
> impala::ThreadPool 
> >::WorkerThread(int) ()
> #16 0x00d602af in 
> impala::Thread::SuperviseThread(std::basic_string std::char_traits, std::allocator > const&, 
> std::basic_string, std::allocator > 
> const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) ()
> #17 0x00d60aaa in boost::detail::thread_data void (*)(std::basic_string, std::allocator 
> > const&, std::basic_string, 
> std::allocator > const&, boost::function, 
> impala::ThreadDebugInfo const*, impala::Promise*), 
> boost::_bi::list5 std::char_traits, std::allocator > >, 
> boost::_bi::value, 
> std::allocator > >, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> > > >::run() ()
> #18 0x012d756a in thread_proxy ()
> #19 0x003ce2a07aa1 in start_thread () from /lib64/libpthread.so.0
> #20 0x003ce26e893d in clone () from /lib64/libc.so.6
> {noformat}
> While it's not a complete fix, increasing the number of threads in 
> {{connection_setup_pool}} makes it more robust against this kind of problem. 
> Ideally, the number of threads in the thread pool should be configurable.



--
This message was sent by Atlassian JIRA

[jira] [Updated] (IMPALA-7777) Fail queries where the sum of offset and limit exceed the max value of int64

2018-10-29 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-:
-
Issue Type: Bug  (was: Improvement)

> Fail queries where the sum of offset and limit exceed the max value of int64
> 
>
> Key: IMPALA-
> URL: https://issues.apache.org/jira/browse/IMPALA-
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Blocker
>
> A follow up to IMPALA-5004. We should prevent users from running queries 
> where the sum of the offset and limit exceeds some threshold (e.g. 
> {{Long.MAX_VALUE}}). If a user tries to run this query the impalad will 
> crash, so we should reject queries that exceed the threshold.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-7266) test_insert failure: Unable to drop partition on s3

2018-10-29 Thread Bikramjeet Vig (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikramjeet Vig reassigned IMPALA-7266:
--

Assignee: Bikramjeet Vig  (was: Sailesh Mukil)

> test_insert failure: Unable to drop partition on s3
> ---
>
> Key: IMPALA-7266
> URL: https://issues.apache.org/jira/browse/IMPALA-7266
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Bikramjeet Vig
>Assignee: Bikramjeet Vig
>Priority: Major
>  Labels: broken-build
>
> {noformat}
> 06:01:28 === FAILURES 
> ===
> 06:01:28  TestInsertQueries.test_insert[exec_option: {'sync_ddl': 0, 
> 'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: text/none] 
> 06:01:28 query_test/test_insert.py:122: in test_insert
> 06:01:28 multiple_impalad=vector.get_value('exec_option')['sync_ddl'] == 
> 1)
> 06:01:28 .../Impala/tests/common/impala_test_suite.py:366: in run_test_case
> 06:01:28 self.execute_test_case_setup(test_section['SETUP'], 
> table_format_info)
> 06:01:28 .../Impala/tests/common/impala_test_suite.py:489: in 
> execute_test_case_setup
> 06:01:28 self.__drop_partitions(db_name, table_name)
> 06:01:28 .../Impala/tests/common/impala_test_suite.py:614: in 
> __drop_partitions
> 06:01:28 partition, True), 'Could not drop partition: %s' % partition
> 06:01:28 .../Impala/shell/gen-py/hive_metastore/ThriftHiveMetastore.py:2862: 
> in drop_partition_by_name
> 06:01:28 return self.recv_drop_partition_by_name()
> 06:01:28 .../Impala/shell/gen-py/hive_metastore/ThriftHiveMetastore.py:2891: 
> in recv_drop_partition_by_name
> 06:01:28 raise result.o2
> 06:01:28 E   MetaException: MetaException(_message='No such file or 
> directory: 
> s3a://impala-cdh5-s3-test/test-warehouse/functional.db/alltypesinsert/year=2009/month=4')
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7777) Fail queries where the sum of offset and limit exceed the max value of int64

2018-10-29 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-:
-
Priority: Blocker  (was: Major)

> Fail queries where the sum of offset and limit exceed the max value of int64
> 
>
> Key: IMPALA-
> URL: https://issues.apache.org/jira/browse/IMPALA-
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Blocker
>
> A follow up to IMPALA-5004. We should prevent users from running queries 
> where the sum of the offset and limit exceeds some threshold (e.g. 
> {{Long.MAX_VALUE}}). If a user tries to run this query the impalad will 
> crash, so we should reject queries that exceed the threshold.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-7747) Clean up the Expression Rewriter

2018-10-29 Thread Paul Rogers (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned IMPALA-7747:
---

   Assignee: (was: Paul Rogers)
   Priority: Minor  (was: Major)
Description: 
This is a roll-up of a number of minor clean-up tasks for the expression 
rewriter. None of this stuff is urgent; we know this bit of code has many 
opportunities for improvement, but we might as well capture what we know.

IMPALA-7655 asks to revisit the rewrite rules for several conditional 
functions. [~philip] suggested that the rewrite rules should apply to [all of 
them|https://impala.apache.org/docs/build3x/html/topics/impala_conditional_functions.html].
 To keep IMPALA-7655 focused, the larger review is presented here, along with 
suggested  opportunities to modernize the front-end rewrite rules.

This is the top-level task for the review tasks, each change is identified by a 
sub-task or linked task in order to keep each code review task small.

h4. Overview

The full set of conditional functions include:

{noformat}
if(boolean condition, type ifTrue, type ifFalseOrNull)
ifnull(type a, type ifNull)
isfalse(boolean)
isnotfalse(boolean)
isnottrue(boolean)
isnull(type a, type ifNull)
istrue(boolean)
nonnullvalue(expression)
nullif(expr1,expr2)
nullifzero(numeric_expr)
nullvalue(expression)
nvl(type a, type ifNull)
nvl2(type a, type ifNull, type ifNotNull)
zeroifnull(numeric_expr)
{noformat}

Turns out conditionals are complex as substantial prior work has gone into 
optimizations. The FE has a number of transforms that affect specific 
conditional statements. The BE has additional transforms. To proceed, each 
operation must be tracked through the system one by one.

The discussion below summarizes the state of each of the Impala conditional 
functions to identify the path needed to implement the requested changes, and 
to ensure that the changes don't impact other functionality. We also point out 
a few out-of-scope nice-to-haves as we go along.

In general, all the action here is in just a few places:

* {{sql-parser.cup}} in which syntax is reduced to parse nodes such as 
functions or operators. The parser unifies certain constructs such as {{<=>}} 
and {{IS NOT DISTINCT FROM}}.
* {{FunctionCallExpr.createExpr()}} is given a function-like definition and 
converts some of them to other forms ({{decode()}}, {{nvl2(}}, {{nullif()}}. A 
nice-to-have would be to move this logic to 
{{SimplifyConditionalsRule.apply()}} so we have a uniform way of doing 
transforms.
* {{SimplifyConditionalsRule}} does a great many transforms of various 
conditional rules. (We will add more for this task.)
* {{impala_functions.py}} in the BE provides a mapping from remaining functions 
(those not optimized away above) to implementations. All functions listed here 
are cross-compiled into LLVM along with a generated wrapper function that binds 
the function to its set of arguments.
* {{conditional-functions.[h|cc]}} handles special case functions that require 
short-circuit argument evaluation ({{isull()}}, {{if()}}, {{coalesce()}}). 
These three functions are never code generated. The goal of this task is to 
convert these into a code generated for using {{CASE}}.

For all expressions, the planner does a check for all-constant expressions 
(such as {{NULL IS NOT NULL}} or {{(10 = 9) IS TRUE}}) and replaces them with 
the result of the expression by using the BE to interpret the partial 
constant-only expression tree. As a result, the rewrite steps focus on the 
non-trivial cases that require knowledge of the semantics of a given function.

In the suggestions that follow, we rewrite certain functions into {{CASE}}. 
But, in so doing, we end up evaluating certain terms twice. IMPALA-7737 asks to 
resolve that issue.

Below is a summary of each conditional function that identifies current state 
and any changes that might be possible.

h4. {{CASE ...}}

BE: Interpreted when in the {{SELECT}} clause (IMPALA-4356). Code generated 
when in the {{WHERE}} clause or in a join.

h4. {{x IS [NOT] (TRUE | FALSE)}}

FE, {{sql-parser.cup}}: captured as a {{FunctionCallExpr}} for the equivalent 
{{ISTRUE\(x)}}, etc. function.

h4. {{x IS [NOT] NULL}}

FE, {{sql-parser.cup}}: captured as a {{IsNullPredicate}}. (Note that this is 
the opposite of {{IS TRUE}}, etc.)

BE: Cross compiled as a UDF: {{IsNullPredicate::Is[Not]Null}}, with wrapper.

h4. {{IS[NOT](TRUE|FALSE)\(x)}}
 
BE: Implemented in {{ConditionalFunctions::IsTrue}}, etc.

h4. {{NULLIF(expr1, expr2)}}

FE, {{FunctionCallExpr}}: {{nullif(expr1, expr2)}}  {{if(expr1 IS 
DISTINCT FROM expr2, expr1, NULL)}}

{{NULLIF()}} and {{NVL2()}} vanish from the plan after this step. There is no 
entry for {{nullify()}} in {{impala_functions.py}}.

Note that the implementation here is different from the 
[docs|https://impala.apache.org/docs/build3x/html/topics/impala_conditional_functions.html]

[jira] [Commented] (IMPALA-7747) Review and modernize conditional function rewrites

2018-10-29 Thread Paul Rogers (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667528#comment-16667528
 ] 

Paul Rogers commented on IMPALA-7747:
-

Other issues found in IMPALA-7655 and deferred:

h4. Special Aggregate Handling

Another issue is how IMPALA-5125 was implemented: if rewriting an expression 
drops the last aggregate, the fix simply throws away the rewrite. This is fine 
for simplifications, but not for the conditional rewrites. Since there is no BE 
implementation for the conditional functions, we can't revert to the original 
form. This means the the conditional rewrites must be aware of aggregates.

h4. Always Perform Conditional Function Rewrites

Turns out that the user can disable optimization. At present, the conditional 
function rewrite is is in the new {{RewriteConditionalFnsRule}} and is 
optional. By making this one rule required we can remove the BE implementation 
as required above.

h4. Handle {{NVL2()}} in the Function Registry

We could create function registry entries for {{nvl2()}} so that it can be 
rewritten in the same place as the other conditional functions (directly to 
{{CASE}} without first going through {{if()}}). Since the signature of this 
function is n^2 in types, the function entry would be huge (which is probably 
why it was rewritten very early in the parse process originally.) A special 
handling of {{NVL2()}} (and, later, {{DECODE()}}, see IMPALA-7747) is needed. 
The special handling can exploit the fact that {{NVL2()}} is a useful fiction; 
a short-hand for a {{CASE}}, rather than being a real function that must 
enforce stricter rules.

h4. Issues with IS [NOT] DISTINCT FROM Optimizations

See IMPALA-7755.

h4. Incomplete Analysis After Rules Fire

See IMPALA-7754.

h4. Rewrite of the Wrong Order By Expressions

See IMPALA-7753.

h4. Invalid Logic to Reset a SELECT Statement

See IMPALA-7752.

> Review and modernize conditional function rewrites
> --
>
> Key: IMPALA-7747
> URL: https://issues.apache.org/jira/browse/IMPALA-7747
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> IMPALA-7655 asks to revisit the rewrite rules for several conditional 
> functions. [~philip] suggested that the rewrite rules should apply to [all of 
> them|https://impala.apache.org/docs/build3x/html/topics/impala_conditional_functions.html].
>  To keep IMPALA-7655 focused, the larger review is presented here, along with 
> suggested  opportunities to modernize the front-end rewrite rules.
> This is the top-level task for the review tasks, each change is identified by 
> a sub-task or linked task in order to keep each code review task small.
> h4. Overview
> The full set of conditional functions include:
> {noformat}
> if(boolean condition, type ifTrue, type ifFalseOrNull)
> ifnull(type a, type ifNull)
> isfalse(boolean)
> isnotfalse(boolean)
> isnottrue(boolean)
> isnull(type a, type ifNull)
> istrue(boolean)
> nonnullvalue(expression)
> nullif(expr1,expr2)
> nullifzero(numeric_expr)
> nullvalue(expression)
> nvl(type a, type ifNull)
> nvl2(type a, type ifNull, type ifNotNull)
> zeroifnull(numeric_expr)
> {noformat}
> Turns out conditionals are complex as substantial prior work has gone into 
> optimizations. The FE has a number of transforms that affect specific 
> conditional statements. The BE has additional transforms. To proceed, each 
> operation must be tracked through the system one by one.
> The discussion below summarizes the state of each of the Impala conditional 
> functions to identify the path needed to implement the requested changes, and 
> to ensure that the changes don't impact other functionality. We also point 
> out a few out-of-scope nice-to-haves as we go along.
> In general, all the action here is in just a few places:
> * {{sql-parser.cup}} in which syntax is reduced to parse nodes such as 
> functions or operators. The parser unifies certain constructs such as {{<=>}} 
> and {{IS NOT DISTINCT FROM}}.
> * {{FunctionCallExpr.createExpr()}} is given a function-like definition and 
> converts some of them to other forms ({{decode()}}, {{nvl2(}}, {{nullif()}}. 
> A nice-to-have would be to move this logic to 
> {{SimplifyConditionalsRule.apply()}} so we have a uniform way of doing 
> transforms.
> * {{SimplifyConditionalsRule}} does a great many transforms of various 
> conditional rules. (We will add more for this task.)
> * {{impala_functions.py}} in the BE provides a mapping from remaining 
> functions (those not optimized away above) to implementations. All functions 
> listed here are cross-compiled into LLVM along with a generated wrapper 
> function that binds the function to its set of arguments.
> *

[jira] [Comment Edited] (IMPALA-7655) Codegen output for conditional functions (if,isnull, coalesce) is very suboptimal

2018-10-29 Thread Paul Rogers (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659482#comment-16659482
 ] 

Paul Rogers edited comment on IMPALA-7655 at 10/29/18 5:35 PM:
---

To summarize the proposed changes:
 * Rewrite {{coalesce()}} to use {{CASE}}.
 * Rewrite {{if()}} to use {{CASE}}.
 * Split conditional rewrite tests out of {{ExprRewriteRulesTest}} into a new 
test class: {{SimplifyConditionalRulesTest}}.
 * Modify the above to test the new rewrite rules.
 * Add "full rewrite" tests to ensure that the rewritten CASE is, itself, 
further simplified.


was (Author: paul.rogers):
To summarize the proposed changes:

* Rewrite {{isnull()}} to use {{CASE}}. (Done)
* Rewrite {{coalesce()}} to use {{CASE}}. (Done)
* Rewrite {{if()}} to use {{CASE}}. (Done)
* Split conditional rewrite tests out of {{ExprRewriteRulesTest}} into a new 
test class: {{SimplifyConditionalRulesTest}}. (Done)
* Modify the above to test the new rewrite rules. (Done)
* Remove special handling for the three functions from 
{{conditional-functions.[h|cc]}} (Done)
* Add tests for the additional simplifications described above. (Done)
* Run BE tests to ensure that query functionality is unchanged.
* Run ad-hoc performance tests, of the kind the ticket description, to check 
resulting performance.

It turns out that the {{nvl2()}} and {{nullif()}} functions are rewritten to 
use {{if()}}, which we will rewrite above. To better handle this new reality, 
add rewrite rules that map these two functions directly to {{CASE}} statements. 
(Done)

> Codegen output for conditional functions (if,isnull, coalesce) is very 
> suboptimal
> -
>
> Key: IMPALA-7655
> URL: https://issues.apache.org/jira/browse/IMPALA-7655
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Paul Rogers
>Priority: Major
>  Labels: codegen, perf, performance
>
> https://gerrit.cloudera.org/#/c/11565/ provided a clue that an aggregation 
> involving an if() function was very slow, 10x slower than the equivalent 
> version using a case:
> {noformat}
> [localhost:21000] default> set num_nodes=1; set mt_dop=1; select count(case 
> when l_orderkey is NULL then 1 else NULL end) from 
> tpch10_parquet.lineitem;summary;
> NUM_NODES set to 1
> MT_DOP set to 1
> Query: select count(case when l_orderkey is NULL then 1 else NULL end) from 
> tpch10_parquet.lineitem
> Query submitted at: 2018-10-04 11:17:31 (Coordinator: 
> http://tarmstrong-box:25000)
> Query progress can be monitored at: 
> http://tarmstrong-box:25000/query_plan?query_id=274b2a6f35cefe31:95a19642
> +--+
> | count(case when l_orderkey is null then 1 else null end) |
> +--+
> | 0|
> +--+
> Fetched 1 row(s) in 0.51s
> +--++--+--+++--+---+-+
> | Operator | #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows | Peak 
> Mem | Est. Peak Mem | Detail  |
> +--++--+--+++--+---+-+
> | 01:AGGREGATE | 1  | 44.03ms  | 44.03ms  | 1  | 1  | 25.00 
> KB | 10.00 MB  | FINALIZE|
> | 00:SCAN HDFS | 1  | 411.57ms | 411.57ms | 59.99M | -1 | 16.61 
> MB | 88.00 MB  | tpch10_parquet.lineitem |
> +--++--+--+++--+---+-+
> [localhost:21000] default> set num_nodes=1; set mt_dop=1; select 
> count(if(l_orderkey is NULL, 1, NULL)) from tpch10_parquet.lineitem;summary;
> NUM_NODES set to 1
> MT_DOP set to 1
> Query: select count(if(l_orderkey is NULL, 1, NULL)) from 
> tpch10_parquet.lineitem
> Query submitted at: 2018-10-04 11:23:07 (Coordinator: 
> http://tarmstrong-box:25000)
> Query progress can be monitored at: 
> http://tarmstrong-box:25000/query_plan?query_id=8e46ab1b84c4dbff:2786ca26
> ++
> | count(if(l_orderkey is null, 1, null)) |
> ++
> | 0  |
> ++
> Fetched 1 row(s) in 1.01s
> +--++--+--+++--+---+-+
> | Operator | #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows | Peak 
> Mem | Est. Peak Mem | Detail  |
>

[jira] [Comment Edited] (IMPALA-7655) Codegen output for conditional functions (if,isnull, coalesce) is very suboptimal

2018-10-29 Thread Paul Rogers (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657532#comment-16657532
 ] 

Paul Rogers edited comment on IMPALA-7655 at 10/29/18 5:34 PM:
---

Work on this ticket ran into a number of known and new bugs in the expression 
rewriter. All of those issues are out of scope for this fix. As a result, this 
fix will simply rewrite the three functions in question to a {{CASE}} 
statement. {{if()}} and {{isnull()}} are rewritten directly to a {{CASE}} 
statement which other rewrite rules simplify (or not, depending on the other 
bugs.)

{{coalesce()}} retains its existing simplification code, which is extended to 
emit a {{CASE}} statement rather than a simplified {{coalesce()}} call. When 
doing so, extended the simplification with two additional optimizations.

# Remove not only leading null values, but all null values.
# Special case not just the last non-null literal, but rather when encountering 
the first such value, drop all remaining terms.

The existing BE implementation is retained because bugs and limitations mean 
that there will still be paths in which the interpreted versions are called:

# If the user disables rewrites.
# If the functions occur in the {{ORDER BY}} clause.


was (Author: paul.rogers):
Work on this ticket required a broad review of conditional functions as 
summarized in IMPALA-7747. The notes below focus on the functions covered in 
this ticket.

h4. {{ISNULL(a, b)}}

BE: Alias for this method exist in {{impala_functions.py}}, special 
implementation in {{conditional-functions.[h|cc]}}.

*Suggestion:* Rewrite as:

{code:sql}
CASE a IS NULL THEN b ELSE a END
{code}

Since {{isnull()}} would vanish from the plan after this transform, remove the 
BE implementation. Ensure that the entry in {{impala_functions.py}} remains so 
that the function appears in the list of  built-in functions.

h4. {{NVL(a, b)}} \\ {{IFNULL(a, b)}}

FE, {{SimplifyConditional}}: Treated same as {{ISNULL(a, b)}}, but is not 
rewritten to this form.

BE: Alias for this method exist in {{impala_functions.py}}.

*Suggestion:* Rewrite to {{ISNULL(a, b)}}, to make things a bit more tidy.

h4. {{IF(cond, trueExpr, falseExpr)}}

FE: {{SimplifyConditional}} performs basic simplifications.

BE: Implemented in  {{conditional-functions.[h|cc]}} as an interpreted-only 
function to allow short-circuit argument evaluation.

*Suggestion:* Rewrite in the FE to

{code:sql}
CASE WHEN cond THEN trueExpr ELSE falseExpr END
{code}

{{IF()}} will then vanish from the plan so remove the BE implementation, 
leaving the entry in {{impala_functions.py}}.

h4. {{COALESCE(e1, e2, … en)}}

FE: {{SimplifyConditional}} performs basic simplifications.

BE: Implemented in {{conditional-functions.[h|cc]}} as an interpreted-only 
function to allow short-circuit argument evaluation.

*Suggestion:* Rewrite in the FE to

{noformat}
CASE WHEN [ei IS NOT NULL THEN ei]* ELSE en END
{noformat}

When doing so, extend two existing optimizations.

1. Remove not only leading null values, but all null values.
2. Special case not just the last non-null literal, but rather when 
encountering the first such value, drop all remaining terms.

{{COLAESCE()}} will then vanish from the plan so remove the BE implementation.

h4. Remove {{conditional-functions.[h|cc]}}

Since the above will remove the the three special conditional functions, remove 
{{conditional-functions.[h|cc]}} as well.

> Codegen output for conditional functions (if,isnull, coalesce) is very 
> suboptimal
> -
>
> Key: IMPALA-7655
> URL: https://issues.apache.org/jira/browse/IMPALA-7655
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Paul Rogers
>Priority: Major
>  Labels: codegen, perf, performance
>
> https://gerrit.cloudera.org/#/c/11565/ provided a clue that an aggregation 
> involving an if() function was very slow, 10x slower than the equivalent 
> version using a case:
> {noformat}
> [localhost:21000] default> set num_nodes=1; set mt_dop=1; select count(case 
> when l_orderkey is NULL then 1 else NULL end) from 
> tpch10_parquet.lineitem;summary;
> NUM_NODES set to 1
> MT_DOP set to 1
> Query: select count(case when l_orderkey is NULL then 1 else NULL end) from 
> tpch10_parquet.lineitem
> Query submitted at: 2018-10-04 11:17:31 (Coordinator: 
> http://tarmstrong-box:25000)
> Query progress can be monitored at: 
> http://tarmstrong-box:25000/query_plan?query_id=274b2a6f35cefe31:95a19642
> +--+
> | count(case when l_orderkey is null then 1 else null end) |
> +--+
> | 0

[jira] [Assigned] (IMPALA-7754) Expressions sometimes not re-analyzed after rewrite

2018-10-29 Thread Paul Rogers (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned IMPALA-7754:
---

Assignee: (was: Paul Rogers)

> Expressions sometimes not re-analyzed after rewrite
> ---
>
> Key: IMPALA-7754
> URL: https://issues.apache.org/jira/browse/IMPALA-7754
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Priority: Major
>
> The analyzer has a chain of rules which fire in order without (as noted 
> above) repeats. The result of rule A (rewriting conditional functions) is fed 
> into rule B (simplify CASE). Each rule requires that analysis be done so that 
> attributes of expressions can be picked out.
> As it turns out, in the current code, this is rather ad-hoc. The 
> {{SimplifyConditionalsRule}} re-analyzes its result as part of the fix for 
> IMPALA-5125, but others do not, leading to optimizations not working. In 
> particular, in a chain of rewrites for {{IS DISTINCT FROM}}, certain rules 
> didn't fire because previous rules left new expressions in an un-analyzed 
> state. This is a bug.
> The fix is to analyze the result any time a rule fires, before passing the 
> result to the next rule.
> {code:java}
>   private Expr applyRuleBottomUp(Expr expr, ExprRewriteRule rule, Analyzer 
> analyzer)
>   throws AnalysisException {
> ...
> Expr rewrittenExpr = rule.apply(expr, analyzer);
> if (rewrittenExpr != expr) {
>   ++numChanges_;
>   rewrittenExpr.analyze(analyzer); // Add me!
> }}
> return rewrittenExpr;
>   }
> {code}
> There are several places that the above logic appears: make the change in all 
> of them.
> Then, in rules that simply refused to run if an expression is to analyzed:
> {code:java}
> public class SimplifyDistinctFromRule implements ExprRewriteRule {
>   public Expr apply(Expr expr, Analyzer analyzer) {
> if (!expr.isAnalyzed()) return expr;
> {code}
> Replace this with an assertion that analysis must have been done:
> {code:java}
> public class SimplifyDistinctFromRule implements ExprRewriteRule {
>   public Expr apply(Expr expr, Analyzer analyzer) {
> assert expr.isAnalyzed();
> {code}
> To be safe, the assertion fires only in "debug" mode, not in production.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7754) Expressions sometimes not re-analyzed after rewrite

2018-10-29 Thread Paul Rogers (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated IMPALA-7754:

Description: 
The analyzer has a chain of rules which fire in order without (as noted above) 
repeats. The result of rule A (rewriting conditional functions) is fed into 
rule B (simplify CASE). Each rule requires that analysis be done so that 
attributes of expressions can be picked out.

As it turns out, in the current code, this is rather ad-hoc. The 
{{SimplifyConditionalsRule}} re-analyzes its result as part of the fix for 
IMPALA-5125, but others do not, leading to optimizations not working. In 
particular, in a chain of rewrites for {{IS DISTINCT FROM}}, certain rules 
didn't fire because previous rules left new expressions in an un-analyzed 
state. This is a bug.

The fix is to analyze the result any time a rule fires, before passing the 
result to the next rule.

{code:java}
  private Expr applyRuleBottomUp(Expr expr, ExprRewriteRule rule, Analyzer 
analyzer)
  throws AnalysisException {
...
Expr rewrittenExpr = rule.apply(expr, analyzer);
if (rewrittenExpr != expr) {
  ++numChanges_;
  rewrittenExpr.analyze(analyzer); // Add me!
}}
return rewrittenExpr;
  }
{code}

There are several places that the above logic appears: make the change in all 
of them.

Then, in rules that simply refused to run if an expression is to analyzed:

{code:java}
public class SimplifyDistinctFromRule implements ExprRewriteRule {
  public Expr apply(Expr expr, Analyzer analyzer) {
if (!expr.isAnalyzed()) return expr;
{code}

Replace this with an assertion that analysis must have been done:

{code:java}
public class SimplifyDistinctFromRule implements ExprRewriteRule {
  public Expr apply(Expr expr, Analyzer analyzer) {
assert expr.isAnalyzed();
{code}

To be safe, the assertion fires only in "debug" mode, not in production.

  was:
The analyzer has a chain of rules which fire in order without (as noted above) 
repeats. The result of rule A (rewriting conditional functions) is fed into 
rule B (simplify CASE). Each rule requires that analysis be done so that 
attributes of expressions can be picked out.

As it turns out, in the current code, this is rather ad-hoc. The 
{{SimplifyConditionalsRule}} re-analyzes its result as part of the fix for 
IMPALA-5125, but others do not, leading to optimizations not working. In 
particular, in a chain of rewrites for {{IS DISTINCT FROM}}, certain rules 
didn't fire because previous rules left new expressions in an un-analyzed 
state. This is a bug.

The fix is to analyze the result any time a rule fires, before passing the 
result to the next rule.Then, in rules that simply refused to run if an 
expression is to analyzed:

{code:java}
public class SimplifyDistinctFromRule implements ExprRewriteRule {
  public Expr apply(Expr expr, Analyzer analyzer) {
if (!expr.isAnalyzed()) return expr;
{code}

Replace this with an assertion that analysis must have been done:

{code:java}
public class SimplifyDistinctFromRule implements ExprRewriteRule {
  public Expr apply(Expr expr, Analyzer analyzer) {
assert expr.isAnalyzed();
{code}

To be safe, the assertion fires only in "debug" mode, not in production.


> Expressions sometimes not re-analyzed after rewrite
> ---
>
> Key: IMPALA-7754
> URL: https://issues.apache.org/jira/browse/IMPALA-7754
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> The analyzer has a chain of rules which fire in order without (as noted 
> above) repeats. The result of rule A (rewriting conditional functions) is fed 
> into rule B (simplify CASE). Each rule requires that analysis be done so that 
> attributes of expressions can be picked out.
> As it turns out, in the current code, this is rather ad-hoc. The 
> {{SimplifyConditionalsRule}} re-analyzes its result as part of the fix for 
> IMPALA-5125, but others do not, leading to optimizations not working. In 
> particular, in a chain of rewrites for {{IS DISTINCT FROM}}, certain rules 
> didn't fire because previous rules left new expressions in an un-analyzed 
> state. This is a bug.
> The fix is to analyze the result any time a rule fires, before passing the 
> result to the next rule.
> {code:java}
>   private Expr applyRuleBottomUp(Expr expr, ExprRewriteRule rule, Analyzer 
> analyzer)
>   throws AnalysisException {
> ...
> Expr rewrittenExpr = rule.apply(expr, analyzer);
> if (rewrittenExpr != expr) {
>   ++numChanges_;
>   rewrittenExpr.analyze(analyzer); // Add me!
> }}
> return rewrittenExpr;
>   }
> {code}
> There are several places that the above logic appears: make the change in all

[jira] [Updated] (IMPALA-7769) Handle CAST(NULL AS type) in rewrites

2018-10-29 Thread Paul Rogers (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated IMPALA-7769:

Description: 
Consider the following query:

{code:sql}
SELECT IFNULL(NULL + 1, id) FROM alltypessmall;
{code}

Visualize the rewritten query after analysis (using the new 
{{FullRewriteTest}}.) We get:

{code:sql}
SELECT CASE WHEN CAST(NULL AS INT) IS NULL THEN id ELSE NULL END FROM 
alltypessmall
{code}

(This is what the expression actually contains. The {{toSql()}} method lies and 
says that the statement is:

{code:sql}
SELECT CASE WHEN NULL IS NULL THEN id ELSE NULL END FROM alltypessmall
{code}

which causes confusion when debugging.)

Expected:

{code:sql}
SELECT id FROM alltypessmall
{code}

Another case:

{code:sql}
CASE WHEN NULL + 1 THEN 10 ELSE 20 END
{code}

Ensure the {{NULL + 1}} is constant folded with a cast. Then, in {{CASE}}:

{code:java}
for (int i = loopStart; i < numChildren - 1; i += 2) {
  if (expr.getChild(i).isLiteral()) {
canSimplify = true;
break;
  }
}
{code}

The {{isLiteral()}} method won’t match the cast and the simplification won’t 
fire.

The reason is that there are multiple rewrites, one of which has a flaw.

# Rewrite {{IFNULL to CASE}}
# Rewrite {{NULL + 1}} to {{CAST(NULL AS SMALLINT)}}
# Try to rewrite {{CAST(NULL AS SMALLINT) IS NULL}}, but fail because CAST is 
not a literal.

In addition to the {{CASE}} issue above, the {{FoldConstantsRule}} itself is 
tripped up. It is supposed to simplify the {{... IS NULL}} expression above, 
but does not.

The code in question in {{FoldConstantsRule.apply()}} is:

{code:java}
for (Expr child: expr.getChildren()) if (!child.isLiteral()) return expr;
{code}

In fact, this check is too restrictive. Need a new {{isLiteralLike()}} which 
should work like {{IsNullLiteral()}}:

* True if the node is a literal.
* True if the node is a cast of a literal.

(Can't change {{isLiteral()}} since there are places that assume that this 
existing predicate indicates that the node is exactly a {{LiteralExpr}}.)

Impala already has a predicate that does what is needed: {{isConstant()}}. 
However, the code in {{FoldConstantsRule.apply()}} specifically excludes 
calling it:

{code:java}
// Avoid calling Expr.isConstant() because that would lead to repeated 
traversals
// of the Expr tree. Assumes the bottom-up application of this rule. 
Constant
// children should have been folded at this point.
{code}

The new method solves the repeated traversal problem. With it, the test query 
now simplifies to the expected result.

  was:
Consider the following query:

{code:sql}
SELECT IFNULL(NULL + 1, id) FROM alltypessmall;
{code}

Visualize the rewritten query after analysis (using the new 
{{FullRewriteTest}}.) We get:

{code:sql}
SELECT CASE WHEN CAST(NULL AS INT) IS NULL THEN id ELSE NULL END FROM 
alltypessmall
{code}

(This is what the expression actually contains. The {{toSql()}} method lies and 
says that the statement is:

{code:sql}
SELECT CASE WHEN NULL IS NULL THEN id ELSE NULL END FROM alltypessmall
{code}

which causes confusion when debugging.)

Expected:

{code:sql}
SELECT id FROM alltypessmall
{code}

The reason is that there are multiple rewrites, one of which has a flaw.

# Rewrite IFNULL to CASE
# Rewrite NULL + 1 to CAST(NULL AS SMALLINT)
# Try to rewrite CAST(NULL AS SMALLINT) IS NULL, but fail because CAST is not a 
literal.

The code in question in {{FoldConstantsRule.apply()}} is:

{code:java}
for (Expr child: expr.getChildren()) if (!child.isLiteral()) return expr;
{code}

In fact, this check is too restrictive. Need a new {{isLiteralLike()}} which 
should work like {{IsNullLiteral()}}:

* True if the node is a literal.
* True if the node is a cast of a literal.

(Can't change {{isLiteral()}} since there are places that assume that this 
existing predicate indicates that the node is exactly a {{LiteralExpr}}.)

Impala already has a predicate that does what is needed: {{isConstant()}}. 
However, the code in {{FoldConstantsRule.apply()}} specifically excludes 
calling it:

{code:java}
// Avoid calling Expr.isConstant() because that would lead to repeated 
traversals
// of the Expr tree. Assumes the bottom-up application of this rule. 
Constant
// children should have been folded at this point.
{code}

The new method solves the repeated traversal problem. With it, the test query 
now simplifies to the expected result.


> Handle CAST(NULL AS type) in rewrites
> -
>
> Key: IMPALA-7769
> URL: https://issues.apache.org/jira/browse/IMPALA-7769
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Consider the following query:
> {code:sql}
> SELECT IFNULL(NULL + 1, id)

[jira] [Assigned] (IMPALA-7769) Handle CAST(NULL AS type) in rewrites

2018-10-29 Thread Paul Rogers (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned IMPALA-7769:
---

Assignee: (was: Paul Rogers)

> Handle CAST(NULL AS type) in rewrites
> -
>
> Key: IMPALA-7769
> URL: https://issues.apache.org/jira/browse/IMPALA-7769
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Consider the following query:
> {code:sql}
> SELECT IFNULL(NULL + 1, id) FROM alltypessmall;
> {code}
> Visualize the rewritten query after analysis (using the new 
> {{FullRewriteTest}}.) We get:
> {code:sql}
> SELECT CASE WHEN CAST(NULL AS INT) IS NULL THEN id ELSE NULL END FROM 
> alltypessmall
> {code}
> (This is what the expression actually contains. The {{toSql()}} method lies 
> and says that the statement is:
> {code:sql}
> SELECT CASE WHEN NULL IS NULL THEN id ELSE NULL END FROM alltypessmall
> {code}
> which causes confusion when debugging.)
> Expected:
> {code:sql}
> SELECT id FROM alltypessmall
> {code}
> The reason is that there are multiple rewrites, one of which has a flaw.
> # Rewrite IFNULL to CASE
> # Rewrite NULL + 1 to CAST(NULL AS SMALLINT)
> # Try to rewrite CAST(NULL AS SMALLINT) IS NULL, but fail because CAST is not 
> a literal.
> The code in question in {{FoldConstantsRule.apply()}} is:
> {code:java}
> for (Expr child: expr.getChildren()) if (!child.isLiteral()) return expr;
> {code}
> In fact, this check is too restrictive. Need a new {{isLiteralLike()}} which 
> should work like {{IsNullLiteral()}}:
> * True if the node is a literal.
> * True if the node is a cast of a literal.
> (Can't change {{isLiteral()}} since there are places that assume that this 
> existing predicate indicates that the node is exactly a {{LiteralExpr}}.)
> Impala already has a predicate that does what is needed: {{isConstant()}}. 
> However, the code in {{FoldConstantsRule.apply()}} specifically excludes 
> calling it:
> {code:java}
> // Avoid calling Expr.isConstant() because that would lead to repeated 
> traversals
> // of the Expr tree. Assumes the bottom-up application of this rule. 
> Constant
> // children should have been folded at this point.
> {code}
> The new method solves the repeated traversal problem. With it, the test query 
> now simplifies to the expected result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7769) Handle CAST(NULL AS type) in rewrites

2018-10-29 Thread Paul Rogers (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated IMPALA-7769:

Summary: Handle CAST(NULL AS type) in rewrites  (was: Constant folding 
sometimes introduces an unnecessary cast)

> Handle CAST(NULL AS type) in rewrites
> -
>
> Key: IMPALA-7769
> URL: https://issues.apache.org/jira/browse/IMPALA-7769
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Consider the following query:
> {code:sql}
> SELECT IFNULL(NULL + 1, id) FROM alltypessmall;
> {code}
> Visualize the rewritten query after analysis (using the new 
> {{FullRewriteTest}}.) We get:
> {code:sql}
> SELECT CASE WHEN CAST(NULL AS INT) IS NULL THEN id ELSE NULL END FROM 
> alltypessmall
> {code}
> (This is what the expression actually contains. The {{toSql()}} method lies 
> and says that the statement is:
> {code:sql}
> SELECT CASE WHEN NULL IS NULL THEN id ELSE NULL END FROM alltypessmall
> {code}
> which causes confusion when debugging.)
> Expected:
> {code:sql}
> SELECT id FROM alltypessmall
> {code}
> The reason is that there are multiple rewrites, one of which has a flaw.
> # Rewrite IFNULL to CASE
> # Rewrite NULL + 1 to CAST(NULL AS SMALLINT)
> # Try to rewrite CAST(NULL AS SMALLINT) IS NULL, but fail because CAST is not 
> a literal.
> The code in question in {{FoldConstantsRule.apply()}} is:
> {code:java}
> for (Expr child: expr.getChildren()) if (!child.isLiteral()) return expr;
> {code}
> In fact, this check is too restrictive. Need a new {{isLiteralLike()}} which 
> should work like {{IsNullLiteral()}}:
> * True if the node is a literal.
> * True if the node is a cast of a literal.
> (Can't change {{isLiteral()}} since there are places that assume that this 
> existing predicate indicates that the node is exactly a {{LiteralExpr}}.)
> Impala already has a predicate that does what is needed: {{isConstant()}}. 
> However, the code in {{FoldConstantsRule.apply()}} specifically excludes 
> calling it:
> {code:java}
> // Avoid calling Expr.isConstant() because that would lead to repeated 
> traversals
> // of the Expr tree. Assumes the bottom-up application of this rule. 
> Constant
> // children should have been folded at this point.
> {code}
> The new method solves the repeated traversal problem. With it, the test query 
> now simplifies to the expected result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-7779) Parquet Scanner can write binary data into profile

2018-10-29 Thread Lars Volker (JIRA)

Lars Volker created IMPALA-7779:
---

 Summary: Parquet Scanner can write binary data into profile
 Key: IMPALA-7779
 URL: https://issues.apache.org/jira/browse/IMPALA-7779
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.1.0
Reporter: Lars Volker


In 
[hdfs-parquet-scanner.cc:1224|https://github.com/apache/impala/blob/master/be/src/exec/hdfs-parquet-scanner.cc#L1224]
 we log an invalid file version string. Whatever 4 bytes that that pointer 
points to will end up in the profile. These can be non-ascii characters, thus 
potentially breaking tools that parse the profiles and expect their content to 
be plain text. We should either remove the bytes from the message, or escape 
them as hex.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7586) Incorrect results when querying primary = "\"" in Kudu and HBase

2018-10-29 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7586:
--
Summary: Incorrect results when querying primary = "\"" in Kudu and HBase  
(was: Incorrect results when querying primary = "\"" in Kudu)

HBase has essentially the same bug so I'll track that here too.

> Incorrect results when querying primary = "\"" in Kudu and HBase
> 
>
> Key: IMPALA-7586
> URL: https://issues.apache.org/jira/browse/IMPALA-7586
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Will Berkeley
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: correctness, kudu
> Attachments: impalakudu_pred_bug.profile
>
>
> Version string from catalogd web ui:
> {noformat}
> catalogd version 3.1.0-cdh6.x-SNAPSHOT RELEASE (build 
> 8baac7f5849b6bacb02fedeb9b3fe2b2ee9450ee)
> {noformat}
> A reproduction script for the impala-shell:
> {noformat}
> create table test(name string, primary key(name) ) stored as kudu;
> insert into test values ("\"");
> -- Modified 1 row(s), 0 row error(s) in 4.01s
> -- row found in full table scan
> select * from test;
> -- Fetched 1 row(s) in 0.15s
> -- row not found on = predicate (pushed to kudu)
> select * from test where name="\"";
> -- Fetched 0 row(s) in 0.13s
> -- row found when predicate cannot be pushed to kudu
> select * from test where name like "\"";
> -- Fetched 1 row(s) in 0.13s
> {noformat}
> This was originally reported as KUDU-2575. I tried to reproduce directly 
> against Kudu using the python client but got the expected result.
> From the plan and profile, Impala is pushing down the predicate, but Kudu is 
> not being scanned, possibly because the Kudu client short-circuits the scan 
> as having no results based on the predicate Impala pushes down.
> {noformat}
> 00:SCAN KUDU [default.test]
>kudu predicates: name = '"'
>mem-estimate=0B mem-reservation=0B thread-reservation=1
>tuple-ids=0 row-size=15B cardinality=unavailable
>in pipelines: 00(GETNEXT)
> {noformat}
> {noformat}
> KUDU_SCAN_NODE (id=0)
>   - AverageScannerThreadConcurrency: 0.00 (0.0)
>   - InactiveTotalTime: 0ns (0)
>   - KuduRemoteScanTokens: 0 (0)
>   - MaterializeTupleTime(*): 0ns (0)
>   - NumScannerThreadMemUnavailable: 0 (0)
>   - NumScannerThreadsStarted: 1 (1)
>   - PeakMemoryUsage: 24.0 KiB (24576)
>   - PeakScannerThreadConcurrency: 1 (1)
>   - RowBatchBytesEnqueued: 16.0 KiB (16384)
>   - RowBatchQueueGetWaitTime: 0ns (0)
>   - RowBatchQueuePeakMemoryUsage: 0 B (0)
>   - RowBatchQueuePutWaitTime: 0ns (0)
>   - RowBatchesEnqueued: 1 (1)
>   - RowsRead: 0 (0)
> ===>  - RowsReturned: 0 (0)
>   - RowsReturnedRate: 0 per second (0)
>   - ScanRangesComplete: 1 (1)
>   - ScannerThreadsInvoluntaryContextSwitches: 0 (0)
>   - ScannerThreadsTotalWallClockTime: 0ns (0)
> - ScannerThreadsSysTime: 158.00us (158000)
> - ScannerThreadsUserTime: 0ns (0)
>   - ScannerThreadsVoluntaryContextSwitches: 2 (2)
> ===>  - TotalKuduScanRoundTrips: 0 (0)
>   - TotalTime: 1ms (172)
> {noformat}
> I also confirmed Kudu sees no scan from Impala for this query using the 
> /scans page of the tablet servers.
> Full profile attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-6042) Allow Impala shell to also use a global impalarc configuration

2018-10-29 Thread Ruslan Dautkhanov (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-6042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667434#comment-16667434
 ] 

Ruslan Dautkhanov commented on IMPALA-6042:
---

Another use case is to update PATH variable, to point to a Python2 home for 
example.

As impala-shell is not Python3-compatible (IMPALA-3343) and we're migrating all 
our applications to Python 3.

It would be great to have a way to globally update certain config variables and 
environment variables like PATH 
to have a workaround for this and some other issues.

Thank you.

> Allow Impala shell to also use a global impalarc configuration
> --
>
> Key: IMPALA-6042
> URL: https://issues.apache.org/jira/browse/IMPALA-6042
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Reporter: Balazs Jeszenszky
>Priority: Minor
>  Labels: newbie, shell, usability
>
> Currently, impalarc files can be specified on a per-user basis (stored in 
> ~/.impalarc), and they aren't created by default. 
> The Impala shell should pick up /etc/impalarc as well, in addition to the 
> user-specific configurations.
> The intent here is to allow a "global" configuration of the shell by a system 
> administrator with common options like:
> {code}
> --ssl
> -l
> -k
> -u 
> -i 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7778) RCFile parser ignores escape characters

2018-10-29 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667427#comment-16667427
 ] 

Tim Armstrong commented on IMPALA-7778:
---

I'm not assigning this a high priority because RCFile has low and declining 
usage.

> RCFile parser ignores escape characters
> ---
>
> Key: IMPALA-7778
> URL: https://issues.apache.org/jira/browse/IMPALA-7778
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.0, Impala 2.1, Impala 2.2, Impala 2.3.0, Impala 
> 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, 
> Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 3.1.0
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: correctness
>
> If an RCFile table has an escape character specified then it is ignored by 
> Impala.
> {code}
> -- HIVE
> CREATE TABLE rc_escape ( s string)
> ROW FORMAT delimited fields terminated by ','  escaped by '\\'
> STORED AS RCFILE;
> insert into rc_escape select '\\"';
> select length(s), s from rc_escape;
> -- +-+-+
> -- | c0  |  s  |
> -- +-+-+
> -- | 2   | \"  |
> -- +-+-+
> -- IMPALA
> invalidate metadata rc_escape;
> select length(s), s from rc_escape;
> -- +---+-+
> -- | length(s) | s   |
> -- +---+-+
> -- | 3 | \\" |
> -- +---+-+
> {code}
> I reproduced on my dev env with "beeline -n $USER -u 
> jdbc:hive2://localhost:11050/default" for Hive and "impala-shell" for Impala.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7764) Add test coverage for SentryProxy

2018-10-29 Thread Fredy Wijaya (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredy Wijaya updated IMPALA-7764:
-
Description: There are currently no unit tests in SentryProxy, which can 
lead to a number of bugs when changing code in SentryProxy. There aren't many 
end-to-end tests related to SentryProxy so that needs to be improved.  (was: 
There are currently no unit tests in SentryProxy, which can lead to a number of 
bugs when changing code in SentryProxy. There aren't many end-to-end tests 
related to SentryProxy that need to be improved.)

> Add test coverage for SentryProxy
> -
>
> Key: IMPALA-7764
> URL: https://issues.apache.org/jira/browse/IMPALA-7764
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Fredy Wijaya
>Assignee: Fredy Wijaya
>Priority: Major
>
> There are currently no unit tests in SentryProxy, which can lead to a number 
> of bugs when changing code in SentryProxy. There aren't many end-to-end tests 
> related to SentryProxy so that needs to be improved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7778) RCFile parser ignores escape characters

2018-10-29 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667423#comment-16667423
 ] 

Tim Armstrong commented on IMPALA-7778:
---

Found this while adding tests for IMPALA-7586

> RCFile parser ignores escape characters
> ---
>
> Key: IMPALA-7778
> URL: https://issues.apache.org/jira/browse/IMPALA-7778
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.0, Impala 2.1, Impala 2.2, Impala 2.3.0, Impala 
> 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, 
> Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 3.1.0
>Reporter: Tim Armstrong
>Priority: Major
>
> If an RCFile table has an escape character specified then it is ignored by 
> Impala.
> {code}
> -- HIVE
> CREATE TABLE rc_escape ( s string)
> ROW FORMAT delimited fields terminated by ','  escaped by '\\'
> STORED AS RCFILE;
> insert into rc_escape select '\\"';
> select length(s), s from rc_escape;
> -- +-+-+
> -- | c0  |  s  |
> -- +-+-+
> -- | 2   | \"  |
> -- +-+-+
> -- IMPALA
> invalidate metadata rc_escape;
> select length(s), s from rc_escape;
> -- +---+-+
> -- | length(s) | s   |
> -- +---+-+
> -- | 3 | \\" |
> -- +---+-+
> {code}
> I reproduced on my dev env with "beeline -n $USER -u 
> jdbc:hive2://localhost:11050/default" for Hive and "impala-shell" for Impala.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-7778) RCFile parser ignores escape characters

2018-10-29 Thread Tim Armstrong (JIRA)

Tim Armstrong created IMPALA-7778:
-

 Summary: RCFile parser ignores escape characters
 Key: IMPALA-7778
 URL: https://issues.apache.org/jira/browse/IMPALA-7778
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.0, Impala 2.11.0, Impala 2.10.0, Impala 2.9.0, 
Impala 2.8.0, Impala 2.7.0, Impala 2.6.0, Impala 2.4.0, Impala 2.5.0, Impala 
2.3.0, Impala 2.2, Impala 2.1, Impala 2.0, Impala 3.1.0
Reporter: Tim Armstrong


If an RCFile table has an escape character specified then it is ignored by 
Impala.

{code}
-- HIVE
CREATE TABLE rc_escape ( s string)
ROW FORMAT delimited fields terminated by ','  escaped by '\\'
STORED AS RCFILE;
insert into rc_escape select '\\"';
select length(s), s from rc_escape;
-- +-+-+
-- | c0  |  s  |
-- +-+-+
-- | 2   | \"  |
-- +-+-+

-- IMPALA
invalidate metadata rc_escape;
select length(s), s from rc_escape;
-- +---+-+
-- | length(s) | s   |
-- +---+-+
-- | 3 | \\" |
-- +---+-+
{code}

I reproduced on my dev env with "beeline -n $USER -u 
jdbc:hive2://localhost:11050/default" for Hive and "impala-shell" for Impala.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-7776) Fail queries where the sum of offset and limit exceed the max value of int64

2018-10-29 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-7776.
--
Resolution: Duplicate

> Fail queries where the sum of offset and limit exceed the max value of int64
> 
>
> Key: IMPALA-7776
> URL: https://issues.apache.org/jira/browse/IMPALA-7776
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> A follow up to IMPALA-5004. We should prevent users from running queries 
> where the sum of the offset and limit exceeds some threshold (e.g. 
> {{Long.MAX_VALUE}}). If a user tries to run this query the impalad will 
> crash, so we should reject queries that exceed the threshold.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7776) Fail queries where the sum of offset and limit exceed the max value of int64

2018-10-29 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667413#comment-16667413
 ] 

Sahil Takiar commented on IMPALA-7776:
--

Whoops, not sure how that happened. Closing this as a duplicate.

> Fail queries where the sum of offset and limit exceed the max value of int64
> 
>
> Key: IMPALA-7776
> URL: https://issues.apache.org/jira/browse/IMPALA-7776
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> A follow up to IMPALA-5004. We should prevent users from running queries 
> where the sum of the offset and limit exceeds some threshold (e.g. 
> {{Long.MAX_VALUE}}). If a user tries to run this query the impalad will 
> crash, so we should reject queries that exceed the threshold.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7764) Add test coverage for SentryProxy

2018-10-29 Thread Fredy Wijaya (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredy Wijaya updated IMPALA-7764:
-
Summary: Add test coverage for SentryProxy  (was: SentryProxy needs unit 
tests)

> Add test coverage for SentryProxy
> -
>
> Key: IMPALA-7764
> URL: https://issues.apache.org/jira/browse/IMPALA-7764
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Fredy Wijaya
>Assignee: Fredy Wijaya
>Priority: Major
>
> There are currently no unit tests in SentryProxy, which can lead to a number 
> of bugs when changing code in SentryProxy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7776) Fail queries where the sum of offset and limit exceed the max value of int64

2018-10-29 Thread Lars Volker (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667404#comment-16667404
 ] 

Lars Volker commented on IMPALA-7776:
-

Duplicate of IMPALA-?

> Fail queries where the sum of offset and limit exceed the max value of int64
> 
>
> Key: IMPALA-7776
> URL: https://issues.apache.org/jira/browse/IMPALA-7776
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> A follow up to IMPALA-5004. We should prevent users from running queries 
> where the sum of the offset and limit exceeds some threshold (e.g. 
> {{Long.MAX_VALUE}}). If a user tries to run this query the impalad will 
> crash, so we should reject queries that exceed the threshold.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-7758) chars_formats dependent tables are created using the wrong LOCATION

2018-10-29 Thread David Knupp (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Knupp resolved IMPALA-7758.
-
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> chars_formats dependent tables are created using the wrong LOCATION
> ---
>
> Key: IMPALA-7758
> URL: https://issues.apache.org/jira/browse/IMPALA-7758
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: David Knupp
>Assignee: David Knupp
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> In testdata/bin/load-dependent-tables.sql, the LOCATION clause when creating 
> the various chars_formats tables (e.g. text) use:
> {noformat}
> LOCATION '${hiveconf:hive.metastore.warehouse.dir}/chars_formats_text'
> {noformat}
> ...which resolves to {{/user/hive/warehouse/chars_formats_text}}
> However, the actual test warehouse root dir is {{/test-warehouse}}, not 
> {{/user/hive/warehouse}}.
> {noformat}
> $ hdfs dfs -cat /test-warehouse/chars_formats_text/chars-formats.txt
> abcde,88db79c70974e02deb3f01cfdcc5daae2078f21517d1021994f12685c0144addae3ce0dbd6a540b55b88af68486251fa6f0c8f9f94b3b1b4bc64c69714e281f388db79c70974,variable
>  length
> abc 
> ,8d3fffddf79e9a232ffd19f9ccaa4d6b37a6a243dbe0f23137b108a043d9da13121a9b505c804956b22e93c7f93969f4a7ba8ddea45bf4aab0bebc8f814e09918d3fffddf79e,abc
> abcdef,68f8c4575da360c32abb46689e58193a0eeaa905ae6f4a5e6c702a6ae1db35a6f86f8222b7a5489d96eb0466c755b677a64160d074617096a8c6279038bc720468f8c4575da3,b2fe9d4638503a57f93396098f24103a20588631727d0f0b5016715a3f6f2616628f09b1f63b23e484396edf949d9a1c307dbe11f23b971afd75b0f639d8a3f1
> {noformat}
> versus...
> {noformat}
> $ hdfs dfs -cat /user/hive/warehouse/chars_formats_text/chars-formats.txt
> cat: `/user/hive/warehouse/chars_formats_text/chars-formats.txt': No such 
> file or directory
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7777) Fail queries where the sum of offset and limit exceed the max value of int64

2018-10-29 Thread Sahil Takiar (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-:
-
Component/s: Frontend

> Fail queries where the sum of offset and limit exceed the max value of int64
> 
>
> Key: IMPALA-
> URL: https://issues.apache.org/jira/browse/IMPALA-
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> A follow up to IMPALA-5004. We should prevent users from running queries 
> where the sum of the offset and limit exceeds some threshold (e.g. 
> {{Long.MAX_VALUE}}). If a user tries to run this query the impalad will 
> crash, so we should reject queries that exceed the threshold.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-7777) Fail queries where the sum of offset and limit exceed the max value of int64

2018-10-29 Thread Sahil Takiar (JIRA)

Sahil Takiar created IMPALA-:


 Summary: Fail queries where the sum of offset and limit exceed the 
max value of int64
 Key: IMPALA-
 URL: https://issues.apache.org/jira/browse/IMPALA-
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar
Assignee: Sahil Takiar


A follow up to IMPALA-5004. We should prevent users from running queries where 
the sum of the offset and limit exceeds some threshold (e.g. 
{{Long.MAX_VALUE}}). If a user tries to run this query the impalad will crash, 
so we should reject queries that exceed the threshold.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-7776) Fail queries where the sum of offset and limit exceed the max value of int64

2018-10-29 Thread Sahil Takiar (JIRA)

Sahil Takiar created IMPALA-7776:


 Summary: Fail queries where the sum of offset and limit exceed the 
max value of int64
 Key: IMPALA-7776
 URL: https://issues.apache.org/jira/browse/IMPALA-7776
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar
Assignee: Sahil Takiar


A follow up to IMPALA-5004. We should prevent users from running queries where 
the sum of the offset and limit exceeds some threshold (e.g. 
{{Long.MAX_VALUE}}). If a user tries to run this query the impalad will crash, 
so we should reject queries that exceed the threshold.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7720) Allow longer timeout on expr-test

2018-10-29 Thread Jim Apple (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Apple updated IMPALA-7720:
--
Issue Type: Task  (was: Sub-task)
Parent: (was: IMPALA-5031)

> Allow longer timeout on expr-test
> -
>
> Key: IMPALA-7720
> URL: https://issues.apache.org/jira/browse/IMPALA-7720
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
> Environment: m4.4xl, ubuntu 16.04
>Reporter: Jim Apple
>Priority: Minor
>
> IMPALA-7581 set up a timeout for backend tests, but will ubsan on codegenned 
> code, expr-test takes longer than this.
> It would be useful if the value were configurable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7720) Allow longer timeout on expr-test

2018-10-29 Thread Jim Apple (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667054#comment-16667054
 ] 

Jim Apple commented on IMPALA-7720:
---

This is now configurable in {{UBSAN_FULL}} at compile time. Since it's no 
longer a subtask of IMPALA-5031, I'll move this to be an independent JIRA.

> Allow longer timeout on expr-test
> -
>
> Key: IMPALA-7720
> URL: https://issues.apache.org/jira/browse/IMPALA-7720
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
> Environment: m4.4xl, ubuntu 16.04
>Reporter: Jim Apple
>Priority: Minor
>
> IMPALA-7581 set up a timeout for backend tests, but will ubsan on codegenned 
> code, expr-test takes longer than this.
> It would be useful if the value were configurable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

91 matches

Mail list logo