[jira] [Resolved] (IMPALA-10225) Bump Impyla version

2020-10-09 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-10225.

Fix Version/s: Impala 4.0
   Resolution: Fixed

> Bump Impyla version
> ---
>
> Key: IMPALA-10225
> URL: https://issues.apache.org/jira/browse/IMPALA-10225
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 4.0
>
>
> There are a couple of new Impyla releases that we can test out in Impala's 
> end-to-end test environment - https://pypi.org/project/impyla/0.17a1/#history



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10225) Bump Impyla version

2020-10-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211523#comment-17211523
 ] 

ASF subversion and git services commented on IMPALA-10225:
--

Commit b8a2b754669eb7f8d164e8112e594ac413e436ef in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b8a2b75 ]

IMPALA-10225: bump impyla version to 0.17a1

Update a couple of tests with the new improved error messages.

Change-Id: I70a0e883275f3c29e2b01fd5bab7725857c8a1ed
Reviewed-on: http://gerrit.cloudera.org:8080/16562
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Bump Impyla version
> ---
>
> Key: IMPALA-10225
> URL: https://issues.apache.org/jira/browse/IMPALA-10225
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 4.0
>
>
> There are a couple of new Impyla releases that we can test out in Impala's 
> end-to-end test environment - https://pypi.org/project/impyla/0.17a1/#history



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9792) Split Kudu scan ranges into smaller chunks for greater paralellelism

2020-10-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211507#comment-17211507
 ] 

ASF subversion and git services commented on IMPALA-9792:
-

Commit 2fd6f5bc5aa6b50e36547e52657c1117637384b6 in impala's branch 
refs/heads/master from Bikramjeet Vig
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=2fd6f5b ]

IMPALA-9792: Add ability to split kudu scan ranges

This patch adds the ability to split kudu scan token via the provided
kudu java API. A query option "TARGETED_KUDU_SCAN_RANGE_LENGTH" has
been added to set the scan range length used in this implementation.

Potential benefit:
This helps increase parallelism during scanning which can
result in more efficient use of CPU with higher mt_dop.

Limitation:
- The scan range length sent to kudu is just a hint and does not
  guarantee that the token will be split at that limit.
- Comes at an added cost of an RPC to tablet server per token in
  order to split it. A slow tablet server which can already slow
  down scanning during execution can now also potentially slow
  down planning.
- Also adds the cost of an RPC per token to open a new scanner for
  it on the kudu side. Therefore, scanning many smaller split
  tokens can slow down scanning and we can also lose benefits
  of scanning a single large token sequentially with a single scanner.

Testing:
- Added an e2e test

Change-Id: Ia02fd94cc1d13c61bc6cb0765dd2cbe90e9a5ce8
Reviewed-on: http://gerrit.cloudera.org:8080/16385
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Split Kudu scan ranges into smaller chunks for greater paralellelism
> 
>
> Key: IMPALA-9792
> URL: https://issues.apache.org/jira/browse/IMPALA-9792
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Bikramjeet Vig
>Priority: Major
>  Labels: kudu, multithreading
>
> We currently use one thread to scan each tablet, which may underparallelise 
> queries in many cases. Kudu added an API in KUDU-2437 and KUDU-2670 to split 
> tokens at a finer granularity.
> See 
> https://github.com/apache/kudu/commit/22a6faa44364dec3a171ec79c15b814ad9277d8f#diff-a4afa9dba99c7612b2cb9176134ff2b0
> The major downside is that the planner has to do an extra RPC to a tserver 
> for each tablet being scanned in order to figure out key range splits. Maybe 
> we can tie this to mt_dop >= 2, or use some heuristics to avoid these RPCs 
> for smaller tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10230) column stats num_nulls less than -1

2020-10-09 Thread logan zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

logan zheng updated IMPALA-10230:
-
Description: 
when update impala 3.2.0(CDH6.3.2 ) to asf3.4.0 ,after when "increment stats 
default.test partition(xx=)":
{noformat}
ERROR: TableLoadingException: Failed to load metadata for table: default.test
CAUSED BY: IllegalStateException: ColumnStats{avgSize_=13.0, 
avgSerializedSize_=25.0, maxSize_=19, numDistinct_=12, numNulls_=-2}{noformat}
The table default.test already exists in impala 3.2.0, and has been running for 
a long time, and has also been added stats. 

 

 
  

  was:
when update impala 3.2.0(CDH6.3.2 ) to asf3.4.0 ,after when "increment stats 
default.test partition(xx=)":
{noformat}
ERROR: TableLoadingException: Failed to load metadata for table: default.test
CAUSED BY: IllegalStateException: ColumnStats{avgSize_=13.0, 
avgSerializedSize_=25.0, maxSize_=19, numDistinct_=12, numNulls_=-2}{noformat}
The table default.test already exists in impala 3.2.0, and has been running for 
a long time, and has also been added stats. 

 

 
 


> column stats num_nulls less than -1
> ---
>
> Key: IMPALA-10230
> URL: https://issues.apache.org/jira/browse/IMPALA-10230
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.4.0
>Reporter: logan zheng
>Priority: Critical
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> when update impala 3.2.0(CDH6.3.2 ) to asf3.4.0 ,after when "increment stats 
> default.test partition(xx=)":
> {noformat}
> ERROR: TableLoadingException: Failed to load metadata for table: default.test
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=13.0, 
> avgSerializedSize_=25.0, maxSize_=19, numDistinct_=12, numNulls_=-2}{noformat}
> The table default.test already exists in impala 3.2.0, and has been running 
> for a long time, and has also been added stats. 
>  
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9812) Remove --unlock_mt_dop and--mt_dop_auto_fallback

2020-10-09 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-9812:
--
Parent: IMPALA-8965
Issue Type: Sub-task  (was: Task)

> Remove --unlock_mt_dop and--mt_dop_auto_fallback 
> -
>
> Key: IMPALA-9812
> URL: https://issues.apache.org/jira/browse/IMPALA-9812
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Tim Armstrong
>Priority: Minor
>
> These flags will become ineffective when DML is supported. We should clean up 
> all references and move them to the flag graveyard.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10230) column stats num_nulls less than -1

2020-10-09 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211486#comment-17211486
 ] 

Tim Armstrong commented on IMPALA-10230:


[~logan zheng] also if you can give us standalone steps to reproduce the issue, 
that would probably help. I tried reproducing on my system but wasn't able to - 
I assume the data or partition layouts are somehow different.

> column stats num_nulls less than -1
> ---
>
> Key: IMPALA-10230
> URL: https://issues.apache.org/jira/browse/IMPALA-10230
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.4.0
>Reporter: logan zheng
>Priority: Critical
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> when update impala 3.2.0(CDH6.3.2 ) to asf3.4.0 ,after when "increment stats 
> default.test partition(xx=)":
> {noformat}
> ERROR: TableLoadingException: Failed to load metadata for table: default.test
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=13.0, 
> avgSerializedSize_=25.0, maxSize_=19, numDistinct_=12, numNulls_=-2}{noformat}
> The table default.test already exists in impala 3.2.0, and has been running 
> for a long time, and has also been added stats. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10230) column stats num_nulls less than -1

2020-10-09 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10230:
---
Target Version: Impala 4.0

> column stats num_nulls less than -1
> ---
>
> Key: IMPALA-10230
> URL: https://issues.apache.org/jira/browse/IMPALA-10230
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.4.0
>Reporter: logan zheng
>Priority: Critical
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> when update impala 3.2.0(CDH6.3.2 ) to asf3.4.0 ,after when "increment stats 
> default.test partition(xx=)":
> {noformat}
> ERROR: TableLoadingException: Failed to load metadata for table: default.test
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=13.0, 
> avgSerializedSize_=25.0, maxSize_=19, numDistinct_=12, numNulls_=-2}{noformat}
> The table default.test already exists in impala 3.2.0, and has been running 
> for a long time, and has also been added stats. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10230) column stats num_nulls less than -1

2020-10-09 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211481#comment-17211481
 ] 

Tim Armstrong commented on IMPALA-10230:


[~logan zheng] do you have the full IllegalStateException stacktrace from the 
catalogd logs?

> column stats num_nulls less than -1
> ---
>
> Key: IMPALA-10230
> URL: https://issues.apache.org/jira/browse/IMPALA-10230
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.4.0
>Reporter: logan zheng
>Priority: Critical
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> when update impala 3.2.0(CDH6.3.2 ) to asf3.4.0 ,after when "increment stats 
> default.test partition(xx=)":
> {noformat}
> ERROR: TableLoadingException: Failed to load metadata for table: default.test
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=13.0, 
> avgSerializedSize_=25.0, maxSize_=19, numDistinct_=12, numNulls_=-2}{noformat}
> The table default.test already exists in impala 3.2.0, and has been running 
> for a long time, and has also been added stats. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8751) Kudu tables cannot be found after created

2020-10-09 Thread Grant Henke (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211428#comment-17211428
 ] 

Grant Henke commented on IMPALA-8751:
-

A Kudu side fix was merged and could be pulled in to fix these test failures:
https://github.com/apache/kudu/commit/6b20440f4c51a6b69c1382db51139bf8d3467b05

> Kudu tables cannot be found after created
> -
>
> Key: IMPALA-8751
> URL: https://issues.apache.org/jira/browse/IMPALA-8751
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.3.0
>Reporter: Yongzhi Chen
>Priority: Major
>
> For example in some kudu integration test as:
> TestKuduHMSIntegration.test_drop_db_cascade in  custom_cluster/test_kudu.py
> It failed with:
> custom_cluster/test_kudu.py:239: in test_drop_db_cascade
> assert not kudu_client.table_exists(kudu_table.name)
> /usr/lib/python2.7/contextlib.py:35: in __exit__
> self.gen.throw(type, value, traceback)
> common/kudu_test_suite.py:165: in temp_kudu_table
> kudu.delete_table(name)
> kudu/client.pyx:392: in kudu.client.Client.delete_table (kudu/client.cpp:7106)
> ???
> kudu/errors.pyx:56: in kudu.errors.check_status (kudu/errors.cpp:904)
> ???
> E   KuduNotFound: failed to drop Hive Metastore table: TException - service 
> has thrown: NoSuchObjectException(message=s7mo1z)
> And when trying to add default capabilities to kudu tables, it is sometime 
> effective, sometimes not:
> For example after enable default kudu OBJCAPABILITIES,
> ./run-tests.py metadata/test_ddl.py -k "create_kudu"  will succeed while same 
> test in
> ./run-tests.py custom_cluster/test_kudu.py :
> {noformat}
>  TestKuduHMSIntegration.test_create_managed_kudu_tables[protocol: beeswax | 
> exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> text/none] 
> custom_cluster/test_kudu.py:147: in test_create_managed_kudu_tables
> self.run_test_case('QueryTest/kudu_create', vector, 
> use_db=unique_database)
> common/impala_test_suite.py:563: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:500: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:798: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:184: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:187: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:362: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:356: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:519: in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Write not supported. Table 
> test_create_managed_kudu_tables_a8d11828.add  access type is: READONLY
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-9728) Data load failed with EOFException writing functional_orc_def.complextypestbl_medium

2020-10-09 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211388#comment-17211388
 ] 

Tim Armstrong edited comment on IMPALA-9728 at 10/9/20, 8:50 PM:
-

Hit again here 
https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/12331/artifact/Impala/logs_static/logs/data_loading/sql/functional/load-functional-query-exhaustive-hive-generated-orc-def-block.sql.log/*view*/


was (Author: tarmstrong):
Hit again here

> Data load failed with EOFException writing 
> functional_orc_def.complextypestbl_medium
> 
>
> Key: IMPALA-9728
> URL: https://issues.apache.org/jira/browse/IMPALA-9728
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: flaky
> Attachments: 
> load-functional-query-exhaustive-hive-generated-orc-def-block.sql.log, 
> load-functional-query-exhaustive-hive-generated-orc-def-block.sql.log
>
>
> {noformat}
> INFO  : Compiling 
> command(queryId=ubuntu_20200506012349_3c5cedc8-49d6-4e72-b4a5-e06cb82d1707): 
> INSERT OVERWRITE TABLE functional_orc_def.complextypestbl_medium SELECT c.* 
> FROM functional_parquet.complextypestbl c join functional.alltypes sort by id
> INFO  : Warning: Map Join MAPJOIN[9][bigTable=alltypes] in task 'Map 2' is a 
> cross product
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:c.id, 
> type:bigint, comment:null), FieldSchema(name:c.int_array, type:array, 
> comment:null), FieldSchema(name:c.int_array_array, type:array>, 
> comment:null), FieldSchema(name:c.int_map, type:map, 
> comment:null), FieldSchema(name:c.int_map_array, type:array>, 
> comment:null), FieldSchema(name:c.nested_struct, 
> type:struct,c:struct>>>,g:map,
>  comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=ubuntu_20200506012349_3c5cedc8-49d6-4e72-b4a5-e06cb82d1707); 
> Time taken: 0.063 seconds
> INFO  : Executing 
> command(queryId=ubuntu_20200506012349_3c5cedc8-49d6-4e72-b4a5-e06cb82d1707): 
> INSERT OVERWRITE TABLE functional_orc_def.complextypestbl_medium SELECT c.* 
> FROM functional_parquet.complextypestbl c join functional.alltypes sort by id
> INFO  : Query ID = ubuntu_20200506012349_3c5cedc8-49d6-4e72-b4a5-e06cb82d1707
> INFO  : Total jobs = 1
> INFO  : Launching Job 1 out of 1
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
> INFO  : Subscribed to counters: [] for queryId: 
> ubuntu_20200506012349_3c5cedc8-49d6-4e72-b4a5-e06cb82d1707
> INFO  : Session is already open
> INFO  : Dag name: INSERT OVERWRITE TABLE functional_orc_d...id (Stage-1)
> INFO  : Setting tez.task.scale.memory.reserve-fraction to 0.3001192092896
> INFO  : Status: Running (Executing on YARN cluster with App id 
> application_1588725973781_0033)
> ...
> Getting log thread is interrupted, since query is done!
> ERROR : Job Commit failed with exception 
> 'org.apache.hadoop.hive.ql.metadata.HiveException(java.io.EOFException)'
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1470)
>   at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
>   at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:620)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:335)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359)
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> 

[jira] [Commented] (IMPALA-9728) Data load failed with EOFException writing functional_orc_def.complextypestbl_medium

2020-10-09 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211388#comment-17211388
 ] 

Tim Armstrong commented on IMPALA-9728:
---

Hit again here

> Data load failed with EOFException writing 
> functional_orc_def.complextypestbl_medium
> 
>
> Key: IMPALA-9728
> URL: https://issues.apache.org/jira/browse/IMPALA-9728
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: flaky
> Attachments: 
> load-functional-query-exhaustive-hive-generated-orc-def-block.sql.log, 
> load-functional-query-exhaustive-hive-generated-orc-def-block.sql.log
>
>
> {noformat}
> INFO  : Compiling 
> command(queryId=ubuntu_20200506012349_3c5cedc8-49d6-4e72-b4a5-e06cb82d1707): 
> INSERT OVERWRITE TABLE functional_orc_def.complextypestbl_medium SELECT c.* 
> FROM functional_parquet.complextypestbl c join functional.alltypes sort by id
> INFO  : Warning: Map Join MAPJOIN[9][bigTable=alltypes] in task 'Map 2' is a 
> cross product
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:c.id, 
> type:bigint, comment:null), FieldSchema(name:c.int_array, type:array, 
> comment:null), FieldSchema(name:c.int_array_array, type:array>, 
> comment:null), FieldSchema(name:c.int_map, type:map, 
> comment:null), FieldSchema(name:c.int_map_array, type:array>, 
> comment:null), FieldSchema(name:c.nested_struct, 
> type:struct,c:struct>>>,g:map,
>  comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=ubuntu_20200506012349_3c5cedc8-49d6-4e72-b4a5-e06cb82d1707); 
> Time taken: 0.063 seconds
> INFO  : Executing 
> command(queryId=ubuntu_20200506012349_3c5cedc8-49d6-4e72-b4a5-e06cb82d1707): 
> INSERT OVERWRITE TABLE functional_orc_def.complextypestbl_medium SELECT c.* 
> FROM functional_parquet.complextypestbl c join functional.alltypes sort by id
> INFO  : Query ID = ubuntu_20200506012349_3c5cedc8-49d6-4e72-b4a5-e06cb82d1707
> INFO  : Total jobs = 1
> INFO  : Launching Job 1 out of 1
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
> INFO  : Subscribed to counters: [] for queryId: 
> ubuntu_20200506012349_3c5cedc8-49d6-4e72-b4a5-e06cb82d1707
> INFO  : Session is already open
> INFO  : Dag name: INSERT OVERWRITE TABLE functional_orc_d...id (Stage-1)
> INFO  : Setting tez.task.scale.memory.reserve-fraction to 0.3001192092896
> INFO  : Status: Running (Executing on YARN cluster with App id 
> application_1588725973781_0033)
> ...
> Getting log thread is interrupted, since query is done!
> ERROR : Job Commit failed with exception 
> 'org.apache.hadoop.hive.ql.metadata.HiveException(java.io.EOFException)'
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1470)
>   at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
>   at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:620)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:335)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359)
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at 

[jira] [Updated] (IMPALA-9728) Data load failed with EOFException writing functional_orc_def.complextypestbl_medium

2020-10-09 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-9728:
--
Attachment: 
load-functional-query-exhaustive-hive-generated-orc-def-block.sql.log

> Data load failed with EOFException writing 
> functional_orc_def.complextypestbl_medium
> 
>
> Key: IMPALA-9728
> URL: https://issues.apache.org/jira/browse/IMPALA-9728
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: flaky
> Attachments: 
> load-functional-query-exhaustive-hive-generated-orc-def-block.sql.log, 
> load-functional-query-exhaustive-hive-generated-orc-def-block.sql.log
>
>
> {noformat}
> INFO  : Compiling 
> command(queryId=ubuntu_20200506012349_3c5cedc8-49d6-4e72-b4a5-e06cb82d1707): 
> INSERT OVERWRITE TABLE functional_orc_def.complextypestbl_medium SELECT c.* 
> FROM functional_parquet.complextypestbl c join functional.alltypes sort by id
> INFO  : Warning: Map Join MAPJOIN[9][bigTable=alltypes] in task 'Map 2' is a 
> cross product
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:c.id, 
> type:bigint, comment:null), FieldSchema(name:c.int_array, type:array, 
> comment:null), FieldSchema(name:c.int_array_array, type:array>, 
> comment:null), FieldSchema(name:c.int_map, type:map, 
> comment:null), FieldSchema(name:c.int_map_array, type:array>, 
> comment:null), FieldSchema(name:c.nested_struct, 
> type:struct,c:struct>>>,g:map,
>  comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=ubuntu_20200506012349_3c5cedc8-49d6-4e72-b4a5-e06cb82d1707); 
> Time taken: 0.063 seconds
> INFO  : Executing 
> command(queryId=ubuntu_20200506012349_3c5cedc8-49d6-4e72-b4a5-e06cb82d1707): 
> INSERT OVERWRITE TABLE functional_orc_def.complextypestbl_medium SELECT c.* 
> FROM functional_parquet.complextypestbl c join functional.alltypes sort by id
> INFO  : Query ID = ubuntu_20200506012349_3c5cedc8-49d6-4e72-b4a5-e06cb82d1707
> INFO  : Total jobs = 1
> INFO  : Launching Job 1 out of 1
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
> INFO  : Subscribed to counters: [] for queryId: 
> ubuntu_20200506012349_3c5cedc8-49d6-4e72-b4a5-e06cb82d1707
> INFO  : Session is already open
> INFO  : Dag name: INSERT OVERWRITE TABLE functional_orc_d...id (Stage-1)
> INFO  : Setting tez.task.scale.memory.reserve-fraction to 0.3001192092896
> INFO  : Status: Running (Executing on YARN cluster with App id 
> application_1588725973781_0033)
> ...
> Getting log thread is interrupted, since query is done!
> ERROR : Job Commit failed with exception 
> 'org.apache.hadoop.hive.ql.metadata.HiveException(java.io.EOFException)'
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1470)
>   at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
>   at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:620)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:335)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359)
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at 

[jira] [Assigned] (IMPALA-9485) Enable file handle cache for EC files

2020-10-09 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned IMPALA-9485:


Assignee: Sahil Takiar

> Enable file handle cache for EC files
> -
>
> Key: IMPALA-9485
> URL: https://issues.apache.org/jira/browse/IMPALA-9485
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> Now that HDFS-14308 has been fixed, we can re-enable the file handle cache 
> for EC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9485) Enable file handle cache for EC files

2020-10-09 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-9485.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Enable file handle cache for EC files
> -
>
> Key: IMPALA-9485
> URL: https://issues.apache.org/jira/browse/IMPALA-9485
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: Impala 4.0
>
>
> Now that HDFS-14308 has been fixed, we can re-enable the file handle cache 
> for EC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10230) column stats num_nulls less than -1

2020-10-09 Thread logan zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

logan zheng updated IMPALA-10230:
-
Description: 
when update impala 3.2.0(CDH6.3.2 ) to asf3.4.0 ,after when "increment stats 
default.test partition(xx=)":
{noformat}
ERROR: TableLoadingException: Failed to load metadata for table: default.test
CAUSED BY: IllegalStateException: ColumnStats{avgSize_=13.0, 
avgSerializedSize_=25.0, maxSize_=19, numDistinct_=12, numNulls_=-2}{noformat}
The table default.test already exists in impala 3.2.0, and has been running for 
a long time, and has also been added stats. 

 

 
 

  was:
when update impala 3.2.0(CDH6.3.2 ) to asf3.4.0 ,after executing "increment 
stats default.test partition(xx=)":

ERROR: TableLoadingException: Failed to load metadata for table: default.test

CAUSED BY: IllegalStateException: ColumnStats\{avgSize_=13.0, 
avgSerializedSize_=25.0, maxSize_=19, numDistinct_=12, numNulls_=-2}


> column stats num_nulls less than -1
> ---
>
> Key: IMPALA-10230
> URL: https://issues.apache.org/jira/browse/IMPALA-10230
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.4.0
>Reporter: logan zheng
>Priority: Critical
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> when update impala 3.2.0(CDH6.3.2 ) to asf3.4.0 ,after when "increment stats 
> default.test partition(xx=)":
> {noformat}
> ERROR: TableLoadingException: Failed to load metadata for table: default.test
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=13.0, 
> avgSerializedSize_=25.0, maxSize_=19, numDistinct_=12, numNulls_=-2}{noformat}
> The table default.test already exists in impala 3.2.0, and has been running 
> for a long time, and has also been added stats. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10230) column stats num_nulls less than -1

2020-10-09 Thread logan zheng (Jira)
logan zheng created IMPALA-10230:


 Summary: column stats num_nulls less than -1
 Key: IMPALA-10230
 URL: https://issues.apache.org/jira/browse/IMPALA-10230
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Affects Versions: Impala 3.4.0
Reporter: logan zheng


when update impala 3.2.0(CDH6.3.2 ) to asf3.4.0 ,after executing "increment 
stats default.test partition(xx=)":

ERROR: TableLoadingException: Failed to load metadata for table: default.test

CAUSED BY: IllegalStateException: ColumnStats\{avgSize_=13.0, 
avgSerializedSize_=25.0, maxSize_=19, numDistinct_=12, numNulls_=-2}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8304) Generate JUnitXML symptom for compilation/CMake failures

2020-10-09 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-8304.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Generate JUnitXML symptom for compilation/CMake failures
> 
>
> Key: IMPALA-8304
> URL: https://issues.apache.org/jira/browse/IMPALA-8304
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 3.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 4.0
>
>
> When compilation or another CMake command fails, it should generate JUnitXML 
> containing the output of the command that failed to allow faster triage. All 
> of the information is currently available in the Jenkins log, but due to the 
> parallel nature of the build, the failure can be buried in logging. Some 
> builds are extremely verbose (e.g. clang tidy) and can hide errors in 
> megabytes of logs.
> This should apply to both frontend and backend compilation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8178) Tests failing with “Could not allocate memory while trying to increase reservation” on EC filesystem

2020-10-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211130#comment-17211130
 ] 

ASF subversion and git services commented on IMPALA-8178:
-

Commit 3382759664fe99317f27200b3e52a1e967f0a042 in impala's branch 
refs/heads/master from Sahil Takiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3382759 ]

IMPALA-9485: Enable file handle cache for EC files

This is essentially a revert of IMPALA-8178. HDFS-14308 added
CanUnbuffer support to the EC input stream APIs in the HDFS client lib.
This patch enables file handle caching for EC files.

Testing:
* Ran core tests against an EC build (ERASURE_CODING=true)

Change-Id: Ieb455eeed02a229a4559d3972dfdac7df32cdb99
Reviewed-on: http://gerrit.cloudera.org:8080/16567
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Tests failing with “Could not allocate memory while trying to increase 
> reservation” on EC filesystem
> 
>
> Key: IMPALA-8178
> URL: https://issues.apache.org/jira/browse/IMPALA-8178
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Andrew Sherman
>Assignee: Joe McDonnell
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.2.0
>
>
> In tests run against an Erasure Coding filesystem, multiple tests failed with 
> memory allocation errors.
> In total 10 tests failed:
>  * query_test.test_scanners.TestParquet.test_decimal_encodings
>  * query_test.test_scanners.TestTpchScanRangeLengths.test_tpch_scan_ranges
>  * query_test.test_exprs.TestExprs.test_exprs [enable_expr_rewrites: 0]
>  * query_test.test_exprs.TestExprs.test_exprs [enable_expr_rewrites: 1]
>  * query_test.test_hbase_queries.TestHBaseQueries.test_hbase_scan_node
>  * query_test.test_scanners.TestParquet.test_def_levels
>  * 
> query_test.test_scanners.TestTextSplitDelimiters.test_text_split_across_buffers_delimiterquery_test.test_hbase_queries.TestHBaseQueries.test_hbase_filters
>  * query_test.test_hbase_queries.TestHBaseQueries.test_hbase_inline_views
>  * query_test.test_hbase_queries.TestHBaseQueries.test_hbase_top_n
> The first failure looked like this on the client side:
> {quote}
> F 
> query_test/test_scanners.py::TestParquet::()::test_decimal_encodings[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'debug_action': 
> '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@0.5', 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]
>  query_test/test_scanners.py:717: in test_decimal_encodings
>  self.run_test_case('QueryTest/parquet-decimal-formats', vector, 
> unique_database)
>  common/impala_test_suite.py:472: in run_test_case
>  result = self.__execute_query(target_impalad_client, query, user=user)
>  common/impala_test_suite.py:699: in __execute_query
>  return impalad_client.execute(query, user=user)
>  common/impala_connection.py:174: in execute
>  return self.__beeswax_client.execute(sql_stmt, user=user)
>  beeswax/impala_beeswax.py:183: in execute
>  handle = self.__execute_query(query_string.strip(), user=user)
>  beeswax/impala_beeswax.py:360: in __execute_query
>  self.wait_for_finished(handle)
>  beeswax/impala_beeswax.py:381: in wait_for_finished
>  raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
>  E   ImpalaBeeswaxException: ImpalaBeeswaxException:
>  EQuery aborted:ExecQueryFInstances rpc 
> query_id=6e44c3c949a31be2:f973c7ff failed: Failed to get minimum 
> memory reservation of 8.00 KB on daemon xxx.com:22001 for query 
> 6e44c3c949a31be2:f973c7ff due to following error: Memory limit 
> exceeded: Could not allocate memory while trying to increase reservation.
>  E   Query(6e44c3c949a31be2:f973c7ff) could not allocate 8.00 KB 
> without exceeding limit.
>  E   Error occurred on backend xxx.com:22001
>  E   Memory left in process limit: 1.19 GB
>  E   Query(6e44c3c949a31be2:f973c7ff): Reservation=0 
> ReservationLimit=9.60 GB OtherMemory=0 Total=0 Peak=0
>  E   Memory is likely oversubscribed. Reducing query concurrency or 
> configuring admission control may help avoid this error.
> {quote}
> On the server side log:
> {quote}
> I0207 18:25:19.329311  5562 impala-server.cc:1063] 
> 6e44c3c949a31be2:f973c7ff] Registered query 
> query_id=6e44c3c949a31be2:f973c7ff 
> session_id=93497065f69e9d01:8a3bd06faff3da5
> I0207 18:25:19.329434  5562 Frontend.java:1242] 
> 6e44c3c949a31be2:f973c7ff] Analyzing query: select score from 
> decimal_stored_as_int32
> I0207 18:25:19.329583  5562 FeSupport.java:285] 
> 

[jira] [Commented] (IMPALA-9485) Enable file handle cache for EC files

2020-10-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211129#comment-17211129
 ] 

ASF subversion and git services commented on IMPALA-9485:
-

Commit 3382759664fe99317f27200b3e52a1e967f0a042 in impala's branch 
refs/heads/master from Sahil Takiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3382759 ]

IMPALA-9485: Enable file handle cache for EC files

This is essentially a revert of IMPALA-8178. HDFS-14308 added
CanUnbuffer support to the EC input stream APIs in the HDFS client lib.
This patch enables file handle caching for EC files.

Testing:
* Ran core tests against an EC build (ERASURE_CODING=true)

Change-Id: Ieb455eeed02a229a4559d3972dfdac7df32cdb99
Reviewed-on: http://gerrit.cloudera.org:8080/16567
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Enable file handle cache for EC files
> -
>
> Key: IMPALA-9485
> URL: https://issues.apache.org/jira/browse/IMPALA-9485
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Priority: Major
>
> Now that HDFS-14308 has been fixed, we can re-enable the file handle cache 
> for EC files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8304) Generate JUnitXML symptom for compilation/CMake failures

2020-10-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211131#comment-17211131
 ] 

ASF subversion and git services commented on IMPALA-8304:
-

Commit 1f3160b4c07c8a5a146067222e6591d44bfa3c7d in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1f3160b ]

IMPALA-8304: Generate JUnitXML if a command run by CMake fails

This wraps each command executed by CMake with a wrapper that
generates a JUnitXML file if the command fails. If the command
succeeds, the wrapper does nothing. The wrapper applies to C++
compilation, linking, and custom shell commands (such as
building the frontend via maven). It does not apply to failures
coming from CMake itself. It can be disabled by setting
DISABLE_CMAKE_JUNITXML.

The command output can include Unicode (e.g. smart quotes for
g++), so this also updates generate_junitxml.py to handle
Unicode.

The wrapper interacts poorly with add_custom_command/add_custom_target
CMake commands that use 'cd directory && do_something', so this
switches those locations (in /docker) to use CMake's WORKING_DIRECTORY.

Testing:
 - Verified it does not impact a successful build (including with
   ccache and/or distcc).
 - Verified it generates JUnitXML for C++ and Java compilation
   failures.
 - Verified it doesn't use the wrapper when DISABLE_CMAKE_JUNITXML
   is set.

Change-Id: If71f2faf3ab5052b56b38f1b291fee53c390ce23
Reviewed-on: http://gerrit.cloudera.org:8080/12668
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 


> Generate JUnitXML symptom for compilation/CMake failures
> 
>
> Key: IMPALA-8304
> URL: https://issues.apache.org/jira/browse/IMPALA-8304
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 3.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
>
> When compilation or another CMake command fails, it should generate JUnitXML 
> containing the output of the command that failed to allow faster triage. All 
> of the information is currently available in the Jenkins log, but due to the 
> parallel nature of the build, the failure can be buried in logging. Some 
> builds are extremely verbose (e.g. clang tidy) and can hide errors in 
> megabytes of logs.
> This should apply to both frontend and backend compilation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10171) Create query options for convert_legacy_hive_parquet_utc_timestamps and use_local_tz_for_unix_timestamp_conversions

2020-10-09 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-10171.
--
Resolution: Implemented

> Create query options for convert_legacy_hive_parquet_utc_timestamps and 
> use_local_tz_for_unix_timestamp_conversions
> ---
>
> Key: IMPALA-10171
> URL: https://issues.apache.org/jira/browse/IMPALA-10171
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Csaba Ringhofer
>Priority: Major
>
> convert_legacy_hive_parquet_utc_timestamps and 
> use_local_tz_for_unix_timestamp_conversions are flags that can be set on all 
> coordinators and executors. Possible inconsistencies could be avoided by 
> always using the flag's value on the coordinator, or adding a query options 
> for these settings.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-10224) Add startup flag not to expose debug web url via PingImpalaService/PingImpalaHS2Service RPC

2020-10-09 Thread Attila Jeges (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-10224 started by Attila Jeges.
-
> Add startup flag not to expose debug web url via 
> PingImpalaService/PingImpalaHS2Service RPC
> ---
>
> Key: IMPALA-10224
> URL: https://issues.apache.org/jira/browse/IMPALA-10224
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: Attila Jeges
>Assignee: Attila Jeges
>Priority: Major
>
> PingImpalaService/PingImpalaHS2Service RPC calls expose the coordinator's 
> debug web url to clients like impala shell.  Since the debug web UI is not 
> something that end-users will necessarily have access to, we should have a 
> server option to send an empty string instead of the real url to the impala 
> client signalling that the debug web ui is not available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9952) Invalid offset index in Parquet file

2020-10-09 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-9952.
---
Fix Version/s: Impala 4.0
   Resolution: Fixed

>  Invalid offset index in Parquet file
> -
>
> Key: IMPALA-9952
> URL: https://issues.apache.org/jira/browse/IMPALA-9952
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: guojingfeng
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: Parquet
> Fix For: Impala 4.0
>
>
> When reading parquet file in impala 3.4, encountered the following error:
> {code:java}
> I0714 16:11:48.307806 1075820 runtime-state.cc:207] 
> 8c43203adb2d4fc8:0478df9b018b] Error from query 
> 8c43203adb2d4fc8:0478df9b: Invalid offset index in Parquet file 
> hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq.
> I0714 16:11:48.834901 1075838 status.cc:126] 
> 8c43203adb2d4fc8:0478df9b02c0] Invalid offset index in Parquet file 
> hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq.
> @   0xbf4ef9
> @  0x1748c41
> @  0x174e170
> @  0x1750e58
> @  0x17519f0
> @  0x1748559
> @  0x1510b41
> @  0x1512c8f
> @  0x137488a
> @  0x1375759
> @  0x1b48a19
> @ 0x7f34509f5e24
> @ 0x7f344d5ed35c
> I0714 16:11:48.835763 1075838 runtime-state.cc:207] 
> 8c43203adb2d4fc8:0478df9b02c0] Error from query 
> 8c43203adb2d4fc8:0478df9b: Invalid offset index in Parquet file 
> hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq.
> I0714 16:11:48.893784 1075820 status.cc:126] 
> 8c43203adb2d4fc8:0478df9b018b] Top level rows aren't in sync during page 
> filtering in file 
> hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq.
> @   0xbf4ef9
> @  0x1749104
> @  0x17494cc
> @  0x1751aee
> @  0x1748559
> @  0x1510b41
> @  0x1512c8f
> @  0x137488a
> @  0x1375759
> @  0x1b48a19
> @ 0x7f34509f5e24
> @ 0x7f344d5ed35c
> {code}
>  Corresponding source code:
> {code:java}
> Status HdfsParquetScanner::CheckPageFiltering() {
>   if (candidate_ranges_.empty() || scalar_readers_.empty()) return 
> Status::OK();  int64_t current_row = scalar_readers_[0]->LastProcessedRow();
>   for (int i = 1; i < scalar_readers_.size(); ++i) {
> if (current_row != scalar_readers_[i]->LastProcessedRow()) {
>   DCHECK(false);
>   return Status(Substitute(
>   "Top level rows aren't in sync during page filtering in file $0.", 
> filename()));
> }
>   }
>   return Status::OK();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9952) Invalid offset index in Parquet file

2020-10-09 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210740#comment-17210740
 ] 

Zoltán Borók-Nagy commented on IMPALA-9952:
---

Thanks for the verification, [~guojingfeng]! I think I'm closing this Jira as 
the patch resolves the crash mentioned in the description.

We can use IMPALA-10186 to track the write side problem.

>  Invalid offset index in Parquet file
> -
>
> Key: IMPALA-9952
> URL: https://issues.apache.org/jira/browse/IMPALA-9952
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: guojingfeng
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: Parquet
>
> When reading parquet file in impala 3.4, encountered the following error:
> {code:java}
> I0714 16:11:48.307806 1075820 runtime-state.cc:207] 
> 8c43203adb2d4fc8:0478df9b018b] Error from query 
> 8c43203adb2d4fc8:0478df9b: Invalid offset index in Parquet file 
> hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq.
> I0714 16:11:48.834901 1075838 status.cc:126] 
> 8c43203adb2d4fc8:0478df9b02c0] Invalid offset index in Parquet file 
> hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq.
> @   0xbf4ef9
> @  0x1748c41
> @  0x174e170
> @  0x1750e58
> @  0x17519f0
> @  0x1748559
> @  0x1510b41
> @  0x1512c8f
> @  0x137488a
> @  0x1375759
> @  0x1b48a19
> @ 0x7f34509f5e24
> @ 0x7f344d5ed35c
> I0714 16:11:48.835763 1075838 runtime-state.cc:207] 
> 8c43203adb2d4fc8:0478df9b02c0] Error from query 
> 8c43203adb2d4fc8:0478df9b: Invalid offset index in Parquet file 
> hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq.
> I0714 16:11:48.893784 1075820 status.cc:126] 
> 8c43203adb2d4fc8:0478df9b018b] Top level rows aren't in sync during page 
> filtering in file 
> hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq.
> @   0xbf4ef9
> @  0x1749104
> @  0x17494cc
> @  0x1751aee
> @  0x1748559
> @  0x1510b41
> @  0x1512c8f
> @  0x137488a
> @  0x1375759
> @  0x1b48a19
> @ 0x7f34509f5e24
> @ 0x7f344d5ed35c
> {code}
>  Corresponding source code:
> {code:java}
> Status HdfsParquetScanner::CheckPageFiltering() {
>   if (candidate_ranges_.empty() || scalar_readers_.empty()) return 
> Status::OK();  int64_t current_row = scalar_readers_[0]->LastProcessedRow();
>   for (int i = 1; i < scalar_readers_.size(); ++i) {
> if (current_row != scalar_readers_[i]->LastProcessedRow()) {
>   DCHECK(false);
>   return Status(Substitute(
>   "Top level rows aren't in sync during page filtering in file $0.", 
> filename()));
> }
>   }
>   return Status::OK();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10164) Support HadoopCatalog for Iceberg table

2020-10-09 Thread WangSheng (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng resolved IMPALA-10164.

Resolution: Fixed

> Support HadoopCatalog for Iceberg table
> ---
>
> Key: IMPALA-10164
> URL: https://issues.apache.org/jira/browse/IMPALA-10164
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> We just supported HadoopTable api to create Iceberg table in Impala now, it's 
> apparently not enough, so we preparing to support HadoopCatalog. The main 
> design is to add a new table property named 'iceberg.catalog', and default 
> value is 'hadoop.tables', we implement 'hadoop.catalog' to supported 
> HadoopCatalog api. We may even support 'hive.catalog' in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-10159) Support ORC file format for Iceberg table

2020-10-09 Thread WangSheng (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-10159 started by WangSheng.
--
> Support ORC file format for Iceberg table
> -
>
> Key: IMPALA-10159
> URL: https://issues.apache.org/jira/browse/IMPALA-10159
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> Impala can query PARQUET file format for Iceberg Table now. Since have 
> already do some work in IMPALA-9741, we can continue ORC file format 
> supported work in this jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9741) Support query iceberg table by impala

2020-10-09 Thread WangSheng (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng resolved IMPALA-9741.
---
Resolution: Fixed

> Support query iceberg table by impala
> -
>
> Key: IMPALA-9741
> URL: https://issues.apache.org/jira/browse/IMPALA-9741
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>  Labels: impala-iceberg
> Attachments: select-iceberg.jpg
>
>
> Since we have submit an patch of supporting create iceberg table by impala in 
> IMPALA-9688, we are preparing to implement iceberg table query by impala. But 
> we need to read the impala and iceberg code  deeply to determine how to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-9967) Scan orc failed when table contains timestamp column

2020-10-09 Thread WangSheng (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng reassigned IMPALA-9967:
-

Assignee: (was: WangSheng)

> Scan orc failed when table contains timestamp column
> 
>
> Key: IMPALA-9967
> URL: https://issues.apache.org/jira/browse/IMPALA-9967
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
> Attachments: 00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc, 
> 00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc
>
>
> Recently, when I test impala query orc table, I found that scanning failed 
> when table contains timestamp column, here is there exception: 
> {code:java}
> I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] 
> Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
>  Unknown type kind
> @  0x1c9f753  impala::Status::Status()
> @  0x27aa049  impala::HdfsOrcScanner::ProcessFileTail()
> @  0x27a7fb3  impala::HdfsOrcScanner::Open()
> @  0x27365fe  
> impala::HdfsScanNodeBase::CreateAndOpenScannerHelper()
> @  0x28cb379  impala::HdfsScanNode::ProcessSplit()
> @  0x28caa7d  impala::HdfsScanNode::ScannerThread()
> @  0x28c9de5  
> _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
> @  0x28cc19e  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x205  boost::function0<>::operator()()
> @  0x2675d93  impala::Thread::SuperviseThread()
> @  0x267dd30  boost::_bi::list5<>::operator()<>()
> @  0x267dc54  boost::_bi::bind_t<>::operator()()
> @  0x267dc15  boost::detail::thread_data<>::run()
> @  0x3e3c3c1  thread_proxy
> @ 0x7f32360336b9  start_thread
> @ 0x7f3232bfe41c  clone
> I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 
> 68436a6e0883be84:53877f720002] Error preparing scanner for scan range 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582).
>  Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
>  Unknown type kind
> {code}
> When I remove timestamp colum from table, and generate test data, query 
> success. By the way, my test data is generated by spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work stopped] (IMPALA-9967) Scan orc failed when table contains timestamp column

2020-10-09 Thread WangSheng (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9967 stopped by WangSheng.
-
> Scan orc failed when table contains timestamp column
> 
>
> Key: IMPALA-9967
> URL: https://issues.apache.org/jira/browse/IMPALA-9967
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
> Attachments: 00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc, 
> 00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc
>
>
> Recently, when I test impala query orc table, I found that scanning failed 
> when table contains timestamp column, here is there exception: 
> {code:java}
> I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] 
> Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
>  Unknown type kind
> @  0x1c9f753  impala::Status::Status()
> @  0x27aa049  impala::HdfsOrcScanner::ProcessFileTail()
> @  0x27a7fb3  impala::HdfsOrcScanner::Open()
> @  0x27365fe  
> impala::HdfsScanNodeBase::CreateAndOpenScannerHelper()
> @  0x28cb379  impala::HdfsScanNode::ProcessSplit()
> @  0x28caa7d  impala::HdfsScanNode::ScannerThread()
> @  0x28c9de5  
> _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
> @  0x28cc19e  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x205  boost::function0<>::operator()()
> @  0x2675d93  impala::Thread::SuperviseThread()
> @  0x267dd30  boost::_bi::list5<>::operator()<>()
> @  0x267dc54  boost::_bi::bind_t<>::operator()()
> @  0x267dc15  boost::detail::thread_data<>::run()
> @  0x3e3c3c1  thread_proxy
> @ 0x7f32360336b9  start_thread
> @ 0x7f3232bfe41c  clone
> I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 
> 68436a6e0883be84:53877f720002] Error preparing scanner for scan range 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582).
>  Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
>  Unknown type kind
> {code}
> When I remove timestamp colum from table, and generate test data, query 
> success. By the way, my test data is generated by spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-9967) Scan orc failed when table contains timestamp column

2020-10-09 Thread WangSheng (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9967 started by WangSheng.
-
> Scan orc failed when table contains timestamp column
> 
>
> Key: IMPALA-9967
> URL: https://issues.apache.org/jira/browse/IMPALA-9967
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
> Attachments: 00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-0.orc, 
> 00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-0.orc
>
>
> Recently, when I test impala query orc table, I found that scanning failed 
> when table contains timestamp column, here is there exception: 
> {code:java}
> I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f720002] 
> Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
>  Unknown type kind
> @  0x1c9f753  impala::Status::Status()
> @  0x27aa049  impala::HdfsOrcScanner::ProcessFileTail()
> @  0x27a7fb3  impala::HdfsOrcScanner::Open()
> @  0x27365fe  
> impala::HdfsScanNodeBase::CreateAndOpenScannerHelper()
> @  0x28cb379  impala::HdfsScanNode::ProcessSplit()
> @  0x28caa7d  impala::HdfsScanNode::ScannerThread()
> @  0x28c9de5  
> _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
> @  0x28cc19e  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x205  boost::function0<>::operator()()
> @  0x2675d93  impala::Thread::SuperviseThread()
> @  0x267dd30  boost::_bi::list5<>::operator()<>()
> @  0x267dc54  boost::_bi::bind_t<>::operator()()
> @  0x267dc15  boost::detail::thread_data<>::run()
> @  0x3e3c3c1  thread_proxy
> @ 0x7f32360336b9  start_thread
> @ 0x7f3232bfe41c  clone
> I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 
> 68436a6e0883be84:53877f720002] Error preparing scanner for scan range 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc(0:582).
>  Encountered parse error in tail of ORC file 
> hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-0.orc:
>  Unknown type kind
> {code}
> When I remove timestamp colum from table, and generate test data, query 
> success. By the way, my test data is generated by spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10221) Use 'iceberg.file_format' to replace 'iceberg_file_format'

2020-10-09 Thread WangSheng (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng resolved IMPALA-10221.

Resolution: Fixed

> Use 'iceberg.file_format' to replace 'iceberg_file_format'
> --
>
> Key: IMPALA-10221
> URL: https://issues.apache.org/jira/browse/IMPALA-10221
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> We provide several new table properties in IMPALA-10164, such as 
> 'iceberg.catalog',
>  in order to keep consist of these properties, we rename 
> 'iceberg_file_format' to
>  'iceberg.file_format'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9688) Support create iceberg table by impala

2020-10-09 Thread WangSheng (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangSheng resolved IMPALA-9688.
---
Resolution: Fixed

> Support create iceberg table by impala
> --
>
> Key: IMPALA-9688
> URL: https://issues.apache.org/jira/browse/IMPALA-9688
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>  Labels: impala-iceberg
>
> This sub-task mainly realizes the creation of iceberg table through impala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org