[jira] [Assigned] (IMPALA-10898) Runtime IN-list filters for ORC tables

2021-09-14 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-10898:
---

Assignee: Quanlong Huang

> Runtime IN-list filters for ORC tables
> --
>
> Key: IMPALA-10898
> URL: https://issues.apache.org/jira/browse/IMPALA-10898
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> Currently Impala has two kinds of runtime filters: bloom filter and min-max 
> filter. Unfortunately they can't leverage the bloom filters in ORC files. 
> Only EQUALS and IN-list 
> predicates can leverage them to skip unrelated ORC RowGroups.
> This JIRA aims to add runtime IN-list filters for small build side (e.g. 
> #rows <= 1024) of a hash join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10882) Push down Min-Max predicates of CHAR/VARCHAR to ORC reader

2021-09-14 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10882:

Priority: Critical  (was: Major)

> Push down Min-Max predicates of CHAR/VARCHAR to ORC reader
> --
>
> Key: IMPALA-10882
> URL: https://issues.apache.org/jira/browse/IMPALA-10882
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Quanlong Huang
>Priority: Critical
>
> This is a follow-up work of IMPALA-6505. Due to padding/truncation issues of 
> CHAR/VARCHAR types when comparing to string literals, we might get wrong 
> results if we simply push down CHAR/VARCHAR predicates into the ORC reader, 
> especially when the file schema differs from the table schema.
> See more discussion in the Gerrit review: 
> https://gerrit.cloudera.org/c/15403/1/be/src/exec/hdfs-orc-scanner.cc#889



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10878) Pushdown runtime min-max filters to the ORC reader

2021-09-14 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10878:

Priority: Critical  (was: Major)

> Pushdown runtime min-max filters to the ORC reader
> --
>
> Key: IMPALA-10878
> URL: https://issues.apache.org/jira/browse/IMPALA-10878
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> ORC reader supports predicate push down since 1.7.0. We has extended runtime 
> min-max filters to generate them on Parquet tables. We can also extend it on 
> ORC tables and push them down into the ORC reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10872) Add a snapshot version of ORC-1.7 to native-toolchain

2021-09-14 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10872:

Priority: Critical  (was: Major)

> Add a snapshot version of ORC-1.7 to native-toolchain
> -
>
> Key: IMPALA-10872
> URL: https://issues.apache.org/jira/browse/IMPALA-10872
> Project: IMPALA
>  Issue Type: Task
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> ORC 1.7 has not been released yet. We need some features of it like predicate 
> push down (ORC-751) to improve our orc-scanner.
> The current top commit of the 1.7 branch is 
> 36349d535089412b58f99c72af9bf7dcf7444aee. It contains all the patches we 
> applied on orc-1.6.2: 
> [https://github.com/cloudera/native-toolchain/tree/master/source/orc/orc-1.6.2-patches]
> New Features/Improvements like ORC-751, ORC-614 are also in it. Let's add ORC 
> 36349d535089412b58f99c72af9bf7dcf7444aee into our native-toolchain to unblock 
> WIP patches, e.g. [https://gerrit.cloudera.org/c/15403/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10898) Runtime IN-list filters for ORC tables

2021-09-14 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10898:

Priority: Critical  (was: Major)

> Runtime IN-list filters for ORC tables
> --
>
> Key: IMPALA-10898
> URL: https://issues.apache.org/jira/browse/IMPALA-10898
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Quanlong Huang
>Priority: Critical
>
> Currently Impala has two kinds of runtime filters: bloom filter and min-max 
> filter. Unfortunately they can't leverage the bloom filters in ORC files. 
> Only EQUALS and IN-list 
> predicates can leverage them to skip unrelated ORC RowGroups.
> This JIRA aims to add runtime IN-list filters for small build side (e.g. 
> #rows <= 1024) of a hash join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10873) Push down EQUALS, IS NULL and IN-list predicate to ORC reader

2021-09-14 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10873:

Priority: Critical  (was: Major)

> Push down EQUALS, IS NULL and IN-list predicate to ORC reader
> -
>
> Key: IMPALA-10873
> URL: https://issues.apache.org/jira/browse/IMPALA-10873
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> IMPALA-6505 pushs down the min-max predicates into the ORC reader. Since 
> ORC's SearchArguments also support IN-list predicates, we can consider 
> pushing down IN-list and not IN-list predicates into it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6636) Use async IO in ORC scanner

2021-09-14 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-6636:
---
Priority: Critical  (was: Major)

> Use async IO in ORC scanner
> ---
>
> Key: IMPALA-6636
> URL: https://issues.apache.org/jira/browse/IMPALA-6636
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Csaba Ringhofer
>Priority: Critical
>
> Though ORC-262 has no progress, we can still prefech data and let the ORC lib 
> reading from an in-memory InputStream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10915) Pushdown Timestamp predicates to ORC reader

2021-09-14 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10915:

Priority: Critical  (was: Major)

> Pushdown Timestamp predicates to ORC reader
> ---
>
> Key: IMPALA-10915
> URL: https://issues.apache.org/jira/browse/IMPALA-10915
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> This is a follow-up task of predicate pushdown to the ORC reader 
> (IMPALA-6505). In IMPALA-6505 we skip pushing down predicates on Timestamp 
> columns. As [~csringhofer] pointed out in the review: 
> https://gerrit.cloudera.org/c/17815, we need to deal with corner cases like 
> timezone conversion and timestamps before 1970-01-01. Also add more tests for 
> coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6505) Min-Max predicate push down in ORC scanner

2021-09-14 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-6505:
---
Priority: Critical  (was: Major)

> Min-Max predicate push down in ORC scanner
> --
>
> Key: IMPALA-6505
> URL: https://issues.apache.org/jira/browse/IMPALA-6505
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Norbert Luksa
>Priority: Critical
>
> For parquet tables, we push down predicates that can be used with file level 
> statistics to filter out row groups. It's controlled by the 
> PARQUET_READ_STATISTICS query option.
> We can do the same for ORC tables after ORC-751 (support predicate pushdown 
> in C++ reader) is resolved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10915) Pushdown Timestamp predicates to ORC reader

2021-09-14 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-10915:
---

 Summary: Pushdown Timestamp predicates to ORC reader
 Key: IMPALA-10915
 URL: https://issues.apache.org/jira/browse/IMPALA-10915
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Quanlong Huang
Assignee: Quanlong Huang


This is a follow-up task of predicate pushdown to the ORC reader (IMPALA-6505). 
In IMPALA-6505 we skip pushing down predicates on Timestamp columns. As 
[~csringhofer] pointed out in the review: https://gerrit.cloudera.org/c/17815, 
we need to deal with corner cases like timezone conversion and timestamps 
before 1970-01-01. Also add more tests for coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10915) Pushdown Timestamp predicates to ORC reader

2021-09-14 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-10915:
---

 Summary: Pushdown Timestamp predicates to ORC reader
 Key: IMPALA-10915
 URL: https://issues.apache.org/jira/browse/IMPALA-10915
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Quanlong Huang
Assignee: Quanlong Huang


This is a follow-up task of predicate pushdown to the ORC reader (IMPALA-6505). 
In IMPALA-6505 we skip pushing down predicates on Timestamp columns. As 
[~csringhofer] pointed out in the review: https://gerrit.cloudera.org/c/17815, 
we need to deal with corner cases like timezone conversion and timestamps 
before 1970-01-01. Also add more tests for coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IMPALA-10316) load_nested.py failed due to out of memory during Jenkins GVO

2021-09-14 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415229#comment-17415229
 ] 

Quanlong Huang commented on IMPALA-10316:
-

Also saw this in: https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14799

> load_nested.py failed due to out of memory during Jenkins GVO
> -
>
> Key: IMPALA-10316
> URL: https://issues.apache.org/jira/browse/IMPALA-10316
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: broken-build, flaky
>
> The following job failed due to out of memory:
> [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/12588] (please click 
> on "Don't keep this build forever" once this issue is resolved)
> Relevant log lines:
> {noformat}
> 02:33:42 Loading nested orc data (logging to 
> /home/ubuntu/Impala/logs/data_loading/load-nested.log)... 
> 02:35:39 FAILED (Took: 1 min 57 sec)
> 02:35:39 '/home/ubuntu/Impala/testdata/bin/load_nested.py -t 
> tpch_nested_orc_def -f orc/def' failed. Tail of log:
> 02:35:39 2020-11-11 02:35:06,225 INFO:load_nested[348]:Executing: 
> 02:35:39 
> 02:35:39   CREATE EXTERNAL TABLE supplier
> 02:35:39   STORED AS orc
> 02:35:39   TBLPROPERTIES('orc.compress' = 
> 'ZLIB','external.table.purge'='TRUE')
> 02:35:39   AS SELECT * FROM tmp_supplier
> 02:35:39 Traceback (most recent call last):
> 02:35:39   File "/home/ubuntu/Impala/testdata/bin/load_nested.py", line 415, 
> in 
> 02:35:39 load()
> 02:35:39   File "/home/ubuntu/Impala/testdata/bin/load_nested.py", line 349, 
> in load
> 02:35:39 hive.execute(stmt)
> 02:35:39   File "/home/ubuntu/Impala/tests/comparison/db_connection.py", line 
> 206, in execute
> 02:35:39 return self._cursor.execute(sql, *args, **kwargs)
> 02:35:39   File 
> "/home/ubuntu/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py",
>  line 331, in execute
> 02:35:39 self._wait_to_finish()  # make execute synchronous
> 02:35:39   File 
> "/home/ubuntu/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py",
>  line 413, in _wait_to_finish
> 02:35:39 raise OperationalError(resp.errorMessage)
> 02:35:39 impala.error.OperationalError: Error while compiling statement: 
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1605060173780_0039_2_00, diagnostics=[Task failed, 
> taskId=task_1605060173780_0039_2_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Container container_1605060173780_0039_01_02 finished with 
> diagnostics set to [Container failed, exitCode=-104. [2020-11-11 
> 02:35:11.768]Container 
> [pid=16810,containerID=container_1605060173780_0039_01_02] is running 
> 7729152B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB 
> physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing 
> container.{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Issue Comment Deleted] (IMPALA-10669) Loading nested ORC data is flaky during Docker-based tests

2021-09-14 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10669:

Comment: was deleted

(was: Also saw this in a build: 
https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14799
{code}
Loading nested orc data (logging to 
/home/ubuntu/Impala/logs/data_loading/load-nested.log)... 
FAILED (Took: 2 min 16 sec)
'/home/ubuntu/Impala/testdata/bin/load_nested.py -t tpch_nested_orc_def -f 
orc/def' failed. Tail of log:
2021-09-14 18:53:02,523 INFO:load_nested[348]:Executing: 

  CREATE EXTERNAL TABLE supplier
  STORED AS orc
  TBLPROPERTIES('orc.compress' = 'ZLIB','external.table.purge'='TRUE')
  AS SELECT * FROM tmp_supplier
Traceback (most recent call last):
  File "/home/ubuntu/Impala/testdata/bin/load_nested.py", line 415, in 
load()
  File "/home/ubuntu/Impala/testdata/bin/load_nested.py", line 349, in load
hive.execute(stmt)
  File "/home/ubuntu/Impala/tests/comparison/db_connection.py", line 206, in 
execute
return self._cursor.execute(sql, *args, **kwargs)
  File 
"/home/ubuntu/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py",
 line 343, in execute
self._wait_to_finish()  # make execute synchronous
  File 
"/home/ubuntu/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py",
 line 427, in _wait_to_finish
raise OperationalError(resp.errorMessage)
impala.error.OperationalError: Error while compiling statement: FAILED: 
Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. 
Vertex failed, vertexName=Map 1, vertexId=vertex_1631643014075_0039_2_00, 
diagnostics=[Task failed, taskId=task_1631643014075_0039_2_00_00, 
diagnostics=[TaskAttempt 0 failed, info=[Container 
container_1631643014075_0039_01_02 finished with diagnostics set to 
[Container failed, exitCode=-104. [2021-09-14 18:53:07.711]Container 
[pid=106725,containerID=container_1631643014075_0039_01_02] is running 
14094336B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB 
physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1631643014075_0039_01_02 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 106725 106723 106725 106725 (bash) 0 0 11546624 742 /bin/bash -c 
/usr/lib/jvm/java-8-openjdk-amd64/bin/java  -Xmx819m -server 
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN  
-Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator 
-Dlog4j.configuration=tez-container-log4j.properties 
-Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_02
 -Dtez.root.logger=INFO,CLA  
-Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_02/tmp
 org.apache.tez.runtime.task.TezChild localhost 33614 
container_1631643014075_0039_01_02 application_1631643014075_0039 1 
1>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_02/stdout
 
2>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_02/stderr
  
|- 106735 106725 106725 106725 (java) 1780 265 2673709056 264843 
/usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx819m -server 
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN 
-Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator 
-Dlog4j.configuration=tez-container-log4j.properties 
-Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_02
 -Dtez.root.logger=INFO,CLA 
-Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_02/tmp
 org.apache.tez.runtime.task.TezChild localhost 33614 
container_1631643014075_0039_01_02 application_1631643014075_0039 1 

[2021-09-14 18:53:07.719]Container killed on request. Exit code is 143
[2021-09-14 18:53:07.719]Container exited with a non-zero exit code 143. 
]], TaskAttempt 1 failed, info=[Container 
container_1631643014075_0039_01_03 finished with diagnostics set to 
[Container failed, exitCode=-104. [2021-09-14 18:53:16.803]Container 
[pid=106885,containerID=container_1631643014075_0039_01_03] is running 
21422080B beyond the 'PHYSICAL' memory limit. Current 

[jira] [Commented] (IMPALA-10669) Loading nested ORC data is flaky during Docker-based tests

2021-09-14 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415223#comment-17415223
 ] 

Quanlong Huang commented on IMPALA-10669:
-

Also saw this in a build: 
https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14799
{code}
Loading nested orc data (logging to 
/home/ubuntu/Impala/logs/data_loading/load-nested.log)... 
FAILED (Took: 2 min 16 sec)
'/home/ubuntu/Impala/testdata/bin/load_nested.py -t tpch_nested_orc_def -f 
orc/def' failed. Tail of log:
2021-09-14 18:53:02,523 INFO:load_nested[348]:Executing: 

  CREATE EXTERNAL TABLE supplier
  STORED AS orc
  TBLPROPERTIES('orc.compress' = 'ZLIB','external.table.purge'='TRUE')
  AS SELECT * FROM tmp_supplier
Traceback (most recent call last):
  File "/home/ubuntu/Impala/testdata/bin/load_nested.py", line 415, in 
load()
  File "/home/ubuntu/Impala/testdata/bin/load_nested.py", line 349, in load
hive.execute(stmt)
  File "/home/ubuntu/Impala/tests/comparison/db_connection.py", line 206, in 
execute
return self._cursor.execute(sql, *args, **kwargs)
  File 
"/home/ubuntu/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py",
 line 343, in execute
self._wait_to_finish()  # make execute synchronous
  File 
"/home/ubuntu/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py",
 line 427, in _wait_to_finish
raise OperationalError(resp.errorMessage)
impala.error.OperationalError: Error while compiling statement: FAILED: 
Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. 
Vertex failed, vertexName=Map 1, vertexId=vertex_1631643014075_0039_2_00, 
diagnostics=[Task failed, taskId=task_1631643014075_0039_2_00_00, 
diagnostics=[TaskAttempt 0 failed, info=[Container 
container_1631643014075_0039_01_02 finished with diagnostics set to 
[Container failed, exitCode=-104. [2021-09-14 18:53:07.711]Container 
[pid=106725,containerID=container_1631643014075_0039_01_02] is running 
14094336B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB 
physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1631643014075_0039_01_02 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 106725 106723 106725 106725 (bash) 0 0 11546624 742 /bin/bash -c 
/usr/lib/jvm/java-8-openjdk-amd64/bin/java  -Xmx819m -server 
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN  
-Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator 
-Dlog4j.configuration=tez-container-log4j.properties 
-Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_02
 -Dtez.root.logger=INFO,CLA  
-Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_02/tmp
 org.apache.tez.runtime.task.TezChild localhost 33614 
container_1631643014075_0039_01_02 application_1631643014075_0039 1 
1>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_02/stdout
 
2>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_02/stderr
  
|- 106735 106725 106725 106725 (java) 1780 265 2673709056 264843 
/usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx819m -server 
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN 
-Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator 
-Dlog4j.configuration=tez-container-log4j.properties 
-Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_02
 -Dtez.root.logger=INFO,CLA 
-Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_02/tmp
 org.apache.tez.runtime.task.TezChild localhost 33614 
container_1631643014075_0039_01_02 application_1631643014075_0039 1 

[2021-09-14 18:53:07.719]Container killed on request. Exit code is 143
[2021-09-14 18:53:07.719]Container exited with a non-zero exit code 143. 
]], TaskAttempt 1 failed, info=[Container 
container_1631643014075_0039_01_03 finished with diagnostics set to 
[Container failed, exitCode=-104. [2021-09-14 18:53:16.803]Container 
[pid=106885,containerID=container_1631643014075_0039_01_03] is running 
21422080B beyond the 'PHYSICAL' memory limit. 

[jira] [Resolved] (IMPALA-10888) getPartitionsByNames should return partitions sorted by name

2021-09-14 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10888.
--
Fix Version/s: Impala 4.1.0
   Resolution: Fixed

> getPartitionsByNames should return partitions sorted by name
> 
>
> Key: IMPALA-10888
> URL: https://issues.apache.org/jira/browse/IMPALA-10888
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> The CatalogMetastoreServer's implementation of {{getPartitionByNames}} does 
> not return partitions order by partition name whereas in case of HMS it 
> orders them by partition name. While this is not a documented behavior and 
> clients should not assume this it can cause test flakiness where we expect 
> the order of the partitions to be consistent. We should change the 
> implementation so that the returned partitions over this API are sorted by 
> partition name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10888) getPartitionsByNames should return partitions sorted by name

2021-09-14 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10888.
--
Fix Version/s: Impala 4.1.0
   Resolution: Fixed

> getPartitionsByNames should return partitions sorted by name
> 
>
> Key: IMPALA-10888
> URL: https://issues.apache.org/jira/browse/IMPALA-10888
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> The CatalogMetastoreServer's implementation of {{getPartitionByNames}} does 
> not return partitions order by partition name whereas in case of HMS it 
> orders them by partition name. While this is not a documented behavior and 
> clients should not assume this it can cause test flakiness where we expect 
> the order of the partitions to be consistent. We should change the 
> implementation so that the returned partitions over this API are sorted by 
> partition name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10714) Intermittent DCHECK(read_iter->read_page_->attached_to_output_batch) in tests

2021-09-14 Thread Kurt Deschler (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414955#comment-17414955
 ] 

Kurt Deschler commented on IMPALA-10714:


There seems to be some history with this assert and attempts to fix it. 
Apparently there is still a timing hole.

> Intermittent DCHECK(read_iter->read_page_->attached_to_output_batch) in tests
> -
>
> Key: IMPALA-10714
> URL: https://issues.apache.org/jira/browse/IMPALA-10714
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: broken-build, flaky
>
> TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK in 
> exhaustive build.
> The Impala git hash was: e11237e29 IMPALA-10197: Add KUDU_REPLICA_SELECTION 
> query option
> In impalad.FATAL:
> {noformat}
> F0525 12:45:49.307780 13122 buffered-tuple-stream.cc:531] 
> 564af337ca503984:f1209fc4] Check failed: 
> read_iter->read_page_->attached_to_output_batch
> {noformat}
> Query 564af337ca503984:f1209fc4 was:
> {noformat}
> I0525 12:45:48.474383 17878 impala-server.cc:1324] 
> 564af337ca503984:f1209fc4] Registered query 
> query_id=564af337ca503984:f1209fc4 
> session_id=9e4875c17adf5e7a:eb72d33dc39b5288
> I0525 12:45:48.474486 17878 Frontend.java:1618] 
> 564af337ca503984:f1209fc4] Analyzing query: select 
> group_concat(string_col), length(bigstr) from bigstrs2
> group by bigstr db: test_spilling_large_rows_119f6bb1
> {noformat}
> I couldn't reproduce the issue locally.
>  
>  
> {code:java}
> be/src/runtime/buffered-tuple-stream.cc:531
> 530 DCHECK_NE(&*read_iter->read_page_, write_page_);
>  531 DCHECK(read_iter->read_page_->attached_to_output_batch);
>  532 pages_.pop_front();
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10714) Intermittent DCHECK(read_iter->read_page_->attached_to_output_batch) in tests

2021-09-14 Thread Kurt Deschler (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Deschler updated IMPALA-10714:
---
Description: 
TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK in 
exhaustive build.

The Impala git hash was: e11237e29 IMPALA-10197: Add KUDU_REPLICA_SELECTION 
query option

In impalad.FATAL:
{noformat}
F0525 12:45:49.307780 13122 buffered-tuple-stream.cc:531] 
564af337ca503984:f1209fc4] Check failed: 
read_iter->read_page_->attached_to_output_batch
{noformat}
Query 564af337ca503984:f1209fc4 was:
{noformat}
I0525 12:45:48.474383 17878 impala-server.cc:1324] 
564af337ca503984:f1209fc4] Registered query 
query_id=564af337ca503984:f1209fc4 
session_id=9e4875c17adf5e7a:eb72d33dc39b5288
I0525 12:45:48.474486 17878 Frontend.java:1618] 
564af337ca503984:f1209fc4] Analyzing query: select 
group_concat(string_col), length(bigstr) from bigstrs2
group by bigstr db: test_spilling_large_rows_119f6bb1
{noformat}
I couldn't reproduce the issue locally.

 

 
{code:java}
be/src/runtime/buffered-tuple-stream.cc:531
530 DCHECK_NE(&*read_iter->read_page_, write_page_);
 531 DCHECK(read_iter->read_page_->attached_to_output_batch);
 532 pages_.pop_front();
{code}
 

  was:
TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK in 
exhaustive build.

The Impala git hash was: e11237e29 IMPALA-10197: Add KUDU_REPLICA_SELECTION 
query option

In impalad.FATAL:

{noformat}
F0525 12:45:49.307780 13122 buffered-tuple-stream.cc:531] 
564af337ca503984:f1209fc4] Check failed: 
read_iter->read_page_->attached_to_output_batch
{noformat}

Query 564af337ca503984:f1209fc4 was:

{noformat}
I0525 12:45:48.474383 17878 impala-server.cc:1324] 
564af337ca503984:f1209fc4] Registered query 
query_id=564af337ca503984:f1209fc4 
session_id=9e4875c17adf5e7a:eb72d33dc39b5288
I0525 12:45:48.474486 17878 Frontend.java:1618] 
564af337ca503984:f1209fc4] Analyzing query: select 
group_concat(string_col), length(bigstr) from bigstrs2
group by bigstr db: test_spilling_large_rows_119f6bb1
{noformat}

I couldn't reproduce the issue locally.


> Intermittent DCHECK(read_iter->read_page_->attached_to_output_batch) in tests
> -
>
> Key: IMPALA-10714
> URL: https://issues.apache.org/jira/browse/IMPALA-10714
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: broken-build, flaky
>
> TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK in 
> exhaustive build.
> The Impala git hash was: e11237e29 IMPALA-10197: Add KUDU_REPLICA_SELECTION 
> query option
> In impalad.FATAL:
> {noformat}
> F0525 12:45:49.307780 13122 buffered-tuple-stream.cc:531] 
> 564af337ca503984:f1209fc4] Check failed: 
> read_iter->read_page_->attached_to_output_batch
> {noformat}
> Query 564af337ca503984:f1209fc4 was:
> {noformat}
> I0525 12:45:48.474383 17878 impala-server.cc:1324] 
> 564af337ca503984:f1209fc4] Registered query 
> query_id=564af337ca503984:f1209fc4 
> session_id=9e4875c17adf5e7a:eb72d33dc39b5288
> I0525 12:45:48.474486 17878 Frontend.java:1618] 
> 564af337ca503984:f1209fc4] Analyzing query: select 
> group_concat(string_col), length(bigstr) from bigstrs2
> group by bigstr db: test_spilling_large_rows_119f6bb1
> {noformat}
> I couldn't reproduce the issue locally.
>  
>  
> {code:java}
> be/src/runtime/buffered-tuple-stream.cc:531
> 530 DCHECK_NE(&*read_iter->read_page_, write_page_);
>  531 DCHECK(read_iter->read_page_->attached_to_output_batch);
>  532 pages_.pop_front();
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10714) Intermittent DCHECK(read_iter->read_page_->attached_to_output_batch) in tests

2021-09-14 Thread Kurt Deschler (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Deschler updated IMPALA-10714:
---
Summary: Intermittent 
DCHECK(read_iter->read_page_->attached_to_output_batch) in tests  (was: 
TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK)

> Intermittent DCHECK(read_iter->read_page_->attached_to_output_batch) in tests
> -
>
> Key: IMPALA-10714
> URL: https://issues.apache.org/jira/browse/IMPALA-10714
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: broken-build, flaky
>
> TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK in 
> exhaustive build.
> The Impala git hash was: e11237e29 IMPALA-10197: Add KUDU_REPLICA_SELECTION 
> query option
> In impalad.FATAL:
> {noformat}
> F0525 12:45:49.307780 13122 buffered-tuple-stream.cc:531] 
> 564af337ca503984:f1209fc4] Check failed: 
> read_iter->read_page_->attached_to_output_batch
> {noformat}
> Query 564af337ca503984:f1209fc4 was:
> {noformat}
> I0525 12:45:48.474383 17878 impala-server.cc:1324] 
> 564af337ca503984:f1209fc4] Registered query 
> query_id=564af337ca503984:f1209fc4 
> session_id=9e4875c17adf5e7a:eb72d33dc39b5288
> I0525 12:45:48.474486 17878 Frontend.java:1618] 
> 564af337ca503984:f1209fc4] Analyzing query: select 
> group_concat(string_col), length(bigstr) from bigstrs2
> group by bigstr db: test_spilling_large_rows_119f6bb1
> {noformat}
> I couldn't reproduce the issue locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-10714) TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK

2021-09-14 Thread Kurt Deschler (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414953#comment-17414953
 ] 

Kurt Deschler edited comment on IMPALA-10714 at 9/14/21, 1:57 PM:
--

Observed same assert in this exhaustive test:
{code:java}
test_inline_view[protocol: hs2 | exec_option: 
{'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'debug_action': 
'BEFORE_CODEGEN_IN_ASYNC_CODEGEN_THREAD:JITTER@1000|AFTER_STARTING_ASYNC_CODEGEN_IN_FRAGMENT_THREAD:JITTER@1000',
 'ASYNC_CODEGEN': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 
'num_nodes': 0} | table_format: parquet/none]{code}


was (Author: kdeschle):
Observed same assert in this test:
{code:java}
test_inline_view[protocol: hs2 | exec_option: 
{'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'debug_action': 
'BEFORE_CODEGEN_IN_ASYNC_CODEGEN_THREAD:JITTER@1000|AFTER_STARTING_ASYNC_CODEGEN_IN_FRAGMENT_THREAD:JITTER@1000',
 'ASYNC_CODEGEN': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 
'num_nodes': 0} | table_format: parquet/none]{code}

> TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK
> --
>
> Key: IMPALA-10714
> URL: https://issues.apache.org/jira/browse/IMPALA-10714
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: broken-build, flaky
>
> TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK in 
> exhaustive build.
> The Impala git hash was: e11237e29 IMPALA-10197: Add KUDU_REPLICA_SELECTION 
> query option
> In impalad.FATAL:
> {noformat}
> F0525 12:45:49.307780 13122 buffered-tuple-stream.cc:531] 
> 564af337ca503984:f1209fc4] Check failed: 
> read_iter->read_page_->attached_to_output_batch
> {noformat}
> Query 564af337ca503984:f1209fc4 was:
> {noformat}
> I0525 12:45:48.474383 17878 impala-server.cc:1324] 
> 564af337ca503984:f1209fc4] Registered query 
> query_id=564af337ca503984:f1209fc4 
> session_id=9e4875c17adf5e7a:eb72d33dc39b5288
> I0525 12:45:48.474486 17878 Frontend.java:1618] 
> 564af337ca503984:f1209fc4] Analyzing query: select 
> group_concat(string_col), length(bigstr) from bigstrs2
> group by bigstr db: test_spilling_large_rows_119f6bb1
> {noformat}
> I couldn't reproduce the issue locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10714) TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK

2021-09-14 Thread Kurt Deschler (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414953#comment-17414953
 ] 

Kurt Deschler commented on IMPALA-10714:


Observed same assert in this test:
{code:java}
test_inline_view[protocol: hs2 | exec_option: 
{'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'debug_action': 
'BEFORE_CODEGEN_IN_ASYNC_CODEGEN_THREAD:JITTER@1000|AFTER_STARTING_ASYNC_CODEGEN_IN_FRAGMENT_THREAD:JITTER@1000',
 'ASYNC_CODEGEN': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 
'num_nodes': 0} | table_format: parquet/none]{code}

> TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK
> --
>
> Key: IMPALA-10714
> URL: https://issues.apache.org/jira/browse/IMPALA-10714
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>  Labels: broken-build, flaky
>
> TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK in 
> exhaustive build.
> The Impala git hash was: e11237e29 IMPALA-10197: Add KUDU_REPLICA_SELECTION 
> query option
> In impalad.FATAL:
> {noformat}
> F0525 12:45:49.307780 13122 buffered-tuple-stream.cc:531] 
> 564af337ca503984:f1209fc4] Check failed: 
> read_iter->read_page_->attached_to_output_batch
> {noformat}
> Query 564af337ca503984:f1209fc4 was:
> {noformat}
> I0525 12:45:48.474383 17878 impala-server.cc:1324] 
> 564af337ca503984:f1209fc4] Registered query 
> query_id=564af337ca503984:f1209fc4 
> session_id=9e4875c17adf5e7a:eb72d33dc39b5288
> I0525 12:45:48.474486 17878 Frontend.java:1618] 
> 564af337ca503984:f1209fc4] Analyzing query: select 
> group_concat(string_col), length(bigstr) from bigstrs2
> group by bigstr db: test_spilling_large_rows_119f6bb1
> {noformat}
> I couldn't reproduce the issue locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org