[jira] [Assigned] (IMPALA-10898) Runtime IN-list filters for ORC tables
[ https://issues.apache.org/jira/browse/IMPALA-10898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang reassigned IMPALA-10898: --- Assignee: Quanlong Huang > Runtime IN-list filters for ORC tables > -- > > Key: IMPALA-10898 > URL: https://issues.apache.org/jira/browse/IMPALA-10898 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > Currently Impala has two kinds of runtime filters: bloom filter and min-max > filter. Unfortunately they can't leverage the bloom filters in ORC files. > Only EQUALS and IN-list > predicates can leverage them to skip unrelated ORC RowGroups. > This JIRA aims to add runtime IN-list filters for small build side (e.g. > #rows <= 1024) of a hash join. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10882) Push down Min-Max predicates of CHAR/VARCHAR to ORC reader
[ https://issues.apache.org/jira/browse/IMPALA-10882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-10882: Priority: Critical (was: Major) > Push down Min-Max predicates of CHAR/VARCHAR to ORC reader > -- > > Key: IMPALA-10882 > URL: https://issues.apache.org/jira/browse/IMPALA-10882 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Quanlong Huang >Priority: Critical > > This is a follow-up work of IMPALA-6505. Due to padding/truncation issues of > CHAR/VARCHAR types when comparing to string literals, we might get wrong > results if we simply push down CHAR/VARCHAR predicates into the ORC reader, > especially when the file schema differs from the table schema. > See more discussion in the Gerrit review: > https://gerrit.cloudera.org/c/15403/1/be/src/exec/hdfs-orc-scanner.cc#889 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10878) Pushdown runtime min-max filters to the ORC reader
[ https://issues.apache.org/jira/browse/IMPALA-10878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-10878: Priority: Critical (was: Major) > Pushdown runtime min-max filters to the ORC reader > -- > > Key: IMPALA-10878 > URL: https://issues.apache.org/jira/browse/IMPALA-10878 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > ORC reader supports predicate push down since 1.7.0. We has extended runtime > min-max filters to generate them on Parquet tables. We can also extend it on > ORC tables and push them down into the ORC reader. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10872) Add a snapshot version of ORC-1.7 to native-toolchain
[ https://issues.apache.org/jira/browse/IMPALA-10872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-10872: Priority: Critical (was: Major) > Add a snapshot version of ORC-1.7 to native-toolchain > - > > Key: IMPALA-10872 > URL: https://issues.apache.org/jira/browse/IMPALA-10872 > Project: IMPALA > Issue Type: Task >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > ORC 1.7 has not been released yet. We need some features of it like predicate > push down (ORC-751) to improve our orc-scanner. > The current top commit of the 1.7 branch is > 36349d535089412b58f99c72af9bf7dcf7444aee. It contains all the patches we > applied on orc-1.6.2: > [https://github.com/cloudera/native-toolchain/tree/master/source/orc/orc-1.6.2-patches] > New Features/Improvements like ORC-751, ORC-614 are also in it. Let's add ORC > 36349d535089412b58f99c72af9bf7dcf7444aee into our native-toolchain to unblock > WIP patches, e.g. [https://gerrit.cloudera.org/c/15403/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10898) Runtime IN-list filters for ORC tables
[ https://issues.apache.org/jira/browse/IMPALA-10898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-10898: Priority: Critical (was: Major) > Runtime IN-list filters for ORC tables > -- > > Key: IMPALA-10898 > URL: https://issues.apache.org/jira/browse/IMPALA-10898 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Quanlong Huang >Priority: Critical > > Currently Impala has two kinds of runtime filters: bloom filter and min-max > filter. Unfortunately they can't leverage the bloom filters in ORC files. > Only EQUALS and IN-list > predicates can leverage them to skip unrelated ORC RowGroups. > This JIRA aims to add runtime IN-list filters for small build side (e.g. > #rows <= 1024) of a hash join. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10873) Push down EQUALS, IS NULL and IN-list predicate to ORC reader
[ https://issues.apache.org/jira/browse/IMPALA-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-10873: Priority: Critical (was: Major) > Push down EQUALS, IS NULL and IN-list predicate to ORC reader > - > > Key: IMPALA-10873 > URL: https://issues.apache.org/jira/browse/IMPALA-10873 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > IMPALA-6505 pushs down the min-max predicates into the ORC reader. Since > ORC's SearchArguments also support IN-list predicates, we can consider > pushing down IN-list and not IN-list predicates into it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-6636) Use async IO in ORC scanner
[ https://issues.apache.org/jira/browse/IMPALA-6636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-6636: --- Priority: Critical (was: Major) > Use async IO in ORC scanner > --- > > Key: IMPALA-6636 > URL: https://issues.apache.org/jira/browse/IMPALA-6636 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Quanlong Huang >Assignee: Csaba Ringhofer >Priority: Critical > > Though ORC-262 has no progress, we can still prefech data and let the ORC lib > reading from an in-memory InputStream. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10915) Pushdown Timestamp predicates to ORC reader
[ https://issues.apache.org/jira/browse/IMPALA-10915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-10915: Priority: Critical (was: Major) > Pushdown Timestamp predicates to ORC reader > --- > > Key: IMPALA-10915 > URL: https://issues.apache.org/jira/browse/IMPALA-10915 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > This is a follow-up task of predicate pushdown to the ORC reader > (IMPALA-6505). In IMPALA-6505 we skip pushing down predicates on Timestamp > columns. As [~csringhofer] pointed out in the review: > https://gerrit.cloudera.org/c/17815, we need to deal with corner cases like > timezone conversion and timestamps before 1970-01-01. Also add more tests for > coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-6505) Min-Max predicate push down in ORC scanner
[ https://issues.apache.org/jira/browse/IMPALA-6505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-6505: --- Priority: Critical (was: Major) > Min-Max predicate push down in ORC scanner > -- > > Key: IMPALA-6505 > URL: https://issues.apache.org/jira/browse/IMPALA-6505 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Reporter: Quanlong Huang >Assignee: Norbert Luksa >Priority: Critical > > For parquet tables, we push down predicates that can be used with file level > statistics to filter out row groups. It's controlled by the > PARQUET_READ_STATISTICS query option. > We can do the same for ORC tables after ORC-751 (support predicate pushdown > in C++ reader) is resolved. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10915) Pushdown Timestamp predicates to ORC reader
Quanlong Huang created IMPALA-10915: --- Summary: Pushdown Timestamp predicates to ORC reader Key: IMPALA-10915 URL: https://issues.apache.org/jira/browse/IMPALA-10915 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Quanlong Huang Assignee: Quanlong Huang This is a follow-up task of predicate pushdown to the ORC reader (IMPALA-6505). In IMPALA-6505 we skip pushing down predicates on Timestamp columns. As [~csringhofer] pointed out in the review: https://gerrit.cloudera.org/c/17815, we need to deal with corner cases like timezone conversion and timestamps before 1970-01-01. Also add more tests for coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10915) Pushdown Timestamp predicates to ORC reader
Quanlong Huang created IMPALA-10915: --- Summary: Pushdown Timestamp predicates to ORC reader Key: IMPALA-10915 URL: https://issues.apache.org/jira/browse/IMPALA-10915 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Quanlong Huang Assignee: Quanlong Huang This is a follow-up task of predicate pushdown to the ORC reader (IMPALA-6505). In IMPALA-6505 we skip pushing down predicates on Timestamp columns. As [~csringhofer] pointed out in the review: https://gerrit.cloudera.org/c/17815, we need to deal with corner cases like timezone conversion and timestamps before 1970-01-01. Also add more tests for coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IMPALA-10316) load_nested.py failed due to out of memory during Jenkins GVO
[ https://issues.apache.org/jira/browse/IMPALA-10316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415229#comment-17415229 ] Quanlong Huang commented on IMPALA-10316: - Also saw this in: https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14799 > load_nested.py failed due to out of memory during Jenkins GVO > - > > Key: IMPALA-10316 > URL: https://issues.apache.org/jira/browse/IMPALA-10316 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: broken-build, flaky > > The following job failed due to out of memory: > [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/12588] (please click > on "Don't keep this build forever" once this issue is resolved) > Relevant log lines: > {noformat} > 02:33:42 Loading nested orc data (logging to > /home/ubuntu/Impala/logs/data_loading/load-nested.log)... > 02:35:39 FAILED (Took: 1 min 57 sec) > 02:35:39 '/home/ubuntu/Impala/testdata/bin/load_nested.py -t > tpch_nested_orc_def -f orc/def' failed. Tail of log: > 02:35:39 2020-11-11 02:35:06,225 INFO:load_nested[348]:Executing: > 02:35:39 > 02:35:39 CREATE EXTERNAL TABLE supplier > 02:35:39 STORED AS orc > 02:35:39 TBLPROPERTIES('orc.compress' = > 'ZLIB','external.table.purge'='TRUE') > 02:35:39 AS SELECT * FROM tmp_supplier > 02:35:39 Traceback (most recent call last): > 02:35:39 File "/home/ubuntu/Impala/testdata/bin/load_nested.py", line 415, > in > 02:35:39 load() > 02:35:39 File "/home/ubuntu/Impala/testdata/bin/load_nested.py", line 349, > in load > 02:35:39 hive.execute(stmt) > 02:35:39 File "/home/ubuntu/Impala/tests/comparison/db_connection.py", line > 206, in execute > 02:35:39 return self._cursor.execute(sql, *args, **kwargs) > 02:35:39 File > "/home/ubuntu/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py", > line 331, in execute > 02:35:39 self._wait_to_finish() # make execute synchronous > 02:35:39 File > "/home/ubuntu/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py", > line 413, in _wait_to_finish > 02:35:39 raise OperationalError(resp.errorMessage) > 02:35:39 impala.error.OperationalError: Error while compiling statement: > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, > vertexId=vertex_1605060173780_0039_2_00, diagnostics=[Task failed, > taskId=task_1605060173780_0039_2_00_00, diagnostics=[TaskAttempt 0 > failed, info=[Container container_1605060173780_0039_01_02 finished with > diagnostics set to [Container failed, exitCode=-104. [2020-11-11 > 02:35:11.768]Container > [pid=16810,containerID=container_1605060173780_0039_01_02] is running > 7729152B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB > physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing > container.{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Issue Comment Deleted] (IMPALA-10669) Loading nested ORC data is flaky during Docker-based tests
[ https://issues.apache.org/jira/browse/IMPALA-10669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-10669: Comment: was deleted (was: Also saw this in a build: https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14799 {code} Loading nested orc data (logging to /home/ubuntu/Impala/logs/data_loading/load-nested.log)... FAILED (Took: 2 min 16 sec) '/home/ubuntu/Impala/testdata/bin/load_nested.py -t tpch_nested_orc_def -f orc/def' failed. Tail of log: 2021-09-14 18:53:02,523 INFO:load_nested[348]:Executing: CREATE EXTERNAL TABLE supplier STORED AS orc TBLPROPERTIES('orc.compress' = 'ZLIB','external.table.purge'='TRUE') AS SELECT * FROM tmp_supplier Traceback (most recent call last): File "/home/ubuntu/Impala/testdata/bin/load_nested.py", line 415, in load() File "/home/ubuntu/Impala/testdata/bin/load_nested.py", line 349, in load hive.execute(stmt) File "/home/ubuntu/Impala/tests/comparison/db_connection.py", line 206, in execute return self._cursor.execute(sql, *args, **kwargs) File "/home/ubuntu/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py", line 343, in execute self._wait_to_finish() # make execute synchronous File "/home/ubuntu/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py", line 427, in _wait_to_finish raise OperationalError(resp.errorMessage) impala.error.OperationalError: Error while compiling statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1631643014075_0039_2_00, diagnostics=[Task failed, taskId=task_1631643014075_0039_2_00_00, diagnostics=[TaskAttempt 0 failed, info=[Container container_1631643014075_0039_01_02 finished with diagnostics set to [Container failed, exitCode=-104. [2021-09-14 18:53:07.711]Container [pid=106725,containerID=container_1631643014075_0039_01_02] is running 14094336B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1631643014075_0039_01_02 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 106725 106723 106725 106725 (bash) 0 0 11546624 742 /bin/bash -c /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_02 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_02/tmp org.apache.tez.runtime.task.TezChild localhost 33614 container_1631643014075_0039_01_02 application_1631643014075_0039 1 1>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_02/stdout 2>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_02/stderr |- 106735 106725 106725 106725 (java) 1780 265 2673709056 264843 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_02 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_02/tmp org.apache.tez.runtime.task.TezChild localhost 33614 container_1631643014075_0039_01_02 application_1631643014075_0039 1 [2021-09-14 18:53:07.719]Container killed on request. Exit code is 143 [2021-09-14 18:53:07.719]Container exited with a non-zero exit code 143. ]], TaskAttempt 1 failed, info=[Container container_1631643014075_0039_01_03 finished with diagnostics set to [Container failed, exitCode=-104. [2021-09-14 18:53:16.803]Container [pid=106885,containerID=container_1631643014075_0039_01_03] is running 21422080B beyond the 'PHYSICAL' memory limit. Current
[jira] [Commented] (IMPALA-10669) Loading nested ORC data is flaky during Docker-based tests
[ https://issues.apache.org/jira/browse/IMPALA-10669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415223#comment-17415223 ] Quanlong Huang commented on IMPALA-10669: - Also saw this in a build: https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14799 {code} Loading nested orc data (logging to /home/ubuntu/Impala/logs/data_loading/load-nested.log)... FAILED (Took: 2 min 16 sec) '/home/ubuntu/Impala/testdata/bin/load_nested.py -t tpch_nested_orc_def -f orc/def' failed. Tail of log: 2021-09-14 18:53:02,523 INFO:load_nested[348]:Executing: CREATE EXTERNAL TABLE supplier STORED AS orc TBLPROPERTIES('orc.compress' = 'ZLIB','external.table.purge'='TRUE') AS SELECT * FROM tmp_supplier Traceback (most recent call last): File "/home/ubuntu/Impala/testdata/bin/load_nested.py", line 415, in load() File "/home/ubuntu/Impala/testdata/bin/load_nested.py", line 349, in load hive.execute(stmt) File "/home/ubuntu/Impala/tests/comparison/db_connection.py", line 206, in execute return self._cursor.execute(sql, *args, **kwargs) File "/home/ubuntu/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py", line 343, in execute self._wait_to_finish() # make execute synchronous File "/home/ubuntu/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py", line 427, in _wait_to_finish raise OperationalError(resp.errorMessage) impala.error.OperationalError: Error while compiling statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1631643014075_0039_2_00, diagnostics=[Task failed, taskId=task_1631643014075_0039_2_00_00, diagnostics=[TaskAttempt 0 failed, info=[Container container_1631643014075_0039_01_02 finished with diagnostics set to [Container failed, exitCode=-104. [2021-09-14 18:53:07.711]Container [pid=106725,containerID=container_1631643014075_0039_01_02] is running 14094336B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1631643014075_0039_01_02 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 106725 106723 106725 106725 (bash) 0 0 11546624 742 /bin/bash -c /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_02 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_02/tmp org.apache.tez.runtime.task.TezChild localhost 33614 container_1631643014075_0039_01_02 application_1631643014075_0039 1 1>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_02/stdout 2>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_02/stderr |- 106735 106725 106725 106725 (java) 1780 265 2673709056 264843 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_02 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_02/tmp org.apache.tez.runtime.task.TezChild localhost 33614 container_1631643014075_0039_01_02 application_1631643014075_0039 1 [2021-09-14 18:53:07.719]Container killed on request. Exit code is 143 [2021-09-14 18:53:07.719]Container exited with a non-zero exit code 143. ]], TaskAttempt 1 failed, info=[Container container_1631643014075_0039_01_03 finished with diagnostics set to [Container failed, exitCode=-104. [2021-09-14 18:53:16.803]Container [pid=106885,containerID=container_1631643014075_0039_01_03] is running 21422080B beyond the 'PHYSICAL' memory limit.
[jira] [Resolved] (IMPALA-10888) getPartitionsByNames should return partitions sorted by name
[ https://issues.apache.org/jira/browse/IMPALA-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10888. -- Fix Version/s: Impala 4.1.0 Resolution: Fixed > getPartitionsByNames should return partitions sorted by name > > > Key: IMPALA-10888 > URL: https://issues.apache.org/jira/browse/IMPALA-10888 > Project: IMPALA > Issue Type: Sub-task >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.1.0 > > > The CatalogMetastoreServer's implementation of {{getPartitionByNames}} does > not return partitions order by partition name whereas in case of HMS it > orders them by partition name. While this is not a documented behavior and > clients should not assume this it can cause test flakiness where we expect > the order of the partitions to be consistent. We should change the > implementation so that the returned partitions over this API are sorted by > partition name. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10888) getPartitionsByNames should return partitions sorted by name
[ https://issues.apache.org/jira/browse/IMPALA-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10888. -- Fix Version/s: Impala 4.1.0 Resolution: Fixed > getPartitionsByNames should return partitions sorted by name > > > Key: IMPALA-10888 > URL: https://issues.apache.org/jira/browse/IMPALA-10888 > Project: IMPALA > Issue Type: Sub-task >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.1.0 > > > The CatalogMetastoreServer's implementation of {{getPartitionByNames}} does > not return partitions order by partition name whereas in case of HMS it > orders them by partition name. While this is not a documented behavior and > clients should not assume this it can cause test flakiness where we expect > the order of the partitions to be consistent. We should change the > implementation so that the returned partitions over this API are sorted by > partition name. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10714) Intermittent DCHECK(read_iter->read_page_->attached_to_output_batch) in tests
[ https://issues.apache.org/jira/browse/IMPALA-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414955#comment-17414955 ] Kurt Deschler commented on IMPALA-10714: There seems to be some history with this assert and attempts to fix it. Apparently there is still a timing hole. > Intermittent DCHECK(read_iter->read_page_->attached_to_output_batch) in tests > - > > Key: IMPALA-10714 > URL: https://issues.apache.org/jira/browse/IMPALA-10714 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: broken-build, flaky > > TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK in > exhaustive build. > The Impala git hash was: e11237e29 IMPALA-10197: Add KUDU_REPLICA_SELECTION > query option > In impalad.FATAL: > {noformat} > F0525 12:45:49.307780 13122 buffered-tuple-stream.cc:531] > 564af337ca503984:f1209fc4] Check failed: > read_iter->read_page_->attached_to_output_batch > {noformat} > Query 564af337ca503984:f1209fc4 was: > {noformat} > I0525 12:45:48.474383 17878 impala-server.cc:1324] > 564af337ca503984:f1209fc4] Registered query > query_id=564af337ca503984:f1209fc4 > session_id=9e4875c17adf5e7a:eb72d33dc39b5288 > I0525 12:45:48.474486 17878 Frontend.java:1618] > 564af337ca503984:f1209fc4] Analyzing query: select > group_concat(string_col), length(bigstr) from bigstrs2 > group by bigstr db: test_spilling_large_rows_119f6bb1 > {noformat} > I couldn't reproduce the issue locally. > > > {code:java} > be/src/runtime/buffered-tuple-stream.cc:531 > 530 DCHECK_NE(&*read_iter->read_page_, write_page_); > 531 DCHECK(read_iter->read_page_->attached_to_output_batch); > 532 pages_.pop_front(); > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10714) Intermittent DCHECK(read_iter->read_page_->attached_to_output_batch) in tests
[ https://issues.apache.org/jira/browse/IMPALA-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kurt Deschler updated IMPALA-10714: --- Description: TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK in exhaustive build. The Impala git hash was: e11237e29 IMPALA-10197: Add KUDU_REPLICA_SELECTION query option In impalad.FATAL: {noformat} F0525 12:45:49.307780 13122 buffered-tuple-stream.cc:531] 564af337ca503984:f1209fc4] Check failed: read_iter->read_page_->attached_to_output_batch {noformat} Query 564af337ca503984:f1209fc4 was: {noformat} I0525 12:45:48.474383 17878 impala-server.cc:1324] 564af337ca503984:f1209fc4] Registered query query_id=564af337ca503984:f1209fc4 session_id=9e4875c17adf5e7a:eb72d33dc39b5288 I0525 12:45:48.474486 17878 Frontend.java:1618] 564af337ca503984:f1209fc4] Analyzing query: select group_concat(string_col), length(bigstr) from bigstrs2 group by bigstr db: test_spilling_large_rows_119f6bb1 {noformat} I couldn't reproduce the issue locally. {code:java} be/src/runtime/buffered-tuple-stream.cc:531 530 DCHECK_NE(&*read_iter->read_page_, write_page_); 531 DCHECK(read_iter->read_page_->attached_to_output_batch); 532 pages_.pop_front(); {code} was: TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK in exhaustive build. The Impala git hash was: e11237e29 IMPALA-10197: Add KUDU_REPLICA_SELECTION query option In impalad.FATAL: {noformat} F0525 12:45:49.307780 13122 buffered-tuple-stream.cc:531] 564af337ca503984:f1209fc4] Check failed: read_iter->read_page_->attached_to_output_batch {noformat} Query 564af337ca503984:f1209fc4 was: {noformat} I0525 12:45:48.474383 17878 impala-server.cc:1324] 564af337ca503984:f1209fc4] Registered query query_id=564af337ca503984:f1209fc4 session_id=9e4875c17adf5e7a:eb72d33dc39b5288 I0525 12:45:48.474486 17878 Frontend.java:1618] 564af337ca503984:f1209fc4] Analyzing query: select group_concat(string_col), length(bigstr) from bigstrs2 group by bigstr db: test_spilling_large_rows_119f6bb1 {noformat} I couldn't reproduce the issue locally. > Intermittent DCHECK(read_iter->read_page_->attached_to_output_batch) in tests > - > > Key: IMPALA-10714 > URL: https://issues.apache.org/jira/browse/IMPALA-10714 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: broken-build, flaky > > TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK in > exhaustive build. > The Impala git hash was: e11237e29 IMPALA-10197: Add KUDU_REPLICA_SELECTION > query option > In impalad.FATAL: > {noformat} > F0525 12:45:49.307780 13122 buffered-tuple-stream.cc:531] > 564af337ca503984:f1209fc4] Check failed: > read_iter->read_page_->attached_to_output_batch > {noformat} > Query 564af337ca503984:f1209fc4 was: > {noformat} > I0525 12:45:48.474383 17878 impala-server.cc:1324] > 564af337ca503984:f1209fc4] Registered query > query_id=564af337ca503984:f1209fc4 > session_id=9e4875c17adf5e7a:eb72d33dc39b5288 > I0525 12:45:48.474486 17878 Frontend.java:1618] > 564af337ca503984:f1209fc4] Analyzing query: select > group_concat(string_col), length(bigstr) from bigstrs2 > group by bigstr db: test_spilling_large_rows_119f6bb1 > {noformat} > I couldn't reproduce the issue locally. > > > {code:java} > be/src/runtime/buffered-tuple-stream.cc:531 > 530 DCHECK_NE(&*read_iter->read_page_, write_page_); > 531 DCHECK(read_iter->read_page_->attached_to_output_batch); > 532 pages_.pop_front(); > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10714) Intermittent DCHECK(read_iter->read_page_->attached_to_output_batch) in tests
[ https://issues.apache.org/jira/browse/IMPALA-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kurt Deschler updated IMPALA-10714: --- Summary: Intermittent DCHECK(read_iter->read_page_->attached_to_output_batch) in tests (was: TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK) > Intermittent DCHECK(read_iter->read_page_->attached_to_output_batch) in tests > - > > Key: IMPALA-10714 > URL: https://issues.apache.org/jira/browse/IMPALA-10714 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: broken-build, flaky > > TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK in > exhaustive build. > The Impala git hash was: e11237e29 IMPALA-10197: Add KUDU_REPLICA_SELECTION > query option > In impalad.FATAL: > {noformat} > F0525 12:45:49.307780 13122 buffered-tuple-stream.cc:531] > 564af337ca503984:f1209fc4] Check failed: > read_iter->read_page_->attached_to_output_batch > {noformat} > Query 564af337ca503984:f1209fc4 was: > {noformat} > I0525 12:45:48.474383 17878 impala-server.cc:1324] > 564af337ca503984:f1209fc4] Registered query > query_id=564af337ca503984:f1209fc4 > session_id=9e4875c17adf5e7a:eb72d33dc39b5288 > I0525 12:45:48.474486 17878 Frontend.java:1618] > 564af337ca503984:f1209fc4] Analyzing query: select > group_concat(string_col), length(bigstr) from bigstrs2 > group by bigstr db: test_spilling_large_rows_119f6bb1 > {noformat} > I couldn't reproduce the issue locally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-10714) TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK
[ https://issues.apache.org/jira/browse/IMPALA-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414953#comment-17414953 ] Kurt Deschler edited comment on IMPALA-10714 at 9/14/21, 1:57 PM: -- Observed same assert in this exhaustive test: {code:java} test_inline_view[protocol: hs2 | exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': 'BEFORE_CODEGEN_IN_ASYNC_CODEGEN_THREAD:JITTER@1000|AFTER_STARTING_ASYNC_CODEGEN_IN_FRAGMENT_THREAD:JITTER@1000', 'ASYNC_CODEGEN': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none]{code} was (Author: kdeschle): Observed same assert in this test: {code:java} test_inline_view[protocol: hs2 | exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': 'BEFORE_CODEGEN_IN_ASYNC_CODEGEN_THREAD:JITTER@1000|AFTER_STARTING_ASYNC_CODEGEN_IN_FRAGMENT_THREAD:JITTER@1000', 'ASYNC_CODEGEN': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none]{code} > TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK > -- > > Key: IMPALA-10714 > URL: https://issues.apache.org/jira/browse/IMPALA-10714 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: broken-build, flaky > > TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK in > exhaustive build. > The Impala git hash was: e11237e29 IMPALA-10197: Add KUDU_REPLICA_SELECTION > query option > In impalad.FATAL: > {noformat} > F0525 12:45:49.307780 13122 buffered-tuple-stream.cc:531] > 564af337ca503984:f1209fc4] Check failed: > read_iter->read_page_->attached_to_output_batch > {noformat} > Query 564af337ca503984:f1209fc4 was: > {noformat} > I0525 12:45:48.474383 17878 impala-server.cc:1324] > 564af337ca503984:f1209fc4] Registered query > query_id=564af337ca503984:f1209fc4 > session_id=9e4875c17adf5e7a:eb72d33dc39b5288 > I0525 12:45:48.474486 17878 Frontend.java:1618] > 564af337ca503984:f1209fc4] Analyzing query: select > group_concat(string_col), length(bigstr) from bigstrs2 > group by bigstr db: test_spilling_large_rows_119f6bb1 > {noformat} > I couldn't reproduce the issue locally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10714) TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK
[ https://issues.apache.org/jira/browse/IMPALA-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414953#comment-17414953 ] Kurt Deschler commented on IMPALA-10714: Observed same assert in this test: {code:java} test_inline_view[protocol: hs2 | exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': 'BEFORE_CODEGEN_IN_ASYNC_CODEGEN_THREAD:JITTER@1000|AFTER_STARTING_ASYNC_CODEGEN_IN_FRAGMENT_THREAD:JITTER@1000', 'ASYNC_CODEGEN': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none]{code} > TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK > -- > > Key: IMPALA-10714 > URL: https://issues.apache.org/jira/browse/IMPALA-10714 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Zoltán Borók-Nagy >Priority: Major > Labels: broken-build, flaky > > TestSpillingDebugActionDimensions::test_spilling_large_rows hit DCHECK in > exhaustive build. > The Impala git hash was: e11237e29 IMPALA-10197: Add KUDU_REPLICA_SELECTION > query option > In impalad.FATAL: > {noformat} > F0525 12:45:49.307780 13122 buffered-tuple-stream.cc:531] > 564af337ca503984:f1209fc4] Check failed: > read_iter->read_page_->attached_to_output_batch > {noformat} > Query 564af337ca503984:f1209fc4 was: > {noformat} > I0525 12:45:48.474383 17878 impala-server.cc:1324] > 564af337ca503984:f1209fc4] Registered query > query_id=564af337ca503984:f1209fc4 > session_id=9e4875c17adf5e7a:eb72d33dc39b5288 > I0525 12:45:48.474486 17878 Frontend.java:1618] > 564af337ca503984:f1209fc4] Analyzing query: select > group_concat(string_col), length(bigstr) from bigstrs2 > group by bigstr db: test_spilling_large_rows_119f6bb1 > {noformat} > I couldn't reproduce the issue locally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org