[jira] [Resolved] (IMPALA-7616) Refactor PrincipalPrivilege.buildPrivilegeName
[ https://issues.apache.org/jira/browse/IMPALA-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fredy Wijaya resolved IMPALA-7616. -- Resolution: Fixed Fix Version/s: Impala 3.1.0 > Refactor PrincipalPrivilege.buildPrivilegeName > -- > > Key: IMPALA-7616 > URL: https://issues.apache.org/jira/browse/IMPALA-7616 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 3.1.0 >Reporter: Adam Holley >Assignee: Fredy Wijaya >Priority: Minor > Fix For: Impala 3.1.0 > > > The buildPrivilegeName pattern across the frontend code is odd in that > setting the name is an explicit function and not built during the get from > the constituent parts. e.g. If you create a privilege that doesn't have the > grant option set, and then set the grant option after, the getPrivilegeName() > will return a name that does not have the grant option. This should be > refactored to build the name on the getPrivilegeName call based on the > current values in the Privilege object. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7593) test_automatic_invalidation failing in S3
[ https://issues.apache.org/jira/browse/IMPALA-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianyi Wang updated IMPALA-7593: Fix Version/s: Impala 3.1.0 > test_automatic_invalidation failing in S3 > - > > Key: IMPALA-7593 > URL: https://issues.apache.org/jira/browse/IMPALA-7593 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.1.0 >Reporter: Thomas Tauber-Marshall >Assignee: Tianyi Wang >Priority: Blocker > Labels: broken-build > Fix For: Impala 3.1.0 > > > Note that the build has the fix for IMPALA-7580 > {noformat} > 4:59:01 ___ TestAutomaticCatalogInvalidation.test_v1_catalog > ___ > 04:59:01 custom_cluster/test_automatic_invalidation.py:63: in test_v1_catalog > 04:59:01 self._run_test(cursor) > 04:59:01 custom_cluster/test_automatic_invalidation.py:58: in _run_test > 04:59:01 assert time.time() < timeout > 04:59:01 E assert 1537355634.805718 < 1537355634.394429 > 04:59:01 E+ where 1537355634.805718 = () > 04:59:01 E+where = time.time > 04:59:01 Captured stderr setup > - > 04:59:01 -- 2018-09-19 04:13:22,796 INFO MainThread: Starting cluster > with command: > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/bin/start-impala-cluster.py > --cluster_size=3 --num_coordinators=3 > --log_dir=/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests > --log_level=1 '--impalad_args="--invalidate_tables_timeout_s=20" ' > '--state_store_args="--statestore_update_frequency_ms=50 > --statestore_priority_update_frequency_ms=50 > --statestore_heartbeat_frequency_ms=50" ' > '--catalogd_args="--invalidate_tables_timeout_s=20" ' > 04:59:01 04:13:23 MainThread: Starting State Store logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/statestored.INFO > 04:59:01 04:13:23 MainThread: Starting Catalog Service logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO > 04:59:01 04:13:24 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad.INFO > 04:59:01 04:13:25 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO > 04:59:01 04:13:26 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO > 04:59:01 04:13:29 MainThread: Found 3 impalad/1 statestored/1 catalogd > process(es) > 04:59:01 04:13:29 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000 > 04:59:01 04:13:29 MainThread: Waiting for num_known_live_backends=3. Current > value: 0 > 04:59:01 04:13:30 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000 > 04:59:01 04:13:30 MainThread: Waiting for num_known_live_backends=3. Current > value: 1 > 04:59:01 04:13:31 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000 > 04:59:01 04:13:31 MainThread: Waiting for num_known_live_backends=3. Current > value: 2 > 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000 > 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3 > 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25001 > 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3 > 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25002 > 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3 > 04:59:01 04:13:32 MainThread: Impala Cluster Running with 3 nodes (3 > coordinators, 3 executors). > 04:59:01 -- 2018-09-19 04:13:33,034 INFO MainThread: Found 3 impalad/1 > statestored/1 catalogd process(es) > 04:59:01 -- 2018-09-19 04:13:33,034 INFO MainThread: Getting metric: > statestore.live-backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25010 > 04:59:01 -- 2018-09-19 04:13:33,036 INFO MainThread: Metric > 'statestore.live-backends' has reached desired value: 4 > 04:59:01 -- 2018-09-19 04:13:33,036 INFO MainThread: Getting > num_known_live_backends from >
[jira] [Updated] (IMPALA-7593) test_automatic_invalidation failing in S3
[ https://issues.apache.org/jira/browse/IMPALA-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianyi Wang updated IMPALA-7593: Component/s: Infrastructure > test_automatic_invalidation failing in S3 > - > > Key: IMPALA-7593 > URL: https://issues.apache.org/jira/browse/IMPALA-7593 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.1.0 >Reporter: Thomas Tauber-Marshall >Assignee: Tianyi Wang >Priority: Blocker > Labels: broken-build > Fix For: Impala 3.1.0 > > > Note that the build has the fix for IMPALA-7580 > {noformat} > 4:59:01 ___ TestAutomaticCatalogInvalidation.test_v1_catalog > ___ > 04:59:01 custom_cluster/test_automatic_invalidation.py:63: in test_v1_catalog > 04:59:01 self._run_test(cursor) > 04:59:01 custom_cluster/test_automatic_invalidation.py:58: in _run_test > 04:59:01 assert time.time() < timeout > 04:59:01 E assert 1537355634.805718 < 1537355634.394429 > 04:59:01 E+ where 1537355634.805718 = () > 04:59:01 E+where = time.time > 04:59:01 Captured stderr setup > - > 04:59:01 -- 2018-09-19 04:13:22,796 INFO MainThread: Starting cluster > with command: > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/bin/start-impala-cluster.py > --cluster_size=3 --num_coordinators=3 > --log_dir=/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests > --log_level=1 '--impalad_args="--invalidate_tables_timeout_s=20" ' > '--state_store_args="--statestore_update_frequency_ms=50 > --statestore_priority_update_frequency_ms=50 > --statestore_heartbeat_frequency_ms=50" ' > '--catalogd_args="--invalidate_tables_timeout_s=20" ' > 04:59:01 04:13:23 MainThread: Starting State Store logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/statestored.INFO > 04:59:01 04:13:23 MainThread: Starting Catalog Service logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO > 04:59:01 04:13:24 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad.INFO > 04:59:01 04:13:25 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO > 04:59:01 04:13:26 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO > 04:59:01 04:13:29 MainThread: Found 3 impalad/1 statestored/1 catalogd > process(es) > 04:59:01 04:13:29 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000 > 04:59:01 04:13:29 MainThread: Waiting for num_known_live_backends=3. Current > value: 0 > 04:59:01 04:13:30 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000 > 04:59:01 04:13:30 MainThread: Waiting for num_known_live_backends=3. Current > value: 1 > 04:59:01 04:13:31 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000 > 04:59:01 04:13:31 MainThread: Waiting for num_known_live_backends=3. Current > value: 2 > 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000 > 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3 > 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25001 > 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3 > 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25002 > 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3 > 04:59:01 04:13:32 MainThread: Impala Cluster Running with 3 nodes (3 > coordinators, 3 executors). > 04:59:01 -- 2018-09-19 04:13:33,034 INFO MainThread: Found 3 impalad/1 > statestored/1 catalogd process(es) > 04:59:01 -- 2018-09-19 04:13:33,034 INFO MainThread: Getting metric: > statestore.live-backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25010 > 04:59:01 -- 2018-09-19 04:13:33,036 INFO MainThread: Metric > 'statestore.live-backends' has reached desired value: 4 > 04:59:01 -- 2018-09-19 04:13:33,036 INFO MainThread: Getting > num_known_live_backends from >
[jira] [Resolved] (IMPALA-7593) test_automatic_invalidation failing in S3
[ https://issues.apache.org/jira/browse/IMPALA-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianyi Wang resolved IMPALA-7593. - Resolution: Fixed > test_automatic_invalidation failing in S3 > - > > Key: IMPALA-7593 > URL: https://issues.apache.org/jira/browse/IMPALA-7593 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.1.0 >Reporter: Thomas Tauber-Marshall >Assignee: Tianyi Wang >Priority: Blocker > Labels: broken-build > > Note that the build has the fix for IMPALA-7580 > {noformat} > 4:59:01 ___ TestAutomaticCatalogInvalidation.test_v1_catalog > ___ > 04:59:01 custom_cluster/test_automatic_invalidation.py:63: in test_v1_catalog > 04:59:01 self._run_test(cursor) > 04:59:01 custom_cluster/test_automatic_invalidation.py:58: in _run_test > 04:59:01 assert time.time() < timeout > 04:59:01 E assert 1537355634.805718 < 1537355634.394429 > 04:59:01 E+ where 1537355634.805718 = () > 04:59:01 E+where = time.time > 04:59:01 Captured stderr setup > - > 04:59:01 -- 2018-09-19 04:13:22,796 INFO MainThread: Starting cluster > with command: > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/bin/start-impala-cluster.py > --cluster_size=3 --num_coordinators=3 > --log_dir=/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests > --log_level=1 '--impalad_args="--invalidate_tables_timeout_s=20" ' > '--state_store_args="--statestore_update_frequency_ms=50 > --statestore_priority_update_frequency_ms=50 > --statestore_heartbeat_frequency_ms=50" ' > '--catalogd_args="--invalidate_tables_timeout_s=20" ' > 04:59:01 04:13:23 MainThread: Starting State Store logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/statestored.INFO > 04:59:01 04:13:23 MainThread: Starting Catalog Service logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO > 04:59:01 04:13:24 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad.INFO > 04:59:01 04:13:25 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO > 04:59:01 04:13:26 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO > 04:59:01 04:13:29 MainThread: Found 3 impalad/1 statestored/1 catalogd > process(es) > 04:59:01 04:13:29 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000 > 04:59:01 04:13:29 MainThread: Waiting for num_known_live_backends=3. Current > value: 0 > 04:59:01 04:13:30 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000 > 04:59:01 04:13:30 MainThread: Waiting for num_known_live_backends=3. Current > value: 1 > 04:59:01 04:13:31 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000 > 04:59:01 04:13:31 MainThread: Waiting for num_known_live_backends=3. Current > value: 2 > 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000 > 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3 > 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25001 > 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3 > 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25002 > 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3 > 04:59:01 04:13:32 MainThread: Impala Cluster Running with 3 nodes (3 > coordinators, 3 executors). > 04:59:01 -- 2018-09-19 04:13:33,034 INFO MainThread: Found 3 impalad/1 > statestored/1 catalogd process(es) > 04:59:01 -- 2018-09-19 04:13:33,034 INFO MainThread: Getting metric: > statestore.live-backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25010 > 04:59:01 -- 2018-09-19 04:13:33,036 INFO MainThread: Metric > 'statestore.live-backends' has reached desired value: 4 > 04:59:01 -- 2018-09-19 04:13:33,036 INFO MainThread: Getting > num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000 > 04:59:01 -- 2018-09-19
[jira] [Updated] (IMPALA-7485) test_spilling_naaj hung on jenkins
[ https://issues.apache.org/jira/browse/IMPALA-7485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-7485: -- Priority: Critical (was: Major) > test_spilling_naaj hung on jenkins > -- > > Key: IMPALA-7485 > URL: https://issues.apache.org/jira/browse/IMPALA-7485 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Csaba Ringhofer >Priority: Critical > Labels: broken-build, flaky, flaky-test > Attachments: resolved_stacks.zip > > > {code} > query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_naaj[exec_option: > {'debug_action': None, 'default_spillable_buffer_size': '256k'} | > table_format: parquet/none] > {code} > seemed hung (was running for more than 4 hours), see > https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/3055/console > Core dumps and stack traces of impalad were created and the impalad was > killed. The tests continued without failures after impalad was restarted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7485) test_spilling_naaj hung on jenkins
[ https://issues.apache.org/jira/browse/IMPALA-7485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-7485: -- Labels: broken-build flaky flaky-test (was: flaky-test) > test_spilling_naaj hung on jenkins > -- > > Key: IMPALA-7485 > URL: https://issues.apache.org/jira/browse/IMPALA-7485 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Csaba Ringhofer >Priority: Major > Labels: broken-build, flaky, flaky-test > Attachments: resolved_stacks.zip > > > {code} > query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_naaj[exec_option: > {'debug_action': None, 'default_spillable_buffer_size': '256k'} | > table_format: parquet/none] > {code} > seemed hung (was running for more than 4 hours), see > https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/3055/console > Core dumps and stack traces of impalad were created and the impalad was > killed. The tests continued without failures after impalad was restarted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7009) test_drop_table_with_purge fails on Isilon
[ https://issues.apache.org/jira/browse/IMPALA-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-7009: -- Target Version: Product Backlog Priority: Major (was: Critical) > test_drop_table_with_purge fails on Isilon > -- > > Key: IMPALA-7009 > URL: https://issues.apache.org/jira/browse/IMPALA-7009 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.0, Impala 2.13.0 >Reporter: Sailesh Mukil >Priority: Major > Labels: broken-build, flaky > > We've seen multiple failures of test_drop_table_with_purge > {code:java} > metadata.test_ddl.TestDdlStatements.test_drop_table_with_purge (from pytest) > Failing for the past 1 build (Since Failed#22 ) > Took 18 sec. > add description > Error Message > metadata/test_ddl.py:72: in test_drop_table_with_purge assert not > self.filesystem_client.exists(\ E assert not True E+ where True = > 0x5fe1210>>('user/jenkins/.Trash/Current/test-warehouse/test_drop_table_with_purge_58c75c18.db/t2') > E+where > = > .exists E > + where 0x5fe1210> = 0x5fe1110>.filesystem_client E+and > 'user/jenkins/.Trash/Current/test-warehouse/test_drop_table_with_purge_58c75c18.db/t2' > = ('jenkins', > 'test_drop_table_with_purge_58c75c18') E+ where format of str object at 0x3eba3f8> = > 'user/{0}/.Trash/Current/test-warehouse/{1}.db/t2'.format E+ and > 'jenkins' = () E+where getuser at 0x1c08c80> = getpass.getuser > Stacktrace > metadata/test_ddl.py:72: in test_drop_table_with_purge > assert not self.filesystem_client.exists(\ > E assert not True > E+ where True = 0x5fe1210>>('user/jenkins/.Trash/Current/test-warehouse/test_drop_table_with_purge_58c75c18.db/t2') > E+where > = > .exists > E+ where 0x5fe1210> = 0x5fe1110>.filesystem_client > E+and > 'user/jenkins/.Trash/Current/test-warehouse/test_drop_table_with_purge_58c75c18.db/t2' > = ('jenkins', > 'test_drop_table_with_purge_58c75c18') > E+ where = > 'user/{0}/.Trash/Current/test-warehouse/{1}.db/t2'.format > E+ and 'jenkins' = () > E+where = getpass.getuser > Standard Error > -- connecting to: localhost:21000 > SET sync_ddl=False; > -- executing against localhost:21000 > DROP DATABASE IF EXISTS `test_drop_table_with_purge_58c75c18` CASCADE; > SET sync_ddl=False; > -- executing against localhost:21000 > CREATE DATABASE `test_drop_table_with_purge_58c75c18`; > MainThread: Created database "test_drop_table_with_purge_58c75c18" for test > ID "metadata/test_ddl.py::TestDdlStatements::()::test_drop_table_with_purge" > -- executing against localhost:21000 > create table test_drop_table_with_purge_58c75c18.t1(i int); > -- executing against localhost:21000 > create table test_drop_table_with_purge_58c75c18.t2(i int); > MainThread: Starting new HTTP connection (1): 10.17.95.12 > MainThread: Starting new HTTP connection (1): 10.17.95.12 > MainThread: Starting new HTTP connection (1): 10.17.95.12 > MainThread: Starting new HTTP connection (1): 10.17.95.12 > -- executing against localhost:21000 > drop table test_drop_table_with_purge_58c75c18.t1; > MainThread: Starting new HTTP connection (1): 10.17.95.12 > MainThread: Starting new HTTP connection (1): 10.17.95.12 > MainThread: Starting new HTTP connection (1): 10.17.95.12 > MainThread: Starting new HTTP connection (1): 10.17.95.12 > -- executing against localhost:21000 > drop table test_drop_table_with_purge_58c75c18.t2 purge; > MainThread: Starting new HTTP connection (1): 10.17.95.12 > MainThread: Starting new HTTP connection (1): 10.17.95.12 > MainThread: Starting new HTTP connection (1): 10.17.95.12 > MainThread: Starting new HTTP connection (1): 10.17.95.12 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-7226) test_compute_stats failed with "Unable to open Kudu table"
[ https://issues.apache.org/jira/browse/IMPALA-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-7226. --- Resolution: Cannot Reproduce I don't think this is actionable right now. Reopen if it reoccurs. > test_compute_stats failed with "Unable to open Kudu table" > -- > > Key: IMPALA-7226 > URL: https://issues.apache.org/jira/browse/IMPALA-7226 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.1.0 >Reporter: Thomas Tauber-Marshall >Priority: Major > Labels: broken-build, flaky, kudu > > https://jenkins.impala.io/job/gerrit-verify-dryrun/2757 > {noformat} > 20:09:25 ] === FAILURES > === > 20:09:25 ] TestMtDop.test_compute_stats[mt_dop: 2 | exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: kudu/none] > 20:09:25 ] [gw3] linux2 -- Python 2.7.12 > /home/ubuntu/Impala/bin/../infra/python/env/bin/python > 20:09:25 ] query_test/test_mt_dop.py:76: in test_compute_stats > 20:09:25 ] vector.get_value('exec_option')) > 20:09:25 ] common/impala_test_suite.py:528: in wrapper > 20:09:25 ] return function(*args, **kwargs) > 20:09:25 ] common/impala_test_suite.py:553: in execute_query > 20:09:25 ] return self.__execute_query(self.client, query, query_options) > 20:09:25 ] common/impala_test_suite.py:620: in __execute_query > 20:09:25 ] return impalad_client.execute(query, user=user) > 20:09:25 ] common/impala_connection.py:160: in execute > 20:09:25 ] return self.__beeswax_client.execute(sql_stmt, user=user) > 20:09:25 ] beeswax/impala_beeswax.py:173: in execute > 20:09:25 ] handle = self.__execute_query(query_string.strip(), user=user) > 20:09:25 ] beeswax/impala_beeswax.py:345: in __execute_query > 20:09:25 ] self.wait_for_completion(handle) > 20:09:25 ] beeswax/impala_beeswax.py:365: in wait_for_completion > 20:09:25 ] raise ImpalaBeeswaxException("Query aborted:" + error_log, > None) > 20:09:25 ] E ImpalaBeeswaxException: ImpalaBeeswaxException: > 20:09:25 ] EQuery aborted:Unable to open Kudu table: Network error: > Recv() got EOF from remote (error 108) > 20:09:25 ] Captured stderr setup > - > 20:09:25 ] SET sync_ddl=False; > 20:09:25 ] -- executing against localhost:21000 > 20:09:25 ] DROP DATABASE IF EXISTS `test_compute_stats_fcf53685` CASCADE; > 20:09:25 ] > 20:09:25 ] SET sync_ddl=False; > 20:09:25 ] -- executing against localhost:21000 > 20:09:25 ] CREATE DATABASE `test_compute_stats_fcf53685`; > 20:09:25 ] > 20:09:25 ] MainThread: Created database "test_compute_stats_fcf53685" for > test ID "query_test/test_mt_dop.py::TestMtDop::()::test_compute_stats[mt_dop: > 2 | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > 0} | table_format: kudu/none]" > 20:09:25 ] - Captured stderr call > - > 20:09:25 ] -- executing against localhost:21000 > 20:09:25 ] create external table test_compute_stats_fcf53685.mt_dop stored as > kudu tblproperties('kudu.table_name'='impala::functional_kudu.alltypes'); > 20:09:25 ] > 20:09:25 ] SET mt_dop=2; > 20:09:25 ] SET batch_size=0; > 20:09:25 ] SET num_nodes=0; > 20:09:25 ] SET disable_codegen_rows_threshold=0; > 20:09:25 ] SET disable_codegen=False; > 20:09:25 ] SET abort_on_error=1; > 20:09:25 ] SET exec_single_node_rows_threshold=0; > 20:09:25 ] -- executing against localhost:21000 > 20:09:25 ] compute stats test_compute_stats_fcf53685.mt_dop; > 20:09:25 ] > 20:09:25 ] = 1 failed, 1954 passed, 63 skipped, 44 xfailed, 1 xpassed in > 2294.44 seconds == > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-6910) Multiple tests failing on S3 build: error reading from HDFS file
[ https://issues.apache.org/jira/browse/IMPALA-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-6910. --- Resolution: Fixed Fix Version/s: Impala 3.1.0 We haven't seen this for quite a while, which suggests the hadoop fix did the trick. > Multiple tests failing on S3 build: error reading from HDFS file > > > Key: IMPALA-6910 > URL: https://issues.apache.org/jira/browse/IMPALA-6910 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.0 >Reporter: David Knupp >Assignee: Sailesh Mukil >Priority: Critical > Labels: broken-build, flaky, s3 > Fix For: Impala 3.1.0 > > > Stacktrace > {noformat} > query_test/test_compressed_formats.py:149: in test_seq_writer > self.run_test_case('QueryTest/seq-writer', vector, unique_database) > common/impala_test_suite.py:397: in run_test_case > result = self.__execute_query(target_impalad_client, query, user=user) > common/impala_test_suite.py:612: in __execute_query > return impalad_client.execute(query, user=user) > common/impala_connection.py:160: in execute > return self.__beeswax_client.execute(sql_stmt, user=user) > beeswax/impala_beeswax.py:173: in execute > handle = self.__execute_query(query_string.strip(), user=user) > beeswax/impala_beeswax.py:341: in __execute_query > self.wait_for_completion(handle) > beeswax/impala_beeswax.py:361: in wait_for_completion > raise ImpalaBeeswaxException("Query aborted:" + error_log, None) > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EQuery aborted:Disk I/O error: Error reading from HDFS file: > s3a://impala-cdh5-s3-test/test-warehouse/tpcds.store_sales_parquet/ss_sold_date_sk=2452585/a5482dcb946b6c98-7543e0dd0004_95929617_data.0.parq > E Error(255): Unknown error 255 > E Root cause: SdkClientException: Data read has a different length than the > expected: dataLength=8576; expectedLength=17785; includeSkipped=true; > in.getClass()=class com.amazonaws.services.s3.AmazonS3Client$2; > markedSupported=false; marked=0; resetSinceLastMarked=false; markCount=0; > resetCount=0 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-6910) Multiple tests failing on S3 build: error reading from HDFS file
[ https://issues.apache.org/jira/browse/IMPALA-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629624#comment-16629624 ] Tim Armstrong commented on IMPALA-6910: --- We believe this was an hadoop AWS connector bug > Multiple tests failing on S3 build: error reading from HDFS file > > > Key: IMPALA-6910 > URL: https://issues.apache.org/jira/browse/IMPALA-6910 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.0 >Reporter: David Knupp >Assignee: Sailesh Mukil >Priority: Critical > Labels: broken-build, flaky, s3 > > Stacktrace > {noformat} > query_test/test_compressed_formats.py:149: in test_seq_writer > self.run_test_case('QueryTest/seq-writer', vector, unique_database) > common/impala_test_suite.py:397: in run_test_case > result = self.__execute_query(target_impalad_client, query, user=user) > common/impala_test_suite.py:612: in __execute_query > return impalad_client.execute(query, user=user) > common/impala_connection.py:160: in execute > return self.__beeswax_client.execute(sql_stmt, user=user) > beeswax/impala_beeswax.py:173: in execute > handle = self.__execute_query(query_string.strip(), user=user) > beeswax/impala_beeswax.py:341: in __execute_query > self.wait_for_completion(handle) > beeswax/impala_beeswax.py:361: in wait_for_completion > raise ImpalaBeeswaxException("Query aborted:" + error_log, None) > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EQuery aborted:Disk I/O error: Error reading from HDFS file: > s3a://impala-cdh5-s3-test/test-warehouse/tpcds.store_sales_parquet/ss_sold_date_sk=2452585/a5482dcb946b6c98-7543e0dd0004_95929617_data.0.parq > E Error(255): Unknown error 255 > E Root cause: SdkClientException: Data read has a different length than the > expected: dataLength=8576; expectedLength=17785; includeSkipped=true; > in.getClass()=class com.amazonaws.services.s3.AmazonS3Client$2; > markedSupported=false; marked=0; resetSinceLastMarked=false; markCount=0; > resetCount=0 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache
[ https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629621#comment-16629621 ] Paul Rogers commented on IMPALA-7501: - So the above was probably looking in the wrong haystack. Todd's comment is the key: {{LocalCatalog}}. The local catalog caches the HMS Thrift objects, including {{Partition}}. The chain is: * {{LocalDb}} contains a map of {LocalTable}}. * {{LocalTable}} has a subclass {{LocalFsTable}} which contains a map of {{LocalPartitionSpec}} objects. * {{LocalPartitionSpec}} has a relation (need to research) to {{LocalFsPartition}}. * {{LocalFsPartition}} holds onto the Hive {{Partition}}, which holds onto the {{FieldSchema}} objects. Short term, just need to track down how we cache the {{Partition}} and nuke the {{FieldSchema}}, then retest. Longer term, the note earlier does apply. While the query-specific metadata goes to pains to avoid caching HMS objects, LocalCatalog (and presumably the similar version in the {{catalogd}} do cache HMS objects which, as noted earlier, are rather bloated for our needs. > Slim down metastore Partition objects in LocalCatalog cache > --- > > Key: IMPALA-7501 > URL: https://issues.apache.org/jira/browse/IMPALA-7501 > Project: IMPALA > Issue Type: Sub-task >Reporter: Todd Lipcon >Priority: Minor > > I took a heap dump of an impalad running in LocalCatalog mode with a 2G limit > after running a production workload simulation for a couple hours. It had > 38.5M objects and 2.02GB heap (the vast majority of the heap is, as expected, > in the LocalCatalog cache). Of this total footprint, 1.78GB and 34.6M objects > are retained by 'Partition' objects. Drilling into those, 1.29GB and 33.6M > objects are retained by FieldSchema, which, as far as I remember, are ignored > on the partition level by the Impala planner. So, with a bit of slimming down > of these objects, we could make a huge dent in effective cache capacity given > a fixed budget. Reducing object count should also have the effect of improved > GC performance (old gen GC is more closely tied to object count than size) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-6591) TestClientSsl hung for a long time
[ https://issues.apache.org/jira/browse/IMPALA-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-6591. --- Resolution: Cannot Reproduce > TestClientSsl hung for a long time > -- > > Key: IMPALA-6591 > URL: https://issues.apache.org/jira/browse/IMPALA-6591 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 2.12.0 >Reporter: Tim Armstrong >Assignee: Sailesh Mukil >Priority: Critical > Labels: broken-build, hang > > {noformat} > 18:49:13 > custom_cluster/test_catalog_wait.py::TestCatalogWait::test_delayed_catalog > PASSED > 18:49:53 > custom_cluster/test_client_ssl.py::TestClientSsl::test_ssl[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: text/none] Build timed > out (after 1,440 minutes). Marking the build as failed. > 12:20:15 Build was aborted > 12:20:15 Archiving artifacts > {noformat} > I unfortunately wasn't able to get any logs... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7404) query_test.test_delimited_text.TestDelimitedText.test_delimited_text_newlines fails to return any rows
[ https://issues.apache.org/jira/browse/IMPALA-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-7404: -- Target Version: Impala 3.1.0 > query_test.test_delimited_text.TestDelimitedText.test_delimited_text_newlines > fails to return any rows > -- > > Key: IMPALA-7404 > URL: https://issues.apache.org/jira/browse/IMPALA-7404 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.1.0 >Reporter: Vuk Ercegovac >Assignee: Csaba Ringhofer >Priority: Blocker > Labels: broken-build, flaky-test, s3 > > {noformat} > query_test/test_delimited_text.py:65: in test_delimited_text_newlines > assert len(result.data) == 2 > E assert 0 == 2 > E+ where 0 = len([]) > E+where [] = at 0x63977d0>.data{noformat} > Expected results from this query after first inserting: > {noformat} > insert into test_delimited_text_newlines_ff243aaa.nl_queries values > ("the\n","\nquick\nbrown","fox\n"), ("\njumped","over the lazy\n","\ndog"); > select * from test_delimited_text_newlines_ff243aaa.nl_queries; > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-7512) test_resolution_by_name failed: did not encounter expected error
[ https://issues.apache.org/jira/browse/IMPALA-7512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-7512. --- Resolution: Duplicate Likely has the same cause as IMPALA-7335 > test_resolution_by_name failed: did not encounter expected error > > > Key: IMPALA-7512 > URL: https://issues.apache.org/jira/browse/IMPALA-7512 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.1.0 >Reporter: Bikramjeet Vig >Priority: Critical > Labels: broken-build > > Seems like the error reached the coordinator and it transitioned to the Error > state but it was not delivered to the client > {noformat} > I0829 23:38:31.626911 16215 impala-server.cc:1040] Registered query > query_id=54b6c955298cd00:623fd4b5 > session_id=f14fb6bdc8ea6686:ad8d4097469f7c8f > I0829 23:38:31.627315 16215 Frontend.java:1029] Analyzing query: select key, > value from switched_map_fields_resolution_test.int_map > I0829 23:38:31.627898 16215 Frontend.java:1041] Analysis finished. > I0829 23:38:31.631665 18839 admission-controller.cc:552] Schedule for > id=54b6c955298cd00:623fd4b5 in pool_name=default-pool > cluster_mem_needed=64.00 MB PoolConfig: max_requests=-1 max_queued=200 > max_mem=-1.00 B > I0829 23:38:31.631866 18839 admission-controller.cc:557] Stats: > agg_num_running=5, agg_num_queued=0, agg_mem_reserved=1.51 GB, > local_host(local_mem_admitted=2.17 GB, num_admitted_running=5, num_queued=0, > backend_mem_reserved=514.22 MB) > I0829 23:38:31.632130 18839 admission-controller.cc:589] Admitted query > id=54b6c955298cd00:623fd4b5 > I0829 23:38:31.632308 18839 coordinator.cc:91] Exec() > query_id=54b6c955298cd00:623fd4b5 stmt=select key, value from > switched_map_fields_resolution_test.int_map > I0829 23:38:31.633636 18744 query-state.cc:491] Instance completed. > instance_id=5049c19ed24bed19:85ffb5e0001 #in-flight=7 status=OK > I0829 23:38:31.634953 18839 coordinator.cc:330] starting execution on 2 > backends for query_id=54b6c955298cd00:623fd4b5 > I0829 23:38:31.635996 29061 impala-internal-service.cc:49] > ExecQueryFInstances(): query_id=54b6c955298cd00:623fd4b5 > coord=impala-ec2-centos74-m5-4xlarge-ondemand-0bff.vpc.cloudera.com:22000 > #instances=1 > I0829 23:38:31.638362 18843 query-state.cc:483] Executing instance. > instance_id=54b6c955298cd00:623fd4b5 fragment_idx=0 > per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=8 > I0829 23:38:31.638617 18839 coordinator.cc:344] started execution on 2 > backends for query_id=54b6c955298cd00:623fd4b5 > I0829 23:38:31.656479 4384 coordinator.cc:685] Backend completed: > host=impala-ec2-centos74-m5-4xlarge-ondemand-0bff.vpc.cloudera.com:22001 > remaining=2 query_id=54b6c955298cd00:623fd4b5 > I0829 23:38:31.656626 4384 coordinator-backend-state.cc:254] > query_id=54b6c955298cd00:623fd4b5: first in-progress backend: > impala-ec2-centos74-m5-4xlarge-ondemand-0bff.vpc.cloudera.com:22000 > I0829 23:38:31.656808 4384 coordinator.cc:498] ExecState: query > id=54b6c955298cd00:623fd4b5 > finstance=54b6c955298cd00:623fd4b50001 on > host=impala-ec2-centos74-m5-4xlarge-ondemand-0bff.vpc.cloudera.com:22001 > (EXECUTING -> ERROR) status=File > 'hdfs://localhost:20500/test-warehouse/test_resolution_by_name_63ec1576.db/switched_map_fields_resolution_test/switched_map.parq' > has an incompatible Parquet schema for column > 'test_resolution_by_name_63ec1576.switched_map_fields_resolution_test.int_map.key'. > Column type: STRING, Parquet schema: > required int32 value [i:0 d:1 r:1] > I0829 23:38:31.657034 4384 coordinator-backend-state.cc:377] sending > CancelQueryFInstances rpc for query_id=54b6c955298cd00:623fd4b5 > backend=impala-ec2-centos74-m5-4xlarge-ondemand-0bff.vpc.cloudera.com:22000 > I0829 23:38:31.657304 5860 impala-internal-service.cc:71] > CancelQueryFInstances(): query_id=54b6c955298cd00:623fd4b5 > I0829 23:38:31.657408 5860 query-exec-mgr.cc:95] QueryState: > query_id=54b6c955298cd00:623fd4b5 refcnt=4 > I0829 23:38:31.657490 5860 query-state.cc:504] Cancel: > query_id=54b6c955298cd00:623fd4b5 > I0829 23:38:31.657575 5860 krpc-data-stream-mgr.cc:325] cancelling all > streams for fragment_instance_id=54b6c955298cd00:623fd4b5 > I0829 23:38:31.657836 4384 coordinator.cc:658] CancelBackends() > query_id=54b6c955298cd00:623fd4b5, tried to cancel 1 backends > I0829 23:38:31.657940 4384 coordinator.cc:792] Release admission control > resources for query_id=54b6c955298cd00:623fd4b5 > I0829 23:38:31.662129 18843 query-state.cc:334] Cancelling fragment instances > as directed by the coordinator. Returned status:
[jira] [Updated] (IMPALA-7523) Planner Test failing with "Failed to assign regions to servers after 60000 millis."
[ https://issues.apache.org/jira/browse/IMPALA-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-7523: -- Labels: broken-build flaky (was: broken-build flake) > Planner Test failing with "Failed to assign regions to servers after 6 > millis." > --- > > Key: IMPALA-7523 > URL: https://issues.apache.org/jira/browse/IMPALA-7523 > Project: IMPALA > Issue Type: Task > Components: Frontend >Reporter: Philip Zeyliger >Priority: Critical > Labels: broken-build, flaky > > I've seen > {{org.apache.impala.planner.PlannerTest.org.apache.impala.planner.PlannerTest}} > fail with the following trace: > {code} > java.lang.IllegalStateException: Failed to assign regions to servers after > 6 millis. > at > org.apache.impala.datagenerator.HBaseTestDataRegionAssignment.performAssignment(HBaseTestDataRegionAssignment.java:153) > at > org.apache.impala.planner.PlannerTestBase.setUp(PlannerTestBase.java:120) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) > {code} > I think we've seen it before as indicated in IMPALA-7061. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7523) Planner Test failing with "Failed to assign regions to servers after 60000 millis."
[ https://issues.apache.org/jira/browse/IMPALA-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-7523: -- Labels: broken-build flake (was: broken-build) > Planner Test failing with "Failed to assign regions to servers after 6 > millis." > --- > > Key: IMPALA-7523 > URL: https://issues.apache.org/jira/browse/IMPALA-7523 > Project: IMPALA > Issue Type: Task > Components: Frontend >Reporter: Philip Zeyliger >Priority: Critical > Labels: broken-build, flaky > > I've seen > {{org.apache.impala.planner.PlannerTest.org.apache.impala.planner.PlannerTest}} > fail with the following trace: > {code} > java.lang.IllegalStateException: Failed to assign regions to servers after > 6 millis. > at > org.apache.impala.datagenerator.HBaseTestDataRegionAssignment.performAssignment(HBaseTestDataRegionAssignment.java:153) > at > org.apache.impala.planner.PlannerTestBase.setUp(PlannerTestBase.java:120) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) > {code} > I think we've seen it before as indicated in IMPALA-7061. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-7494) Hang in TestTpcdsDecimalV2Query::test_tpcds_q69
[ https://issues.apache.org/jira/browse/IMPALA-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-7494. --- Resolution: Fixed Fix Version/s: Impala 3.1.0 Suspect this is IMPALA-7488 > Hang in TestTpcdsDecimalV2Query::test_tpcds_q69 > --- > > Key: IMPALA-7494 > URL: https://issues.apache.org/jira/browse/IMPALA-7494 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 3.1.0 >Reporter: Bikramjeet Vig >Priority: Critical > Labels: broken-build > Fix For: Impala 3.1.0 > > > A hang in this test cause the build to timeout after 1440 minutes > {noformat} > 10:47:51 [gw3] PASSED > query_test/test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q65[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] > 10:48:31 > query_test/test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q67a[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] > 10:48:31 [gw3] PASSED > query_test/test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q67a[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] > 10:48:32 > query_test/test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q68[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] > 10:48:32 [gw3] PASSED > query_test/test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q68[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] > 10:48:34 > query_test/test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q69[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] > 07:11:24 [gw3] PASSED > query_test/test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q69[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] Build > timed out (after 1,440 minutes). Marking the build as aborted. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-7493) hang in test_spilling_query_options
[ https://issues.apache.org/jira/browse/IMPALA-7493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-7493. --- Resolution: Cannot Reproduce Suspect this is IMPALA-7488 > hang in test_spilling_query_options > --- > > Key: IMPALA-7493 > URL: https://issues.apache.org/jira/browse/IMPALA-7493 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 3.1.0 >Reporter: Bikramjeet Vig >Priority: Critical > Labels: broken-build > > A hang in this test cause the build to timeout after 1440 minutes > {noformat} > 14:50:23 [gw6] PASSED > query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_naaj[exec_option: > {'debug_action': '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', > 'default_spillable_buffer_size': '256k'} | table_format: parquet/none] > 14:50:23 > query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive[exec_option: > {'debug_action': None, 'default_spillable_buffer_size': '256k'} | > table_format: parquet/none] > 14:50:23 [gw6] SKIPPED > query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive[exec_option: > {'debug_action': None, 'default_spillable_buffer_size': '256k'} | > table_format: parquet/none] > 14:50:23 > query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive[exec_option: > {'debug_action': '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', > 'default_spillable_buffer_size': '256k'} | table_format: parquet/none] > 14:50:23 [gw6] SKIPPED > query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive[exec_option: > {'debug_action': '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', > 'default_spillable_buffer_size': '256k'} | table_format: parquet/none] > 14:51:41 > query_test/test_spilling.py::TestSpillingNoDebugActionDimensions::test_spilling_naaj_no_deny_reservation[exec_option: > {'default_spillable_buffer_size': '256k'} | table_format: parquet/none] > 14:51:41 [gw6] PASSED > query_test/test_spilling.py::TestSpillingNoDebugActionDimensions::test_spilling_naaj_no_deny_reservation[exec_option: > {'default_spillable_buffer_size': '256k'} | table_format: parquet/none] > 14:51:48 > query_test/test_spilling.py::TestSpillingNoDebugActionDimensions::test_spilling_query_options[exec_option: > {'default_spillable_buffer_size': '256k'} | table_format: parquet/none] > 12:34:40 [gw6] PASSED > query_test/test_spilling.py::TestSpillingNoDebugActionDimensions::test_spilling_query_options[exec_option: > {'default_spillable_buffer_size': '256k'} | table_format: parquet/none] > Build timed out (after 1,440 minutes). Marking the build as aborted. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7593) test_automatic_invalidation failing in S3
[ https://issues.apache.org/jira/browse/IMPALA-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629614#comment-16629614 ] Tim Armstrong commented on IMPALA-7593: --- Can this be closed? > test_automatic_invalidation failing in S3 > - > > Key: IMPALA-7593 > URL: https://issues.apache.org/jira/browse/IMPALA-7593 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 3.1.0 >Reporter: Thomas Tauber-Marshall >Assignee: Tianyi Wang >Priority: Blocker > Labels: broken-build > > Note that the build has the fix for IMPALA-7580 > {noformat} > 4:59:01 ___ TestAutomaticCatalogInvalidation.test_v1_catalog > ___ > 04:59:01 custom_cluster/test_automatic_invalidation.py:63: in test_v1_catalog > 04:59:01 self._run_test(cursor) > 04:59:01 custom_cluster/test_automatic_invalidation.py:58: in _run_test > 04:59:01 assert time.time() < timeout > 04:59:01 E assert 1537355634.805718 < 1537355634.394429 > 04:59:01 E+ where 1537355634.805718 = () > 04:59:01 E+where = time.time > 04:59:01 Captured stderr setup > - > 04:59:01 -- 2018-09-19 04:13:22,796 INFO MainThread: Starting cluster > with command: > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/bin/start-impala-cluster.py > --cluster_size=3 --num_coordinators=3 > --log_dir=/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests > --log_level=1 '--impalad_args="--invalidate_tables_timeout_s=20" ' > '--state_store_args="--statestore_update_frequency_ms=50 > --statestore_priority_update_frequency_ms=50 > --statestore_heartbeat_frequency_ms=50" ' > '--catalogd_args="--invalidate_tables_timeout_s=20" ' > 04:59:01 04:13:23 MainThread: Starting State Store logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/statestored.INFO > 04:59:01 04:13:23 MainThread: Starting Catalog Service logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO > 04:59:01 04:13:24 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad.INFO > 04:59:01 04:13:25 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO > 04:59:01 04:13:26 MainThread: Starting Impala Daemon logging to > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO > 04:59:01 04:13:29 MainThread: Found 3 impalad/1 statestored/1 catalogd > process(es) > 04:59:01 04:13:29 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000 > 04:59:01 04:13:29 MainThread: Waiting for num_known_live_backends=3. Current > value: 0 > 04:59:01 04:13:30 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000 > 04:59:01 04:13:30 MainThread: Waiting for num_known_live_backends=3. Current > value: 1 > 04:59:01 04:13:31 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000 > 04:59:01 04:13:31 MainThread: Waiting for num_known_live_backends=3. Current > value: 2 > 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000 > 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3 > 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25001 > 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3 > 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25002 > 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3 > 04:59:01 04:13:32 MainThread: Impala Cluster Running with 3 nodes (3 > coordinators, 3 executors). > 04:59:01 -- 2018-09-19 04:13:33,034 INFO MainThread: Found 3 impalad/1 > statestored/1 catalogd process(es) > 04:59:01 -- 2018-09-19 04:13:33,034 INFO MainThread: Getting metric: > statestore.live-backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25010 > 04:59:01 -- 2018-09-19 04:13:33,036 INFO MainThread: Metric > 'statestore.live-backends' has reached desired value: 4 > 04:59:01 -- 2018-09-19 04:13:33,036 INFO MainThread: Getting > num_known_live_backends from > impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000 > 04:59:01 -- 2018-09-19
[jira] [Created] (IMPALA-7637) Include more hash table stats in profile
Tim Armstrong created IMPALA-7637: - Summary: Include more hash table stats in profile Key: IMPALA-7637 URL: https://issues.apache.org/jira/browse/IMPALA-7637 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Tim Armstrong Our hash table collects some useful stats about collisions and travel length, but then we don't do anything to expose them: https://github.com/apache/impala/blob/540611e863fe99b3d3ae35f8b94a745a68b9eba2/be/src/exec/hash-table.h#L989 We should add some of them to the profile, maybe: * the number of probes * the average travel length per probe * the number of hash collisions * (optional) the number of hash table resizes. We already have the hash table size and the resize time, which I think is sufficient to debug most problems with resizes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache
[ https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629554#comment-16629554 ] Paul Rogers edited comment on IMPALA-7501 at 9/27/18 12:09 AM: --- Analysis: * Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects. * Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} objects. * Impala's {{LocalTable}} contains a Hive {{Table}} object. * The {{Table}} object is defined in [Hive's Thrift schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift] API. It does not contain a list of partitions. Things are a bit confusing because: * Hive defines a different [{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java] class, which contains a {{TableSpec}}. * Hive's [{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java] contains a list of {{Partition}} objects. * Hive's [{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java] is generated from Thrift. Contains a {{StorageDescriptor}}. * Hive's [{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java] contains the list of {{FieldSchema}} objects which Todd saw in the heap dump. The above says that, yes, Hive {{Partition}} objects do hold a list of {{FieldSchema}}, but not via the simplest path, via the Hive API {{Table}} object. Perhaps we cache {{Partition}} objects in the table schema: Impala loads tables in the background by calling {{HdfsTable.load()}}: * The {{LocalTable}} wraps a number of subclass, of which the one of interest is {{HdfsTable}}. * {{load()}} calls {{loadAllPartitions()}} to do the partition work. * {{loadAllPartitions}} calls {{MetaStoreUtil.fetchAllPartitions()}} to get the partitions as a list of Hive {{Partition} objects. * {{loadAllParitions}} wraps each in a {{HdfsPartition}}, and calls {{addPartition}} to put the partition into a couple of maps. * But, {{HdfsPartition}} goes to extremes to copy data out of Hive’s {{Partition}} object without holding onto Hive’s object. So, we did take steps to avoid holding onto Hive’s {{Partition}} objects. Still, there are references, so the question is: where? That is, the original description was concerned with the {{FieldSchema}} references in {{Partition}}. But, the above analysis of the code suggests that even the {{Partition}} objects themselves should not exist: we should have copied their info into {{HDFSPartition}} objects and discarded them. Maybe this test run found issues for storage engines other than HDFS? was (Author: paul.rogers): Analysis: * Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects. * Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} objects. * Impala's {{LocalTable}} contains a Hive {{Table}} object. * The {{Table}} object is defined in [Hive's Thrift schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift] API. It does not contain a list of partitions. Things are a bit confusing because: * Hive defines a different [{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java] class, which contains a {{TableSpec}}. * Hive's [{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java] contains a list of {{Partition}} objects. * Hive's [{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java] is generated from Thrift. Contains a {{StorageDescriptor}}. * Hive's [{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java] contains the list of {{FieldSchema}} objects which Todd saw in the heap dump. The above says that, yes, Hive {{Partition}} objects do hold a list of {{FieldSchema}}, but not via the simplest path, via the Hive API {{Table}} object. Perhaps we cache {{Partition}} objects in the table schema: Impala loads tables in the background by calling {{HdfsTable.load()}}: * The {{LocalTable}} wraps a number of subclass, of which the one of interest is {{HdfsTable}}. * {{load()}} calls {{loadAllPartitions()}} to do the partition work. * {{loadAllPartitions}} calls
[jira] [Commented] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache
[ https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629554#comment-16629554 ] Paul Rogers commented on IMPALA-7501: - Analysis: * Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects. * Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} objects. * Impala's {{LocalTable}} contains a Hive {{Table}} object. * The {{Table}} object is defined in [Hive's Thrift schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift] API. It does not contain a list of partitions. Things are a bit confusing because: * Hive defines a different [{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java] class, which contains a {{TableSpec}}. * Hive's [{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java] contains a list of {{Partition}} objects. * Hive's [{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java] is generated from Thrift. Contains a {{StorageDescriptor}}. * Hive's [{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java] contains the list of {{FieldSchema}} objects which Todd saw in the heap dump. The above says that, yes, Hive {{Partition}} objects do hold a list of {{FieldSchema}}, but not via the simplest path, via the Hive API {{Table}} object. Perhaps we cache {{Partition}} objects in the table schema: Impala loads tables in the background by calling {{HdfsTable.load()}}: * The {{LocalTable}} wraps a number of subclass, of which the one of interest is {{HdfsTable}}. * {{load()}} calls {{loadAllPartitions()}} to do the partition work. * {{loadAllPartitions}} calls {{MetaStoreUtil.fetchAllPartitions()}} to get the partitions as a list of Hive {{Partition} objects. * {{loadAllParitions}} wraps each in a {{HdfsPartition}}, and calls {{addPartition}} to put the partition into a couple of maps. * But, {{HdfsPartition}} goes to extremes to copy data out of Hive’s {{Partition}} object without holding onto Hive’s object. So, we did take steps to avoid holding onto Hive’s {{Partition}} objects. Still, there are references, so the question is: where? > Slim down metastore Partition objects in LocalCatalog cache > --- > > Key: IMPALA-7501 > URL: https://issues.apache.org/jira/browse/IMPALA-7501 > Project: IMPALA > Issue Type: Sub-task >Reporter: Todd Lipcon >Priority: Minor > > I took a heap dump of an impalad running in LocalCatalog mode with a 2G limit > after running a production workload simulation for a couple hours. It had > 38.5M objects and 2.02GB heap (the vast majority of the heap is, as expected, > in the LocalCatalog cache). Of this total footprint, 1.78GB and 34.6M objects > are retained by 'Partition' objects. Drilling into those, 1.29GB and 33.6M > objects are retained by FieldSchema, which, as far as I remember, are ignored > on the partition level by the Impala planner. So, with a bit of slimming down > of these objects, we could make a huge dent in effective cache capacity given > a fixed budget. Reducing object count should also have the effect of improved > GC performance (old gen GC is more closely tied to object count than size) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache
[ https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629410#comment-16629410 ] Paul Rogers edited comment on IMPALA-7501 at 9/26/18 11:35 PM: --- A quick scan of the Hive code suggests that Hive's Thrift objects carry more info that is required in the Impala cache. Creating Impala-specific, high-performance versions would likely save space. (No need for parent pointers, no need for the two-level Hive API structure, etc.) So, this gives us two options: * Reach inside Hive's Thrift objects to null out fields which we don't need, or * Design an Impala-specific, compact representation for the data that omits all but essential objects and fields. The second choice provides a huge opportunity for memory optimization. The first is a crude-but-effective short-term solution. But, see a later note, the story is not as simple as a first analysis suggests. was (Author: paul.rogers): Analysis: * Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects. * Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} objects. * Impala's {{LocalTable}} contains a Hive {{Table}} object. * The {{Table}} object is defined in [Hive's Thrift schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift] API. It does not contain a list of partitions. * The {{LocalTable}} wraps a number of subclass, of which the one of interest is {{HdfsTable}}. Impala loads tables in the background by calling {{HdfsTable.load()}}: * {{load()}} calls {{loadAllPartitions()}} to do the partition work. * {{loadAllPartitions}} calls {{MetaStoreUtil.fetchAllPartitions()}} to get the partitions as a list of Hive {{Partition} objects. * {{loadAllParitions}} wraps each in a {{HdfsPartition}}, and calls {{addPartition}} to put the partition into a couple of maps. * Hive's [{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java] is generated from Thrift. Contains a {{StorageDescriptor}}. * Hive's [{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java] contains the list of {{FieldSchema}} objects which Todd saw in the heap dump. Things are a bit confusing because: * Hive defines a different [{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java] class, which contains a {{TableSpec}}. * Hive's [{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java] contains a list of {{Partition}} objects. A quick scan of the Hive code suggests that Hive's Thrift objects carry more info that is required in the Impala cache. Creating Impala-specific, high-performance versions would likely save space. (No need for parent pointers, no need for the two-level Hive API structure, etc.) So, this gives us two options: * Reach inside Hive's Thrift objects to null out fields which we don't need, or * Design an Impala-specific, compact representation for the data that omits all but essential objects and fields. The second choice provides a huge opportunity for memory optimization. The first is a crude-but-effective short-term solution. > Slim down metastore Partition objects in LocalCatalog cache > --- > > Key: IMPALA-7501 > URL: https://issues.apache.org/jira/browse/IMPALA-7501 > Project: IMPALA > Issue Type: Sub-task >Reporter: Todd Lipcon >Priority: Minor > > I took a heap dump of an impalad running in LocalCatalog mode with a 2G limit > after running a production workload simulation for a couple hours. It had > 38.5M objects and 2.02GB heap (the vast majority of the heap is, as expected, > in the LocalCatalog cache). Of this total footprint, 1.78GB and 34.6M objects > are retained by 'Partition' objects. Drilling into those, 1.29GB and 33.6M > objects are retained by FieldSchema, which, as far as I remember, are ignored > on the partition level by the Impala planner. So, with a bit of slimming down > of these objects, we could make a huge dent in effective cache capacity given > a fixed budget. Reducing object count should also have the effect of improved > GC performance (old gen GC is more closely tied to object count than size) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail:
[jira] [Comment Edited] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache
[ https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629410#comment-16629410 ] Paul Rogers edited comment on IMPALA-7501 at 9/26/18 11:21 PM: --- Analysis: * Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects. * Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} objects. * Impala's {{LocalTable}} contains a Hive {{Table}} object. * The {{Table}} object is defined in [Hive's Thrift schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift] API. It does not contain a list of partitions. * The {{LocalTable}} wraps a number of subclass, of which the one of interest is {{HdfsTable}}. Impala loads tables in the background by calling {{HdfsTable.load()}}: * {{load()}} calls {{loadAllPartitions()}} to do the partition work. * {{loadAllPartitions}} calls {{MetaStoreUtil.fetchAllPartitions()}} to get the partitions as a list of Hive {{Partition} objects. * {{loadAllParitions}} wraps each in a {{HdfsPartition}}, and calls {{addPartition}} to put the partition into a couple of maps. * Hive's [{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java] is generated from Thrift. Contains a {{StorageDescriptor}}. * Hive's [{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java] contains the list of {{FieldSchema}} objects which Todd saw in the heap dump. Things are a bit confusing because: * Hive defines a different [{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java] class, which contains a {{TableSpec}}. * Hive's [{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java] contains a list of {{Partition}} objects. A quick scan of the Hive code suggests that Hive's Thrift objects carry more info that is required in the Impala cache. Creating Impala-specific, high-performance versions would likely save space. (No need for parent pointers, no need for the two-level Hive API structure, etc.) So, this gives us two options: * Reach inside Hive's Thrift objects to null out fields which we don't need, or * Design an Impala-specific, compact representation for the data that omits all but essential objects and fields. The second choice provides a huge opportunity for memory optimization. The first is a crude-but-effective short-term solution. was (Author: paul.rogers): Analysis: * Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects. * Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} objects. * Impala's {{LocalTable}} contains a Hive {{Table}} object. * The {{Table}} object is defined in [Hive's Thrift schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift] API. It does not contain a list of partitions. * The {{LocalTable}} wraps a number of subclass, of which the one of interest is {{HdfsTable}}. Impala loads tables in the background by calling {{HdfsTable.load()}}: * {{load()}} calls {{loadAllPartitions()}} to do the partition work. * {{loadAllPartitions}} calls {{MetaStoreUtil.fetchAllPartitions()}} to get the partitions as a list of Hive {{Partition} objects. * {{loadAllParitions}} wraps each in a {{HdfsPartition}}, and calls {{addPartition}} to put the partition into a couple of maps. * Hive's [{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java] is generated from Thrift. Contains a {{StorageDescriptor}}. * Hive's [{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java] contains the list of {{FieldSchema}} objects which Todd saw in the heap dump. Things are a bit confusing because: * Hive defines a different [{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java] class, which contains a {{TableSpec}}. * Hive's [{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java] contains a list of {{Partition}} objects. * A quick scan of the Hive code suggests that Hive's Thrift objects carry more info that is required in the Impala cache. Creating Impala-specific, high-performance versions would likely save space.
[jira] [Comment Edited] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache
[ https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629410#comment-16629410 ] Paul Rogers edited comment on IMPALA-7501 at 9/26/18 11:21 PM: --- Analysis: * Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects. * Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} objects. * Impala's {{LocalTable}} contains a Hive {{Table}} object. * The {{Table}} object is defined in [Hive's Thrift schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift] API. It does not contain a list of partitions. * The {{LocalTable}} wraps a number of subclass, of which the one of interest is {{HdfsTable}}. Impala loads tables in the background by calling {{HdfsTable.load()}}: * {{load()}} calls {{loadAllPartitions()}} to do the partition work. * {{loadAllPartitions}} calls {{MetaStoreUtil.fetchAllPartitions()}} to get the partitions as a list of Hive {{Partition} objects. * {{loadAllParitions}} wraps each in a {{HdfsPartition}}, and calls {{addPartition}} to put the partition into a couple of maps. * Hive's [{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java] is generated from Thrift. Contains a {{StorageDescriptor}}. * Hive's [{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java] contains the list of {{FieldSchema}} objects which Todd saw in the heap dump. Things are a bit confusing because: * Hive defines a different [{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java] class, which contains a {{TableSpec}}. * Hive's [{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java] contains a list of {{Partition}} objects. * A quick scan of the Hive code suggests that Hive's Thrift objects carry more info that is required in the Impala cache. Creating Impala-specific, high-performance versions would likely save space. (No need for parent pointers, no need for the two-level Hive API structure, etc.) So, this gives us two options: * Reach inside Hive's Thrift objects to null out fields which we don't need, or * Design an Impala-specific, compact representation for the data that omits all but essential objects and fields. The second choice provides a huge opportunity for memory optimization. The first is a crude-but-effective short-term solution. was (Author: paul.rogers): Analysis: * Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects. * Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} objects. * Impala's {{LocalTable}} contains a Hive {{Table}} object. * The {{Table}} object is defined in [Hive's Thrift schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift] API. It does not contain a list of partitions. Things here get complex because Cloudera does not provide source jars for its build of Hive, so can't step into or set breakpoints in Hive code. Impala loads tables in the background by calling {{HdfsTable.load()}}: * {{load()}} calls {{MetaStoreUtil.fetchAllPartitions()}} to get the partitions. * The list of Hive {{Partition} objects is passed to {{HdfsTable.loadAllPartitions()}}. * Hive's [{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java] is generated from Thrift. Contains a {{StorageDescriptor}}. * Hive's [{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java] contains the list of {{FieldSchema}} objects which Todd saw in the heap dump. Things are a bit confusing because: * Hive defines a different [{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java] class, which contains a {{TableSpec}}. * Hive's [{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java] contains a list of {{Partition}} objects. A quick scan of the Hive code suggests that Hive's Thrift objects carry more info that is required in the Impala cache. Creating Impala-specific, high-performance versions would likely save space. (No need for parent pointers, no need for the two-level Hive API structure, etc.) So, this gives us two
[jira] [Comment Edited] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache
[ https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629410#comment-16629410 ] Paul Rogers edited comment on IMPALA-7501 at 9/26/18 11:07 PM: --- Analysis: * Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects. * Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} objects. * Impala's {{LocalTable}} contains a Hive {{Table}} object. * The {{Table}} object is defined in [Hive's Thrift schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift] API. It does not contain a list of partitions. Things here get complex because Cloudera does not provide source jars for its build of Hive, so can't step into or set breakpoints in Hive code. Impala loads tables in the background by calling {{HdfsTable.load()}}: * {{load()}} calls {{MetaStoreUtil.fetchAllPartitions()}} to get the partitions. * The list of Hive {{Partition} objects is passed to {{HdfsTable.loadAllPartitions()}}. * Hive's [{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java] is generated from Thrift. Contains a {{StorageDescriptor}}. * Hive's [{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java] contains the list of {{FieldSchema}} objects which Todd saw in the heap dump. Things are a bit confusing because: * Hive defines a different [{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java] class, which contains a {{TableSpec}}. * Hive's [{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java] contains a list of {{Partition}} objects. was (Author: paul.rogers): Analysis: * Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects. * Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} objects. * Impala's {{LocalTable}} contains a Hive {{Table}} object. * Hive's [{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java] is a Hive-defined class, which contains a {{TableSpec}}. * Hive's [{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java] contains a list of {{Partition}} objects. * Hive's [{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java] is generated from Thrift. Contains a {{StorageDescriptor}}. * Hive's [{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java] contains the list of {{FieldSchema}} objects which Todd saw in the heap dump. The [Hive Thrift schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift] is an easier way to visualize Hive part of the above analysis. A quick scan of the Hive code suggests that Hive's Thrift objects carry more info that is required in the Impala cache. Creating Impala-specific, high-performance versions would likely save space. (No need for parent pointers, no need for the two-level Hive API structure, etc.) So, this gives us two options: * Reach inside Hive's Thrift objects to null out fields which we don't need, or * Design an Impala-specific, compact representation for the data that omits all but essential objects and fields. The second choice provides a huge opportunity for memory optimization. The first is a crude-but-effective short-term solution. > Slim down metastore Partition objects in LocalCatalog cache > --- > > Key: IMPALA-7501 > URL: https://issues.apache.org/jira/browse/IMPALA-7501 > Project: IMPALA > Issue Type: Sub-task >Reporter: Todd Lipcon >Priority: Minor > > I took a heap dump of an impalad running in LocalCatalog mode with a 2G limit > after running a production workload simulation for a couple hours. It had > 38.5M objects and 2.02GB heap (the vast majority of the heap is, as expected, > in the LocalCatalog cache). Of this total footprint, 1.78GB and 34.6M objects > are retained by 'Partition' objects. Drilling into those, 1.29GB and 33.6M > objects are retained by FieldSchema, which, as far as I remember, are ignored > on the partition level
[jira] [Comment Edited] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache
[ https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629410#comment-16629410 ] Paul Rogers edited comment on IMPALA-7501 at 9/26/18 11:07 PM: --- Analysis: * Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects. * Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} objects. * Impala's {{LocalTable}} contains a Hive {{Table}} object. * The {{Table}} object is defined in [Hive's Thrift schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift] API. It does not contain a list of partitions. Things here get complex because Cloudera does not provide source jars for its build of Hive, so can't step into or set breakpoints in Hive code. Impala loads tables in the background by calling {{HdfsTable.load()}}: * {{load()}} calls {{MetaStoreUtil.fetchAllPartitions()}} to get the partitions. * The list of Hive {{Partition} objects is passed to {{HdfsTable.loadAllPartitions()}}. * Hive's [{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java] is generated from Thrift. Contains a {{StorageDescriptor}}. * Hive's [{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java] contains the list of {{FieldSchema}} objects which Todd saw in the heap dump. Things are a bit confusing because: * Hive defines a different [{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java] class, which contains a {{TableSpec}}. * Hive's [{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java] contains a list of {{Partition}} objects. A quick scan of the Hive code suggests that Hive's Thrift objects carry more info that is required in the Impala cache. Creating Impala-specific, high-performance versions would likely save space. (No need for parent pointers, no need for the two-level Hive API structure, etc.) So, this gives us two options: * Reach inside Hive's Thrift objects to null out fields which we don't need, or * Design an Impala-specific, compact representation for the data that omits all but essential objects and fields. The second choice provides a huge opportunity for memory optimization. The first is a crude-but-effective short-term solution. was (Author: paul.rogers): Analysis: * Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects. * Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} objects. * Impala's {{LocalTable}} contains a Hive {{Table}} object. * The {{Table}} object is defined in [Hive's Thrift schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift] API. It does not contain a list of partitions. Things here get complex because Cloudera does not provide source jars for its build of Hive, so can't step into or set breakpoints in Hive code. Impala loads tables in the background by calling {{HdfsTable.load()}}: * {{load()}} calls {{MetaStoreUtil.fetchAllPartitions()}} to get the partitions. * The list of Hive {{Partition} objects is passed to {{HdfsTable.loadAllPartitions()}}. * Hive's [{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java] is generated from Thrift. Contains a {{StorageDescriptor}}. * Hive's [{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java] contains the list of {{FieldSchema}} objects which Todd saw in the heap dump. Things are a bit confusing because: * Hive defines a different [{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java] class, which contains a {{TableSpec}}. * Hive's [{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java] contains a list of {{Partition}} objects. > Slim down metastore Partition objects in LocalCatalog cache > --- > > Key: IMPALA-7501 > URL: https://issues.apache.org/jira/browse/IMPALA-7501 > Project: IMPALA > Issue Type: Sub-task >Reporter: Todd Lipcon >Priority: Minor > > I took a heap dump of an impalad running in LocalCatalog
[jira] [Commented] (IMPALA-7597) "show partitions" does not retry on InconsistentMetadataFetchException
[ https://issues.apache.org/jira/browse/IMPALA-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629491#comment-16629491 ] Vuk Ercegovac commented on IMPALA-7597: --- The issue reported here is one example of InconsistentMetadataFetchException that can be thrown by code that is not under the retry loop of createExecRequest. Working backwards, all of these a thrown from sendRequest in CatalogMetaProvider when fetching from catalogd and at catalogd, 1) not finding an expected object (e.g., database might have been deleted and now we're fetching its list of table names, which is no longer valid) or 2) finding that versions mismatch due to an interleaved write. Such inconsistencies are possible at every step of the schema hierarchy, e.g., list dbs, get db info, list table names, load table, load table col stats, list partitions, load partition(s), list functions, load function. With the push architecture ("v1"), many of these operations would succeed but with potentially stale data. For example, if the table is present locally, its partitions are also present, so "show partitions" would complete. With the pull architecture ("v2"), if a new partition is added or the table is dropped for example, after the table is cached but before the partitions are fetched, the change will be reported as an exception. While the exception reflects a more current state, such exceptions offer a different behavior than with "v1". With "v1", a stale result can be returned. A follow-up operation, for example listing the tables in a database for a database that was listed (via show databases) but since dropped would just result in an error stating that the database does not exist. For queries, we chose to explicitly retry. An option here is to retry for all such operations. We can do so with a retrying wrapper with the same interface (similar to the hms retrying client). However, that may be too heavyweight an approach. For example, getCatalogMetrics (and its callers) should be able to proceed when such an exception arises-- its for internal book-keeping and can be skipped. An alternative is to provide a wrapper that retries and can easily be obtained-- first thought is to add something along side getCatalog in Frontend, e.g., getRetryableCatalog-- and to use it where needed. Further alternatives include making the exception checked, which was pointed out in a todo (along with it being viral). Another approach is to make v2's cache more coarse grained. For example, a database can include all its table names and functions (avoids the double check). In addition, a way to test this is needed. Initial thought is to inject time delays and check that at least one such inconsistency is encountered and retried per operation. > "show partitions" does not retry on InconsistentMetadataFetchException > -- > > Key: IMPALA-7597 > URL: https://issues.apache.org/jira/browse/IMPALA-7597 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 3.1.0 >Reporter: bharath v >Assignee: Vuk Ercegovac >Priority: Critical > > IMPALA-7530 added retries in case LocalCatalog throws > InconsistentMetadataFetchException. These retries apply to all code paths > taking {{Frontend#createExecRequest()}}. > "show partitions" additionally takes {{Frontend#getTableStats()} and aborts > the first time it sees InconsistentMetadataFetchException. > We need to make sure all the queries (especially DDLs) retry if they hit this > exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache
[ https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629450#comment-16629450 ] Philip Zeyliger commented on IMPALA-7501: - I think Todd's immediate suggestion here is to null out the Thrift stuff. Note that I think we first retrieve in in {{catalogd}} but it eventually makes its way into {{impalad}} and is presumably Thrift-serialized on the way. It may be useful to null it out in {{catalogd}} since memory there is also valuable, but you'll have to work out the details. > Slim down metastore Partition objects in LocalCatalog cache > --- > > Key: IMPALA-7501 > URL: https://issues.apache.org/jira/browse/IMPALA-7501 > Project: IMPALA > Issue Type: Sub-task >Reporter: Todd Lipcon >Priority: Minor > > I took a heap dump of an impalad running in LocalCatalog mode with a 2G limit > after running a production workload simulation for a couple hours. It had > 38.5M objects and 2.02GB heap (the vast majority of the heap is, as expected, > in the LocalCatalog cache). Of this total footprint, 1.78GB and 34.6M objects > are retained by 'Partition' objects. Drilling into those, 1.29GB and 33.6M > objects are retained by FieldSchema, which, as far as I remember, are ignored > on the partition level by the Impala planner. So, with a bit of slimming down > of these objects, we could make a huge dent in effective cache capacity given > a fixed budget. Reducing object count should also have the effect of improved > GC performance (old gen GC is more closely tied to object count than size) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7636) Avoid storing hash in hash table bucket for hash tables in join
Tim Armstrong created IMPALA-7636: - Summary: Avoid storing hash in hash table bucket for hash tables in join Key: IMPALA-7636 URL: https://issues.apache.org/jira/browse/IMPALA-7636 Project: IMPALA Issue Type: Improvement Components: Backend Affects Versions: Impala 3.1.0 Reporter: Tim Armstrong Somewhat related to IMPALA-7635, I think storing the precomputed hash in the hash table buckets is of questionable benefit for joins. It's useful for aggregations since we frequently resize the hash tables, but in joins it's only used to short-circuit calling Equal(), which often isn't that expensive. It's unclear how many calls to Equal() are actually avoided. We should do some benchmarks to determine . As a sanity check for the idea, we could remove the (hash == bucket->hash) check in Probe() and see if performance is affected. The difficult part here is figuring out how to share the HashTable code between the agg and join but having different bucket representations - templates? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7635) Reduce size of hash tables in-memory by packing buckets more densely
Tim Armstrong created IMPALA-7635: - Summary: Reduce size of hash tables in-memory by packing buckets more densely Key: IMPALA-7635 URL: https://issues.apache.org/jira/browse/IMPALA-7635 Project: IMPALA Issue Type: Improvement Components: Backend Affects Versions: Impala 3.1.0 Reporter: Tim Armstrong Currently the hash tables used for hash join and aggregation use 16 bytes per bucket and 24 bytes per additional duplicate for a key: {code} /// Linked list of entries used for duplicates. struct DuplicateNode { /// Used for full outer and right {outer, anti, semi} joins. Indicates whether the /// row in the DuplicateNode has been matched. /// From an abstraction point of view, this is an awkward place to store this /// information. /// TODO: Fold this flag in the next pointer below. bool matched; /// Chain to next duplicate node, NULL when end of list. DuplicateNode* next; HtData htdata; }; struct Bucket { /// Whether this bucket contains a vaild entry, or it is empty. bool filled; /// Used for full outer and right {outer, anti, semi} joins. Indicates whether the /// row in the bucket has been matched. /// From an abstraction point of view, this is an awkward place to store this /// information but it is efficient. This space is otherwise unused. bool matched; /// Used in case of duplicates. If true, then the bucketData union should be used as /// 'duplicates'. bool hasDuplicates; /// Cache of the hash for data. /// TODO: Do we even have to cache the hash value? uint32_t hash; /// Either the data for this bucket or the linked list of duplicates. union { HtData htdata; DuplicateNode* duplicates; } bucketData; }; {code} There are some comments in the code that suggest folding the boolean values into the upper bits of the pointers (since on amd64 the address space is only 48 bits, but moving to 57 bits apparently - see https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf). That would reduce the bucket to 12 bytes of actual data. This would give us the opportunity to reduce memory requirements of joins and the pressure on caches significantly, provided we can work out the implementation issues and the cost of the bit manipulation doesn't exceed the benefit (my intuition is that cache effects are way more important but I could be wrong). Here's a rough idea of what we could do: # Implement folding of booleans into the pointer and mark struct Bucket as packed so that it doesn't just undo the work with additional padding. # Modifying Hashtable to work with the new bucket structure. This needs a little thought since the bucket allocations must be a power-of-two size in bytes, but we also need the hash table entries to be a power-of-two in order for masking the hash to get the bucket number to work. I think either we could just leave wasted space in the buffer or switch to a non-power-of-two number of buckets and using an alternative method of getting the bucket from the hash: https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/ # Run benchmarks to see if it's beneficial. The effect probably depends on the data set size. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7634) Impala 3.1 Doc: Doc the command to gracefully shutdown Impala
[ https://issues.apache.org/jira/browse/IMPALA-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni updated IMPALA-7634: Affects Version/s: Impala 3.1.0 > Impala 3.1 Doc: Doc the command to gracefully shutdown Impala > - > > Key: IMPALA-7634 > URL: https://issues.apache.org/jira/browse/IMPALA-7634 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Affects Versions: Impala 3.1.0 >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7634) Impala 3.1 Doc: Doc the command to gracefully shutdown Impala
[ https://issues.apache.org/jira/browse/IMPALA-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni updated IMPALA-7634: Target Version: Impala 3.1.0 > Impala 3.1 Doc: Doc the command to gracefully shutdown Impala > - > > Key: IMPALA-7634 > URL: https://issues.apache.org/jira/browse/IMPALA-7634 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7634) Impala 3.1 Doc: Doc the command to gracefully shutdown Impala
Alex Rodoni created IMPALA-7634: --- Summary: Impala 3.1 Doc: Doc the command to gracefully shutdown Impala Key: IMPALA-7634 URL: https://issues.apache.org/jira/browse/IMPALA-7634 Project: IMPALA Issue Type: Sub-task Components: Docs Reporter: Alex Rodoni Assignee: Alex Rodoni -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache
[ https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629410#comment-16629410 ] Paul Rogers commented on IMPALA-7501: - Analysis: * Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects. * Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} objects. * Impala's {{LocalTable}} contains a Hive {{Table}} object. * Hive's [{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java] is a Hive-defined class, which contains a {{TableSpec}}. * Hive's [{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java] contains a list of {{Partition}} objects. * Hive's [{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java] is generated from Thrift. Contains a {{StorageDescriptor}}. * Hive's [{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java] contains the list of {{FieldSchema}} objects which Todd saw in the heap dump. The [Hive Thrift schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift] is an easier way to visualize Hive part of the above analysis. A quick scan of the Hive code suggests that Hive's Thrift objects carry more info that is required in the Impala cache. Creating Impala-specific, high-performance versions would likely save space. (No need for parent pointers, no need for the two-level Hive API structure, etc.) So, this gives us two options: * Reach inside Hive's Thrift objects to null out fields which we don't need, or * Design an Impala-specific, compact representation for the data that omits all but essential objects and fields. The second choice provides a huge opportunity for memory optimization. The first is a crude-but-effective short-term solution. > Slim down metastore Partition objects in LocalCatalog cache > --- > > Key: IMPALA-7501 > URL: https://issues.apache.org/jira/browse/IMPALA-7501 > Project: IMPALA > Issue Type: Sub-task >Reporter: Todd Lipcon >Priority: Minor > > I took a heap dump of an impalad running in LocalCatalog mode with a 2G limit > after running a production workload simulation for a couple hours. It had > 38.5M objects and 2.02GB heap (the vast majority of the heap is, as expected, > in the LocalCatalog cache). Of this total footprint, 1.78GB and 34.6M objects > are retained by 'Partition' objects. Drilling into those, 1.29GB and 33.6M > objects are retained by FieldSchema, which, as far as I remember, are ignored > on the partition level by the Impala planner. So, with a bit of slimming down > of these objects, we could make a huge dent in effective cache capacity given > a fixed budget. Reducing object count should also have the effect of improved > GC performance (old gen GC is more closely tied to object count than size) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7633) Impala 3.1 Doc: Add support for multiple distinct operators in the same query block
Alex Rodoni created IMPALA-7633: --- Summary: Impala 3.1 Doc: Add support for multiple distinct operators in the same query block Key: IMPALA-7633 URL: https://issues.apache.org/jira/browse/IMPALA-7633 Project: IMPALA Issue Type: Sub-task Components: Docs Affects Versions: Impala 3.1.0 Reporter: Alex Rodoni Assignee: Alex Rodoni -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7581) Hang in buffer-pool-test
[ https://issues.apache.org/jira/browse/IMPALA-7581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629383#comment-16629383 ] Tim Armstrong commented on IMPALA-7581: --- Yeah both reasonable suggestions.I also thought about adding a background thread to unit tests that called abort() after a timeout. > Hang in buffer-pool-test > > > Key: IMPALA-7581 > URL: https://issues.apache.org/jira/browse/IMPALA-7581 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.1.0 >Reporter: Thomas Tauber-Marshall >Assignee: Tim Armstrong >Priority: Critical > Labels: broken-build, flaky > Attachments: gdb.txt > > > We have observed a hang in buffer-pool-test an ASAN build. Unfortunately, no > logs were generated with any info about what might have happened. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7581) Hang in buffer-pool-test
[ https://issues.apache.org/jira/browse/IMPALA-7581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629351#comment-16629351 ] Philip Zeyliger commented on IMPALA-7581: - We could also wrap the test runners in {{/usr/bin/timeout}} to make debugging this slightly more pleasant. I think it's reasonable to skip them under ASAN. We already skip death tests with release builds: {code} // Gtest's ASSERT_DEBUG_DEATH macro has peculiar semantics where in debug builds it // executes the code in a forked process, so it has no visible side-effects, but in // release builds it executes the code as normal. This makes it difficult to write // death tests that work in both debug and release builds. To avoid this problem, update // our wrapper macro to simply omit the death test expression in release builds, where we // can't actually test DCHECKs anyway. #define IMPALA_ASSERT_DEBUG_DEATH(fn, msg) #endif {code} > Hang in buffer-pool-test > > > Key: IMPALA-7581 > URL: https://issues.apache.org/jira/browse/IMPALA-7581 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.1.0 >Reporter: Thomas Tauber-Marshall >Assignee: Tim Armstrong >Priority: Critical > Labels: broken-build, flaky > Attachments: gdb.txt > > > We have observed a hang in buffer-pool-test an ASAN build. Unfortunately, no > logs were generated with any info about what might have happened. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7628) test_tls_ecdh failing on CentOS 6/Python 2.6
[ https://issues.apache.org/jira/browse/IMPALA-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629342#comment-16629342 ] ASF subversion and git services commented on IMPALA-7628: - Commit 09150f04cac84965e3b390404c57a51261aecf56 in impala's branch refs/heads/master from [~tarmstr...@cloudera.com] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=09150f0 ] IMPALA-7628: skip test_tls_ecdh on Python 2.6 This is a temporary workaround. On the CentOS 6 build that failed test_tls_v12, test_wildcard_san_ssl and test_wildcard_ssl were all skipped so I figured this will unblock the tests without losing coverage on most platforms that have recent Python. Change-Id: I94ae9d254d5fd337774a24106eb9b08585ac0b01 Reviewed-on: http://gerrit.cloudera.org:8080/11519 Reviewed-by: Thomas Marshall Tested-by: Impala Public Jenkins > test_tls_ecdh failing on CentOS 6/Python 2.6 > > > Key: IMPALA-7628 > URL: https://issues.apache.org/jira/browse/IMPALA-7628 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.1.0 > Environment: CentOS 6.4, Python 2.6 >Reporter: Tim Armstrong >Assignee: Thomas Tauber-Marshall >Priority: Blocker > > {noformat} > custom_cluster/test_client_ssl.py:125: in test_tls_ecdh > self._validate_positive_cases("%s/server-cert.pem" % self.CERT_DIR) > custom_cluster/test_client_ssl.py:198: in _validate_positive_cases > result = run_impala_shell_cmd(shell_options) > shell/util.py:97: in run_impala_shell_cmd > result.stderr) > E AssertionError: Cmd --ssl -q 'select 1 + 2' was expected to succeed: > Starting Impala Shell without Kerberos authentication > E SSL is enabled. Impala server certificates will NOT be verified (set > --ca_cert to change) > E > /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80: > DeprecationWarning: 3th positional argument is deprecated. Use keyward > argument insteand. > E DeprecationWarning) > E > /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80: > DeprecationWarning: 4th positional argument is deprecated. Use keyward > argument insteand. > E DeprecationWarning) > E > /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80: > DeprecationWarning: 5th positional argument is deprecated. Use keyward > argument insteand. > E DeprecationWarning) > E > /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:216: > DeprecationWarning: validate is deprecated. Use cert_reqs=ssl.CERT_NONE > instead > E DeprecationWarning) > E No handlers could be found for logger "thrift.transport.TSSLSocket" > E Error connecting: TTransportException, Could not connect to > localhost:21000: [Errno 1] _ssl.c:490: error:14094410:SSL > routines:SSL3_READ_BYTES:sslv3 alert handshake failure > E Not connected to Impala, could not execute queries. > {noformat} > Git hash is e38715e25297cc3643482be04e3b1b273e339b54 > I'm going to push out a temporary fix to unblock tests (since there are other > related tests skipped on this platform) but I'll let Thomas validate the > correctness of it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622856#comment-16622856 ] Paul Rogers edited comment on IMPALA-7310 at 9/26/18 7:45 PM: -- The planner uses NDVs to make binary decisions: do I do x or y? (Do I put t1 on the build side of a join, or to I put it on the probe site?) In most cases, the values being compared are order-of-magnitude different, and so fine nuances of value are not important. We simply need some reasonable non-zero number so that the calcs can play out. (Note: the following is left for the record, but the final fix is much more narrowly tailored.) The simplest fix is to handle the non-stats case for a [{{ColumnStats}}|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/ColumnStats.java] instance. The current code in {{initColStats()}} initializes NDV to -1 (undefined). Suggested alternatives: * If type is Boolean, NDV = 2 * If type is TINYINT, NDV = 256. * If type is anything else, assume NDV = some constant, say 1000. Note that a variation of the above logic actually already exists in {{createHiveColStatsData()}} where it is used to bound the NDV value. So, we just reuse it. That code also suggests we can NDV at row count. But, since our guesses are small, the row count might not add much value. Or, our NDV guess might be some fraction of row count, if we want to be fancy. {{ColumnStats}} already has a {{hasStats()}} method which checks if NDV is other than -1. Since NDV will always not be some value, change this method to check only {{numNulls_}}, which will continue to be -1 without stats. Finally, in {{createHiveColStatsData}}, set a floor on NDV at 1 to account for the fact that an all-null column has HDV=0. Or, to be conservative, if NDV <= 10, add one to NDV to account for nulls. (Do this always, since a column that claims to be non-null can eventually become null as the result of an outer join.) Next, modify {{update()}} to use the defaults (to be set in {{initColStats()}} for the "incompatible" case. As a result, when the plan nodes ask for NDV, they won't get a 0 value if we have no data, nor will they get 0 if a column is all nulls. Add or modify unit tests to verify the above logic, especially the defaults case and how the defaults propagate up the plan tree. The risk is that some plans will change. We hope they change to favor getting the correct plan more often. But, there will be some use case for which the old, wrong, values produced a more accurate plan than the new estimates. This is always a risk. was (Author: paul.rogers): The planner uses NDVs to make binary decisions: do I do x or y? (Do I put t1 on the build side of a join, or to I put it on the probe site?) In most cases, the values being compared are order-of-magnitude different, and so fine nuances of value are not important. We simply need some reasonable non-zero number so that the calcs can play out. The simplest fix is to handle the non-stats case for a [{{ColumnStats}}|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/ColumnStats.java] instance. The current code in {{initColStats()}} initializes NDV to -1 (undefined). Suggested alternatives: * If type is Boolean, NDV = 2 * If type is TINYINT, NDV = 256. * If type is anything else, assume NDV = some constant, say 1000. Note that a variation of the above logic actually already exists in {{createHiveColStatsData()}} where it is used to bound the NDV value. So, we just reuse it. That code also suggests we can NDV at row count. But, since our guesses are small, the row count might not add much value. Or, our NDV guess might be some fraction of row count, if we want to be fancy. {{ColumnStats}} already has a {{hasStats()}} method which checks if NDV is other than -1. Since NDV will always not be some value, change this method to check only {{numNulls_}}, which will continue to be -1 without stats. Finally, in {{createHiveColStatsData}}, set a floor on NDV at 1 to account for the fact that an all-null column has HDV=0. Or, to be conservative, if NDV <= 10, add one to NDV to account for nulls. (Do this always, since a column that claims to be non-null can eventually become null as the result of an outer join.) Next, modify {{update()}} to use the defaults (to be set in {{initColStats()}} for the "incompatible" case. As a result, when the plan nodes ask for NDV, they won't get a 0 value if we have no data, nor will they get 0 if a column is all nulls. Add or modify unit tests to verify the above logic, especially the defaults case and how the defaults propagate up the plan tree. The risk is that some plans will change. We hope they change to favor getting the correct plan more often. But, there will be some use case for which the old, wrong, values produced a more
[jira] [Comment Edited] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates
[ https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626483#comment-16626483 ] Paul Rogers edited comment on IMPALA-7310 at 9/26/18 7:44 PM: -- Final solution is even simpler, since we don't want to change anything except handling of a table with stats, and a column of all nulls. So we apply the adjustment only if: * The column comes from a base table (not an internal column such as {{COUNT\(*)}} * The table has stats * The column in question is nullable * The column either has no null count, or the null count is non-zero * The NDV without nulls is 0 or 1 In this very limited case, we bump the NDV by 1. As it turns out, the TPC-H test cases have several queries in which a table column is nullable, has only one or two values, and those columns are clearly meant to be non-null. The above fix works around these cases so that such cases don't cause large changes to the results for {{PlannerTest}}. was (Author: paul.rogers): Final solution is even simpler, since we don't want to change anything except handling of a table with stats, and a column of all nulls. So we apply the adjustment only if: * The column comes from a base table (not an internal column such as {{COUNT(*)}} * The table has stats * The column in question is nullable * The column either has no null count, or the null count is non-zero * The NDV without nulls is 0 or 1  In this very limited case, we bump the NDV by 1. As it turns out, the TPC-H test cases have several queries in which a table column is nullable, has only one or two values, and those columns are clearly meant to be non-null. The above fix works around these cases so that such cases don't cause large changes to the results for {{PlannerTest}}. > Compute Stats not computing NULLs as a distinct value causing wrong estimates > - > > Key: IMPALA-7310 > URL: https://issues.apache.org/jira/browse/IMPALA-7310 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, > Impala 2.11.0, Impala 3.0, Impala 2.12.0 >Reporter: Zsombor Fedor >Assignee: Paul Rogers >Priority: Major > > As seen in other DBMSs > {code:java} > NDV(col){code} > not counting NULL as a distinct value. The same also applies to > {code:java} > COUNT(DISTINCT col){code} > This is working as intended, but when computing column statistics it can > cause some anomalies (i.g. bad join order) as compute stats uses NDV() to > determine columns NDVs. >  > For example when aggregating more columns, the estimated cardinality is > [counted as the product of the columns' number of distinct > values.|https://github.com/cloudera/Impala/blob/64cd0bb0c3529efa0ab5452c4e9e2a04fd815b4f/fe/src/main/java/org/apache/impala/analysis/Expr.java#L669] >  If there is a column full of NULLs the whole product will be 0. >  > There are two possible fix for this. > Either we should count NULLs as a distinct value when Computing Stats in the > query: > {code:java} > SELECT NDV(a) + COUNT(DISTINCT CASE WHEN a IS NULL THEN 1 END) AS a, CAST(-1 > as BIGINT), 4, CAST(4 as DOUBLE) FROM test;{code} > instead of > {code:java} > SELECT NDV(a) AS a, CAST(-1 as BIGINT), 4, CAST(4 as DOUBLE) FROM test;{code} >  >  > Or we should change the planner > [function|https://github.com/cloudera/Impala/blob/2d2579cb31edda24457d33ff5176d79b7c0432c5/fe/src/main/java/org/apache/impala/planner/AggregationNode.java#L169] > to take care of this bug. >  -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7627) Parallel the fetching permission process
[ https://issues.apache.org/jira/browse/IMPALA-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629301#comment-16629301 ] Vuk Ercegovac commented on IMPALA-7627: --- Pls include the version (or githash) of Impala from which these measurements were obtained. Also, would be useful to have units on those measurements as well as number of partitions/files. > Parallel the fetching permission process > > > Key: IMPALA-7627 > URL: https://issues.apache.org/jira/browse/IMPALA-7627 > Project: IMPALA > Issue Type: Improvement >Reporter: Peikai Zheng >Assignee: Peikai Zheng >Priority: Major > > There are three phases when the Catalogd loading the metadata of a table. > Firstly, the Catalogd fetches the metadata from Hive metastore; > Then, the Catalogd fetches the permission of each partition from HDFS > NameNode; > Finally, the Catalogd loads the file descriptor from HDFS NameNode. > According to my test result: > ||Average Time(GetFileInfoThread=10) || phase 1 || phase 2 || phase 3|| > > |idm.sauron_message|9.9917115|459.2106944|95.0179163| > |default.revenue_enriched|12.3377474|111.2969046|40.827472| > |default.upp_raw_prod|1.5143162|50.0251426|12.6805323| > |default.hit_to_beacon_playback_prod|1.4294509|49.7670539|18.3557858| > |default.sitetracking_enriched|13.0003804|112.8746656|42.1824032| > |default.player_custom_event|9.2618705|493.4865302|116.4986184| > |default.revenue_day_est|57.9116561|106.5028664|24.005822| > The majority of the time occupied by the second phase. > So, I suggest to parallel the second phase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7632) Erasure coding builds still failing because of default query options
Tim Armstrong created IMPALA-7632: - Summary: Erasure coding builds still failing because of default query options Key: IMPALA-7632 URL: https://issues.apache.org/jira/browse/IMPALA-7632 Project: IMPALA Issue Type: Bug Components: Infrastructure Affects Versions: Impala 3.1.0 Reporter: Tim Armstrong Assignee: Tim Armstrong Two tests fail because the default query options they set were clobbered by the custom cluster test infra: *TestSetAndUnset.test_set_and_unset *TestAdmissionController.test_set_request_pool {noformat} hs2/hs2_test_suite.py:48: in add_session fn(self) custom_cluster/test_set_and_unset.py:44: in test_set_and_unset assert "DEBUG_ACTION\tcustom\tDEVELOPMENT" in result.data, "baseline" E AssertionError: baseline E assert 'DEBUG_ACTION\tcustom\tDEVELOPMENT' in ['ABORT_ON_ERROR\t0\tREGULAR', 'ALLOW_ERASURE_CODED_FILES\t1\tDEVELOPMENT', 'APPX_COUNT_DISTINCT\t0\tADVANCED', 'BATCH_SIZE\t0\tDEVELOPMENT', 'BUFFER_POOL_LIMIT\t\tADVANCED', 'COMPRESSION_CODEC\t\tREGULAR', ...] E+ where ['ABORT_ON_ERROR\t0\tREGULAR', 'ALLOW_ERASURE_CODED_FILES\t1\tDEVELOPMENT', 'APPX_COUNT_DISTINCT\t0\tADVANCED', 'BATCH_SIZE\t0\tDEVELOPMENT', 'BUFFER_POOL_LIMIT\t\tADVANCED', 'COMPRESSION_CODEC\t\tREGULAR', ...] = .data hs2/hs2_test_suite.py:48: in add_session fn(self) custom_cluster/test_admission_controller.py:317: in test_set_request_pool ['MEM_LIMIT=2', 'REQUEST_POOL=root.queueB']) custom_cluster/test_admission_controller.py:224: in __check_query_options assert False, "Expected query options %s, got %s." % (expected, actual) E AssertionError: Expected query options MEM_LIMIT=2,REQUEST_POOL=root.queueB, got allow_erasure_coded_files=1,request_pool=root.queueb. E assert False{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7631) Add Sentry configuration to allow specific privileges to be granted explicitly
Fredy Wijaya created IMPALA-7631: Summary: Add Sentry configuration to allow specific privileges to be granted explicitly Key: IMPALA-7631 URL: https://issues.apache.org/jira/browse/IMPALA-7631 Project: IMPALA Issue Type: Sub-task Components: Infrastructure Affects Versions: Impala 3.0 Reporter: Fredy Wijaya Assignee: Fredy Wijaya Sentry requires a new configuration (sentry.db.explicit.grants.permitted) to specify which privileges are permitted to be granted explicitly: https://issues.apache.org/jira/browse/SENTRY-2413. We need to update sentry-site*template files with a new configuration. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-7631) Add Sentry configuration to allow specific privileges to be granted explicitly
[ https://issues.apache.org/jira/browse/IMPALA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-7631 started by Fredy Wijaya. > Add Sentry configuration to allow specific privileges to be granted explicitly > -- > > Key: IMPALA-7631 > URL: https://issues.apache.org/jira/browse/IMPALA-7631 > Project: IMPALA > Issue Type: Sub-task > Components: Infrastructure >Affects Versions: Impala 3.0 >Reporter: Fredy Wijaya >Assignee: Fredy Wijaya >Priority: Major > > Sentry requires a new configuration (sentry.db.explicit.grants.permitted) to > specify which privileges are permitted to be granted explicitly: > https://issues.apache.org/jira/browse/SENTRY-2413. We need to update > sentry-site*template files with a new configuration. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-7566) TAcceptQueueServer connection setup should have timeout
[ https://issues.apache.org/jira/browse/IMPALA-7566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Volker resolved IMPALA-7566. - Resolution: Duplicate Assignee: (was: bharath v) Fix Version/s: Impala 2.11.0 > TAcceptQueueServer connection setup should have timeout > --- > > Key: IMPALA-7566 > URL: https://issues.apache.org/jira/browse/IMPALA-7566 > Project: IMPALA > Issue Type: Improvement > Components: Clients, Distributed Exec >Affects Versions: Impala 2.11.0, Impala 3.0, Impala 2.12.0 >Reporter: Michael Ho >Priority: Blocker > Fix For: Impala 2.11.0 > > > Currently, there is no timeout when establishing a connection with an Impala > client. For instance, if a client freezes whether (intentionally or > unintentionally) in the middle of connection establishment (e.g. > Saslhandshake), the *single thread* in {{connection_setup_pool}} of > {{TAcceptQueueServer}} will be stuck waiting for the client and thus all > other clients trying to connect to Beeswax or HS2 port of Impalad will be > stuck forever. Impala should consider adding a timeout on the socket during > connection establishment phase with a client so as limit the amount of time > the thread can be stuck. > One can try using "{{openssl s_client"}} command to connect to Impalad with > TLS and Kerberos enabled and leave it opened. The thread doing the connection > setup will be stuck in the stack below: > {noformat} > Thread 551 (Thread 0x7fddde563700 (LWP 166354)): > #0 0x003ce2a0e82d in read () from /lib64/libpthread.so.0 > #1 0x003ce56dea71 in ?? () from /usr/lib64/libcrypto.so.10 > #2 0x003ce56dcdc9 in BIO_read () from /usr/lib64/libcrypto.so.10 > #3 0x003ce9a2c1df in ssl3_read_n () from /usr/lib64/libssl.so.10 > #4 0x003ce9a2c8dd in ssl3_read_bytes () from /usr/lib64/libssl.so.10 > #5 0x003ce9a281a0 in ?? () from /usr/lib64/libssl.so.10 > #6 0x0208ede2 in > apache::thrift::transport::TSSLSocket::read(unsigned char*, unsigned int) () > #7 0x0208b6f3 in unsigned int > apache::thrift::transport::readAll(apache::thrift::transport::TSocket&, > unsigned char*, unsigned int) () > #8 0x00cb2aa9 in > apache::thrift::transport::TSaslTransport::receiveSaslMessage(apache::thrift::transport::NegotiationStatus*, > unsigned int*) () > #9 0x00cb03e4 in > apache::thrift::transport::TSaslServerTransport::handleSaslStartMessage() () > #10 0x00cb2c23 in > apache::thrift::transport::TSaslTransport::doSaslNegotiation() () > #11 0x00cb10b8 in > apache::thrift::transport::TSaslServerTransport::Factory::getTransport(boost::shared_ptr) > () > #12 0x00b13e47 in > apache::thrift::server::TAcceptQueueServer::SetupConnection(boost::shared_ptr) > () > #13 0x00b14932 in > boost::detail::function::void_function_obj_invoker2 boost::shared_ptr const&)#1}, void, > int, boost::shared_ptr > const&>::invoke(boost::detail::function::function_buffer&, int, > boost::shared_ptr const&) () > #14 0x00b177f9 in > impala::ThreadPool > >::WorkerThread(int) () > #15 0x00d602af in > impala::Thread::SuperviseThread(std::basic_string std::char_traits, std::allocator > const&, > std::basic_string, std::allocator > > const&, boost::function, impala::ThreadDebugInfo const*, > impala::Promise*) () > #16 0x00d60aaa in boost::detail::thread_data void (*)(std::basic_string, std::allocator > > const&, std::basic_string, > std::allocator > const&, boost::function, > impala::ThreadDebugInfo const*, impala::Promise*), > boost::_bi::list5 std::char_traits, std::allocator > >, > boost::_bi::value, > std::allocator > >, boost::_bi::value >, > boost::_bi::value, > boost::_bi::value*> > > >::run() () > #17 0x012d756a in thread_proxy () > #18 0x003ce2a07aa1 in start_thread () from /lib64/libpthread.so.0 > #19 0x003ce26e893d in clone () from /lib64/libc.so.6 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7581) Hang in buffer-pool-test
[ https://issues.apache.org/jira/browse/IMPALA-7581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629228#comment-16629228 ] Tim Armstrong commented on IMPALA-7581: --- I think the best theory is that the hang happens when a background thread allocates or frees memory at the same time as the death test forks. I took a look at the background threads and it wasn't really obvious which might be a likely culprit since the death test happens at a point where things should mostly be idle. Any background thread that logs something is a potential candidate, e.g. the pause monitor threads. Or it could be something in the JVM that uses malloc, which is more opaque. I tried running with a breakpoint set on malloc() and there's a lot of activity from the JVM, particularly around class loading. One option to mitigate is to skip death tests under ASAN, under the theory that we're far more likely to grab global malloc locks there compared with TCMalloc. > Hang in buffer-pool-test > > > Key: IMPALA-7581 > URL: https://issues.apache.org/jira/browse/IMPALA-7581 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.1.0 >Reporter: Thomas Tauber-Marshall >Assignee: Tim Armstrong >Priority: Critical > Labels: broken-build, flaky > Attachments: gdb.txt > > > We have observed a hang in buffer-pool-test an ASAN build. Unfortunately, no > logs were generated with any info about what might have happened. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-110) Add support for multiple distinct operators in the same query block
[ https://issues.apache.org/jira/browse/IMPALA-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629207#comment-16629207 ] Thomas Tauber-Marshall commented on IMPALA-110: --- Barring any unforeseen problems, this will be part of the 3.1 Apache release. I am not currently aware of any plans for a 2.13 release. Cloudera does not in general make commitments about when features will land in CDH. For more info about that, you can contact Cloudera directly. > Add support for multiple distinct operators in the same query block > --- > > Key: IMPALA-110 > URL: https://issues.apache.org/jira/browse/IMPALA-110 > Project: IMPALA > Issue Type: New Feature > Components: Backend, Frontend >Affects Versions: Impala 0.5, Impala 1.4, Impala 2.0, Impala 2.2, Impala > 2.3.0 >Reporter: Greg Rahn >Assignee: Thomas Tauber-Marshall >Priority: Major > Labels: sql-language > Fix For: Impala 3.1.0 > > > Impala only allows a single (DISTINCT columns) expression in each query. > {color:red}Note: > If you do not need precise accuracy, you can produce an estimate of the > distinct values for a column by specifying NDV(column); a query can contain > multiple instances of NDV(column). To make Impala automatically rewrite > COUNT(DISTINCT) expressions to NDV(), enable the APPX_COUNT_DISTINCT query > option. > {color} > {code} > [impala:21000] > select count(distinct i_class_id) from item; > Query: select count(distinct i_class_id) from item > Query finished, fetching results ... > 16 > Returned 1 row(s) in 1.51s > {code} > {code} > [impala:21000] > select count(distinct i_class_id), count(distinct > i_brand_id) from item; > Query: select count(distinct i_class_id), count(distinct i_brand_id) from item > ERROR: com.cloudera.impala.common.AnalysisException: Analysis exception (in > select count(distinct i_class_id), count(distinct i_brand_id) from item) > at > com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:133) > at > com.cloudera.impala.service.Frontend.createExecRequest(Frontend.java:221) > at > com.cloudera.impala.service.JniFrontend.createExecRequest(JniFrontend.java:89) > Caused by: com.cloudera.impala.common.AnalysisException: all DISTINCT > aggregate functions need to have the same set of parameters as COUNT(DISTINCT > i_class_id); deviating function: COUNT(DISTINCT i_brand_id) > at > com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo(AggregateInfo.java:196) > at > com.cloudera.impala.analysis.AggregateInfo.create(AggregateInfo.java:143) > at > com.cloudera.impala.analysis.SelectStmt.createAggInfo(SelectStmt.java:466) > at > com.cloudera.impala.analysis.SelectStmt.analyzeAggregation(SelectStmt.java:347) > at com.cloudera.impala.analysis.SelectStmt.analyze(SelectStmt.java:155) > at > com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:130) > ... 2 more > {code} > Hive supports this: > {code} > $ hive -e "select count(distinct i_class_id), count(distinct i_brand_id) from > item;" > Logging initialized using configuration in > file:/etc/hive/conf.dist/hive-log4j.properties > Hive history file=/tmp/grahn/hive_job_log_grahn_201303052234_1625576708.txt > Total MapReduce jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapred.reduce.tasks= > Starting Job = job_201302081514_0073, Tracking URL = > http://impala:50030/jobdetails.jsp?jobid=job_201302081514_0073 > Kill Command = /usr/lib/hadoop/bin/hadoop job > -Dmapred.job.tracker=m0525.mtv.cloudera.com:8021 -kill job_201302081514_0073 > Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 1 > 2013-03-05 22:34:43,255 Stage-1 map = 0%, reduce = 0% > 2013-03-05 22:34:49,323 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:50,337 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:51,351 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:52,360 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:53,370 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:54,379 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:55,389 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 8.58 sec > 2013-03-05 22:34:56,402 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 8.58 sec > 2013-03-05 22:34:57,413
[jira] [Updated] (IMPALA-110) Add support for multiple distinct operators in the same query block
[ https://issues.apache.org/jira/browse/IMPALA-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Tauber-Marshall updated IMPALA-110: -- Fix Version/s: Impala 3.1.0 > Add support for multiple distinct operators in the same query block > --- > > Key: IMPALA-110 > URL: https://issues.apache.org/jira/browse/IMPALA-110 > Project: IMPALA > Issue Type: New Feature > Components: Backend, Frontend >Affects Versions: Impala 0.5, Impala 1.4, Impala 2.0, Impala 2.2, Impala > 2.3.0 >Reporter: Greg Rahn >Assignee: Thomas Tauber-Marshall >Priority: Major > Labels: sql-language > Fix For: Impala 3.1.0 > > > Impala only allows a single (DISTINCT columns) expression in each query. > {color:red}Note: > If you do not need precise accuracy, you can produce an estimate of the > distinct values for a column by specifying NDV(column); a query can contain > multiple instances of NDV(column). To make Impala automatically rewrite > COUNT(DISTINCT) expressions to NDV(), enable the APPX_COUNT_DISTINCT query > option. > {color} > {code} > [impala:21000] > select count(distinct i_class_id) from item; > Query: select count(distinct i_class_id) from item > Query finished, fetching results ... > 16 > Returned 1 row(s) in 1.51s > {code} > {code} > [impala:21000] > select count(distinct i_class_id), count(distinct > i_brand_id) from item; > Query: select count(distinct i_class_id), count(distinct i_brand_id) from item > ERROR: com.cloudera.impala.common.AnalysisException: Analysis exception (in > select count(distinct i_class_id), count(distinct i_brand_id) from item) > at > com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:133) > at > com.cloudera.impala.service.Frontend.createExecRequest(Frontend.java:221) > at > com.cloudera.impala.service.JniFrontend.createExecRequest(JniFrontend.java:89) > Caused by: com.cloudera.impala.common.AnalysisException: all DISTINCT > aggregate functions need to have the same set of parameters as COUNT(DISTINCT > i_class_id); deviating function: COUNT(DISTINCT i_brand_id) > at > com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo(AggregateInfo.java:196) > at > com.cloudera.impala.analysis.AggregateInfo.create(AggregateInfo.java:143) > at > com.cloudera.impala.analysis.SelectStmt.createAggInfo(SelectStmt.java:466) > at > com.cloudera.impala.analysis.SelectStmt.analyzeAggregation(SelectStmt.java:347) > at com.cloudera.impala.analysis.SelectStmt.analyze(SelectStmt.java:155) > at > com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:130) > ... 2 more > {code} > Hive supports this: > {code} > $ hive -e "select count(distinct i_class_id), count(distinct i_brand_id) from > item;" > Logging initialized using configuration in > file:/etc/hive/conf.dist/hive-log4j.properties > Hive history file=/tmp/grahn/hive_job_log_grahn_201303052234_1625576708.txt > Total MapReduce jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapred.reduce.tasks= > Starting Job = job_201302081514_0073, Tracking URL = > http://impala:50030/jobdetails.jsp?jobid=job_201302081514_0073 > Kill Command = /usr/lib/hadoop/bin/hadoop job > -Dmapred.job.tracker=m0525.mtv.cloudera.com:8021 -kill job_201302081514_0073 > Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 1 > 2013-03-05 22:34:43,255 Stage-1 map = 0%, reduce = 0% > 2013-03-05 22:34:49,323 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:50,337 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:51,351 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:52,360 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:53,370 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:54,379 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:55,389 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 8.58 sec > 2013-03-05 22:34:56,402 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 8.58 sec > 2013-03-05 22:34:57,413 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 8.58 sec > 2013-03-05 22:34:58,424 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 8.58 sec > MapReduce Total cumulative CPU time: 8 seconds 580 msec > Ended Job = job_201302081514_0073 > MapReduce Jobs Launched: > Job 0: Map: 1 Reduce:
[jira] [Commented] (IMPALA-7627) Parallel the fetching permission process
[ https://issues.apache.org/jira/browse/IMPALA-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629189#comment-16629189 ] bharath v commented on IMPALA-7627: --- There were some improvements in IMPALA-7320. Did you take a look at it already? https://gerrit.cloudera.org/#/c/11027/ Could you summarize your approach here? > Parallel the fetching permission process > > > Key: IMPALA-7627 > URL: https://issues.apache.org/jira/browse/IMPALA-7627 > Project: IMPALA > Issue Type: Improvement >Reporter: Peikai Zheng >Assignee: Peikai Zheng >Priority: Major > > There are three phases when the Catalogd loading the metadata of a table. > Firstly, the Catalogd fetches the metadata from Hive metastore; > Then, the Catalogd fetches the permission of each partition from HDFS > NameNode; > Finally, the Catalogd loads the file descriptor from HDFS NameNode. > According to my test result: > ||Average Time(GetFileInfoThread=10) || phase 1 || phase 2 || phase 3|| > > |idm.sauron_message|9.9917115|459.2106944|95.0179163| > |default.revenue_enriched|12.3377474|111.2969046|40.827472| > |default.upp_raw_prod|1.5143162|50.0251426|12.6805323| > |default.hit_to_beacon_playback_prod|1.4294509|49.7670539|18.3557858| > |default.sitetracking_enriched|13.0003804|112.8746656|42.1824032| > |default.player_custom_event|9.2618705|493.4865302|116.4986184| > |default.revenue_day_est|57.9116561|106.5028664|24.005822| > The majority of the time occupied by the second phase. > So, I suggest to parallel the second phase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-1760) Add Impala SQL command to gracefully shut down an Impala daemon
[ https://issues.apache.org/jira/browse/IMPALA-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-1760: -- Summary: Add Impala SQL command to gracefully shut down an Impala daemon (was: Add Impala command to gracefully shut down an Impala daemon) > Add Impala SQL command to gracefully shut down an Impala daemon > --- > > Key: IMPALA-1760 > URL: https://issues.apache.org/jira/browse/IMPALA-1760 > Project: IMPALA > Issue Type: New Feature > Components: Distributed Exec >Affects Versions: Impala 2.1.1 >Reporter: Henry Robinson >Assignee: Tim Armstrong >Priority: Critical > Labels: resource-management, scalability, scheduler, usability > Fix For: Impala 3.1.0 > > > In larger clusters, node maintenance is a frequent occurrence. There's no way > currently to stop an Impala node without failing running queries, without > draining queries across the whole cluster first. We should fix that. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-1760) Add Impala command to gracefully shut down an Impala daemon
[ https://issues.apache.org/jira/browse/IMPALA-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-1760. --- Resolution: Fixed Fix Version/s: Impala 3.1.0 This adds the primitive required for Impala to do a graceful shutdown. There is further work to integrate this with any management tools - currently this would require an admin user to run the command directly. > Add Impala command to gracefully shut down an Impala daemon > --- > > Key: IMPALA-1760 > URL: https://issues.apache.org/jira/browse/IMPALA-1760 > Project: IMPALA > Issue Type: New Feature > Components: Distributed Exec >Affects Versions: Impala 2.1.1 >Reporter: Henry Robinson >Assignee: Tim Armstrong >Priority: Critical > Labels: resource-management, scalability, scheduler, usability > Fix For: Impala 3.1.0 > > > In larger clusters, node maintenance is a frequent occurrence. There's no way > currently to stop an Impala node without failing running queries, without > draining queries across the whole cluster first. We should fix that. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-7600) Mem limit exceeded in test_kudu_scan_mem_usage
[ https://issues.apache.org/jira/browse/IMPALA-7600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-7600. --- Resolution: Fixed Fix Version/s: Impala 3.1.0 > Mem limit exceeded in test_kudu_scan_mem_usage > -- > > Key: IMPALA-7600 > URL: https://issues.apache.org/jira/browse/IMPALA-7600 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.1.0 >Reporter: Thomas Tauber-Marshall >Assignee: Tim Armstrong >Priority: Blocker > Labels: broken-build, flaky > Fix For: Impala 3.1.0 > > > Seen in an exhaustive release build: > {noformat} > 00:05:35 TestScanMemLimit.test_kudu_scan_mem_usage[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: avro/snap/block] > 00:05:35 [gw6] linux2 -- Python 2.7.5 > /data/jenkins/workspace/impala-asf-master-exhaustive-release/repos/Impala/bin/../infra/python/env/bin/python > 00:05:35 query_test/test_mem_usage_scaling.py:358: in test_kudu_scan_mem_usage > 00:05:35 self.run_test_case('QueryTest/kudu-scan-mem-usage', vector) > 00:05:35 common/impala_test_suite.py:408: in run_test_case > 00:05:35 result = self.__execute_query(target_impalad_client, query, > user=user) > 00:05:35 common/impala_test_suite.py:623: in __execute_query > 00:05:35 return impalad_client.execute(query, user=user) > 00:05:35 common/impala_connection.py:160: in execute > 00:05:35 return self.__beeswax_client.execute(sql_stmt, user=user) > 00:05:35 beeswax/impala_beeswax.py:176: in execute > 00:05:35 handle = self.__execute_query(query_string.strip(), user=user) > 00:05:35 beeswax/impala_beeswax.py:350: in __execute_query > 00:05:35 self.wait_for_finished(handle) > 00:05:35 beeswax/impala_beeswax.py:371: in wait_for_finished > 00:05:35 raise ImpalaBeeswaxException("Query aborted:" + error_log, None) > 00:05:35 E ImpalaBeeswaxException: ImpalaBeeswaxException: > 00:05:35 EQuery aborted:Memory limit exceeded: Error occurred on backend > impala-ec2-centos74-m5-4xlarge-ondemand-0e2c.vpc.cloudera.com:22000 by > fragment b34270820f59a0c9:a507139e0001 > 00:05:35 E Memory left in process limit: 10.12 GB > 00:05:35 E Memory left in query limit: -16.92 KB > 00:05:35 E Query(b34270820f59a0c9:a507139e): memory limit exceeded. > Limit=4.00 MB Reservation=0 ReservationLimit=0 OtherMemory=4.02 MB Total=4.02 > MB Peak=4.02 MB > 00:05:35 E Fragment b34270820f59a0c9:a507139e: Reservation=0 > OtherMemory=40.10 KB Total=40.10 KB Peak=340.00 KB > 00:05:35 E EXCHANGE_NODE (id=2): Reservation=32.00 KB OtherMemory=0 > Total=32.00 KB Peak=32.00 KB > 00:05:35 E KrpcDeferredRpcs: Total=0 Peak=0 > 00:05:35 E PLAN_ROOT_SINK: Total=0 Peak=0 > 00:05:35 E CodeGen: Total=103.00 B Peak=332.00 KB > 00:05:35 E Fragment b34270820f59a0c9:a507139e0001: Reservation=0 > OtherMemory=3.98 MB Total=3.98 MB Peak=3.98 MB > 00:05:35 E SORT_NODE (id=1): Total=342.00 KB Peak=342.00 KB > 00:05:35 E KUDU_SCAN_NODE (id=0): Total=3.63 MB Peak=3.63 MB > 00:05:35 E Queued Batches: Total=3.30 MB Peak=3.63 MB > 00:05:35 E KrpcDataStreamSender (dst_id=2): Total=1.16 KB Peak=1.16 KB > 00:05:35 E CodeGen: Total=3.66 KB Peak=1.14 MB > 00:05:35 E > 00:05:35 E Memory limit exceeded: Error occurred on backend > impala-ec2-centos74-m5-4xlarge-ondemand-0e2c.vpc.cloudera.com:22000 by > fragment b34270820f59a0c9:a507139e0001 > 00:05:35 E Memory left in process limit: 10.12 GB > 00:05:35 E Memory left in query limit: -16.92 KB > 00:05:35 E Query(b34270820f59a0c9:a507139e): memory limit exceeded. > Limit=4.00 MB Reservation=0 ReservationLimit=0 OtherMemory=4.02 MB Total=4.02 > MB Peak=4.02 MB > 00:05:35 E Fragment b34270820f59a0c9:a507139e: Reservation=0 > OtherMemory=40.10 KB Total=40.10 KB Peak=340.00 KB > 00:05:35 E EXCHANGE_NODE (id=2): Reservation=32.00 KB OtherMemory=0 > Total=32.00 KB Peak=32.00 KB > 00:05:35 E KrpcDeferredRpcs: Total=0 Peak=0 > 00:05:35 E PLAN_ROOT_SINK: Total=0 Peak=0 > 00:05:35 E CodeGen: Total=103.00 B Peak=332.00 KB > 00:05:35 E Fragment b34270820f59a0c9:a507139e0001: Reservation=0 > OtherMemory=3.98 MB Total=3.98 MB Peak=3.98 MB > 00:05:35 E SORT_NODE (id=1): Total=342.00 KB Peak=342.00 KB > 00:05:35 E KUDU_SCAN_NODE (id=0): Total=3.63 MB Peak=3.63 MB > 00:05:35 E Queued Batches: Total=3.30 MB Peak=3.63 MB > 00:05:35 E KrpcDataStreamSender (dst_id=2): Total=1.16 KB Peak=1.16 KB > 00:05:35 E CodeGen: Total=3.66 KB Peak=1.14 MB (1 of 2 similar) >
[jira] [Updated] (IMPALA-1760) Add Impala command to gracefully shut down an Impala daemon
[ https://issues.apache.org/jira/browse/IMPALA-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-1760: -- Summary: Add Impala command to gracefully shut down an Impala daemon (was: Add decommissioning support / graceful shutdown / quiesce) > Add Impala command to gracefully shut down an Impala daemon > --- > > Key: IMPALA-1760 > URL: https://issues.apache.org/jira/browse/IMPALA-1760 > Project: IMPALA > Issue Type: New Feature > Components: Distributed Exec >Affects Versions: Impala 2.1.1 >Reporter: Henry Robinson >Assignee: Tim Armstrong >Priority: Critical > Labels: resource-management, scalability, scheduler, usability > > In larger clusters, node maintenance is a frequent occurrence. There's no way > currently to stop an Impala node without failing running queries, without > draining queries across the whole cluster first. We should fix that. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7595) Check failed: IsValidTime(time_) at timestamp-value.h:322
[ https://issues.apache.org/jira/browse/IMPALA-7595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629107#comment-16629107 ] Csaba Ringhofer commented on IMPALA-7595: - https://gerrit.cloudera.org/#/c/11521/ > Check failed: IsValidTime(time_) at timestamp-value.h:322 > -- > > Key: IMPALA-7595 > URL: https://issues.apache.org/jira/browse/IMPALA-7595 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.1.0 >Reporter: Tim Armstrong >Assignee: Csaba Ringhofer >Priority: Blocker > Labels: broken-build, crash > > See https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/3197/. hash is > 23c7d7e57b7868eedbf5a9a4bc4aafd6066a04fb > Some of the fuzz tests stand out amongst the tests that were running at the > same time as the crash, particularly: > 19:12:17 [gw4] PASSED > query_test/test_scanners_fuzz.py::TestScannersFuzzing::test_fuzz_alltypes[exec_option: > {'debug_action': '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', > 'abort_on_error': False, 'mem_limit': '512m', 'num_nodes': 0} | table_format: > parquet/none] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-7595) Check failed: IsValidTime(time_) at timestamp-value.h:322
[ https://issues.apache.org/jira/browse/IMPALA-7595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-7595 started by Csaba Ringhofer. --- > Check failed: IsValidTime(time_) at timestamp-value.h:322 > -- > > Key: IMPALA-7595 > URL: https://issues.apache.org/jira/browse/IMPALA-7595 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.1.0 >Reporter: Tim Armstrong >Assignee: Csaba Ringhofer >Priority: Blocker > Labels: broken-build, crash > > See https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/3197/. hash is > 23c7d7e57b7868eedbf5a9a4bc4aafd6066a04fb > Some of the fuzz tests stand out amongst the tests that were running at the > same time as the crash, particularly: > 19:12:17 [gw4] PASSED > query_test/test_scanners_fuzz.py::TestScannersFuzzing::test_fuzz_alltypes[exec_option: > {'debug_action': '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', > 'abort_on_error': False, 'mem_limit': '512m', 'num_nodes': 0} | table_format: > parquet/none] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7630) Support GENERATED/VIRTUAL columns in Impala tables
[ https://issues.apache.org/jira/browse/IMPALA-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bharath v updated IMPALA-7630: -- Component/s: Frontend Backend > Support GENERATED/VIRTUAL columns in Impala tables > -- > > Key: IMPALA-7630 > URL: https://issues.apache.org/jira/browse/IMPALA-7630 > Project: IMPALA > Issue Type: New Feature > Components: Backend, Frontend >Affects Versions: Impala 3.1.0 >Reporter: bharath v >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7630) Support GENERATED/VIRTUAL columns in Impala tables
[ https://issues.apache.org/jira/browse/IMPALA-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bharath v updated IMPALA-7630: -- Summary: Support GENERATED/VIRTUAL columns in Impala tables (was: Support ) > Support GENERATED/VIRTUAL columns in Impala tables > -- > > Key: IMPALA-7630 > URL: https://issues.apache.org/jira/browse/IMPALA-7630 > Project: IMPALA > Issue Type: New Feature > Components: Backend, Frontend >Affects Versions: Impala 3.1.0 >Reporter: bharath v >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7630) Support GENERATED/VIRTUAL columns in Impala tables
[ https://issues.apache.org/jira/browse/IMPALA-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bharath v updated IMPALA-7630: -- Affects Version/s: Impala 3.1.0 > Support GENERATED/VIRTUAL columns in Impala tables > -- > > Key: IMPALA-7630 > URL: https://issues.apache.org/jira/browse/IMPALA-7630 > Project: IMPALA > Issue Type: New Feature > Components: Backend, Frontend >Affects Versions: Impala 3.1.0 >Reporter: bharath v >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7630) Support
bharath v created IMPALA-7630: - Summary: Support Key: IMPALA-7630 URL: https://issues.apache.org/jira/browse/IMPALA-7630 Project: IMPALA Issue Type: New Feature Reporter: bharath v -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-5142) EventSequence::MarkEvent() may record concurrent events out of serialized order
[ https://issues.apache.org/jira/browse/IMPALA-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-5142: -- Fix Version/s: Impala 2.11.0 > EventSequence::MarkEvent() may record concurrent events out of serialized > order > --- > > Key: IMPALA-5142 > URL: https://issues.apache.org/jira/browse/IMPALA-5142 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 2.7.0 >Reporter: Mostafa Mokhtar >Assignee: Pranay Singh >Priority: Trivial > Labels: ramp-up, trivial > Fix For: Impala 2.11.0 > > > When the event for first dynamic filter is received ahead of all fragments > started the query timeline prints a negative time value for "fragment > instances started" > {code} >- Planning finished: 790.555ms (32.872ms) > Query Timeline: 2m40s >- Query submitted: 7.295ms (7.295ms) >- Planning finished: 1s397ms (1s390ms) >- Submit for admission: 3s059ms (1s661ms) >- Completed admission: 3s087ms (27.810ms) >- Ready to start 90 fragment instances: 3s527ms (440.540ms) >- First dynamic filter received: 7s851ms (4s323ms) >- All 90 fragment instances started: 7s851ms (-88037.000ns) >- Rows available: 2m28s (2m20s) >- First row fetched: 2m28s (51.725ms) >- Unregister query: 2m30s (1s459ms) > - ComputeScanRangeAssignmentTimer: 770.794ms > {code} > Query timeline when filter arrive after all fragments starting. > {code} > Query Timeline: 17s011ms >- Query submitted: 174.449us (174.449us) >- Planning finished: 209.847ms (209.672ms) >- Submit for admission: 255.819ms (45.971ms) >- Completed admission: 256.212ms (393.074us) >- Ready to start 90 fragment instances: 283.582ms (27.370ms) >- All 90 fragment instances started: 627.013ms (343.430ms) >- First dynamic filter received: 954.223ms (327.209ms) >- Rows available: 16s393ms (15s439ms) >- First row fetched: 16s705ms (311.586ms) >- Unregister query: 16s871ms (165.908ms) > - ComputeScanRangeAssignmentTimer: 13.125ms > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Issue Comment Deleted] (IMPALA-110) Add support for multiple distinct operators in the same query block
[ https://issues.apache.org/jira/browse/IMPALA-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-110: - Comment: was deleted (was: Thank you for your email. I will be taking time off starting Wednesday 9/26/2018 and returning Friday, 9/28/2018 and will have limited access to my email. If you require immediate assistance, please contact Narmada Gomatam. ) > Add support for multiple distinct operators in the same query block > --- > > Key: IMPALA-110 > URL: https://issues.apache.org/jira/browse/IMPALA-110 > Project: IMPALA > Issue Type: New Feature > Components: Backend, Frontend >Affects Versions: Impala 0.5, Impala 1.4, Impala 2.0, Impala 2.2, Impala > 2.3.0 >Reporter: Greg Rahn >Assignee: Thomas Tauber-Marshall >Priority: Major > Labels: sql-language > > Impala only allows a single (DISTINCT columns) expression in each query. > {color:red}Note: > If you do not need precise accuracy, you can produce an estimate of the > distinct values for a column by specifying NDV(column); a query can contain > multiple instances of NDV(column). To make Impala automatically rewrite > COUNT(DISTINCT) expressions to NDV(), enable the APPX_COUNT_DISTINCT query > option. > {color} > {code} > [impala:21000] > select count(distinct i_class_id) from item; > Query: select count(distinct i_class_id) from item > Query finished, fetching results ... > 16 > Returned 1 row(s) in 1.51s > {code} > {code} > [impala:21000] > select count(distinct i_class_id), count(distinct > i_brand_id) from item; > Query: select count(distinct i_class_id), count(distinct i_brand_id) from item > ERROR: com.cloudera.impala.common.AnalysisException: Analysis exception (in > select count(distinct i_class_id), count(distinct i_brand_id) from item) > at > com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:133) > at > com.cloudera.impala.service.Frontend.createExecRequest(Frontend.java:221) > at > com.cloudera.impala.service.JniFrontend.createExecRequest(JniFrontend.java:89) > Caused by: com.cloudera.impala.common.AnalysisException: all DISTINCT > aggregate functions need to have the same set of parameters as COUNT(DISTINCT > i_class_id); deviating function: COUNT(DISTINCT i_brand_id) > at > com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo(AggregateInfo.java:196) > at > com.cloudera.impala.analysis.AggregateInfo.create(AggregateInfo.java:143) > at > com.cloudera.impala.analysis.SelectStmt.createAggInfo(SelectStmt.java:466) > at > com.cloudera.impala.analysis.SelectStmt.analyzeAggregation(SelectStmt.java:347) > at com.cloudera.impala.analysis.SelectStmt.analyze(SelectStmt.java:155) > at > com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:130) > ... 2 more > {code} > Hive supports this: > {code} > $ hive -e "select count(distinct i_class_id), count(distinct i_brand_id) from > item;" > Logging initialized using configuration in > file:/etc/hive/conf.dist/hive-log4j.properties > Hive history file=/tmp/grahn/hive_job_log_grahn_201303052234_1625576708.txt > Total MapReduce jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapred.reduce.tasks= > Starting Job = job_201302081514_0073, Tracking URL = > http://impala:50030/jobdetails.jsp?jobid=job_201302081514_0073 > Kill Command = /usr/lib/hadoop/bin/hadoop job > -Dmapred.job.tracker=m0525.mtv.cloudera.com:8021 -kill job_201302081514_0073 > Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 1 > 2013-03-05 22:34:43,255 Stage-1 map = 0%, reduce = 0% > 2013-03-05 22:34:49,323 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:50,337 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:51,351 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:52,360 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:53,370 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:54,379 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:55,389 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 8.58 sec > 2013-03-05 22:34:56,402 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 8.58 sec > 2013-03-05 22:34:57,413 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 8.58 sec > 2013-03-05 22:34:58,424 Stage-1 map = 100%, reduce = 100%,
[jira] [Commented] (IMPALA-7628) test_tls_ecdh failing on CentOS 6/Python 2.6
[ https://issues.apache.org/jira/browse/IMPALA-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628984#comment-16628984 ] Tim Armstrong commented on IMPALA-7628: --- Temporary fix here: https://gerrit.cloudera.org/#/c/11519/ > test_tls_ecdh failing on CentOS 6/Python 2.6 > > > Key: IMPALA-7628 > URL: https://issues.apache.org/jira/browse/IMPALA-7628 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.1.0 > Environment: CentOS 6.4, Python 2.6 >Reporter: Tim Armstrong >Assignee: Thomas Tauber-Marshall >Priority: Blocker > > {noformat} > custom_cluster/test_client_ssl.py:125: in test_tls_ecdh > self._validate_positive_cases("%s/server-cert.pem" % self.CERT_DIR) > custom_cluster/test_client_ssl.py:198: in _validate_positive_cases > result = run_impala_shell_cmd(shell_options) > shell/util.py:97: in run_impala_shell_cmd > result.stderr) > E AssertionError: Cmd --ssl -q 'select 1 + 2' was expected to succeed: > Starting Impala Shell without Kerberos authentication > E SSL is enabled. Impala server certificates will NOT be verified (set > --ca_cert to change) > E > /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80: > DeprecationWarning: 3th positional argument is deprecated. Use keyward > argument insteand. > E DeprecationWarning) > E > /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80: > DeprecationWarning: 4th positional argument is deprecated. Use keyward > argument insteand. > E DeprecationWarning) > E > /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80: > DeprecationWarning: 5th positional argument is deprecated. Use keyward > argument insteand. > E DeprecationWarning) > E > /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:216: > DeprecationWarning: validate is deprecated. Use cert_reqs=ssl.CERT_NONE > instead > E DeprecationWarning) > E No handlers could be found for logger "thrift.transport.TSSLSocket" > E Error connecting: TTransportException, Could not connect to > localhost:21000: [Errno 1] _ssl.c:490: error:14094410:SSL > routines:SSL3_READ_BYTES:sslv3 alert handshake failure > E Not connected to Impala, could not execute queries. > {noformat} > Git hash is e38715e25297cc3643482be04e3b1b273e339b54 > I'm going to push out a temporary fix to unblock tests (since there are other > related tests skipped on this platform) but I'll let Thomas validate the > correctness of it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7629) TestClientSsl tests seem to be disabled on non-legacy platforms
Tim Armstrong created IMPALA-7629: - Summary: TestClientSsl tests seem to be disabled on non-legacy platforms Key: IMPALA-7629 URL: https://issues.apache.org/jira/browse/IMPALA-7629 Project: IMPALA Issue Type: Bug Components: Infrastructure Affects Versions: Impala 3.1.0 Environment: Ubuntu 16.04, Python 2.7.14 Reporter: Tim Armstrong Assignee: Philip Zeyliger I noticed that when I ran some of these tests on Ubuntu 16.04 they are skipped: {noformat} $ impala-py.test tests/custom_cluster/test_client_ssl.py -k ecdh ... tests/custom_cluster/test_client_ssl.py::TestClientSsl::test_tls_ecdh[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 0} | table_format: text/none] SKIPPED {noformat} I don't think this is intended. The logic in IMPALA-6990 looks backwards in that HAS_LEGACY_OPENSSL is a non-None integer (i.e. truthy) when the version field exists. Assigning to Phil since he reviewed the patch and probably has some context. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7628) test_tls_ecdh failing on CentOS 6/Python 2.6
[ https://issues.apache.org/jira/browse/IMPALA-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-7628: -- Summary: test_tls_ecdh failing on CentOS 6/Python 2.6 (was: test_tls_edch failing on CentOS 6/Python 2.6) > test_tls_ecdh failing on CentOS 6/Python 2.6 > > > Key: IMPALA-7628 > URL: https://issues.apache.org/jira/browse/IMPALA-7628 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.1.0 > Environment: CentOS 6.4, Python 2.6 >Reporter: Tim Armstrong >Assignee: Thomas Tauber-Marshall >Priority: Blocker > > {noformat} > custom_cluster/test_client_ssl.py:125: in test_tls_ecdh > self._validate_positive_cases("%s/server-cert.pem" % self.CERT_DIR) > custom_cluster/test_client_ssl.py:198: in _validate_positive_cases > result = run_impala_shell_cmd(shell_options) > shell/util.py:97: in run_impala_shell_cmd > result.stderr) > E AssertionError: Cmd --ssl -q 'select 1 + 2' was expected to succeed: > Starting Impala Shell without Kerberos authentication > E SSL is enabled. Impala server certificates will NOT be verified (set > --ca_cert to change) > E > /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80: > DeprecationWarning: 3th positional argument is deprecated. Use keyward > argument insteand. > E DeprecationWarning) > E > /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80: > DeprecationWarning: 4th positional argument is deprecated. Use keyward > argument insteand. > E DeprecationWarning) > E > /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80: > DeprecationWarning: 5th positional argument is deprecated. Use keyward > argument insteand. > E DeprecationWarning) > E > /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:216: > DeprecationWarning: validate is deprecated. Use cert_reqs=ssl.CERT_NONE > instead > E DeprecationWarning) > E No handlers could be found for logger "thrift.transport.TSSLSocket" > E Error connecting: TTransportException, Could not connect to > localhost:21000: [Errno 1] _ssl.c:490: error:14094410:SSL > routines:SSL3_READ_BYTES:sslv3 alert handshake failure > E Not connected to Impala, could not execute queries. > {noformat} > Git hash is e38715e25297cc3643482be04e3b1b273e339b54 > I'm going to push out a temporary fix to unblock tests (since there are other > related tests skipped on this platform) but I'll let Thomas validate the > correctness of it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7628) test_tls_edch failing on CentOS 6/Python 2.6
Tim Armstrong created IMPALA-7628: - Summary: test_tls_edch failing on CentOS 6/Python 2.6 Key: IMPALA-7628 URL: https://issues.apache.org/jira/browse/IMPALA-7628 Project: IMPALA Issue Type: Bug Components: Infrastructure Affects Versions: Impala 3.1.0 Environment: CentOS 6.4, Python 2.6 Reporter: Tim Armstrong Assignee: Thomas Tauber-Marshall {noformat} custom_cluster/test_client_ssl.py:125: in test_tls_ecdh self._validate_positive_cases("%s/server-cert.pem" % self.CERT_DIR) custom_cluster/test_client_ssl.py:198: in _validate_positive_cases result = run_impala_shell_cmd(shell_options) shell/util.py:97: in run_impala_shell_cmd result.stderr) E AssertionError: Cmd --ssl -q 'select 1 + 2' was expected to succeed: Starting Impala Shell without Kerberos authentication E SSL is enabled. Impala server certificates will NOT be verified (set --ca_cert to change) E /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80: DeprecationWarning: 3th positional argument is deprecated. Use keyward argument insteand. E DeprecationWarning) E /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80: DeprecationWarning: 4th positional argument is deprecated. Use keyward argument insteand. E DeprecationWarning) E /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80: DeprecationWarning: 5th positional argument is deprecated. Use keyward argument insteand. E DeprecationWarning) E /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:216: DeprecationWarning: validate is deprecated. Use cert_reqs=ssl.CERT_NONE instead E DeprecationWarning) E No handlers could be found for logger "thrift.transport.TSSLSocket" E Error connecting: TTransportException, Could not connect to localhost:21000: [Errno 1] _ssl.c:490: error:14094410:SSL routines:SSL3_READ_BYTES:sslv3 alert handshake failure E Not connected to Impala, could not execute queries. {noformat} Git hash is e38715e25297cc3643482be04e3b1b273e339b54 I'm going to push out a temporary fix to unblock tests (since there are other related tests skipped on this platform) but I'll let Thomas validate the correctness of it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-110) Add support for multiple distinct operators in the same query block
[ https://issues.apache.org/jira/browse/IMPALA-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628966#comment-16628966 ] Eric Campbell commented on IMPALA-110: -- Thank you for your email. I will be taking time off starting Wednesday 9/26/2018 and returning Friday, 9/28/2018 and will have limited access to my email. If you require immediate assistance, please contact Narmada Gomatam. > Add support for multiple distinct operators in the same query block > --- > > Key: IMPALA-110 > URL: https://issues.apache.org/jira/browse/IMPALA-110 > Project: IMPALA > Issue Type: New Feature > Components: Backend, Frontend >Affects Versions: Impala 0.5, Impala 1.4, Impala 2.0, Impala 2.2, Impala > 2.3.0 >Reporter: Greg Rahn >Assignee: Thomas Tauber-Marshall >Priority: Major > Labels: sql-language > > Impala only allows a single (DISTINCT columns) expression in each query. > {color:red}Note: > If you do not need precise accuracy, you can produce an estimate of the > distinct values for a column by specifying NDV(column); a query can contain > multiple instances of NDV(column). To make Impala automatically rewrite > COUNT(DISTINCT) expressions to NDV(), enable the APPX_COUNT_DISTINCT query > option. > {color} > {code} > [impala:21000] > select count(distinct i_class_id) from item; > Query: select count(distinct i_class_id) from item > Query finished, fetching results ... > 16 > Returned 1 row(s) in 1.51s > {code} > {code} > [impala:21000] > select count(distinct i_class_id), count(distinct > i_brand_id) from item; > Query: select count(distinct i_class_id), count(distinct i_brand_id) from item > ERROR: com.cloudera.impala.common.AnalysisException: Analysis exception (in > select count(distinct i_class_id), count(distinct i_brand_id) from item) > at > com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:133) > at > com.cloudera.impala.service.Frontend.createExecRequest(Frontend.java:221) > at > com.cloudera.impala.service.JniFrontend.createExecRequest(JniFrontend.java:89) > Caused by: com.cloudera.impala.common.AnalysisException: all DISTINCT > aggregate functions need to have the same set of parameters as COUNT(DISTINCT > i_class_id); deviating function: COUNT(DISTINCT i_brand_id) > at > com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo(AggregateInfo.java:196) > at > com.cloudera.impala.analysis.AggregateInfo.create(AggregateInfo.java:143) > at > com.cloudera.impala.analysis.SelectStmt.createAggInfo(SelectStmt.java:466) > at > com.cloudera.impala.analysis.SelectStmt.analyzeAggregation(SelectStmt.java:347) > at com.cloudera.impala.analysis.SelectStmt.analyze(SelectStmt.java:155) > at > com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:130) > ... 2 more > {code} > Hive supports this: > {code} > $ hive -e "select count(distinct i_class_id), count(distinct i_brand_id) from > item;" > Logging initialized using configuration in > file:/etc/hive/conf.dist/hive-log4j.properties > Hive history file=/tmp/grahn/hive_job_log_grahn_201303052234_1625576708.txt > Total MapReduce jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapred.reduce.tasks= > Starting Job = job_201302081514_0073, Tracking URL = > http://impala:50030/jobdetails.jsp?jobid=job_201302081514_0073 > Kill Command = /usr/lib/hadoop/bin/hadoop job > -Dmapred.job.tracker=m0525.mtv.cloudera.com:8021 -kill job_201302081514_0073 > Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 1 > 2013-03-05 22:34:43,255 Stage-1 map = 0%, reduce = 0% > 2013-03-05 22:34:49,323 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:50,337 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:51,351 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:52,360 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:53,370 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:54,379 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:55,389 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 8.58 sec > 2013-03-05 22:34:56,402 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 8.58 sec > 2013-03-05 22:34:57,413 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 8.58 sec > 2013-03-05 22:34:58,424 Stage-1 map = 100%, reduce =
[jira] [Commented] (IMPALA-110) Add support for multiple distinct operators in the same query block
[ https://issues.apache.org/jira/browse/IMPALA-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628963#comment-16628963 ] Ruslan Dautkhanov commented on IMPALA-110: -- Hooray!! Congrats - great work everyone involved. Will this be part of Impala 2.13 release / CDH 5.16? Thank you! > Add support for multiple distinct operators in the same query block > --- > > Key: IMPALA-110 > URL: https://issues.apache.org/jira/browse/IMPALA-110 > Project: IMPALA > Issue Type: New Feature > Components: Backend, Frontend >Affects Versions: Impala 0.5, Impala 1.4, Impala 2.0, Impala 2.2, Impala > 2.3.0 >Reporter: Greg Rahn >Assignee: Thomas Tauber-Marshall >Priority: Major > Labels: sql-language > > Impala only allows a single (DISTINCT columns) expression in each query. > {color:red}Note: > If you do not need precise accuracy, you can produce an estimate of the > distinct values for a column by specifying NDV(column); a query can contain > multiple instances of NDV(column). To make Impala automatically rewrite > COUNT(DISTINCT) expressions to NDV(), enable the APPX_COUNT_DISTINCT query > option. > {color} > {code} > [impala:21000] > select count(distinct i_class_id) from item; > Query: select count(distinct i_class_id) from item > Query finished, fetching results ... > 16 > Returned 1 row(s) in 1.51s > {code} > {code} > [impala:21000] > select count(distinct i_class_id), count(distinct > i_brand_id) from item; > Query: select count(distinct i_class_id), count(distinct i_brand_id) from item > ERROR: com.cloudera.impala.common.AnalysisException: Analysis exception (in > select count(distinct i_class_id), count(distinct i_brand_id) from item) > at > com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:133) > at > com.cloudera.impala.service.Frontend.createExecRequest(Frontend.java:221) > at > com.cloudera.impala.service.JniFrontend.createExecRequest(JniFrontend.java:89) > Caused by: com.cloudera.impala.common.AnalysisException: all DISTINCT > aggregate functions need to have the same set of parameters as COUNT(DISTINCT > i_class_id); deviating function: COUNT(DISTINCT i_brand_id) > at > com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo(AggregateInfo.java:196) > at > com.cloudera.impala.analysis.AggregateInfo.create(AggregateInfo.java:143) > at > com.cloudera.impala.analysis.SelectStmt.createAggInfo(SelectStmt.java:466) > at > com.cloudera.impala.analysis.SelectStmt.analyzeAggregation(SelectStmt.java:347) > at com.cloudera.impala.analysis.SelectStmt.analyze(SelectStmt.java:155) > at > com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:130) > ... 2 more > {code} > Hive supports this: > {code} > $ hive -e "select count(distinct i_class_id), count(distinct i_brand_id) from > item;" > Logging initialized using configuration in > file:/etc/hive/conf.dist/hive-log4j.properties > Hive history file=/tmp/grahn/hive_job_log_grahn_201303052234_1625576708.txt > Total MapReduce jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapred.reduce.tasks= > Starting Job = job_201302081514_0073, Tracking URL = > http://impala:50030/jobdetails.jsp?jobid=job_201302081514_0073 > Kill Command = /usr/lib/hadoop/bin/hadoop job > -Dmapred.job.tracker=m0525.mtv.cloudera.com:8021 -kill job_201302081514_0073 > Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 1 > 2013-03-05 22:34:43,255 Stage-1 map = 0%, reduce = 0% > 2013-03-05 22:34:49,323 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:50,337 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:51,351 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:52,360 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:53,370 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:54,379 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:55,389 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 8.58 sec > 2013-03-05 22:34:56,402 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 8.58 sec > 2013-03-05 22:34:57,413 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 8.58 sec > 2013-03-05 22:34:58,424 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 8.58 sec > MapReduce Total cumulative CPU time: 8 seconds 580 msec > Ended Job
[jira] [Commented] (IMPALA-7600) Mem limit exceeded in test_kudu_scan_mem_usage
[ https://issues.apache.org/jira/browse/IMPALA-7600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628945#comment-16628945 ] ASF subversion and git services commented on IMPALA-7600: - Commit ce145ffee6ee68a60c4ef663cb9f47f22d9eb19f in impala's branch refs/heads/master from [~tarmstr...@cloudera.com] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=ce145ff ] IMPALA-7600: bump mem_limit for test_kudu_scan_mem_usage The estimate for memory consumption for this scan is 9 columns * 384kb per column = 3.375mb. So if we set the mem_limit to 6.5mb, we should still not get more than one scanner thread, but we can avoid hitting out-of-memory. The issue in the JIRA was queued row batches. With this change, and num_scanner_threads=2, there should be max 12 row batches (10 in the queue, 2 in the scanner threads about to be enqueued) and based on the column stats I'd estimate that each row batch is around 200kb, so this change should provide significantly more headroom. Change-Id: I6d992cc076bc8678089f765bdffe92e877e9d229 Reviewed-on: http://gerrit.cloudera.org:8080/11513 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Mem limit exceeded in test_kudu_scan_mem_usage > -- > > Key: IMPALA-7600 > URL: https://issues.apache.org/jira/browse/IMPALA-7600 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.1.0 >Reporter: Thomas Tauber-Marshall >Assignee: Tim Armstrong >Priority: Blocker > Labels: broken-build, flaky > > Seen in an exhaustive release build: > {noformat} > 00:05:35 TestScanMemLimit.test_kudu_scan_mem_usage[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: avro/snap/block] > 00:05:35 [gw6] linux2 -- Python 2.7.5 > /data/jenkins/workspace/impala-asf-master-exhaustive-release/repos/Impala/bin/../infra/python/env/bin/python > 00:05:35 query_test/test_mem_usage_scaling.py:358: in test_kudu_scan_mem_usage > 00:05:35 self.run_test_case('QueryTest/kudu-scan-mem-usage', vector) > 00:05:35 common/impala_test_suite.py:408: in run_test_case > 00:05:35 result = self.__execute_query(target_impalad_client, query, > user=user) > 00:05:35 common/impala_test_suite.py:623: in __execute_query > 00:05:35 return impalad_client.execute(query, user=user) > 00:05:35 common/impala_connection.py:160: in execute > 00:05:35 return self.__beeswax_client.execute(sql_stmt, user=user) > 00:05:35 beeswax/impala_beeswax.py:176: in execute > 00:05:35 handle = self.__execute_query(query_string.strip(), user=user) > 00:05:35 beeswax/impala_beeswax.py:350: in __execute_query > 00:05:35 self.wait_for_finished(handle) > 00:05:35 beeswax/impala_beeswax.py:371: in wait_for_finished > 00:05:35 raise ImpalaBeeswaxException("Query aborted:" + error_log, None) > 00:05:35 E ImpalaBeeswaxException: ImpalaBeeswaxException: > 00:05:35 EQuery aborted:Memory limit exceeded: Error occurred on backend > impala-ec2-centos74-m5-4xlarge-ondemand-0e2c.vpc.cloudera.com:22000 by > fragment b34270820f59a0c9:a507139e0001 > 00:05:35 E Memory left in process limit: 10.12 GB > 00:05:35 E Memory left in query limit: -16.92 KB > 00:05:35 E Query(b34270820f59a0c9:a507139e): memory limit exceeded. > Limit=4.00 MB Reservation=0 ReservationLimit=0 OtherMemory=4.02 MB Total=4.02 > MB Peak=4.02 MB > 00:05:35 E Fragment b34270820f59a0c9:a507139e: Reservation=0 > OtherMemory=40.10 KB Total=40.10 KB Peak=340.00 KB > 00:05:35 E EXCHANGE_NODE (id=2): Reservation=32.00 KB OtherMemory=0 > Total=32.00 KB Peak=32.00 KB > 00:05:35 E KrpcDeferredRpcs: Total=0 Peak=0 > 00:05:35 E PLAN_ROOT_SINK: Total=0 Peak=0 > 00:05:35 E CodeGen: Total=103.00 B Peak=332.00 KB > 00:05:35 E Fragment b34270820f59a0c9:a507139e0001: Reservation=0 > OtherMemory=3.98 MB Total=3.98 MB Peak=3.98 MB > 00:05:35 E SORT_NODE (id=1): Total=342.00 KB Peak=342.00 KB > 00:05:35 E KUDU_SCAN_NODE (id=0): Total=3.63 MB Peak=3.63 MB > 00:05:35 E Queued Batches: Total=3.30 MB Peak=3.63 MB > 00:05:35 E KrpcDataStreamSender (dst_id=2): Total=1.16 KB Peak=1.16 KB > 00:05:35 E CodeGen: Total=3.66 KB Peak=1.14 MB > 00:05:35 E > 00:05:35 E Memory limit exceeded: Error occurred on backend > impala-ec2-centos74-m5-4xlarge-ondemand-0e2c.vpc.cloudera.com:22000 by > fragment b34270820f59a0c9:a507139e0001 > 00:05:35 E Memory left in process limit: 10.12 GB > 00:05:35 E Memory left in query limit: -16.92 KB > 00:05:35 E Query(b34270820f59a0c9:a507139e): memory limit exceeded. > Limit=4.00 MB
[jira] [Assigned] (IMPALA-7627) Parallel the fetching permission process
[ https://issues.apache.org/jira/browse/IMPALA-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peikai Zheng reassigned IMPALA-7627: Assignee: Peikai Zheng > Parallel the fetching permission process > > > Key: IMPALA-7627 > URL: https://issues.apache.org/jira/browse/IMPALA-7627 > Project: IMPALA > Issue Type: Improvement >Reporter: Peikai Zheng >Assignee: Peikai Zheng >Priority: Major > > There are three phases when the Catalogd loading the metadata of a table. > Firstly, the Catalogd fetches the metadata from Hive metastore; > Then, the Catalogd fetches the permission of each partition from HDFS > NameNode; > Finally, the Catalogd loads the file descriptor from HDFS NameNode. > According to my test result: > ||Average Time(GetFileInfoThread=10) || phase 1 || phase 2 || phase 3|| > > |idm.sauron_message|9.9917115|459.2106944|95.0179163| > |default.revenue_enriched|12.3377474|111.2969046|40.827472| > |default.upp_raw_prod|1.5143162|50.0251426|12.6805323| > |default.hit_to_beacon_playback_prod|1.4294509|49.7670539|18.3557858| > |default.sitetracking_enriched|13.0003804|112.8746656|42.1824032| > |default.player_custom_event|9.2618705|493.4865302|116.4986184| > |default.revenue_day_est|57.9116561|106.5028664|24.005822| > The majority of the time occupied by the second phase. > So, I suggest to parallel the second phase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7624) test-with-docker sometimes hangs creating docker containers
[ https://issues.apache.org/jira/browse/IMPALA-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628286#comment-16628286 ] ASF subversion and git services commented on IMPALA-7624: - Commit 91673fee607b552f142c6ab2aad0e96efa9e0f80 in impala's branch refs/heads/master from [~philip] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=91673fe ] IMPALA-7624: Workaround docker/kernel bug causing test-with-docker to sometimes hang. I've observed that builds of test-with-docker that have "suite parallelism" sometimes hang when the Docker containers are being created. (The implementation had multiple threads calling "docker create" simultaneously.) Trolling the mailing lists, it's maybe a bug in Docker or the kernel. I've never caught it live enough to strace it. A hopeful workaround is to serialize the docker create calls, which is easy and harmless, given that "docker create" is usually pretty quick (subsecond) and the overall run time here is hours+. With this change, I was able to run test-with-docker with --suite-concurrency=6 on a c5.9xlarge in AWS, with a total runtime of 1h35m. The hangs are intermittent and cause, in the typical case, inconsistency in runtimes because less parallelism happens when one of the "docker create" calls hang. (I've seen them resume after one of the other containers finishes.) We'll find out with time whether this stabilizes it or has no effect. Change-Id: I3e44db7a6ce08a42d6fe574d7348332578cd9e51 Reviewed-on: http://gerrit.cloudera.org:8080/11481 Reviewed-by: Philip Zeyliger Tested-by: Impala Public Jenkins > test-with-docker sometimes hangs creating docker containers > --- > > Key: IMPALA-7624 > URL: https://issues.apache.org/jira/browse/IMPALA-7624 > Project: IMPALA > Issue Type: Task >Reporter: Philip Zeyliger >Priority: Major > > I've seen the test-with-docker executions hang, or sort of hang, in threads > doing {{docker create}}. I think this is ultimately a Docker or kernel bug, > but we can work around it by serializing our "docker create" invocations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7456) Deprecated file-based authorization
[ https://issues.apache.org/jira/browse/IMPALA-7456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628283#comment-16628283 ] ASF subversion and git services commented on IMPALA-7456: - Commit 48640b5dfa131ca0c7ae9e541e376d11ac6e6d33 in impala's branch refs/heads/master from [~aholley] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=48640b5 ] IMPALA-7456: Deprecate file-based authorization This patch simply adds a warning message to the log when the authorization_policy_file run-time flag is used. Sentry has deprecated the use of policy files and they do not support user level privileges which are required for object ownership. Here is the Jira where it will be removed. SENTRY-1922 Test: - Added custom cluster test to validate logs - Ran all custom cluster tests Change-Id: Ibbb13f3ef1c3a00812c180ecef022ea638c2ebc7 Reviewed-on: http://gerrit.cloudera.org:8080/11502 Reviewed-by: Fredy Wijaya Tested-by: Impala Public Jenkins > Deprecated file-based authorization > --- > > Key: IMPALA-7456 > URL: https://issues.apache.org/jira/browse/IMPALA-7456 > Project: IMPALA > Issue Type: Dependency upgrade > Components: Frontend >Affects Versions: Impala 3.0 >Reporter: Adam Holley >Assignee: Adam Holley >Priority: Major > Labels: security > Fix For: Impala 3.1.0 > > > Sentry has deprecated their support of file-based authorizations. Some newer > security features such as object ownership require user level authorizations > which the file-based security does not support. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-2990) Coordinator should timeout a connection for an unresponsive backend
[ https://issues.apache.org/jira/browse/IMPALA-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628285#comment-16628285 ] ASF subversion and git services commented on IMPALA-2990: - Commit f46de21140f3bb483884fc49f5ded7afc466faac in impala's branch refs/heads/master from [~tarmstr...@cloudera.com] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=f46de21 ] IMPALA-1760: Implement shutdown command This is the same patch except with fixes for the test failures on EC and S3 noted in the JIRA. This allows graceful shutdown of executors and partially graceful shutdown of coordinators (new operations fail, old operations can continue). Details: * In order to allow future admin commands, this is implemented with function-like syntax and does not add any reserved words. * ALL privilege is required on the server * The coordinator impalad that the client is connected to can be shut down directly with ":shutdown()". * Remote shutdown of another impalad is supported, e.g. with ":shutdown('hostname')", so that non-coordinators can be shut down and for the convenience of the client, which does not have to connect to the specific impalad. There is no assumption that the other impalad is registered in the statestore; just that the coordinator can connect to the other daemon's thrift endpoint. This simplifies things and allows shutdown in various important cases, e.g. statestore down. * The shutdown time limit can be overridden to force a quicker or slower shutdown by specifying a deadline in seconds after the statement is executed. * If shutting down, a banner is shown on the root debug page. Workflow: 1. (if a coordinator) clients are prevented from submitting queries to this coordinator via some out-of-band mechanism, e.g. load balancer 2. the shutdown process is started via ":shutdown()" 3. a bit is set in the statestore and propagated to coordinators, which stop scheduling fragment instances on this daemon (if an executor). 4. the query startup grace period (which is ideally set to the AC queueing delay plus some additional leeway) expires 5. once the daemon is quiesced (i.e. no fragments, no registered queries), it shuts itself down. 6. If the daemon does not successfully quiesce (e.g. rogue clients, long-running queries), after a longer timeout (counted from the start of the shutdown process) it will shut down anyway. What this does: * Executors can be shut down without causing a service-wide outage * Shutting down an executor will not disrupt any short-running queries and will wait for long-running queries up to a threshold. * Coordinators can be shut down without query failures only if there is an out-of-band mechanism to prevent submission of more queries to the shut down coordinator. If queries are submitted to a coordinator after shutdown has started, they will fail. * Long running queries or other issues (e.g. stuck fragments) will slow down but not prevent eventual shutdown. Limitations: * The startup grace period needs to be configured to be greater than the latency of statestore updates + scheduling + admission + coordinator startup. Otherwise a coordinator may send a fragment instance to the shutting down impalad. (We could automate this configuration as a follow-on) * The startup grace period means a minimum latency for shutdown, even if the cluster is idle. * We depend on the statestore detecting the process going down if queries are still running on that backend when the timeout expires. This may still be subject to existing problems, e.g. IMPALA-2990. Tests: * Added parser, analysis and authorization tests. * End-to-end test of shutting down impalads. * End-to-end test of shutting down then restarting an executor while queries are running. * End-to-end test of shutting down a coordinator - New queries cannot be started on coord, existing queries continue to run - Exercises various Beeswax and HS2 operations. Change-Id: I8f3679ef442745a60a0ab97c4e9eac437aef9463 Reviewed-on: http://gerrit.cloudera.org:8080/11484 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Coordinator should timeout a connection for an unresponsive backend > --- > > Key: IMPALA-2990 > URL: https://issues.apache.org/jira/browse/IMPALA-2990 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 2.3.0 >Reporter: Sailesh Mukil >Assignee: Michael Ho >Priority: Critical > Labels: hang, observability, supportability > > The coordinator currently waits indefinitely if it does not hear back from a > backend. This could cause a query to hang indefinitely in case of a network > error, etc. > We should add logic for determining
[jira] [Commented] (IMPALA-7546) Impala 3.1 Doc: Doc the new query option TIMEZONE
[ https://issues.apache.org/jira/browse/IMPALA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628281#comment-16628281 ] ASF subversion and git services commented on IMPALA-7546: - Commit 17bc980d9540b29a1667841b7bffc2084204ac35 in impala's branch refs/heads/master from [~arodoni_cloudera] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=17bc980 ] IMPALA-7546: [DOCS] A new TIMEZONE query option Documented the new TIMEZONE query option to set a time TIMEZONE to be used in timestamp conversions. Change-Id: I734b8b37ae2360422fce269ed87507a04e8c05ac Reviewed-on: http://gerrit.cloudera.org:8080/11505 Tested-by: Impala Public Jenkins Reviewed-by: Csaba Ringhofer > Impala 3.1 Doc: Doc the new query option TIMEZONE > - > > Key: IMPALA-7546 > URL: https://issues.apache.org/jira/browse/IMPALA-7546 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc > Fix For: Impala 3.1.0 > > > https://gerrit.cloudera.org/#/c/11505/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-1760) Add decommissioning support / graceful shutdown / quiesce
[ https://issues.apache.org/jira/browse/IMPALA-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628284#comment-16628284 ] ASF subversion and git services commented on IMPALA-1760: - Commit f46de21140f3bb483884fc49f5ded7afc466faac in impala's branch refs/heads/master from [~tarmstr...@cloudera.com] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=f46de21 ] IMPALA-1760: Implement shutdown command This is the same patch except with fixes for the test failures on EC and S3 noted in the JIRA. This allows graceful shutdown of executors and partially graceful shutdown of coordinators (new operations fail, old operations can continue). Details: * In order to allow future admin commands, this is implemented with function-like syntax and does not add any reserved words. * ALL privilege is required on the server * The coordinator impalad that the client is connected to can be shut down directly with ":shutdown()". * Remote shutdown of another impalad is supported, e.g. with ":shutdown('hostname')", so that non-coordinators can be shut down and for the convenience of the client, which does not have to connect to the specific impalad. There is no assumption that the other impalad is registered in the statestore; just that the coordinator can connect to the other daemon's thrift endpoint. This simplifies things and allows shutdown in various important cases, e.g. statestore down. * The shutdown time limit can be overridden to force a quicker or slower shutdown by specifying a deadline in seconds after the statement is executed. * If shutting down, a banner is shown on the root debug page. Workflow: 1. (if a coordinator) clients are prevented from submitting queries to this coordinator via some out-of-band mechanism, e.g. load balancer 2. the shutdown process is started via ":shutdown()" 3. a bit is set in the statestore and propagated to coordinators, which stop scheduling fragment instances on this daemon (if an executor). 4. the query startup grace period (which is ideally set to the AC queueing delay plus some additional leeway) expires 5. once the daemon is quiesced (i.e. no fragments, no registered queries), it shuts itself down. 6. If the daemon does not successfully quiesce (e.g. rogue clients, long-running queries), after a longer timeout (counted from the start of the shutdown process) it will shut down anyway. What this does: * Executors can be shut down without causing a service-wide outage * Shutting down an executor will not disrupt any short-running queries and will wait for long-running queries up to a threshold. * Coordinators can be shut down without query failures only if there is an out-of-band mechanism to prevent submission of more queries to the shut down coordinator. If queries are submitted to a coordinator after shutdown has started, they will fail. * Long running queries or other issues (e.g. stuck fragments) will slow down but not prevent eventual shutdown. Limitations: * The startup grace period needs to be configured to be greater than the latency of statestore updates + scheduling + admission + coordinator startup. Otherwise a coordinator may send a fragment instance to the shutting down impalad. (We could automate this configuration as a follow-on) * The startup grace period means a minimum latency for shutdown, even if the cluster is idle. * We depend on the statestore detecting the process going down if queries are still running on that backend when the timeout expires. This may still be subject to existing problems, e.g. IMPALA-2990. Tests: * Added parser, analysis and authorization tests. * End-to-end test of shutting down impalads. * End-to-end test of shutting down then restarting an executor while queries are running. * End-to-end test of shutting down a coordinator - New queries cannot be started on coord, existing queries continue to run - Exercises various Beeswax and HS2 operations. Change-Id: I8f3679ef442745a60a0ab97c4e9eac437aef9463 Reviewed-on: http://gerrit.cloudera.org:8080/11484 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Add decommissioning support / graceful shutdown / quiesce > - > > Key: IMPALA-1760 > URL: https://issues.apache.org/jira/browse/IMPALA-1760 > Project: IMPALA > Issue Type: New Feature > Components: Distributed Exec >Affects Versions: Impala 2.1.1 >Reporter: Henry Robinson >Assignee: Tim Armstrong >Priority: Critical > Labels: resource-management, scalability, scheduler, usability > > In larger clusters, node maintenance is a frequent occurrence. There's no way > currently to stop an Impala node without failing running queries, without > draining queries across the whole
[jira] [Commented] (IMPALA-110) Add support for multiple distinct operators in the same query block
[ https://issues.apache.org/jira/browse/IMPALA-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628287#comment-16628287 ] ASF subversion and git services commented on IMPALA-110: Commit df53ec2385190bba2b3cefb43b094cde6d33642f in impala's branch refs/heads/master from [~twmarshall] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=df53ec2 ] IMPALA-110: Support for multiple DISTINCT This patch adds support for having multiple aggregate functions in a single SELECT block that use DISTINCT over different sets of columns. Planner design: - The existing tree-based plan shape with a two-phased aggregation is maintained. - Existing plans are not changed. - Aggregates are grouped into 'aggregation classes' based on their expressions in the distinct portion which may be empty for non-distinct aggregates. - The aggregation framework is generalized to simultaneously process multiple aggregation classes within the tree-based plan. This process splits the results of different aggregation classes into separate rows, so a final aggregation is needed to transpose the results into the desired form. - Main challenge: Each aggregation class consumes and produces different tuples, so conceptually a union-type of tuples flows through the runtime. The tuple union is represented by a TupleRow with one tuple per aggregation class. Only one tuple in such a TupleRow is non-NULL. - Backend exec nodes in the aggregation plan will be aware of this tuple-union either explicitly in their implementation or by relying on expressions that distinguish the aggregation classes. - To distinguish the aggregation classes, e.g. in hash exchanges, CASE expressions are crafted to hash/group on the appropriate slots. Deferred FE work: - Beautify/condense the long CASE exprs - Push applicable conjuncts into individual aggregators before the transposition step - Added a few testing TODOs to reduce the size of this patch - Decide whether we want to change existing plans to the new model Execution design: - Previous patches separated out aggregation logic from the exec node into Aggregators. This is extended to support multiple Aggregators per node, with different grouping and aggregating functions. - There is a fast path for aggregations with only one aggregator, which leaves the execution essentially unchanged from before. - When there are multiple aggregators, the first aggregation node in the plan replicates its input to each aggregator. The output of this step is rows where only a single tuple is non-null, corresponding to the aggregator that produced the row. - A new expr is introduced, ValidTupleId, which takes one of these rows and returns which tuple is non-null. - For additional aggregation nodes, the input is split apart into 'mini-batches' according to which aggregator the row corresponds to. Testing: - Added analyzer and planner tests - Added end-to-end queries tests - Ran hdfs/core tests - Added support in the query generator and ran in a loop. Change-Id: I055402eaef6d81e5f70e850d9f8a621e766830a4 Reviewed-on: http://gerrit.cloudera.org:8080/10771 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Add support for multiple distinct operators in the same query block > --- > > Key: IMPALA-110 > URL: https://issues.apache.org/jira/browse/IMPALA-110 > Project: IMPALA > Issue Type: New Feature > Components: Backend, Frontend >Affects Versions: Impala 0.5, Impala 1.4, Impala 2.0, Impala 2.2, Impala > 2.3.0 >Reporter: Greg Rahn >Assignee: Thomas Tauber-Marshall >Priority: Major > Labels: sql-language > > Impala only allows a single (DISTINCT columns) expression in each query. > {color:red}Note: > If you do not need precise accuracy, you can produce an estimate of the > distinct values for a column by specifying NDV(column); a query can contain > multiple instances of NDV(column). To make Impala automatically rewrite > COUNT(DISTINCT) expressions to NDV(), enable the APPX_COUNT_DISTINCT query > option. > {color} > {code} > [impala:21000] > select count(distinct i_class_id) from item; > Query: select count(distinct i_class_id) from item > Query finished, fetching results ... > 16 > Returned 1 row(s) in 1.51s > {code} > {code} > [impala:21000] > select count(distinct i_class_id), count(distinct > i_brand_id) from item; > Query: select count(distinct i_class_id), count(distinct i_brand_id) from item > ERROR: com.cloudera.impala.common.AnalysisException: Analysis exception (in > select count(distinct i_class_id), count(distinct i_brand_id) from item) > at > com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:133) > at >
[jira] [Commented] (IMPALA-7537) REVOKE GRANT OPTION regression
[ https://issues.apache.org/jira/browse/IMPALA-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628282#comment-16628282 ] ASF subversion and git services commented on IMPALA-7537: - Commit c5dc6ded68c62f9f2138ab3376531c6292d1df78 in impala's branch refs/heads/master from [~aholley] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=c5dc6de ] IMPALA-7537: REVOKE GRANT OPTION regression This patch fixes several issues around granting and revoking of privileges. This includes: - REVOKE ALL ON SERVER where the privilege has the grant option was removing from the cache but not Sentry. - With the addition of the grantoption to the name in the catalog object, refactoring was required to make grants and revokes work correctly. Assertions with regard to granting and revoking: - If there is a privilege that has the grant option, that privilege can be revoked simply with "REVOKE privilege..." or the grant option can be removed with "REVOKE GRANT OPTION ON..." - We should not limit the privilege being revoked simply because it has the grant option. - If a privilege already exists without the grant option, granting the privilege with the grant option should add the grant option to it. - If a privilege already exists with the grant option, granting the privilege without the grant option will not change anything as the expectation is if you want to remove the grant option, you should explicitly use the "REVOKE GRANT OPTION ON...". Testing: - Added new grant/revoke tests that validate cache and Sentry refresh - Ran all FE, E2E, and custom-cluster tests. Change-Id: I3be5c8f15e9bc53e9661347578832bf446abaedc Reviewed-on: http://gerrit.cloudera.org:8080/11483 Reviewed-by: Fredy Wijaya Tested-by: Impala Public Jenkins > REVOKE GRANT OPTION regression > -- > > Key: IMPALA-7537 > URL: https://issues.apache.org/jira/browse/IMPALA-7537 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 3.1.0 >Reporter: Adam Holley >Assignee: Adam Holley >Priority: Major > Fix For: Impala 3.1.0 > > > Recent commit ec88aa2 added 'grantoption' to the privilege name. This name > is used by the catalog cache which broke "revoke grant option" since the > privilege names do not match. > [localhost:21000] default> create role foo_role; > [localhost:21000] default> grant all on server to foo_role with grant option; > [localhost:21000] default> revoke grant option for all on server from > foo_role; > ERROR: IllegalStateException: null -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org