[jira] [Resolved] (IMPALA-7616) Refactor PrincipalPrivilege.buildPrivilegeName

2018-09-26 Thread Fredy Wijaya (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredy Wijaya resolved IMPALA-7616.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Refactor PrincipalPrivilege.buildPrivilegeName
> --
>
> Key: IMPALA-7616
> URL: https://issues.apache.org/jira/browse/IMPALA-7616
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Adam Holley
>Assignee: Fredy Wijaya
>Priority: Minor
> Fix For: Impala 3.1.0
>
>
> The buildPrivilegeName pattern across the frontend code is odd in that 
> setting the name is an explicit function and not built during the get from 
> the constituent parts.  e.g. If you create a privilege that doesn't have the 
> grant option set, and then set the grant option after, the getPrivilegeName() 
> will return a name that does not have the grant option.  This should be 
> refactored to build the name on the getPrivilegeName call based on the 
> current values in the Privilege object.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7593) test_automatic_invalidation failing in S3

2018-09-26 Thread Tianyi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianyi Wang updated IMPALA-7593:

Fix Version/s: Impala 3.1.0

> test_automatic_invalidation failing in S3
> -
>
> Key: IMPALA-7593
> URL: https://issues.apache.org/jira/browse/IMPALA-7593
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Tianyi Wang
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.1.0
>
>
> Note that the build has the fix for IMPALA-7580
> {noformat}
> 4:59:01 ___ TestAutomaticCatalogInvalidation.test_v1_catalog 
> ___
> 04:59:01 custom_cluster/test_automatic_invalidation.py:63: in test_v1_catalog
> 04:59:01 self._run_test(cursor)
> 04:59:01 custom_cluster/test_automatic_invalidation.py:58: in _run_test
> 04:59:01 assert time.time() < timeout
> 04:59:01 E   assert 1537355634.805718 < 1537355634.394429
> 04:59:01 E+  where 1537355634.805718 = ()
> 04:59:01 E+where  = time.time
> 04:59:01  Captured stderr setup 
> -
> 04:59:01 -- 2018-09-19 04:13:22,796 INFO MainThread: Starting cluster 
> with command: 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/bin/start-impala-cluster.py
>  --cluster_size=3 --num_coordinators=3 
> --log_dir=/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests
>  --log_level=1 '--impalad_args="--invalidate_tables_timeout_s=20" ' 
> '--state_store_args="--statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50" ' 
> '--catalogd_args="--invalidate_tables_timeout_s=20" '
> 04:59:01 04:13:23 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 04:59:01 04:13:23 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 04:59:01 04:13:24 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 04:59:01 04:13:25 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 04:59:01 04:13:26 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 04:59:01 04:13:29 MainThread: Found 3 impalad/1 statestored/1 catalogd 
> process(es)
> 04:59:01 04:13:29 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000
> 04:59:01 04:13:29 MainThread: Waiting for num_known_live_backends=3. Current 
> value: 0
> 04:59:01 04:13:30 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000
> 04:59:01 04:13:30 MainThread: Waiting for num_known_live_backends=3. Current 
> value: 1
> 04:59:01 04:13:31 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000
> 04:59:01 04:13:31 MainThread: Waiting for num_known_live_backends=3. Current 
> value: 2
> 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000
> 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3
> 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25001
> 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3
> 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25002
> 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3
> 04:59:01 04:13:32 MainThread: Impala Cluster Running with 3 nodes (3 
> coordinators, 3 executors).
> 04:59:01 -- 2018-09-19 04:13:33,034 INFO MainThread: Found 3 impalad/1 
> statestored/1 catalogd process(es)
> 04:59:01 -- 2018-09-19 04:13:33,034 INFO MainThread: Getting metric: 
> statestore.live-backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25010
> 04:59:01 -- 2018-09-19 04:13:33,036 INFO MainThread: Metric 
> 'statestore.live-backends' has reached desired value: 4
> 04:59:01 -- 2018-09-19 04:13:33,036 INFO MainThread: Getting 
> num_known_live_backends from 
> 

[jira] [Updated] (IMPALA-7593) test_automatic_invalidation failing in S3

2018-09-26 Thread Tianyi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianyi Wang updated IMPALA-7593:

Component/s: Infrastructure

> test_automatic_invalidation failing in S3
> -
>
> Key: IMPALA-7593
> URL: https://issues.apache.org/jira/browse/IMPALA-7593
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Tianyi Wang
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.1.0
>
>
> Note that the build has the fix for IMPALA-7580
> {noformat}
> 4:59:01 ___ TestAutomaticCatalogInvalidation.test_v1_catalog 
> ___
> 04:59:01 custom_cluster/test_automatic_invalidation.py:63: in test_v1_catalog
> 04:59:01 self._run_test(cursor)
> 04:59:01 custom_cluster/test_automatic_invalidation.py:58: in _run_test
> 04:59:01 assert time.time() < timeout
> 04:59:01 E   assert 1537355634.805718 < 1537355634.394429
> 04:59:01 E+  where 1537355634.805718 = ()
> 04:59:01 E+where  = time.time
> 04:59:01  Captured stderr setup 
> -
> 04:59:01 -- 2018-09-19 04:13:22,796 INFO MainThread: Starting cluster 
> with command: 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/bin/start-impala-cluster.py
>  --cluster_size=3 --num_coordinators=3 
> --log_dir=/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests
>  --log_level=1 '--impalad_args="--invalidate_tables_timeout_s=20" ' 
> '--state_store_args="--statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50" ' 
> '--catalogd_args="--invalidate_tables_timeout_s=20" '
> 04:59:01 04:13:23 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 04:59:01 04:13:23 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 04:59:01 04:13:24 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 04:59:01 04:13:25 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 04:59:01 04:13:26 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 04:59:01 04:13:29 MainThread: Found 3 impalad/1 statestored/1 catalogd 
> process(es)
> 04:59:01 04:13:29 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000
> 04:59:01 04:13:29 MainThread: Waiting for num_known_live_backends=3. Current 
> value: 0
> 04:59:01 04:13:30 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000
> 04:59:01 04:13:30 MainThread: Waiting for num_known_live_backends=3. Current 
> value: 1
> 04:59:01 04:13:31 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000
> 04:59:01 04:13:31 MainThread: Waiting for num_known_live_backends=3. Current 
> value: 2
> 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000
> 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3
> 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25001
> 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3
> 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25002
> 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3
> 04:59:01 04:13:32 MainThread: Impala Cluster Running with 3 nodes (3 
> coordinators, 3 executors).
> 04:59:01 -- 2018-09-19 04:13:33,034 INFO MainThread: Found 3 impalad/1 
> statestored/1 catalogd process(es)
> 04:59:01 -- 2018-09-19 04:13:33,034 INFO MainThread: Getting metric: 
> statestore.live-backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25010
> 04:59:01 -- 2018-09-19 04:13:33,036 INFO MainThread: Metric 
> 'statestore.live-backends' has reached desired value: 4
> 04:59:01 -- 2018-09-19 04:13:33,036 INFO MainThread: Getting 
> num_known_live_backends from 
> 

[jira] [Resolved] (IMPALA-7593) test_automatic_invalidation failing in S3

2018-09-26 Thread Tianyi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianyi Wang resolved IMPALA-7593.
-
Resolution: Fixed

> test_automatic_invalidation failing in S3
> -
>
> Key: IMPALA-7593
> URL: https://issues.apache.org/jira/browse/IMPALA-7593
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Tianyi Wang
>Priority: Blocker
>  Labels: broken-build
>
> Note that the build has the fix for IMPALA-7580
> {noformat}
> 4:59:01 ___ TestAutomaticCatalogInvalidation.test_v1_catalog 
> ___
> 04:59:01 custom_cluster/test_automatic_invalidation.py:63: in test_v1_catalog
> 04:59:01 self._run_test(cursor)
> 04:59:01 custom_cluster/test_automatic_invalidation.py:58: in _run_test
> 04:59:01 assert time.time() < timeout
> 04:59:01 E   assert 1537355634.805718 < 1537355634.394429
> 04:59:01 E+  where 1537355634.805718 = ()
> 04:59:01 E+where  = time.time
> 04:59:01  Captured stderr setup 
> -
> 04:59:01 -- 2018-09-19 04:13:22,796 INFO MainThread: Starting cluster 
> with command: 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/bin/start-impala-cluster.py
>  --cluster_size=3 --num_coordinators=3 
> --log_dir=/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests
>  --log_level=1 '--impalad_args="--invalidate_tables_timeout_s=20" ' 
> '--state_store_args="--statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50" ' 
> '--catalogd_args="--invalidate_tables_timeout_s=20" '
> 04:59:01 04:13:23 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 04:59:01 04:13:23 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 04:59:01 04:13:24 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 04:59:01 04:13:25 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 04:59:01 04:13:26 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 04:59:01 04:13:29 MainThread: Found 3 impalad/1 statestored/1 catalogd 
> process(es)
> 04:59:01 04:13:29 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000
> 04:59:01 04:13:29 MainThread: Waiting for num_known_live_backends=3. Current 
> value: 0
> 04:59:01 04:13:30 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000
> 04:59:01 04:13:30 MainThread: Waiting for num_known_live_backends=3. Current 
> value: 1
> 04:59:01 04:13:31 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000
> 04:59:01 04:13:31 MainThread: Waiting for num_known_live_backends=3. Current 
> value: 2
> 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000
> 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3
> 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25001
> 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3
> 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25002
> 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3
> 04:59:01 04:13:32 MainThread: Impala Cluster Running with 3 nodes (3 
> coordinators, 3 executors).
> 04:59:01 -- 2018-09-19 04:13:33,034 INFO MainThread: Found 3 impalad/1 
> statestored/1 catalogd process(es)
> 04:59:01 -- 2018-09-19 04:13:33,034 INFO MainThread: Getting metric: 
> statestore.live-backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25010
> 04:59:01 -- 2018-09-19 04:13:33,036 INFO MainThread: Metric 
> 'statestore.live-backends' has reached desired value: 4
> 04:59:01 -- 2018-09-19 04:13:33,036 INFO MainThread: Getting 
> num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000
> 04:59:01 -- 2018-09-19 

[jira] [Updated] (IMPALA-7485) test_spilling_naaj hung on jenkins

2018-09-26 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7485:
--
Priority: Critical  (was: Major)

> test_spilling_naaj hung on jenkins
> --
>
> Key: IMPALA-7485
> URL: https://issues.apache.org/jira/browse/IMPALA-7485
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Csaba Ringhofer
>Priority: Critical
>  Labels: broken-build, flaky, flaky-test
> Attachments: resolved_stacks.zip
>
>
> {code}
> query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_naaj[exec_option:
>  {'debug_action': None, 'default_spillable_buffer_size': '256k'} | 
> table_format: parquet/none]  
> {code}
> seemed hung (was running for more than 4 hours), see 
> https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/3055/console
> Core dumps and stack traces of impalad were created and the impalad was 
> killed. The tests continued without failures after impalad was restarted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7485) test_spilling_naaj hung on jenkins

2018-09-26 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7485:
--
Labels: broken-build flaky flaky-test  (was: flaky-test)

> test_spilling_naaj hung on jenkins
> --
>
> Key: IMPALA-7485
> URL: https://issues.apache.org/jira/browse/IMPALA-7485
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Csaba Ringhofer
>Priority: Major
>  Labels: broken-build, flaky, flaky-test
> Attachments: resolved_stacks.zip
>
>
> {code}
> query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_naaj[exec_option:
>  {'debug_action': None, 'default_spillable_buffer_size': '256k'} | 
> table_format: parquet/none]  
> {code}
> seemed hung (was running for more than 4 hours), see 
> https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/3055/console
> Core dumps and stack traces of impalad were created and the impalad was 
> killed. The tests continued without failures after impalad was restarted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7009) test_drop_table_with_purge fails on Isilon

2018-09-26 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7009:
--
Target Version: Product Backlog
  Priority: Major  (was: Critical)

> test_drop_table_with_purge fails on Isilon
> --
>
> Key: IMPALA-7009
> URL: https://issues.apache.org/jira/browse/IMPALA-7009
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.0, Impala 2.13.0
>Reporter: Sailesh Mukil
>Priority: Major
>  Labels: broken-build, flaky
>
> We've seen multiple failures of test_drop_table_with_purge
> {code:java}
> metadata.test_ddl.TestDdlStatements.test_drop_table_with_purge (from pytest)
> Failing for the past 1 build (Since Failed#22 )
> Took 18 sec.
> add description
> Error Message
> metadata/test_ddl.py:72: in test_drop_table_with_purge assert not 
> self.filesystem_client.exists(\ E   assert not True E+  where True = 
>   0x5fe1210>>('user/jenkins/.Trash/Current/test-warehouse/test_drop_table_with_purge_58c75c18.db/t2')
>  E+where  > = 
> .exists E  
>   +  where  0x5fe1210> =  0x5fe1110>.filesystem_client E+and   
> 'user/jenkins/.Trash/Current/test-warehouse/test_drop_table_with_purge_58c75c18.db/t2'
>  = ('jenkins', 
> 'test_drop_table_with_purge_58c75c18') E+  where  format of str object at 0x3eba3f8> = 
> 'user/{0}/.Trash/Current/test-warehouse/{1}.db/t2'.format E+  and   
> 'jenkins' = () E+where  getuser at 0x1c08c80> = getpass.getuser
> Stacktrace
> metadata/test_ddl.py:72: in test_drop_table_with_purge
> assert not self.filesystem_client.exists(\
> E   assert not True
> E+  where True =   0x5fe1210>>('user/jenkins/.Trash/Current/test-warehouse/test_drop_table_with_purge_58c75c18.db/t2')
> E+where  > = 
> .exists
> E+  where  0x5fe1210> =  0x5fe1110>.filesystem_client
> E+and   
> 'user/jenkins/.Trash/Current/test-warehouse/test_drop_table_with_purge_58c75c18.db/t2'
>  = ('jenkins', 
> 'test_drop_table_with_purge_58c75c18')
> E+  where  = 
> 'user/{0}/.Trash/Current/test-warehouse/{1}.db/t2'.format
> E+  and   'jenkins' = ()
> E+where  = getpass.getuser
> Standard Error
> -- connecting to: localhost:21000
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_drop_table_with_purge_58c75c18` CASCADE;
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_drop_table_with_purge_58c75c18`;
> MainThread: Created database "test_drop_table_with_purge_58c75c18" for test 
> ID "metadata/test_ddl.py::TestDdlStatements::()::test_drop_table_with_purge"
> -- executing against localhost:21000
> create table test_drop_table_with_purge_58c75c18.t1(i int);
> -- executing against localhost:21000
> create table test_drop_table_with_purge_58c75c18.t2(i int);
> MainThread: Starting new HTTP connection (1): 10.17.95.12
> MainThread: Starting new HTTP connection (1): 10.17.95.12
> MainThread: Starting new HTTP connection (1): 10.17.95.12
> MainThread: Starting new HTTP connection (1): 10.17.95.12
> -- executing against localhost:21000
> drop table test_drop_table_with_purge_58c75c18.t1;
> MainThread: Starting new HTTP connection (1): 10.17.95.12
> MainThread: Starting new HTTP connection (1): 10.17.95.12
> MainThread: Starting new HTTP connection (1): 10.17.95.12
> MainThread: Starting new HTTP connection (1): 10.17.95.12
> -- executing against localhost:21000
> drop table test_drop_table_with_purge_58c75c18.t2 purge;
> MainThread: Starting new HTTP connection (1): 10.17.95.12
> MainThread: Starting new HTTP connection (1): 10.17.95.12
> MainThread: Starting new HTTP connection (1): 10.17.95.12
> MainThread: Starting new HTTP connection (1): 10.17.95.12
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7226) test_compute_stats failed with "Unable to open Kudu table"

2018-09-26 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7226.
---
Resolution: Cannot Reproduce

I don't think this is actionable right now. Reopen if it reoccurs.

> test_compute_stats failed with "Unable to open Kudu table"
> --
>
> Key: IMPALA-7226
> URL: https://issues.apache.org/jira/browse/IMPALA-7226
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Thomas Tauber-Marshall
>Priority: Major
>  Labels: broken-build, flaky, kudu
>
> https://jenkins.impala.io/job/gerrit-verify-dryrun/2757
> {noformat}
> 20:09:25 ] === FAILURES 
> ===
> 20:09:25 ]  TestMtDop.test_compute_stats[mt_dop: 2 | exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: kudu/none] 
> 20:09:25 ] [gw3] linux2 -- Python 2.7.12 
> /home/ubuntu/Impala/bin/../infra/python/env/bin/python
> 20:09:25 ] query_test/test_mt_dop.py:76: in test_compute_stats
> 20:09:25 ] vector.get_value('exec_option'))
> 20:09:25 ] common/impala_test_suite.py:528: in wrapper
> 20:09:25 ] return function(*args, **kwargs)
> 20:09:25 ] common/impala_test_suite.py:553: in execute_query
> 20:09:25 ] return self.__execute_query(self.client, query, query_options)
> 20:09:25 ] common/impala_test_suite.py:620: in __execute_query
> 20:09:25 ] return impalad_client.execute(query, user=user)
> 20:09:25 ] common/impala_connection.py:160: in execute
> 20:09:25 ] return self.__beeswax_client.execute(sql_stmt, user=user)
> 20:09:25 ] beeswax/impala_beeswax.py:173: in execute
> 20:09:25 ] handle = self.__execute_query(query_string.strip(), user=user)
> 20:09:25 ] beeswax/impala_beeswax.py:345: in __execute_query
> 20:09:25 ] self.wait_for_completion(handle)
> 20:09:25 ] beeswax/impala_beeswax.py:365: in wait_for_completion
> 20:09:25 ] raise ImpalaBeeswaxException("Query aborted:" + error_log, 
> None)
> 20:09:25 ] E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> 20:09:25 ] EQuery aborted:Unable to open Kudu table: Network error: 
> Recv() got EOF from remote (error 108)
> 20:09:25 ]  Captured stderr setup 
> -
> 20:09:25 ] SET sync_ddl=False;
> 20:09:25 ] -- executing against localhost:21000
> 20:09:25 ] DROP DATABASE IF EXISTS `test_compute_stats_fcf53685` CASCADE;
> 20:09:25 ] 
> 20:09:25 ] SET sync_ddl=False;
> 20:09:25 ] -- executing against localhost:21000
> 20:09:25 ] CREATE DATABASE `test_compute_stats_fcf53685`;
> 20:09:25 ] 
> 20:09:25 ] MainThread: Created database "test_compute_stats_fcf53685" for 
> test ID "query_test/test_mt_dop.py::TestMtDop::()::test_compute_stats[mt_dop: 
> 2 | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: kudu/none]"
> 20:09:25 ] - Captured stderr call 
> -
> 20:09:25 ] -- executing against localhost:21000
> 20:09:25 ] create external table test_compute_stats_fcf53685.mt_dop stored as 
> kudu tblproperties('kudu.table_name'='impala::functional_kudu.alltypes');
> 20:09:25 ] 
> 20:09:25 ] SET mt_dop=2;
> 20:09:25 ] SET batch_size=0;
> 20:09:25 ] SET num_nodes=0;
> 20:09:25 ] SET disable_codegen_rows_threshold=0;
> 20:09:25 ] SET disable_codegen=False;
> 20:09:25 ] SET abort_on_error=1;
> 20:09:25 ] SET exec_single_node_rows_threshold=0;
> 20:09:25 ] -- executing against localhost:21000
> 20:09:25 ] compute stats test_compute_stats_fcf53685.mt_dop;
> 20:09:25 ] 
> 20:09:25 ] = 1 failed, 1954 passed, 63 skipped, 44 xfailed, 1 xpassed in 
> 2294.44 seconds ==
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-6910) Multiple tests failing on S3 build: error reading from HDFS file

2018-09-26 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6910.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

We haven't seen this for quite a while, which suggests the hadoop fix did the 
trick.

> Multiple tests failing on S3 build: error reading from HDFS file
> 
>
> Key: IMPALA-6910
> URL: https://issues.apache.org/jira/browse/IMPALA-6910
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.0
>Reporter: David Knupp
>Assignee: Sailesh Mukil
>Priority: Critical
>  Labels: broken-build, flaky, s3
> Fix For: Impala 3.1.0
>
>
> Stacktrace
> {noformat}
> query_test/test_compressed_formats.py:149: in test_seq_writer
> self.run_test_case('QueryTest/seq-writer', vector, unique_database)
> common/impala_test_suite.py:397: in run_test_case
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:612: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:160: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:173: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:341: in __execute_query
> self.wait_for_completion(handle)
> beeswax/impala_beeswax.py:361: in wait_for_completion
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:Disk I/O error: Error reading from HDFS file: 
> s3a://impala-cdh5-s3-test/test-warehouse/tpcds.store_sales_parquet/ss_sold_date_sk=2452585/a5482dcb946b6c98-7543e0dd0004_95929617_data.0.parq
> E   Error(255): Unknown error 255
> E   Root cause: SdkClientException: Data read has a different length than the 
> expected: dataLength=8576; expectedLength=17785; includeSkipped=true; 
> in.getClass()=class com.amazonaws.services.s3.AmazonS3Client$2; 
> markedSupported=false; marked=0; resetSinceLastMarked=false; markCount=0; 
> resetCount=0
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6910) Multiple tests failing on S3 build: error reading from HDFS file

2018-09-26 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629624#comment-16629624
 ] 

Tim Armstrong commented on IMPALA-6910:
---

We believe this was an hadoop AWS connector bug

> Multiple tests failing on S3 build: error reading from HDFS file
> 
>
> Key: IMPALA-6910
> URL: https://issues.apache.org/jira/browse/IMPALA-6910
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.0
>Reporter: David Knupp
>Assignee: Sailesh Mukil
>Priority: Critical
>  Labels: broken-build, flaky, s3
>
> Stacktrace
> {noformat}
> query_test/test_compressed_formats.py:149: in test_seq_writer
> self.run_test_case('QueryTest/seq-writer', vector, unique_database)
> common/impala_test_suite.py:397: in run_test_case
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:612: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:160: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:173: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:341: in __execute_query
> self.wait_for_completion(handle)
> beeswax/impala_beeswax.py:361: in wait_for_completion
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:Disk I/O error: Error reading from HDFS file: 
> s3a://impala-cdh5-s3-test/test-warehouse/tpcds.store_sales_parquet/ss_sold_date_sk=2452585/a5482dcb946b6c98-7543e0dd0004_95929617_data.0.parq
> E   Error(255): Unknown error 255
> E   Root cause: SdkClientException: Data read has a different length than the 
> expected: dataLength=8576; expectedLength=17785; includeSkipped=true; 
> in.getClass()=class com.amazonaws.services.s3.AmazonS3Client$2; 
> markedSupported=false; marked=0; resetSinceLastMarked=false; markCount=0; 
> resetCount=0
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache

2018-09-26 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629621#comment-16629621
 ] 

Paul Rogers commented on IMPALA-7501:
-

So the above was probably looking in the wrong haystack. Todd's comment is the 
key: {{LocalCatalog}}. The local catalog caches the HMS Thrift objects, 
including {{Partition}}.

The chain is:

* {{LocalDb}} contains a map of {LocalTable}}.
* {{LocalTable}} has a subclass {{LocalFsTable}} which contains a map of 
{{LocalPartitionSpec}} objects.
* {{LocalPartitionSpec}} has a relation (need to research) to 
{{LocalFsPartition}}.
* {{LocalFsPartition}} holds onto the Hive {{Partition}}, which holds onto the 
{{FieldSchema}} objects.

Short term, just need to track down how we cache the {{Partition}} and nuke the 
{{FieldSchema}}, then retest.

Longer term, the note earlier does apply. While the query-specific metadata 
goes to pains to avoid caching HMS objects, LocalCatalog (and presumably the 
similar version in the {{catalogd}} do cache HMS objects which, as noted 
earlier, are rather bloated for our needs.

> Slim down metastore Partition objects in LocalCatalog cache
> ---
>
> Key: IMPALA-7501
> URL: https://issues.apache.org/jira/browse/IMPALA-7501
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Todd Lipcon
>Priority: Minor
>
> I took a heap dump of an impalad running in LocalCatalog mode with a 2G limit 
> after running a production workload simulation for a couple hours. It had 
> 38.5M objects and 2.02GB heap (the vast majority of the heap is, as expected, 
> in the LocalCatalog cache). Of this total footprint, 1.78GB and 34.6M objects 
> are retained by 'Partition' objects. Drilling into those, 1.29GB and 33.6M 
> objects are retained by FieldSchema, which, as far as I remember, are ignored 
> on the partition level by the Impala planner. So, with a bit of slimming down 
> of these objects, we could make a huge dent in effective cache capacity given 
> a fixed budget. Reducing object count should also have the effect of improved 
> GC performance (old gen GC is more closely tied to object count than size)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-6591) TestClientSsl hung for a long time

2018-09-26 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6591.
---
Resolution: Cannot Reproduce

> TestClientSsl hung for a long time
> --
>
> Key: IMPALA-6591
> URL: https://issues.apache.org/jira/browse/IMPALA-6591
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.12.0
>Reporter: Tim Armstrong
>Assignee: Sailesh Mukil
>Priority: Critical
>  Labels: broken-build, hang
>
> {noformat}
> 18:49:13 
> custom_cluster/test_catalog_wait.py::TestCatalogWait::test_delayed_catalog 
> PASSED
> 18:49:53 
> custom_cluster/test_client_ssl.py::TestClientSsl::test_ssl[exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: text/none] Build timed 
> out (after 1,440 minutes). Marking the build as failed.
> 12:20:15 Build was aborted
> 12:20:15 Archiving artifacts
> {noformat}
> I unfortunately wasn't able to get any logs...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7404) query_test.test_delimited_text.TestDelimitedText.test_delimited_text_newlines fails to return any rows

2018-09-26 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7404:
--
Target Version: Impala 3.1.0

> query_test.test_delimited_text.TestDelimitedText.test_delimited_text_newlines 
> fails to return any rows
> --
>
> Key: IMPALA-7404
> URL: https://issues.apache.org/jira/browse/IMPALA-7404
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Vuk Ercegovac
>Assignee: Csaba Ringhofer
>Priority: Blocker
>  Labels: broken-build, flaky-test, s3
>
> {noformat}
> query_test/test_delimited_text.py:65: in test_delimited_text_newlines
> assert len(result.data) == 2
> E   assert 0 == 2
> E+  where 0 = len([])
> E+where [] =  at 0x63977d0>.data{noformat}
> Expected results from this query after first inserting:
> {noformat}
> insert into test_delimited_text_newlines_ff243aaa.nl_queries values 
> ("the\n","\nquick\nbrown","fox\n"), ("\njumped","over the lazy\n","\ndog");
> select * from test_delimited_text_newlines_ff243aaa.nl_queries;
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7512) test_resolution_by_name failed: did not encounter expected error

2018-09-26 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7512.
---
Resolution: Duplicate

Likely has the same cause as IMPALA-7335

> test_resolution_by_name failed: did not encounter expected error
> 
>
> Key: IMPALA-7512
> URL: https://issues.apache.org/jira/browse/IMPALA-7512
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Bikramjeet Vig
>Priority: Critical
>  Labels: broken-build
>
> Seems like the error reached the coordinator and it transitioned to the Error 
> state but it was not delivered to the client
> {noformat}
> I0829 23:38:31.626911 16215 impala-server.cc:1040] Registered query 
> query_id=54b6c955298cd00:623fd4b5 
> session_id=f14fb6bdc8ea6686:ad8d4097469f7c8f
> I0829 23:38:31.627315 16215 Frontend.java:1029] Analyzing query: select key, 
> value from switched_map_fields_resolution_test.int_map
> I0829 23:38:31.627898 16215 Frontend.java:1041] Analysis finished.
> I0829 23:38:31.631665 18839 admission-controller.cc:552] Schedule for 
> id=54b6c955298cd00:623fd4b5 in pool_name=default-pool 
> cluster_mem_needed=64.00 MB PoolConfig: max_requests=-1 max_queued=200 
> max_mem=-1.00 B
> I0829 23:38:31.631866 18839 admission-controller.cc:557] Stats: 
> agg_num_running=5, agg_num_queued=0, agg_mem_reserved=1.51 GB,  
> local_host(local_mem_admitted=2.17 GB, num_admitted_running=5, num_queued=0, 
> backend_mem_reserved=514.22 MB)
> I0829 23:38:31.632130 18839 admission-controller.cc:589] Admitted query 
> id=54b6c955298cd00:623fd4b5
> I0829 23:38:31.632308 18839 coordinator.cc:91] Exec() 
> query_id=54b6c955298cd00:623fd4b5 stmt=select key, value from 
> switched_map_fields_resolution_test.int_map
> I0829 23:38:31.633636 18744 query-state.cc:491] Instance completed. 
> instance_id=5049c19ed24bed19:85ffb5e0001 #in-flight=7 status=OK
> I0829 23:38:31.634953 18839 coordinator.cc:330] starting execution on 2 
> backends for query_id=54b6c955298cd00:623fd4b5
> I0829 23:38:31.635996 29061 impala-internal-service.cc:49] 
> ExecQueryFInstances(): query_id=54b6c955298cd00:623fd4b5 
> coord=impala-ec2-centos74-m5-4xlarge-ondemand-0bff.vpc.cloudera.com:22000 
> #instances=1
> I0829 23:38:31.638362 18843 query-state.cc:483] Executing instance. 
> instance_id=54b6c955298cd00:623fd4b5 fragment_idx=0 
> per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=8
> I0829 23:38:31.638617 18839 coordinator.cc:344] started execution on 2 
> backends for query_id=54b6c955298cd00:623fd4b5
> I0829 23:38:31.656479  4384 coordinator.cc:685] Backend completed: 
> host=impala-ec2-centos74-m5-4xlarge-ondemand-0bff.vpc.cloudera.com:22001 
> remaining=2 query_id=54b6c955298cd00:623fd4b5
> I0829 23:38:31.656626  4384 coordinator-backend-state.cc:254] 
> query_id=54b6c955298cd00:623fd4b5: first in-progress backend: 
> impala-ec2-centos74-m5-4xlarge-ondemand-0bff.vpc.cloudera.com:22000
> I0829 23:38:31.656808  4384 coordinator.cc:498] ExecState: query 
> id=54b6c955298cd00:623fd4b5 
> finstance=54b6c955298cd00:623fd4b50001 on 
> host=impala-ec2-centos74-m5-4xlarge-ondemand-0bff.vpc.cloudera.com:22001 
> (EXECUTING -> ERROR) status=File 
> 'hdfs://localhost:20500/test-warehouse/test_resolution_by_name_63ec1576.db/switched_map_fields_resolution_test/switched_map.parq'
>  has an incompatible Parquet schema for column 
> 'test_resolution_by_name_63ec1576.switched_map_fields_resolution_test.int_map.key'.
>  Column type: STRING, Parquet schema:
> required int32 value [i:0 d:1 r:1]
> I0829 23:38:31.657034  4384 coordinator-backend-state.cc:377] sending 
> CancelQueryFInstances rpc for query_id=54b6c955298cd00:623fd4b5 
> backend=impala-ec2-centos74-m5-4xlarge-ondemand-0bff.vpc.cloudera.com:22000
> I0829 23:38:31.657304  5860 impala-internal-service.cc:71] 
> CancelQueryFInstances(): query_id=54b6c955298cd00:623fd4b5
> I0829 23:38:31.657408  5860 query-exec-mgr.cc:95] QueryState: 
> query_id=54b6c955298cd00:623fd4b5 refcnt=4
> I0829 23:38:31.657490  5860 query-state.cc:504] Cancel: 
> query_id=54b6c955298cd00:623fd4b5
> I0829 23:38:31.657575  5860 krpc-data-stream-mgr.cc:325] cancelling all 
> streams for fragment_instance_id=54b6c955298cd00:623fd4b5
> I0829 23:38:31.657836  4384 coordinator.cc:658] CancelBackends() 
> query_id=54b6c955298cd00:623fd4b5, tried to cancel 1 backends
> I0829 23:38:31.657940  4384 coordinator.cc:792] Release admission control 
> resources for query_id=54b6c955298cd00:623fd4b5
> I0829 23:38:31.662129 18843 query-state.cc:334] Cancelling fragment instances 
> as directed by the coordinator. Returned status: 

[jira] [Updated] (IMPALA-7523) Planner Test failing with "Failed to assign regions to servers after 60000 millis."

2018-09-26 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7523:
--
Labels: broken-build flaky  (was: broken-build flake)

> Planner Test failing with "Failed to assign regions to servers after 6 
> millis."
> ---
>
> Key: IMPALA-7523
> URL: https://issues.apache.org/jira/browse/IMPALA-7523
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Reporter: Philip Zeyliger
>Priority: Critical
>  Labels: broken-build, flaky
>
> I've seen 
> {{org.apache.impala.planner.PlannerTest.org.apache.impala.planner.PlannerTest}}
>  fail with the following trace:
> {code}
> java.lang.IllegalStateException: Failed to assign regions to servers after 
> 6 millis.
>   at 
> org.apache.impala.datagenerator.HBaseTestDataRegionAssignment.performAssignment(HBaseTestDataRegionAssignment.java:153)
>   at 
> org.apache.impala.planner.PlannerTestBase.setUp(PlannerTestBase.java:120)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> {code}
> I think we've seen it before as indicated in IMPALA-7061.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7523) Planner Test failing with "Failed to assign regions to servers after 60000 millis."

2018-09-26 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7523:
--
Labels: broken-build flake  (was: broken-build)

> Planner Test failing with "Failed to assign regions to servers after 6 
> millis."
> ---
>
> Key: IMPALA-7523
> URL: https://issues.apache.org/jira/browse/IMPALA-7523
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Reporter: Philip Zeyliger
>Priority: Critical
>  Labels: broken-build, flaky
>
> I've seen 
> {{org.apache.impala.planner.PlannerTest.org.apache.impala.planner.PlannerTest}}
>  fail with the following trace:
> {code}
> java.lang.IllegalStateException: Failed to assign regions to servers after 
> 6 millis.
>   at 
> org.apache.impala.datagenerator.HBaseTestDataRegionAssignment.performAssignment(HBaseTestDataRegionAssignment.java:153)
>   at 
> org.apache.impala.planner.PlannerTestBase.setUp(PlannerTestBase.java:120)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> {code}
> I think we've seen it before as indicated in IMPALA-7061.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7494) Hang in TestTpcdsDecimalV2Query::test_tpcds_q69

2018-09-26 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7494.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

Suspect this is IMPALA-7488

> Hang in TestTpcdsDecimalV2Query::test_tpcds_q69
> ---
>
> Key: IMPALA-7494
> URL: https://issues.apache.org/jira/browse/IMPALA-7494
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.1.0
>Reporter: Bikramjeet Vig
>Priority: Critical
>  Labels: broken-build
> Fix For: Impala 3.1.0
>
>
> A hang in this test cause the build to timeout after 1440 minutes
> {noformat}
> 10:47:51 [gw3] PASSED 
> query_test/test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q65[exec_option:
>  {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> 10:48:31 
> query_test/test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q67a[exec_option:
>  {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> 10:48:31 [gw3] PASSED 
> query_test/test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q67a[exec_option:
>  {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> 10:48:32 
> query_test/test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q68[exec_option:
>  {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> 10:48:32 [gw3] PASSED 
> query_test/test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q68[exec_option:
>  {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> 10:48:34 
> query_test/test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q69[exec_option:
>  {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> 07:11:24 [gw3] PASSED 
> query_test/test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q69[exec_option:
>  {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] Build 
> timed out (after 1,440 minutes). Marking the build as aborted.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7493) hang in test_spilling_query_options

2018-09-26 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7493.
---
Resolution: Cannot Reproduce

Suspect this is IMPALA-7488

> hang in test_spilling_query_options
> ---
>
> Key: IMPALA-7493
> URL: https://issues.apache.org/jira/browse/IMPALA-7493
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.1.0
>Reporter: Bikramjeet Vig
>Priority: Critical
>  Labels: broken-build
>
> A hang in this test cause the build to timeout after 1440 minutes
> {noformat}
> 14:50:23 [gw6] PASSED 
> query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_naaj[exec_option:
>  {'debug_action': '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', 
> 'default_spillable_buffer_size': '256k'} | table_format: parquet/none] 
> 14:50:23 
> query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive[exec_option:
>  {'debug_action': None, 'default_spillable_buffer_size': '256k'} | 
> table_format: parquet/none] 
> 14:50:23 [gw6] SKIPPED 
> query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive[exec_option:
>  {'debug_action': None, 'default_spillable_buffer_size': '256k'} | 
> table_format: parquet/none] 
> 14:50:23 
> query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive[exec_option:
>  {'debug_action': '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', 
> 'default_spillable_buffer_size': '256k'} | table_format: parquet/none] 
> 14:50:23 [gw6] SKIPPED 
> query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive[exec_option:
>  {'debug_action': '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', 
> 'default_spillable_buffer_size': '256k'} | table_format: parquet/none] 
> 14:51:41 
> query_test/test_spilling.py::TestSpillingNoDebugActionDimensions::test_spilling_naaj_no_deny_reservation[exec_option:
>  {'default_spillable_buffer_size': '256k'} | table_format: parquet/none] 
> 14:51:41 [gw6] PASSED 
> query_test/test_spilling.py::TestSpillingNoDebugActionDimensions::test_spilling_naaj_no_deny_reservation[exec_option:
>  {'default_spillable_buffer_size': '256k'} | table_format: parquet/none] 
> 14:51:48 
> query_test/test_spilling.py::TestSpillingNoDebugActionDimensions::test_spilling_query_options[exec_option:
>  {'default_spillable_buffer_size': '256k'} | table_format: parquet/none] 
> 12:34:40 [gw6] PASSED 
> query_test/test_spilling.py::TestSpillingNoDebugActionDimensions::test_spilling_query_options[exec_option:
>  {'default_spillable_buffer_size': '256k'} | table_format: parquet/none] 
> Build timed out (after 1,440 minutes). Marking the build as aborted.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7593) test_automatic_invalidation failing in S3

2018-09-26 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629614#comment-16629614
 ] 

Tim Armstrong commented on IMPALA-7593:
---

Can this be closed?

> test_automatic_invalidation failing in S3
> -
>
> Key: IMPALA-7593
> URL: https://issues.apache.org/jira/browse/IMPALA-7593
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.1.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Tianyi Wang
>Priority: Blocker
>  Labels: broken-build
>
> Note that the build has the fix for IMPALA-7580
> {noformat}
> 4:59:01 ___ TestAutomaticCatalogInvalidation.test_v1_catalog 
> ___
> 04:59:01 custom_cluster/test_automatic_invalidation.py:63: in test_v1_catalog
> 04:59:01 self._run_test(cursor)
> 04:59:01 custom_cluster/test_automatic_invalidation.py:58: in _run_test
> 04:59:01 assert time.time() < timeout
> 04:59:01 E   assert 1537355634.805718 < 1537355634.394429
> 04:59:01 E+  where 1537355634.805718 = ()
> 04:59:01 E+where  = time.time
> 04:59:01  Captured stderr setup 
> -
> 04:59:01 -- 2018-09-19 04:13:22,796 INFO MainThread: Starting cluster 
> with command: 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/bin/start-impala-cluster.py
>  --cluster_size=3 --num_coordinators=3 
> --log_dir=/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests
>  --log_level=1 '--impalad_args="--invalidate_tables_timeout_s=20" ' 
> '--state_store_args="--statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50" ' 
> '--catalogd_args="--invalidate_tables_timeout_s=20" '
> 04:59:01 04:13:23 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 04:59:01 04:13:23 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 04:59:01 04:13:24 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 04:59:01 04:13:25 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 04:59:01 04:13:26 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 04:59:01 04:13:29 MainThread: Found 3 impalad/1 statestored/1 catalogd 
> process(es)
> 04:59:01 04:13:29 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000
> 04:59:01 04:13:29 MainThread: Waiting for num_known_live_backends=3. Current 
> value: 0
> 04:59:01 04:13:30 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000
> 04:59:01 04:13:30 MainThread: Waiting for num_known_live_backends=3. Current 
> value: 1
> 04:59:01 04:13:31 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000
> 04:59:01 04:13:31 MainThread: Waiting for num_known_live_backends=3. Current 
> value: 2
> 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000
> 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3
> 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25001
> 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3
> 04:59:01 04:13:32 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25002
> 04:59:01 04:13:32 MainThread: num_known_live_backends has reached value: 3
> 04:59:01 04:13:32 MainThread: Impala Cluster Running with 3 nodes (3 
> coordinators, 3 executors).
> 04:59:01 -- 2018-09-19 04:13:33,034 INFO MainThread: Found 3 impalad/1 
> statestored/1 catalogd process(es)
> 04:59:01 -- 2018-09-19 04:13:33,034 INFO MainThread: Getting metric: 
> statestore.live-backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25010
> 04:59:01 -- 2018-09-19 04:13:33,036 INFO MainThread: Metric 
> 'statestore.live-backends' has reached desired value: 4
> 04:59:01 -- 2018-09-19 04:13:33,036 INFO MainThread: Getting 
> num_known_live_backends from 
> impala-ec2-centos74-r4-4xlarge-ondemand-1860.vpc.cloudera.com:25000
> 04:59:01 -- 2018-09-19 

[jira] [Created] (IMPALA-7637) Include more hash table stats in profile

2018-09-26 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-7637:
-

 Summary: Include more hash table stats in profile
 Key: IMPALA-7637
 URL: https://issues.apache.org/jira/browse/IMPALA-7637
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Tim Armstrong


Our hash table collects some useful stats about collisions and travel length, 
but then we don't do anything to expose them: 
https://github.com/apache/impala/blob/540611e863fe99b3d3ae35f8b94a745a68b9eba2/be/src/exec/hash-table.h#L989

We should add some of them to the profile, maybe:
* the number of probes
* the average travel length per probe
* the number of hash collisions
* (optional) the number of hash table resizes. We already have the hash table 
size and the resize time, which I think is sufficient to debug most problems 
with resizes.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache

2018-09-26 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629554#comment-16629554
 ] 

Paul Rogers edited comment on IMPALA-7501 at 9/27/18 12:09 AM:
---

Analysis:

* Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects.
* Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} 
objects.
* Impala's {{LocalTable}} contains a Hive {{Table}} object.
* The {{Table}} object is defined in [Hive's Thrift 
schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift]
 API. It does not contain a list of partitions.

Things are a bit confusing because:

* Hive defines a different 
[{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java]
 class, which contains a {{TableSpec}}.
* Hive's 
[{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java]
 contains a list of {{Partition}} objects.
* Hive's 
[{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java]
 is generated from Thrift. Contains a {{StorageDescriptor}}.
* Hive's 
[{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java]
 contains the list of {{FieldSchema}} objects which Todd saw in the heap dump.

The above says that, yes, Hive {{Partition}} objects do hold a list of 
{{FieldSchema}}, but not via the simplest path, via the Hive API {{Table}} 
object. Perhaps we cache {{Partition}} objects in the table schema:

Impala loads tables in the background by calling {{HdfsTable.load()}}:

* The {{LocalTable}} wraps a number of subclass, of which the one of interest 
is {{HdfsTable}}.
* {{load()}} calls {{loadAllPartitions()}} to do the partition work.
* {{loadAllPartitions}} calls {{MetaStoreUtil.fetchAllPartitions()}} to get the 
partitions as a list of Hive {{Partition} objects.
* {{loadAllParitions}} wraps each in a {{HdfsPartition}}, and calls 
{{addPartition}} to put the partition into a couple of maps.
* But, {{HdfsPartition}} goes to extremes to copy data out of Hive’s 
{{Partition}} object without holding onto Hive’s object.

So, we did take steps to avoid holding onto Hive’s {{Partition}} objects. 
Still, there are references, so the question is: where? That is, the original 
description was concerned with the {{FieldSchema}} references in {{Partition}}. 
But, the above analysis of the code suggests that even the {{Partition}} 
objects themselves should not exist: we should have copied their info into 
{{HDFSPartition}} objects and discarded them.

Maybe this test run found issues for storage engines other than HDFS?


was (Author: paul.rogers):
Analysis:

* Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects.
* Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} 
objects.
* Impala's {{LocalTable}} contains a Hive {{Table}} object.
* The {{Table}} object is defined in [Hive's Thrift 
schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift]
 API. It does not contain a list of partitions.

Things are a bit confusing because:

* Hive defines a different 
[{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java]
 class, which contains a {{TableSpec}}.
* Hive's 
[{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java]
 contains a list of {{Partition}} objects.
* Hive's 
[{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java]
 is generated from Thrift. Contains a {{StorageDescriptor}}.
* Hive's 
[{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java]
 contains the list of {{FieldSchema}} objects which Todd saw in the heap dump.

The above says that, yes, Hive {{Partition}} objects do hold a list of 
{{FieldSchema}}, but not via the simplest path, via the Hive API {{Table}} 
object. Perhaps we cache {{Partition}} objects in the table schema:

Impala loads tables in the background by calling {{HdfsTable.load()}}:

* The {{LocalTable}} wraps a number of subclass, of which the one of interest 
is {{HdfsTable}}.
* {{load()}} calls {{loadAllPartitions()}} to do the partition work.
* {{loadAllPartitions}} calls 

[jira] [Commented] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache

2018-09-26 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629554#comment-16629554
 ] 

Paul Rogers commented on IMPALA-7501:
-

Analysis:

* Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects.
* Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} 
objects.
* Impala's {{LocalTable}} contains a Hive {{Table}} object.
* The {{Table}} object is defined in [Hive's Thrift 
schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift]
 API. It does not contain a list of partitions.

Things are a bit confusing because:

* Hive defines a different 
[{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java]
 class, which contains a {{TableSpec}}.
* Hive's 
[{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java]
 contains a list of {{Partition}} objects.
* Hive's 
[{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java]
 is generated from Thrift. Contains a {{StorageDescriptor}}.
* Hive's 
[{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java]
 contains the list of {{FieldSchema}} objects which Todd saw in the heap dump.

The above says that, yes, Hive {{Partition}} objects do hold a list of 
{{FieldSchema}}, but not via the simplest path, via the Hive API {{Table}} 
object. Perhaps we cache {{Partition}} objects in the table schema:

Impala loads tables in the background by calling {{HdfsTable.load()}}:

* The {{LocalTable}} wraps a number of subclass, of which the one of interest 
is {{HdfsTable}}.
* {{load()}} calls {{loadAllPartitions()}} to do the partition work.
* {{loadAllPartitions}} calls {{MetaStoreUtil.fetchAllPartitions()}} to get the 
partitions as a list of Hive {{Partition} objects.
* {{loadAllParitions}} wraps each in a {{HdfsPartition}}, and calls 
{{addPartition}} to put the partition into a couple of maps.
* But, {{HdfsPartition}} goes to extremes to copy data out of Hive’s 
{{Partition}} object without holding onto Hive’s object.

So, we did take steps to avoid holding onto Hive’s {{Partition}} objects. 
Still, there are references, so the question is: where?

> Slim down metastore Partition objects in LocalCatalog cache
> ---
>
> Key: IMPALA-7501
> URL: https://issues.apache.org/jira/browse/IMPALA-7501
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Todd Lipcon
>Priority: Minor
>
> I took a heap dump of an impalad running in LocalCatalog mode with a 2G limit 
> after running a production workload simulation for a couple hours. It had 
> 38.5M objects and 2.02GB heap (the vast majority of the heap is, as expected, 
> in the LocalCatalog cache). Of this total footprint, 1.78GB and 34.6M objects 
> are retained by 'Partition' objects. Drilling into those, 1.29GB and 33.6M 
> objects are retained by FieldSchema, which, as far as I remember, are ignored 
> on the partition level by the Impala planner. So, with a bit of slimming down 
> of these objects, we could make a huge dent in effective cache capacity given 
> a fixed budget. Reducing object count should also have the effect of improved 
> GC performance (old gen GC is more closely tied to object count than size)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache

2018-09-26 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629410#comment-16629410
 ] 

Paul Rogers edited comment on IMPALA-7501 at 9/26/18 11:35 PM:
---

A quick scan of the Hive code suggests that Hive's Thrift objects carry more 
info that is required in the Impala cache. Creating Impala-specific, 
high-performance versions would likely save space. (No need for parent 
pointers, no need for the two-level Hive API structure, etc.)

So, this gives us two options:

* Reach inside Hive's Thrift objects to null out fields which we don't need, or
* Design an Impala-specific, compact representation for the data that omits all 
but essential objects and fields.

The second choice provides a huge opportunity for memory optimization. The 
first is a crude-but-effective short-term solution.

But, see a later note, the story is not as simple as a first analysis suggests.


was (Author: paul.rogers):
Analysis:

* Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects.
* Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} 
objects.
* Impala's {{LocalTable}} contains a Hive {{Table}} object.
* The {{Table}} object is defined in [Hive's Thrift 
schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift]
 API. It does not contain a list of partitions.
* The {{LocalTable}} wraps a number of subclass, of which the one of interest 
is {{HdfsTable}}.

Impala loads tables in the background by calling {{HdfsTable.load()}}:

* {{load()}} calls {{loadAllPartitions()}} to do the partition work.
* {{loadAllPartitions}} calls {{MetaStoreUtil.fetchAllPartitions()}} to get the 
partitions as a list of Hive {{Partition} objects.
* {{loadAllParitions}} wraps each in a {{HdfsPartition}}, and calls 
{{addPartition}} to put the partition into a couple of maps.
* Hive's 
[{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java]
 is generated from Thrift. Contains a {{StorageDescriptor}}.
* Hive's 
[{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java]
 contains the list of {{FieldSchema}} objects which Todd saw in the heap dump.

Things are a bit confusing because:

* Hive defines a different 
[{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java]
 class, which contains a {{TableSpec}}.
* Hive's 
[{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java]
 contains a list of {{Partition}} objects.

A quick scan of the Hive code suggests that Hive's Thrift objects carry more 
info that is required in the Impala cache. Creating Impala-specific, 
high-performance versions would likely save space. (No need for parent 
pointers, no need for the two-level Hive API structure, etc.)

So, this gives us two options:

* Reach inside Hive's Thrift objects to null out fields which we don't need, or
* Design an Impala-specific, compact representation for the data that omits all 
but essential objects and fields.

The second choice provides a huge opportunity for memory optimization. The 
first is a crude-but-effective short-term solution.

> Slim down metastore Partition objects in LocalCatalog cache
> ---
>
> Key: IMPALA-7501
> URL: https://issues.apache.org/jira/browse/IMPALA-7501
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Todd Lipcon
>Priority: Minor
>
> I took a heap dump of an impalad running in LocalCatalog mode with a 2G limit 
> after running a production workload simulation for a couple hours. It had 
> 38.5M objects and 2.02GB heap (the vast majority of the heap is, as expected, 
> in the LocalCatalog cache). Of this total footprint, 1.78GB and 34.6M objects 
> are retained by 'Partition' objects. Drilling into those, 1.29GB and 33.6M 
> objects are retained by FieldSchema, which, as far as I remember, are ignored 
> on the partition level by the Impala planner. So, with a bit of slimming down 
> of these objects, we could make a huge dent in effective cache capacity given 
> a fixed budget. Reducing object count should also have the effect of improved 
> GC performance (old gen GC is more closely tied to object count than size)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: 

[jira] [Comment Edited] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache

2018-09-26 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629410#comment-16629410
 ] 

Paul Rogers edited comment on IMPALA-7501 at 9/26/18 11:21 PM:
---

Analysis:

* Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects.
* Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} 
objects.
* Impala's {{LocalTable}} contains a Hive {{Table}} object.
* The {{Table}} object is defined in [Hive's Thrift 
schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift]
 API. It does not contain a list of partitions.
* The {{LocalTable}} wraps a number of subclass, of which the one of interest 
is {{HdfsTable}}.

Impala loads tables in the background by calling {{HdfsTable.load()}}:

* {{load()}} calls {{loadAllPartitions()}} to do the partition work.
* {{loadAllPartitions}} calls {{MetaStoreUtil.fetchAllPartitions()}} to get the 
partitions as a list of Hive {{Partition} objects.
* {{loadAllParitions}} wraps each in a {{HdfsPartition}}, and calls 
{{addPartition}} to put the partition into a couple of maps.
* Hive's 
[{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java]
 is generated from Thrift. Contains a {{StorageDescriptor}}.
* Hive's 
[{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java]
 contains the list of {{FieldSchema}} objects which Todd saw in the heap dump.

Things are a bit confusing because:

* Hive defines a different 
[{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java]
 class, which contains a {{TableSpec}}.
* Hive's 
[{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java]
 contains a list of {{Partition}} objects.

A quick scan of the Hive code suggests that Hive's Thrift objects carry more 
info that is required in the Impala cache. Creating Impala-specific, 
high-performance versions would likely save space. (No need for parent 
pointers, no need for the two-level Hive API structure, etc.)

So, this gives us two options:

* Reach inside Hive's Thrift objects to null out fields which we don't need, or
* Design an Impala-specific, compact representation for the data that omits all 
but essential objects and fields.

The second choice provides a huge opportunity for memory optimization. The 
first is a crude-but-effective short-term solution.


was (Author: paul.rogers):
Analysis:

* Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects.
* Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} 
objects.
* Impala's {{LocalTable}} contains a Hive {{Table}} object.
* The {{Table}} object is defined in [Hive's Thrift 
schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift]
 API. It does not contain a list of partitions.
* The {{LocalTable}} wraps a number of subclass, of which the one of interest 
is {{HdfsTable}}.

Impala loads tables in the background by calling {{HdfsTable.load()}}:

* {{load()}} calls {{loadAllPartitions()}} to do the partition work.
* {{loadAllPartitions}} calls {{MetaStoreUtil.fetchAllPartitions()}} to get the 
partitions as a list of Hive {{Partition} objects.
* {{loadAllParitions}} wraps each in a {{HdfsPartition}}, and calls 
{{addPartition}} to put the partition into a couple of maps.
* Hive's 
[{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java]
 is generated from Thrift. Contains a {{StorageDescriptor}}.
* Hive's 
[{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java]
 contains the list of {{FieldSchema}} objects which Todd saw in the heap dump.

Things are a bit confusing because:

* Hive defines a different 
[{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java]
 class, which contains a {{TableSpec}}.
* Hive's 
[{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java]
 contains a list of {{Partition}} objects.
* 
A quick scan of the Hive code suggests that Hive's Thrift objects carry more 
info that is required in the Impala cache. Creating Impala-specific, 
high-performance versions would likely save space. 

[jira] [Comment Edited] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache

2018-09-26 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629410#comment-16629410
 ] 

Paul Rogers edited comment on IMPALA-7501 at 9/26/18 11:21 PM:
---

Analysis:

* Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects.
* Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} 
objects.
* Impala's {{LocalTable}} contains a Hive {{Table}} object.
* The {{Table}} object is defined in [Hive's Thrift 
schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift]
 API. It does not contain a list of partitions.
* The {{LocalTable}} wraps a number of subclass, of which the one of interest 
is {{HdfsTable}}.

Impala loads tables in the background by calling {{HdfsTable.load()}}:

* {{load()}} calls {{loadAllPartitions()}} to do the partition work.
* {{loadAllPartitions}} calls {{MetaStoreUtil.fetchAllPartitions()}} to get the 
partitions as a list of Hive {{Partition} objects.
* {{loadAllParitions}} wraps each in a {{HdfsPartition}}, and calls 
{{addPartition}} to put the partition into a couple of maps.
* Hive's 
[{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java]
 is generated from Thrift. Contains a {{StorageDescriptor}}.
* Hive's 
[{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java]
 contains the list of {{FieldSchema}} objects which Todd saw in the heap dump.

Things are a bit confusing because:

* Hive defines a different 
[{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java]
 class, which contains a {{TableSpec}}.
* Hive's 
[{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java]
 contains a list of {{Partition}} objects.
* 
A quick scan of the Hive code suggests that Hive's Thrift objects carry more 
info that is required in the Impala cache. Creating Impala-specific, 
high-performance versions would likely save space. (No need for parent 
pointers, no need for the two-level Hive API structure, etc.)

So, this gives us two options:

* Reach inside Hive's Thrift objects to null out fields which we don't need, or
* Design an Impala-specific, compact representation for the data that omits all 
but essential objects and fields.

The second choice provides a huge opportunity for memory optimization. The 
first is a crude-but-effective short-term solution.


was (Author: paul.rogers):
Analysis:

* Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects.
* Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} 
objects.
* Impala's {{LocalTable}} contains a Hive {{Table}} object.
* The {{Table}} object is defined in [Hive's Thrift 
schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift]
 API. It does not contain a list of partitions.

Things here get complex because Cloudera does not provide source jars for its 
build of Hive, so can't step into or set breakpoints in Hive code.

Impala loads tables in the background by calling {{HdfsTable.load()}}:

* {{load()}} calls {{MetaStoreUtil.fetchAllPartitions()}} to get the partitions.
* The list of Hive {{Partition} objects is passed to 
{{HdfsTable.loadAllPartitions()}}.
* Hive's 
[{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java]
 is generated from Thrift. Contains a {{StorageDescriptor}}.
* Hive's 
[{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java]
 contains the list of {{FieldSchema}} objects which Todd saw in the heap dump.

Things are a bit confusing because:

* Hive defines a different 
[{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java]
 class, which contains a {{TableSpec}}.
* Hive's 
[{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java]
 contains a list of {{Partition}} objects.
A quick scan of the Hive code suggests that Hive's Thrift objects carry more 
info that is required in the Impala cache. Creating Impala-specific, 
high-performance versions would likely save space. (No need for parent 
pointers, no need for the two-level Hive API structure, etc.)

So, this gives us two 

[jira] [Comment Edited] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache

2018-09-26 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629410#comment-16629410
 ] 

Paul Rogers edited comment on IMPALA-7501 at 9/26/18 11:07 PM:
---

Analysis:

* Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects.
* Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} 
objects.
* Impala's {{LocalTable}} contains a Hive {{Table}} object.
* The {{Table}} object is defined in [Hive's Thrift 
schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift]
 API. It does not contain a list of partitions.

Things here get complex because Cloudera does not provide source jars for its 
build of Hive, so can't step into or set breakpoints in Hive code.

Impala loads tables in the background by calling {{HdfsTable.load()}}:

* {{load()}} calls {{MetaStoreUtil.fetchAllPartitions()}} to get the partitions.
* The list of Hive {{Partition} objects is passed to 
{{HdfsTable.loadAllPartitions()}}.
* Hive's 
[{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java]
 is generated from Thrift. Contains a {{StorageDescriptor}}.
* Hive's 
[{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java]
 contains the list of {{FieldSchema}} objects which Todd saw in the heap dump.

Things are a bit confusing because:

* Hive defines a different 
[{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java]
 class, which contains a {{TableSpec}}.
* Hive's 
[{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java]
 contains a list of {{Partition}} objects.


was (Author: paul.rogers):
Analysis:

* Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects.
* Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} 
objects.
* Impala's {{LocalTable}} contains a Hive {{Table}} object. 
* Hive's 
[{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java]
 is a Hive-defined class, which contains a {{TableSpec}}.
* Hive's 
[{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java]
 contains a list of {{Partition}} objects.
* Hive's 
[{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java]
 is generated from Thrift. Contains a {{StorageDescriptor}}.
* Hive's 
[{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java]
 contains the list of {{FieldSchema}} objects which Todd saw in the heap dump.

The [Hive Thrift 
schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift]
 is an easier way to visualize Hive part of the above analysis.

A quick scan of the Hive code suggests that Hive's Thrift objects carry more 
info that is required in the Impala cache. Creating Impala-specific, 
high-performance versions would likely save space. (No need for parent 
pointers, no need for the two-level Hive API structure, etc.)

So, this gives us two options:

* Reach inside Hive's Thrift objects to null out fields which we don't need, or
* Design an Impala-specific, compact representation for the data that omits all 
but essential objects and fields.

The second choice provides a huge opportunity for memory optimization. The 
first is a crude-but-effective short-term solution.

> Slim down metastore Partition objects in LocalCatalog cache
> ---
>
> Key: IMPALA-7501
> URL: https://issues.apache.org/jira/browse/IMPALA-7501
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Todd Lipcon
>Priority: Minor
>
> I took a heap dump of an impalad running in LocalCatalog mode with a 2G limit 
> after running a production workload simulation for a couple hours. It had 
> 38.5M objects and 2.02GB heap (the vast majority of the heap is, as expected, 
> in the LocalCatalog cache). Of this total footprint, 1.78GB and 34.6M objects 
> are retained by 'Partition' objects. Drilling into those, 1.29GB and 33.6M 
> objects are retained by FieldSchema, which, as far as I remember, are ignored 
> on the partition level 

[jira] [Comment Edited] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache

2018-09-26 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629410#comment-16629410
 ] 

Paul Rogers edited comment on IMPALA-7501 at 9/26/18 11:07 PM:
---

Analysis:

* Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects.
* Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} 
objects.
* Impala's {{LocalTable}} contains a Hive {{Table}} object.
* The {{Table}} object is defined in [Hive's Thrift 
schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift]
 API. It does not contain a list of partitions.

Things here get complex because Cloudera does not provide source jars for its 
build of Hive, so can't step into or set breakpoints in Hive code.

Impala loads tables in the background by calling {{HdfsTable.load()}}:

* {{load()}} calls {{MetaStoreUtil.fetchAllPartitions()}} to get the partitions.
* The list of Hive {{Partition} objects is passed to 
{{HdfsTable.loadAllPartitions()}}.
* Hive's 
[{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java]
 is generated from Thrift. Contains a {{StorageDescriptor}}.
* Hive's 
[{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java]
 contains the list of {{FieldSchema}} objects which Todd saw in the heap dump.

Things are a bit confusing because:

* Hive defines a different 
[{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java]
 class, which contains a {{TableSpec}}.
* Hive's 
[{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java]
 contains a list of {{Partition}} objects.
A quick scan of the Hive code suggests that Hive's Thrift objects carry more 
info that is required in the Impala cache. Creating Impala-specific, 
high-performance versions would likely save space. (No need for parent 
pointers, no need for the two-level Hive API structure, etc.)

So, this gives us two options:

* Reach inside Hive's Thrift objects to null out fields which we don't need, or
* Design an Impala-specific, compact representation for the data that omits all 
but essential objects and fields.

The second choice provides a huge opportunity for memory optimization. The 
first is a crude-but-effective short-term solution.


was (Author: paul.rogers):
Analysis:

* Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects.
* Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} 
objects.
* Impala's {{LocalTable}} contains a Hive {{Table}} object.
* The {{Table}} object is defined in [Hive's Thrift 
schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift]
 API. It does not contain a list of partitions.

Things here get complex because Cloudera does not provide source jars for its 
build of Hive, so can't step into or set breakpoints in Hive code.

Impala loads tables in the background by calling {{HdfsTable.load()}}:

* {{load()}} calls {{MetaStoreUtil.fetchAllPartitions()}} to get the partitions.
* The list of Hive {{Partition} objects is passed to 
{{HdfsTable.loadAllPartitions()}}.
* Hive's 
[{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java]
 is generated from Thrift. Contains a {{StorageDescriptor}}.
* Hive's 
[{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java]
 contains the list of {{FieldSchema}} objects which Todd saw in the heap dump.

Things are a bit confusing because:

* Hive defines a different 
[{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java]
 class, which contains a {{TableSpec}}.
* Hive's 
[{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java]
 contains a list of {{Partition}} objects.

> Slim down metastore Partition objects in LocalCatalog cache
> ---
>
> Key: IMPALA-7501
> URL: https://issues.apache.org/jira/browse/IMPALA-7501
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Todd Lipcon
>Priority: Minor
>
> I took a heap dump of an impalad running in LocalCatalog 

[jira] [Commented] (IMPALA-7597) "show partitions" does not retry on InconsistentMetadataFetchException

2018-09-26 Thread Vuk Ercegovac (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629491#comment-16629491
 ] 

Vuk Ercegovac commented on IMPALA-7597:
---

The issue reported here is one example of InconsistentMetadataFetchException 
that can be thrown by code that is not under the retry loop of 
createExecRequest.

Working backwards, all of these a thrown from sendRequest in 
CatalogMetaProvider when fetching from catalogd and at catalogd, 1) not finding 
an expected object (e.g., database might have been deleted and now we're 
fetching its list of table names, which is no longer valid) or 2) finding that 
versions mismatch due to an interleaved write.

Such inconsistencies are possible at every step of the schema hierarchy, e.g., 
list dbs, get db info, list table names, load table, load table col stats, list 
partitions, load partition(s), list functions, load function.

With the push architecture ("v1"), many of these operations would succeed but 
with potentially stale data. For example, if the table is present locally, its 
partitions are also present, so "show partitions" would complete. With the pull 
architecture ("v2"), if a new partition is added or the table is dropped for 
example, after the table is cached but before the partitions are fetched, the 
change will be reported as an exception. While the exception reflects a more 
current state, such exceptions offer a different behavior than with "v1". With 
"v1", a stale result can be returned. A follow-up operation, for example 
listing the tables in a database for a database that was listed (via show 
databases) but since dropped would just result in an error stating that the 
database does not exist.

For queries, we chose to explicitly retry. An option here is to retry for all 
such operations. We can do so with a retrying wrapper with the same interface 
(similar to the hms retrying client). However, that may be too heavyweight an 
approach. For example, getCatalogMetrics (and its callers) should be able to 
proceed when such an exception arises-- its for internal book-keeping and can 
be skipped. An alternative is to provide a wrapper that retries and can easily 
be obtained-- first thought is to add something along side getCatalog in 
Frontend, e.g., getRetryableCatalog-- and to use it where needed. Further 
alternatives include making the exception checked, which was pointed out in a 
todo (along with it being viral). Another approach is to make v2's cache more 
coarse grained. For example, a database can include all its table names and 
functions (avoids the double check).

In addition, a way to test this is needed. Initial thought is to inject time 
delays and check that at least one such inconsistency is encountered and 
retried per operation.

> "show partitions" does not retry on InconsistentMetadataFetchException
> --
>
> Key: IMPALA-7597
> URL: https://issues.apache.org/jira/browse/IMPALA-7597
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: bharath v
>Assignee: Vuk Ercegovac
>Priority: Critical
>
> IMPALA-7530 added retries in case LocalCatalog throws 
> InconsistentMetadataFetchException. These retries apply to all code paths 
> taking {{Frontend#createExecRequest()}}. 
> "show partitions" additionally takes {{Frontend#getTableStats()} and aborts 
> the first time it sees InconsistentMetadataFetchException. 
> We need to make sure all the queries (especially DDLs) retry if they hit this 
> exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache

2018-09-26 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629450#comment-16629450
 ] 

Philip Zeyliger commented on IMPALA-7501:
-

I think Todd's immediate suggestion here is to null out the Thrift stuff. Note 
that I think we first retrieve in in {{catalogd}} but it eventually makes its 
way into {{impalad}} and is presumably Thrift-serialized on the way. It may be 
useful to null it out in {{catalogd}} since memory there is also valuable, but 
you'll have to work out the details.

> Slim down metastore Partition objects in LocalCatalog cache
> ---
>
> Key: IMPALA-7501
> URL: https://issues.apache.org/jira/browse/IMPALA-7501
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Todd Lipcon
>Priority: Minor
>
> I took a heap dump of an impalad running in LocalCatalog mode with a 2G limit 
> after running a production workload simulation for a couple hours. It had 
> 38.5M objects and 2.02GB heap (the vast majority of the heap is, as expected, 
> in the LocalCatalog cache). Of this total footprint, 1.78GB and 34.6M objects 
> are retained by 'Partition' objects. Drilling into those, 1.29GB and 33.6M 
> objects are retained by FieldSchema, which, as far as I remember, are ignored 
> on the partition level by the Impala planner. So, with a bit of slimming down 
> of these objects, we could make a huge dent in effective cache capacity given 
> a fixed budget. Reducing object count should also have the effect of improved 
> GC performance (old gen GC is more closely tied to object count than size)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7636) Avoid storing hash in hash table bucket for hash tables in join

2018-09-26 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-7636:
-

 Summary: Avoid storing hash in hash table bucket for hash tables 
in join
 Key: IMPALA-7636
 URL: https://issues.apache.org/jira/browse/IMPALA-7636
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Affects Versions: Impala 3.1.0
Reporter: Tim Armstrong


Somewhat related to IMPALA-7635, I think storing the precomputed hash in the 
hash table buckets is of questionable benefit for joins. It's useful for 
aggregations since we frequently resize the hash tables, but in joins it's only 
used to short-circuit calling Equal(), which often isn't that expensive. It's 
unclear how many calls to Equal() are actually avoided. We should do some 
benchmarks to determine . As a sanity check for the idea, we could remove the 
(hash == bucket->hash) check in Probe() and see if performance is affected.

The difficult part here is figuring out how to share the HashTable code between 
the agg and join but having different bucket representations - templates?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7635) Reduce size of hash tables in-memory by packing buckets more densely

2018-09-26 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-7635:
-

 Summary: Reduce size of hash tables in-memory by packing buckets 
more densely
 Key: IMPALA-7635
 URL: https://issues.apache.org/jira/browse/IMPALA-7635
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Affects Versions: Impala 3.1.0
Reporter: Tim Armstrong


Currently the hash tables used for hash join and aggregation use 16 bytes per 
bucket and 24 bytes per additional duplicate for a key:

{code}
  /// Linked list of entries used for duplicates.
  struct DuplicateNode {
/// Used for full outer and right {outer, anti, semi} joins. Indicates 
whether the
/// row in the DuplicateNode has been matched.
/// From an abstraction point of view, this is an awkward place to store 
this
/// information.
/// TODO: Fold this flag in the next pointer below.
bool matched;

/// Chain to next duplicate node, NULL when end of list.
DuplicateNode* next;
HtData htdata;
  };

  struct Bucket {
/// Whether this bucket contains a vaild entry, or it is empty.
bool filled;

/// Used for full outer and right {outer, anti, semi} joins. Indicates 
whether the
/// row in the bucket has been matched.
/// From an abstraction point of view, this is an awkward place to store 
this
/// information but it is efficient. This space is otherwise unused.
bool matched;

/// Used in case of duplicates. If true, then the bucketData union should 
be used as
/// 'duplicates'.
bool hasDuplicates;

/// Cache of the hash for data.
/// TODO: Do we even have to cache the hash value?
uint32_t hash;

/// Either the data for this bucket or the linked list of duplicates.
union {
  HtData htdata;
  DuplicateNode* duplicates;
} bucketData;
  };

{code}

There are some comments in the code that suggest folding the boolean values 
into the upper bits of the pointers (since on amd64 the address space is only 
48 bits, but moving to 57 bits apparently - see 
https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf).
 That would reduce the bucket to 12 bytes of actual data.

This would give us the opportunity to reduce memory requirements of joins and 
the pressure on caches significantly, provided we can work out the 
implementation issues and the cost of the bit manipulation doesn't exceed the 
benefit (my intuition is that cache effects are way more important but I could 
be wrong).

Here's a rough idea of what we could do:
# Implement folding of booleans into the pointer and mark struct Bucket as 
packed so that it doesn't just undo the work with additional padding.
# Modifying Hashtable to work with the new bucket structure. This needs a 
little thought since the bucket allocations must be a power-of-two size in 
bytes, but we also need the hash table entries to be a power-of-two in order 
for masking the hash to get the bucket number to work. I think either we could 
just leave wasted space in the buffer or switch to a non-power-of-two number of 
buckets and using an alternative method of getting the bucket from the hash: 
https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
# Run benchmarks to see if it's beneficial. The effect probably depends on the 
data set size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7634) Impala 3.1 Doc: Doc the command to gracefully shutdown Impala

2018-09-26 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni updated IMPALA-7634:

Affects Version/s: Impala 3.1.0

> Impala 3.1 Doc: Doc the command to gracefully shutdown Impala
> -
>
> Key: IMPALA-7634
> URL: https://issues.apache.org/jira/browse/IMPALA-7634
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Affects Versions: Impala 3.1.0
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7634) Impala 3.1 Doc: Doc the command to gracefully shutdown Impala

2018-09-26 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni updated IMPALA-7634:

Target Version: Impala 3.1.0

> Impala 3.1 Doc: Doc the command to gracefully shutdown Impala
> -
>
> Key: IMPALA-7634
> URL: https://issues.apache.org/jira/browse/IMPALA-7634
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7634) Impala 3.1 Doc: Doc the command to gracefully shutdown Impala

2018-09-26 Thread Alex Rodoni (JIRA)
Alex Rodoni created IMPALA-7634:
---

 Summary: Impala 3.1 Doc: Doc the command to gracefully shutdown 
Impala
 Key: IMPALA-7634
 URL: https://issues.apache.org/jira/browse/IMPALA-7634
 Project: IMPALA
  Issue Type: Sub-task
  Components: Docs
Reporter: Alex Rodoni
Assignee: Alex Rodoni






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache

2018-09-26 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629410#comment-16629410
 ] 

Paul Rogers commented on IMPALA-7501:
-

Analysis:

* Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects.
* Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} 
objects.
* Impala's {{LocalTable}} contains a Hive {{Table}} object. 
* Hive's 
[{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java]
 is a Hive-defined class, which contains a {{TableSpec}}.
* Hive's 
[{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java]
 contains a list of {{Partition}} objects.
* Hive's 
[{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java]
 is generated from Thrift. Contains a {{StorageDescriptor}}.
* Hive's 
[{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java]
 contains the list of {{FieldSchema}} objects which Todd saw in the heap dump.

The [Hive Thrift 
schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift]
 is an easier way to visualize Hive part of the above analysis.

A quick scan of the Hive code suggests that Hive's Thrift objects carry more 
info that is required in the Impala cache. Creating Impala-specific, 
high-performance versions would likely save space. (No need for parent 
pointers, no need for the two-level Hive API structure, etc.)

So, this gives us two options:

* Reach inside Hive's Thrift objects to null out fields which we don't need, or
* Design an Impala-specific, compact representation for the data that omits all 
but essential objects and fields.

The second choice provides a huge opportunity for memory optimization. The 
first is a crude-but-effective short-term solution.

> Slim down metastore Partition objects in LocalCatalog cache
> ---
>
> Key: IMPALA-7501
> URL: https://issues.apache.org/jira/browse/IMPALA-7501
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Todd Lipcon
>Priority: Minor
>
> I took a heap dump of an impalad running in LocalCatalog mode with a 2G limit 
> after running a production workload simulation for a couple hours. It had 
> 38.5M objects and 2.02GB heap (the vast majority of the heap is, as expected, 
> in the LocalCatalog cache). Of this total footprint, 1.78GB and 34.6M objects 
> are retained by 'Partition' objects. Drilling into those, 1.29GB and 33.6M 
> objects are retained by FieldSchema, which, as far as I remember, are ignored 
> on the partition level by the Impala planner. So, with a bit of slimming down 
> of these objects, we could make a huge dent in effective cache capacity given 
> a fixed budget. Reducing object count should also have the effect of improved 
> GC performance (old gen GC is more closely tied to object count than size)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7633) Impala 3.1 Doc: Add support for multiple distinct operators in the same query block

2018-09-26 Thread Alex Rodoni (JIRA)
Alex Rodoni created IMPALA-7633:
---

 Summary: Impala 3.1 Doc: Add support for multiple distinct 
operators in the same query block
 Key: IMPALA-7633
 URL: https://issues.apache.org/jira/browse/IMPALA-7633
 Project: IMPALA
  Issue Type: Sub-task
  Components: Docs
Affects Versions: Impala 3.1.0
Reporter: Alex Rodoni
Assignee: Alex Rodoni






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7581) Hang in buffer-pool-test

2018-09-26 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629383#comment-16629383
 ] 

Tim Armstrong commented on IMPALA-7581:
---

Yeah both reasonable suggestions.I also thought about adding a background 
thread to unit tests that called abort() after a timeout.

> Hang in buffer-pool-test
> 
>
> Key: IMPALA-7581
> URL: https://issues.apache.org/jira/browse/IMPALA-7581
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: broken-build, flaky
> Attachments: gdb.txt
>
>
> We have observed a hang in buffer-pool-test an ASAN build. Unfortunately, no 
> logs were generated with any info about what might have happened.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7581) Hang in buffer-pool-test

2018-09-26 Thread Philip Zeyliger (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629351#comment-16629351
 ] 

Philip Zeyliger commented on IMPALA-7581:
-

We could also wrap the test runners in {{/usr/bin/timeout}} to make debugging 
this slightly more pleasant. 

I think it's reasonable to skip them under ASAN. We already skip death tests 
with release builds:
{code}
// Gtest's ASSERT_DEBUG_DEATH macro has peculiar semantics where in debug 
builds it
// executes the code in a forked process, so it has no visible side-effects, 
but in
// release builds it executes the code as normal. This makes it difficult to 
write
// death tests that work in both debug and release builds. To avoid this 
problem, update
// our wrapper macro to simply omit the death test expression in release 
builds, where we
// can't actually test DCHECKs anyway.
#define IMPALA_ASSERT_DEBUG_DEATH(fn, msg)
#endif
{code}

> Hang in buffer-pool-test
> 
>
> Key: IMPALA-7581
> URL: https://issues.apache.org/jira/browse/IMPALA-7581
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: broken-build, flaky
> Attachments: gdb.txt
>
>
> We have observed a hang in buffer-pool-test an ASAN build. Unfortunately, no 
> logs were generated with any info about what might have happened.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7628) test_tls_ecdh failing on CentOS 6/Python 2.6

2018-09-26 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629342#comment-16629342
 ] 

ASF subversion and git services commented on IMPALA-7628:
-

Commit 09150f04cac84965e3b390404c57a51261aecf56 in impala's branch 
refs/heads/master from [~tarmstr...@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=09150f0 ]

IMPALA-7628: skip test_tls_ecdh on Python 2.6

This is a temporary workaround. On the CentOS 6 build that failed
test_tls_v12, test_wildcard_san_ssl and test_wildcard_ssl were
all skipped so I figured this will unblock the tests without
losing coverage on most platforms that have recent Python.

Change-Id: I94ae9d254d5fd337774a24106eb9b08585ac0b01
Reviewed-on: http://gerrit.cloudera.org:8080/11519
Reviewed-by: Thomas Marshall 
Tested-by: Impala Public Jenkins 


> test_tls_ecdh failing on CentOS 6/Python 2.6
> 
>
> Key: IMPALA-7628
> URL: https://issues.apache.org/jira/browse/IMPALA-7628
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
> Environment: CentOS 6.4, Python 2.6
>Reporter: Tim Armstrong
>Assignee: Thomas Tauber-Marshall
>Priority: Blocker
>
> {noformat}
> custom_cluster/test_client_ssl.py:125: in test_tls_ecdh
> self._validate_positive_cases("%s/server-cert.pem" % self.CERT_DIR)
> custom_cluster/test_client_ssl.py:198: in _validate_positive_cases
> result = run_impala_shell_cmd(shell_options)
> shell/util.py:97: in run_impala_shell_cmd
> result.stderr)
> E   AssertionError: Cmd --ssl -q 'select 1 + 2' was expected to succeed: 
> Starting Impala Shell without Kerberos authentication
> E   SSL is enabled. Impala server certificates will NOT be verified (set 
> --ca_cert to change)
> E   
> /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80:
>  DeprecationWarning: 3th positional argument is deprecated. Use keyward 
> argument insteand.
> E DeprecationWarning)
> E   
> /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80:
>  DeprecationWarning: 4th positional argument is deprecated. Use keyward 
> argument insteand.
> E DeprecationWarning)
> E   
> /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80:
>  DeprecationWarning: 5th positional argument is deprecated. Use keyward 
> argument insteand.
> E DeprecationWarning)
> E   
> /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:216:
>  DeprecationWarning: validate is deprecated. Use cert_reqs=ssl.CERT_NONE 
> instead
> E DeprecationWarning)
> E   No handlers could be found for logger "thrift.transport.TSSLSocket"
> E   Error connecting: TTransportException, Could not connect to 
> localhost:21000: [Errno 1] _ssl.c:490: error:14094410:SSL 
> routines:SSL3_READ_BYTES:sslv3 alert handshake failure
> E   Not connected to Impala, could not execute queries.
> {noformat}
> Git hash is e38715e25297cc3643482be04e3b1b273e339b54
> I'm going to push out a temporary fix to unblock tests (since there are other 
> related tests skipped on this platform) but I'll let Thomas validate the 
> correctness of it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-26 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622856#comment-16622856
 ] 

Paul Rogers edited comment on IMPALA-7310 at 9/26/18 7:45 PM:
--

The planner uses NDVs to make binary decisions: do I do x or y? (Do I put t1 on 
the build side of a join, or to I put it on the probe site?) In most cases, the 
values being compared are order-of-magnitude different, and so fine nuances of 
value are not important. We simply need some reasonable non-zero number so that 
the calcs can play out.

(Note: the following is left for the record, but the final fix is much more 
narrowly tailored.)

The simplest fix is to handle the non-stats case for a 
[{{ColumnStats}}|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/ColumnStats.java]
 instance.

The current code in {{initColStats()}} initializes NDV to -1 (undefined). 
Suggested alternatives:
 * If type is Boolean, NDV = 2
 * If type is TINYINT, NDV = 256.
 * If type is anything else, assume NDV = some constant, say 1000.

Note that a variation of the above logic actually already exists in 
{{createHiveColStatsData()}} where it is used to bound the NDV value. So, we 
just reuse it. That code also suggests we can NDV at row count. But, since our 
guesses are small, the row count might not add much value. Or, our NDV guess 
might be some fraction of row count, if we want to be fancy.

{{ColumnStats}} already has a {{hasStats()}} method which checks if NDV is 
other than -1. Since NDV will always not be some value, change this method to 
check only {{numNulls_}}, which will continue to be -1 without stats.

Finally, in {{createHiveColStatsData}}, set a floor on NDV at 1 to account for 
the fact that an all-null column has HDV=0. Or, to be conservative, if NDV <= 
10, add one to NDV to account for nulls. (Do this always, since a column that 
claims to be non-null can eventually become null as the result of an outer 
join.)

Next, modify {{update()}} to use the defaults (to be set in {{initColStats()}} 
for the "incompatible" case.

As a result, when the plan nodes ask for NDV, they won't get a 0 value if we 
have no data, nor will they get 0 if a column is all nulls.

Add or modify unit tests to verify the above logic, especially the defaults 
case and how the defaults propagate up the plan tree.

The risk is that some plans will change. We hope they change to favor getting 
the correct plan more often. But, there will be some use case for which the 
old, wrong, values produced a more accurate plan than the new estimates. This 
is always a risk.


was (Author: paul.rogers):
The planner uses NDVs to make binary decisions: do I do x or y? (Do I put t1 on 
the build side of a join, or to I put it on the probe site?) In most cases, the 
values being compared are order-of-magnitude different, and so fine nuances of 
value are not important. We simply need some reasonable non-zero number so that 
the calcs can play out.

The simplest fix is to handle the non-stats case for a 
[{{ColumnStats}}|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/ColumnStats.java]
 instance.

The current code in {{initColStats()}} initializes NDV to -1 (undefined). 
Suggested alternatives:
 * If type is Boolean, NDV = 2
 * If type is TINYINT, NDV = 256.
 * If type is anything else, assume NDV = some constant, say 1000.

Note that a variation of the above logic actually already exists in 
{{createHiveColStatsData()}} where it is used to bound the NDV value. So, we 
just reuse it. That code also suggests we can NDV at row count. But, since our 
guesses are small, the row count might not add much value. Or, our NDV guess 
might be some fraction of row count, if we want to be fancy.

{{ColumnStats}} already has a {{hasStats()}} method which checks if NDV is 
other than -1. Since NDV will always not be some value, change this method to 
check only {{numNulls_}}, which will continue to be -1 without stats.

Finally, in {{createHiveColStatsData}}, set a floor on NDV at 1 to account for 
the fact that an all-null column has HDV=0. Or, to be conservative, if NDV <= 
10, add one to NDV to account for nulls. (Do this always, since a column that 
claims to be non-null can eventually become null as the result of an outer 
join.)

Next, modify {{update()}} to use the defaults (to be set in {{initColStats()}} 
for the "incompatible" case.

As a result, when the plan nodes ask for NDV, they won't get a 0 value if we 
have no data, nor will they get 0 if a column is all nulls.

Add or modify unit tests to verify the above logic, especially the defaults 
case and how the defaults propagate up the plan tree.

The risk is that some plans will change. We hope they change to favor getting 
the correct plan more often. But, there will be some use case for which the 
old, wrong, values produced a more 

[jira] [Comment Edited] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-26 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626483#comment-16626483
 ] 

Paul Rogers edited comment on IMPALA-7310 at 9/26/18 7:44 PM:
--

Final solution is even simpler, since we don't want to change anything except 
handling of a table with stats, and a column of all nulls. So we apply the 
adjustment only if:

 * The column comes from a base table (not an internal column such as 
{{COUNT\(*)}}
 * The table has stats
 * The column in question is nullable
 * The column either has no null count, or the null count is non-zero
 * The NDV without nulls is 0 or 1

In this very limited case, we bump the NDV by 1.

As it turns out, the TPC-H test cases have several queries in which a table 
column is nullable, has only one or two values, and those columns are clearly 
meant to be non-null. The above fix works around these cases so that such cases 
don't cause large changes to the results for {{PlannerTest}}.


was (Author: paul.rogers):
Final solution is even simpler, since we don't want to change anything except 
handling of a table with stats, and a column of all nulls. So we apply the 
adjustment only if:
 * The column comes from a base table (not an internal column such as 
{{COUNT(*)}}
 * The table has stats
 * The column in question is nullable
 * The column either has no null count, or the null count is non-zero
 * The NDV without nulls is 0 or 1

 

In this very limited case, we bump the NDV by 1.

As it turns out, the TPC-H test cases have several queries in which a table 
column is nullable, has only one or two values, and those columns are clearly 
meant to be non-null. The above fix works around these cases so that such cases 
don't cause large changes to the results for {{PlannerTest}}.

> Compute Stats not computing NULLs as a distinct value causing wrong estimates
> -
>
> Key: IMPALA-7310
> URL: https://issues.apache.org/jira/browse/IMPALA-7310
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, 
> Impala 2.11.0, Impala 3.0, Impala 2.12.0
>Reporter: Zsombor Fedor
>Assignee: Paul Rogers
>Priority: Major
>
> As seen in other DBMSs
> {code:java}
> NDV(col){code}
> not counting NULL as a distinct value. The same also applies to
> {code:java}
> COUNT(DISTINCT col){code}
> This is working as intended, but when computing column statistics it can 
> cause some anomalies (i.g. bad join order) as compute stats uses NDV() to 
> determine columns NDVs.
>  
> For example when aggregating more columns, the estimated cardinality is 
> [counted as the product of the columns' number of distinct 
> values.|https://github.com/cloudera/Impala/blob/64cd0bb0c3529efa0ab5452c4e9e2a04fd815b4f/fe/src/main/java/org/apache/impala/analysis/Expr.java#L669]
>  If there is a column full of NULLs the whole product will be 0.
>  
> There are two possible fix for this.
> Either we should count NULLs as a distinct value when Computing Stats in the 
> query:
> {code:java}
> SELECT NDV(a) + COUNT(DISTINCT CASE WHEN a IS NULL THEN 1 END) AS a, CAST(-1 
> as BIGINT), 4, CAST(4 as DOUBLE) FROM test;{code}
> instead of
> {code:java}
> SELECT NDV(a) AS a, CAST(-1 as BIGINT), 4, CAST(4 as DOUBLE) FROM test;{code}
>  
>  
> Or we should change the planner 
> [function|https://github.com/cloudera/Impala/blob/2d2579cb31edda24457d33ff5176d79b7c0432c5/fe/src/main/java/org/apache/impala/planner/AggregationNode.java#L169]
>  to take care of this bug.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7627) Parallel the fetching permission process

2018-09-26 Thread Vuk Ercegovac (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629301#comment-16629301
 ] 

Vuk Ercegovac commented on IMPALA-7627:
---

Pls include the version (or githash) of Impala from which these measurements 
were obtained.

Also, would be useful to have units on those measurements as well as number of 
partitions/files.

> Parallel the fetching permission process
> 
>
> Key: IMPALA-7627
> URL: https://issues.apache.org/jira/browse/IMPALA-7627
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Peikai Zheng
>Assignee: Peikai Zheng
>Priority: Major
>
> There are three phases when the Catalogd loading the metadata of a table.
> Firstly, the Catalogd fetches the metadata from Hive metastore;
> Then, the Catalogd fetches the permission of each partition from HDFS 
> NameNode;
> Finally, the Catalogd loads the file descriptor from HDFS NameNode.
> According to my test result:
> ||Average Time(GetFileInfoThread=10) || phase 1 || phase 2 || phase 3||   
> 
> |idm.sauron_message|9.9917115|459.2106944|95.0179163|
> |default.revenue_enriched|12.3377474|111.2969046|40.827472|
> |default.upp_raw_prod|1.5143162|50.0251426|12.6805323|
> |default.hit_to_beacon_playback_prod|1.4294509|49.7670539|18.3557858|
> |default.sitetracking_enriched|13.0003804|112.8746656|42.1824032|
> |default.player_custom_event|9.2618705|493.4865302|116.4986184|
> |default.revenue_day_est|57.9116561|106.5028664|24.005822|
> The majority of the time occupied by the second phase. 
> So, I suggest to parallel the second phase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7632) Erasure coding builds still failing because of default query options

2018-09-26 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-7632:
-

 Summary: Erasure coding builds still failing because of default 
query options
 Key: IMPALA-7632
 URL: https://issues.apache.org/jira/browse/IMPALA-7632
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.1.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong


Two tests fail because the default query options they set were clobbered by the 
custom cluster test infra:

*TestSetAndUnset.test_set_and_unset

*TestAdmissionController.test_set_request_pool
{noformat}
hs2/hs2_test_suite.py:48: in add_session
fn(self)
custom_cluster/test_set_and_unset.py:44: in test_set_and_unset
assert "DEBUG_ACTION\tcustom\tDEVELOPMENT" in result.data, "baseline"
E   AssertionError: baseline
E   assert 'DEBUG_ACTION\tcustom\tDEVELOPMENT' in 
['ABORT_ON_ERROR\t0\tREGULAR', 'ALLOW_ERASURE_CODED_FILES\t1\tDEVELOPMENT', 
'APPX_COUNT_DISTINCT\t0\tADVANCED', 'BATCH_SIZE\t0\tDEVELOPMENT', 
'BUFFER_POOL_LIMIT\t\tADVANCED', 'COMPRESSION_CODEC\t\tREGULAR', ...]
E+  where ['ABORT_ON_ERROR\t0\tREGULAR', 
'ALLOW_ERASURE_CODED_FILES\t1\tDEVELOPMENT', 
'APPX_COUNT_DISTINCT\t0\tADVANCED', 'BATCH_SIZE\t0\tDEVELOPMENT', 
'BUFFER_POOL_LIMIT\t\tADVANCED', 'COMPRESSION_CODEC\t\tREGULAR', ...] = 
.data

hs2/hs2_test_suite.py:48: in add_session
fn(self)
custom_cluster/test_admission_controller.py:317: in test_set_request_pool
['MEM_LIMIT=2', 'REQUEST_POOL=root.queueB'])
custom_cluster/test_admission_controller.py:224: in __check_query_options
assert False, "Expected query options %s, got %s." % (expected, actual)
E   AssertionError: Expected query options 
MEM_LIMIT=2,REQUEST_POOL=root.queueB, got 
allow_erasure_coded_files=1,request_pool=root.queueb.
E   assert False{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7631) Add Sentry configuration to allow specific privileges to be granted explicitly

2018-09-26 Thread Fredy Wijaya (JIRA)
Fredy Wijaya created IMPALA-7631:


 Summary: Add Sentry configuration to allow specific privileges to 
be granted explicitly
 Key: IMPALA-7631
 URL: https://issues.apache.org/jira/browse/IMPALA-7631
 Project: IMPALA
  Issue Type: Sub-task
  Components: Infrastructure
Affects Versions: Impala 3.0
Reporter: Fredy Wijaya
Assignee: Fredy Wijaya


Sentry requires a new configuration (sentry.db.explicit.grants.permitted) to 
specify which privileges are permitted to be granted explicitly: 
https://issues.apache.org/jira/browse/SENTRY-2413. We need to update 
sentry-site*template files with a new configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-7631) Add Sentry configuration to allow specific privileges to be granted explicitly

2018-09-26 Thread Fredy Wijaya (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-7631 started by Fredy Wijaya.

> Add Sentry configuration to allow specific privileges to be granted explicitly
> --
>
> Key: IMPALA-7631
> URL: https://issues.apache.org/jira/browse/IMPALA-7631
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Affects Versions: Impala 3.0
>Reporter: Fredy Wijaya
>Assignee: Fredy Wijaya
>Priority: Major
>
> Sentry requires a new configuration (sentry.db.explicit.grants.permitted) to 
> specify which privileges are permitted to be granted explicitly: 
> https://issues.apache.org/jira/browse/SENTRY-2413. We need to update 
> sentry-site*template files with a new configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7566) TAcceptQueueServer connection setup should have timeout

2018-09-26 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker resolved IMPALA-7566.
-
   Resolution: Duplicate
 Assignee: (was: bharath v)
Fix Version/s: Impala 2.11.0

> TAcceptQueueServer connection setup should have timeout
> ---
>
> Key: IMPALA-7566
> URL: https://issues.apache.org/jira/browse/IMPALA-7566
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients, Distributed Exec
>Affects Versions: Impala 2.11.0, Impala 3.0, Impala 2.12.0
>Reporter: Michael Ho
>Priority: Blocker
> Fix For: Impala 2.11.0
>
>
> Currently, there is no timeout when establishing a connection with an Impala 
> client. For instance, if a client freezes whether (intentionally or 
> unintentionally) in the middle of connection establishment (e.g. 
> Saslhandshake), the *single thread* in {{connection_setup_pool}} of 
> {{TAcceptQueueServer}} will be stuck waiting for the client and thus all 
> other clients trying to connect to Beeswax or HS2 port of Impalad will be 
> stuck forever. Impala should consider adding a timeout on the socket during 
> connection establishment phase with a client so as limit the amount of time 
> the thread can be stuck.
> One can try using "{{openssl s_client"}} command to connect to Impalad with 
> TLS and Kerberos enabled and leave it opened. The thread doing the connection 
> setup will be stuck in the stack below:
> {noformat}
> Thread 551 (Thread 0x7fddde563700 (LWP 166354)):
> #0  0x003ce2a0e82d in read () from /lib64/libpthread.so.0
> #1  0x003ce56dea71 in ?? () from /usr/lib64/libcrypto.so.10
> #2  0x003ce56dcdc9 in BIO_read () from /usr/lib64/libcrypto.so.10
> #3  0x003ce9a2c1df in ssl3_read_n () from /usr/lib64/libssl.so.10
> #4  0x003ce9a2c8dd in ssl3_read_bytes () from /usr/lib64/libssl.so.10
> #5  0x003ce9a281a0 in ?? () from /usr/lib64/libssl.so.10
> #6  0x0208ede2 in 
> apache::thrift::transport::TSSLSocket::read(unsigned char*, unsigned int) ()
> #7  0x0208b6f3 in unsigned int 
> apache::thrift::transport::readAll(apache::thrift::transport::TSocket&,
>  unsigned char*, unsigned int) ()
> #8  0x00cb2aa9 in 
> apache::thrift::transport::TSaslTransport::receiveSaslMessage(apache::thrift::transport::NegotiationStatus*,
>  unsigned int*) ()
> #9  0x00cb03e4 in 
> apache::thrift::transport::TSaslServerTransport::handleSaslStartMessage() ()
> #10 0x00cb2c23 in 
> apache::thrift::transport::TSaslTransport::doSaslNegotiation() ()
> #11 0x00cb10b8 in 
> apache::thrift::transport::TSaslServerTransport::Factory::getTransport(boost::shared_ptr)
>  ()
> #12 0x00b13e47 in 
> apache::thrift::server::TAcceptQueueServer::SetupConnection(boost::shared_ptr)
>  ()
> #13 0x00b14932 in 
> boost::detail::function::void_function_obj_invoker2  boost::shared_ptr const&)#1}, void, 
> int, boost::shared_ptr 
> const&>::invoke(boost::detail::function::function_buffer&, int, 
> boost::shared_ptr const&) ()
> #14 0x00b177f9 in 
> impala::ThreadPool 
> >::WorkerThread(int) ()
> #15 0x00d602af in 
> impala::Thread::SuperviseThread(std::basic_string std::char_traits, std::allocator > const&, 
> std::basic_string, std::allocator > 
> const&, boost::function, impala::ThreadDebugInfo const*, 
> impala::Promise*) ()
> #16 0x00d60aaa in boost::detail::thread_data void (*)(std::basic_string, std::allocator 
> > const&, std::basic_string, 
> std::allocator > const&, boost::function, 
> impala::ThreadDebugInfo const*, impala::Promise*), 
> boost::_bi::list5 std::char_traits, std::allocator > >, 
> boost::_bi::value, 
> std::allocator > >, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value*> > > >::run() ()
> #17 0x012d756a in thread_proxy ()
> #18 0x003ce2a07aa1 in start_thread () from /lib64/libpthread.so.0
> #19 0x003ce26e893d in clone () from /lib64/libc.so.6
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7581) Hang in buffer-pool-test

2018-09-26 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629228#comment-16629228
 ] 

Tim Armstrong commented on IMPALA-7581:
---

I think the best theory is that the hang happens when a background thread 
allocates or frees memory at the same time as the death test forks. I took a 
look at the background threads and it wasn't really obvious which might be a 
likely culprit since the death test happens at a point where things should 
mostly be idle. Any background thread that logs something is a potential 
candidate, e.g. the pause monitor threads. Or it could be something in the JVM 
that uses malloc, which is more opaque. I tried running with a breakpoint set 
on malloc() and there's a lot of activity from the JVM, particularly around 
class loading.

One option to mitigate is to skip death tests under ASAN, under the theory that 
we're far more likely to grab global malloc locks there compared with TCMalloc.

> Hang in buffer-pool-test
> 
>
> Key: IMPALA-7581
> URL: https://issues.apache.org/jira/browse/IMPALA-7581
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: broken-build, flaky
> Attachments: gdb.txt
>
>
> We have observed a hang in buffer-pool-test an ASAN build. Unfortunately, no 
> logs were generated with any info about what might have happened.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-110) Add support for multiple distinct operators in the same query block

2018-09-26 Thread Thomas Tauber-Marshall (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629207#comment-16629207
 ] 

Thomas Tauber-Marshall commented on IMPALA-110:
---

Barring any unforeseen problems, this will be part of the 3.1 Apache release. I 
am not currently aware of any plans for a 2.13 release.

Cloudera does not in general make commitments about when features will land in 
CDH. For more info about that, you can contact Cloudera directly.

> Add support for multiple distinct operators in the same query block
> ---
>
> Key: IMPALA-110
> URL: https://issues.apache.org/jira/browse/IMPALA-110
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Affects Versions: Impala 0.5, Impala 1.4, Impala 2.0, Impala 2.2, Impala 
> 2.3.0
>Reporter: Greg Rahn
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>  Labels: sql-language
> Fix For: Impala 3.1.0
>
>
> Impala only allows a single (DISTINCT columns) expression in each query.
> {color:red}Note:
> If you do not need precise accuracy, you can produce an estimate of the 
> distinct values for a column by specifying NDV(column); a query can contain 
> multiple instances of NDV(column). To make Impala automatically rewrite 
> COUNT(DISTINCT) expressions to NDV(), enable the APPX_COUNT_DISTINCT query 
> option.
> {color}
> {code}
> [impala:21000] > select count(distinct i_class_id) from item;
> Query: select count(distinct i_class_id) from item
> Query finished, fetching results ...
> 16
> Returned 1 row(s) in 1.51s
> {code}
> {code}
> [impala:21000] > select count(distinct i_class_id), count(distinct 
> i_brand_id) from item;
> Query: select count(distinct i_class_id), count(distinct i_brand_id) from item
> ERROR: com.cloudera.impala.common.AnalysisException: Analysis exception (in 
> select count(distinct i_class_id), count(distinct i_brand_id) from item)
>   at 
> com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:133)
>   at 
> com.cloudera.impala.service.Frontend.createExecRequest(Frontend.java:221)
>   at 
> com.cloudera.impala.service.JniFrontend.createExecRequest(JniFrontend.java:89)
> Caused by: com.cloudera.impala.common.AnalysisException: all DISTINCT 
> aggregate functions need to have the same set of parameters as COUNT(DISTINCT 
> i_class_id); deviating function: COUNT(DISTINCT i_brand_id)
>   at 
> com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo(AggregateInfo.java:196)
>   at 
> com.cloudera.impala.analysis.AggregateInfo.create(AggregateInfo.java:143)
>   at 
> com.cloudera.impala.analysis.SelectStmt.createAggInfo(SelectStmt.java:466)
>   at 
> com.cloudera.impala.analysis.SelectStmt.analyzeAggregation(SelectStmt.java:347)
>   at com.cloudera.impala.analysis.SelectStmt.analyze(SelectStmt.java:155)
>   at 
> com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:130)
>   ... 2 more
> {code}
> Hive supports this:
> {code}
> $ hive -e "select count(distinct i_class_id), count(distinct i_brand_id) from 
> item;"
> Logging initialized using configuration in 
> file:/etc/hive/conf.dist/hive-log4j.properties
> Hive history file=/tmp/grahn/hive_job_log_grahn_201303052234_1625576708.txt
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=
> Starting Job = job_201302081514_0073, Tracking URL = 
> http://impala:50030/jobdetails.jsp?jobid=job_201302081514_0073
> Kill Command = /usr/lib/hadoop/bin/hadoop job  
> -Dmapred.job.tracker=m0525.mtv.cloudera.com:8021 -kill job_201302081514_0073
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 
> 1
> 2013-03-05 22:34:43,255 Stage-1 map = 0%,  reduce = 0%
> 2013-03-05 22:34:49,323 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:50,337 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:51,351 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:52,360 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:53,370 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:54,379 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:55,389 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> 2013-03-05 22:34:56,402 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> 2013-03-05 22:34:57,413 

[jira] [Updated] (IMPALA-110) Add support for multiple distinct operators in the same query block

2018-09-26 Thread Thomas Tauber-Marshall (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Tauber-Marshall updated IMPALA-110:
--
Fix Version/s: Impala 3.1.0

> Add support for multiple distinct operators in the same query block
> ---
>
> Key: IMPALA-110
> URL: https://issues.apache.org/jira/browse/IMPALA-110
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Affects Versions: Impala 0.5, Impala 1.4, Impala 2.0, Impala 2.2, Impala 
> 2.3.0
>Reporter: Greg Rahn
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>  Labels: sql-language
> Fix For: Impala 3.1.0
>
>
> Impala only allows a single (DISTINCT columns) expression in each query.
> {color:red}Note:
> If you do not need precise accuracy, you can produce an estimate of the 
> distinct values for a column by specifying NDV(column); a query can contain 
> multiple instances of NDV(column). To make Impala automatically rewrite 
> COUNT(DISTINCT) expressions to NDV(), enable the APPX_COUNT_DISTINCT query 
> option.
> {color}
> {code}
> [impala:21000] > select count(distinct i_class_id) from item;
> Query: select count(distinct i_class_id) from item
> Query finished, fetching results ...
> 16
> Returned 1 row(s) in 1.51s
> {code}
> {code}
> [impala:21000] > select count(distinct i_class_id), count(distinct 
> i_brand_id) from item;
> Query: select count(distinct i_class_id), count(distinct i_brand_id) from item
> ERROR: com.cloudera.impala.common.AnalysisException: Analysis exception (in 
> select count(distinct i_class_id), count(distinct i_brand_id) from item)
>   at 
> com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:133)
>   at 
> com.cloudera.impala.service.Frontend.createExecRequest(Frontend.java:221)
>   at 
> com.cloudera.impala.service.JniFrontend.createExecRequest(JniFrontend.java:89)
> Caused by: com.cloudera.impala.common.AnalysisException: all DISTINCT 
> aggregate functions need to have the same set of parameters as COUNT(DISTINCT 
> i_class_id); deviating function: COUNT(DISTINCT i_brand_id)
>   at 
> com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo(AggregateInfo.java:196)
>   at 
> com.cloudera.impala.analysis.AggregateInfo.create(AggregateInfo.java:143)
>   at 
> com.cloudera.impala.analysis.SelectStmt.createAggInfo(SelectStmt.java:466)
>   at 
> com.cloudera.impala.analysis.SelectStmt.analyzeAggregation(SelectStmt.java:347)
>   at com.cloudera.impala.analysis.SelectStmt.analyze(SelectStmt.java:155)
>   at 
> com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:130)
>   ... 2 more
> {code}
> Hive supports this:
> {code}
> $ hive -e "select count(distinct i_class_id), count(distinct i_brand_id) from 
> item;"
> Logging initialized using configuration in 
> file:/etc/hive/conf.dist/hive-log4j.properties
> Hive history file=/tmp/grahn/hive_job_log_grahn_201303052234_1625576708.txt
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=
> Starting Job = job_201302081514_0073, Tracking URL = 
> http://impala:50030/jobdetails.jsp?jobid=job_201302081514_0073
> Kill Command = /usr/lib/hadoop/bin/hadoop job  
> -Dmapred.job.tracker=m0525.mtv.cloudera.com:8021 -kill job_201302081514_0073
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 
> 1
> 2013-03-05 22:34:43,255 Stage-1 map = 0%,  reduce = 0%
> 2013-03-05 22:34:49,323 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:50,337 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:51,351 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:52,360 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:53,370 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:54,379 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:55,389 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> 2013-03-05 22:34:56,402 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> 2013-03-05 22:34:57,413 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> 2013-03-05 22:34:58,424 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> MapReduce Total cumulative CPU time: 8 seconds 580 msec
> Ended Job = job_201302081514_0073
> MapReduce Jobs Launched: 
> Job 0: Map: 1  Reduce: 

[jira] [Commented] (IMPALA-7627) Parallel the fetching permission process

2018-09-26 Thread bharath v (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629189#comment-16629189
 ] 

bharath v commented on IMPALA-7627:
---

There were some improvements in IMPALA-7320. Did you take a look at it already? 
https://gerrit.cloudera.org/#/c/11027/
Could you summarize your approach here?

> Parallel the fetching permission process
> 
>
> Key: IMPALA-7627
> URL: https://issues.apache.org/jira/browse/IMPALA-7627
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Peikai Zheng
>Assignee: Peikai Zheng
>Priority: Major
>
> There are three phases when the Catalogd loading the metadata of a table.
> Firstly, the Catalogd fetches the metadata from Hive metastore;
> Then, the Catalogd fetches the permission of each partition from HDFS 
> NameNode;
> Finally, the Catalogd loads the file descriptor from HDFS NameNode.
> According to my test result:
> ||Average Time(GetFileInfoThread=10) || phase 1 || phase 2 || phase 3||   
> 
> |idm.sauron_message|9.9917115|459.2106944|95.0179163|
> |default.revenue_enriched|12.3377474|111.2969046|40.827472|
> |default.upp_raw_prod|1.5143162|50.0251426|12.6805323|
> |default.hit_to_beacon_playback_prod|1.4294509|49.7670539|18.3557858|
> |default.sitetracking_enriched|13.0003804|112.8746656|42.1824032|
> |default.player_custom_event|9.2618705|493.4865302|116.4986184|
> |default.revenue_day_est|57.9116561|106.5028664|24.005822|
> The majority of the time occupied by the second phase. 
> So, I suggest to parallel the second phase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-1760) Add Impala SQL command to gracefully shut down an Impala daemon

2018-09-26 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-1760:
--
Summary: Add Impala SQL command to gracefully shut down an Impala daemon  
(was: Add Impala command to gracefully shut down an Impala daemon)

> Add Impala SQL command to gracefully shut down an Impala daemon
> ---
>
> Key: IMPALA-1760
> URL: https://issues.apache.org/jira/browse/IMPALA-1760
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Distributed Exec
>Affects Versions: Impala 2.1.1
>Reporter: Henry Robinson
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: resource-management, scalability, scheduler, usability
> Fix For: Impala 3.1.0
>
>
> In larger clusters, node maintenance is a frequent occurrence. There's no way 
> currently to stop an Impala node without failing running queries, without 
> draining queries across the whole cluster first. We should fix that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-1760) Add Impala command to gracefully shut down an Impala daemon

2018-09-26 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1760.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

This adds the primitive required for Impala to do a graceful shutdown. There is 
further work to integrate this with any management tools - currently this would 
require an admin user to run the command directly.

> Add Impala command to gracefully shut down an Impala daemon
> ---
>
> Key: IMPALA-1760
> URL: https://issues.apache.org/jira/browse/IMPALA-1760
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Distributed Exec
>Affects Versions: Impala 2.1.1
>Reporter: Henry Robinson
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: resource-management, scalability, scheduler, usability
> Fix For: Impala 3.1.0
>
>
> In larger clusters, node maintenance is a frequent occurrence. There's no way 
> currently to stop an Impala node without failing running queries, without 
> draining queries across the whole cluster first. We should fix that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7600) Mem limit exceeded in test_kudu_scan_mem_usage

2018-09-26 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7600.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Mem limit exceeded in test_kudu_scan_mem_usage
> --
>
> Key: IMPALA-7600
> URL: https://issues.apache.org/jira/browse/IMPALA-7600
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: broken-build, flaky
> Fix For: Impala 3.1.0
>
>
> Seen in an exhaustive release build:
> {noformat}
> 00:05:35  TestScanMemLimit.test_kudu_scan_mem_usage[exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: avro/snap/block] 
> 00:05:35 [gw6] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-asf-master-exhaustive-release/repos/Impala/bin/../infra/python/env/bin/python
> 00:05:35 query_test/test_mem_usage_scaling.py:358: in test_kudu_scan_mem_usage
> 00:05:35 self.run_test_case('QueryTest/kudu-scan-mem-usage', vector)
> 00:05:35 common/impala_test_suite.py:408: in run_test_case
> 00:05:35 result = self.__execute_query(target_impalad_client, query, 
> user=user)
> 00:05:35 common/impala_test_suite.py:623: in __execute_query
> 00:05:35 return impalad_client.execute(query, user=user)
> 00:05:35 common/impala_connection.py:160: in execute
> 00:05:35 return self.__beeswax_client.execute(sql_stmt, user=user)
> 00:05:35 beeswax/impala_beeswax.py:176: in execute
> 00:05:35 handle = self.__execute_query(query_string.strip(), user=user)
> 00:05:35 beeswax/impala_beeswax.py:350: in __execute_query
> 00:05:35 self.wait_for_finished(handle)
> 00:05:35 beeswax/impala_beeswax.py:371: in wait_for_finished
> 00:05:35 raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> 00:05:35 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> 00:05:35 EQuery aborted:Memory limit exceeded: Error occurred on backend 
> impala-ec2-centos74-m5-4xlarge-ondemand-0e2c.vpc.cloudera.com:22000 by 
> fragment b34270820f59a0c9:a507139e0001
> 00:05:35 E   Memory left in process limit: 10.12 GB
> 00:05:35 E   Memory left in query limit: -16.92 KB
> 00:05:35 E   Query(b34270820f59a0c9:a507139e): memory limit exceeded. 
> Limit=4.00 MB Reservation=0 ReservationLimit=0 OtherMemory=4.02 MB Total=4.02 
> MB Peak=4.02 MB
> 00:05:35 E Fragment b34270820f59a0c9:a507139e: Reservation=0 
> OtherMemory=40.10 KB Total=40.10 KB Peak=340.00 KB
> 00:05:35 E   EXCHANGE_NODE (id=2): Reservation=32.00 KB OtherMemory=0 
> Total=32.00 KB Peak=32.00 KB
> 00:05:35 E KrpcDeferredRpcs: Total=0 Peak=0
> 00:05:35 E   PLAN_ROOT_SINK: Total=0 Peak=0
> 00:05:35 E   CodeGen: Total=103.00 B Peak=332.00 KB
> 00:05:35 E Fragment b34270820f59a0c9:a507139e0001: Reservation=0 
> OtherMemory=3.98 MB Total=3.98 MB Peak=3.98 MB
> 00:05:35 E   SORT_NODE (id=1): Total=342.00 KB Peak=342.00 KB
> 00:05:35 E   KUDU_SCAN_NODE (id=0): Total=3.63 MB Peak=3.63 MB
> 00:05:35 E Queued Batches: Total=3.30 MB Peak=3.63 MB
> 00:05:35 E   KrpcDataStreamSender (dst_id=2): Total=1.16 KB Peak=1.16 KB
> 00:05:35 E   CodeGen: Total=3.66 KB Peak=1.14 MB
> 00:05:35 E   
> 00:05:35 E   Memory limit exceeded: Error occurred on backend 
> impala-ec2-centos74-m5-4xlarge-ondemand-0e2c.vpc.cloudera.com:22000 by 
> fragment b34270820f59a0c9:a507139e0001
> 00:05:35 E   Memory left in process limit: 10.12 GB
> 00:05:35 E   Memory left in query limit: -16.92 KB
> 00:05:35 E   Query(b34270820f59a0c9:a507139e): memory limit exceeded. 
> Limit=4.00 MB Reservation=0 ReservationLimit=0 OtherMemory=4.02 MB Total=4.02 
> MB Peak=4.02 MB
> 00:05:35 E Fragment b34270820f59a0c9:a507139e: Reservation=0 
> OtherMemory=40.10 KB Total=40.10 KB Peak=340.00 KB
> 00:05:35 E   EXCHANGE_NODE (id=2): Reservation=32.00 KB OtherMemory=0 
> Total=32.00 KB Peak=32.00 KB
> 00:05:35 E KrpcDeferredRpcs: Total=0 Peak=0
> 00:05:35 E   PLAN_ROOT_SINK: Total=0 Peak=0
> 00:05:35 E   CodeGen: Total=103.00 B Peak=332.00 KB
> 00:05:35 E Fragment b34270820f59a0c9:a507139e0001: Reservation=0 
> OtherMemory=3.98 MB Total=3.98 MB Peak=3.98 MB
> 00:05:35 E   SORT_NODE (id=1): Total=342.00 KB Peak=342.00 KB
> 00:05:35 E   KUDU_SCAN_NODE (id=0): Total=3.63 MB Peak=3.63 MB
> 00:05:35 E Queued Batches: Total=3.30 MB Peak=3.63 MB
> 00:05:35 E   KrpcDataStreamSender (dst_id=2): Total=1.16 KB Peak=1.16 KB
> 00:05:35 E   CodeGen: Total=3.66 KB Peak=1.14 MB (1 of 2 similar)
> 

[jira] [Updated] (IMPALA-1760) Add Impala command to gracefully shut down an Impala daemon

2018-09-26 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-1760:
--
Summary: Add Impala command to gracefully shut down an Impala daemon  (was: 
Add decommissioning support / graceful shutdown / quiesce)

> Add Impala command to gracefully shut down an Impala daemon
> ---
>
> Key: IMPALA-1760
> URL: https://issues.apache.org/jira/browse/IMPALA-1760
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Distributed Exec
>Affects Versions: Impala 2.1.1
>Reporter: Henry Robinson
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: resource-management, scalability, scheduler, usability
>
> In larger clusters, node maintenance is a frequent occurrence. There's no way 
> currently to stop an Impala node without failing running queries, without 
> draining queries across the whole cluster first. We should fix that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7595) Check failed: IsValidTime(time_) at timestamp-value.h:322

2018-09-26 Thread Csaba Ringhofer (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629107#comment-16629107
 ] 

Csaba Ringhofer commented on IMPALA-7595:
-

https://gerrit.cloudera.org/#/c/11521/

> Check failed: IsValidTime(time_) at timestamp-value.h:322 
> --
>
> Key: IMPALA-7595
> URL: https://issues.apache.org/jira/browse/IMPALA-7595
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Tim Armstrong
>Assignee: Csaba Ringhofer
>Priority: Blocker
>  Labels: broken-build, crash
>
> See https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/3197/. hash is 
> 23c7d7e57b7868eedbf5a9a4bc4aafd6066a04fb
> Some of the fuzz tests stand out amongst the tests that were running at the 
> same time as the crash, particularly:
>  19:12:17 [gw4] PASSED 
> query_test/test_scanners_fuzz.py::TestScannersFuzzing::test_fuzz_alltypes[exec_option:
>  {'debug_action': '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', 
> 'abort_on_error': False, 'mem_limit': '512m', 'num_nodes': 0} | table_format: 
> parquet/none] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-7595) Check failed: IsValidTime(time_) at timestamp-value.h:322

2018-09-26 Thread Csaba Ringhofer (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-7595 started by Csaba Ringhofer.
---
> Check failed: IsValidTime(time_) at timestamp-value.h:322 
> --
>
> Key: IMPALA-7595
> URL: https://issues.apache.org/jira/browse/IMPALA-7595
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Tim Armstrong
>Assignee: Csaba Ringhofer
>Priority: Blocker
>  Labels: broken-build, crash
>
> See https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/3197/. hash is 
> 23c7d7e57b7868eedbf5a9a4bc4aafd6066a04fb
> Some of the fuzz tests stand out amongst the tests that were running at the 
> same time as the crash, particularly:
>  19:12:17 [gw4] PASSED 
> query_test/test_scanners_fuzz.py::TestScannersFuzzing::test_fuzz_alltypes[exec_option:
>  {'debug_action': '-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0', 
> 'abort_on_error': False, 'mem_limit': '512m', 'num_nodes': 0} | table_format: 
> parquet/none] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7630) Support GENERATED/VIRTUAL columns in Impala tables

2018-09-26 Thread bharath v (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bharath v updated IMPALA-7630:
--
Component/s: Frontend
 Backend

> Support GENERATED/VIRTUAL columns in Impala tables
> --
>
> Key: IMPALA-7630
> URL: https://issues.apache.org/jira/browse/IMPALA-7630
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Affects Versions: Impala 3.1.0
>Reporter: bharath v
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7630) Support GENERATED/VIRTUAL columns in Impala tables

2018-09-26 Thread bharath v (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bharath v updated IMPALA-7630:
--
Summary: Support GENERATED/VIRTUAL columns in Impala tables  (was: Support )

> Support GENERATED/VIRTUAL columns in Impala tables
> --
>
> Key: IMPALA-7630
> URL: https://issues.apache.org/jira/browse/IMPALA-7630
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Affects Versions: Impala 3.1.0
>Reporter: bharath v
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7630) Support GENERATED/VIRTUAL columns in Impala tables

2018-09-26 Thread bharath v (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bharath v updated IMPALA-7630:
--
Affects Version/s: Impala 3.1.0

> Support GENERATED/VIRTUAL columns in Impala tables
> --
>
> Key: IMPALA-7630
> URL: https://issues.apache.org/jira/browse/IMPALA-7630
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Affects Versions: Impala 3.1.0
>Reporter: bharath v
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7630) Support

2018-09-26 Thread bharath v (JIRA)
bharath v created IMPALA-7630:
-

 Summary: Support 
 Key: IMPALA-7630
 URL: https://issues.apache.org/jira/browse/IMPALA-7630
 Project: IMPALA
  Issue Type: New Feature
Reporter: bharath v






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-5142) EventSequence::MarkEvent() may record concurrent events out of serialized order

2018-09-26 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-5142:
--
Fix Version/s: Impala 2.11.0

> EventSequence::MarkEvent() may record concurrent events out of serialized 
> order
> ---
>
> Key: IMPALA-5142
> URL: https://issues.apache.org/jira/browse/IMPALA-5142
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.7.0
>Reporter: Mostafa Mokhtar
>Assignee: Pranay Singh
>Priority: Trivial
>  Labels: ramp-up, trivial
> Fix For: Impala 2.11.0
>
>
> When the event for first dynamic filter is received ahead of all fragments 
> started the query timeline prints a negative time value for "fragment 
> instances started"
> {code}
>- Planning finished: 790.555ms (32.872ms)
> Query Timeline: 2m40s
>- Query submitted: 7.295ms (7.295ms)
>- Planning finished: 1s397ms (1s390ms)
>- Submit for admission: 3s059ms (1s661ms)
>- Completed admission: 3s087ms (27.810ms)
>- Ready to start 90 fragment instances: 3s527ms (440.540ms)
>- First dynamic filter received: 7s851ms (4s323ms)
>- All 90 fragment instances started: 7s851ms (-88037.000ns)
>- Rows available: 2m28s (2m20s)
>- First row fetched: 2m28s (51.725ms)
>- Unregister query: 2m30s (1s459ms)
>  - ComputeScanRangeAssignmentTimer: 770.794ms
> {code}
> Query timeline when filter arrive after all fragments starting. 
> {code}
> Query Timeline: 17s011ms
>- Query submitted: 174.449us (174.449us)
>- Planning finished: 209.847ms (209.672ms)
>- Submit for admission: 255.819ms (45.971ms)
>- Completed admission: 256.212ms (393.074us)
>- Ready to start 90 fragment instances: 283.582ms (27.370ms)
>- All 90 fragment instances started: 627.013ms (343.430ms)
>- First dynamic filter received: 954.223ms (327.209ms)
>- Rows available: 16s393ms (15s439ms)
>- First row fetched: 16s705ms (311.586ms)
>- Unregister query: 16s871ms (165.908ms)
>  - ComputeScanRangeAssignmentTimer: 13.125ms
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Issue Comment Deleted] (IMPALA-110) Add support for multiple distinct operators in the same query block

2018-09-26 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-110:
-
Comment: was deleted

(was: Thank you for your email. I will be taking time off starting Wednesday 
9/26/2018 and returning Friday, 9/28/2018 and will have limited access to my 
email. If you require immediate assistance, please contact Narmada Gomatam.
)

> Add support for multiple distinct operators in the same query block
> ---
>
> Key: IMPALA-110
> URL: https://issues.apache.org/jira/browse/IMPALA-110
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Affects Versions: Impala 0.5, Impala 1.4, Impala 2.0, Impala 2.2, Impala 
> 2.3.0
>Reporter: Greg Rahn
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>  Labels: sql-language
>
> Impala only allows a single (DISTINCT columns) expression in each query.
> {color:red}Note:
> If you do not need precise accuracy, you can produce an estimate of the 
> distinct values for a column by specifying NDV(column); a query can contain 
> multiple instances of NDV(column). To make Impala automatically rewrite 
> COUNT(DISTINCT) expressions to NDV(), enable the APPX_COUNT_DISTINCT query 
> option.
> {color}
> {code}
> [impala:21000] > select count(distinct i_class_id) from item;
> Query: select count(distinct i_class_id) from item
> Query finished, fetching results ...
> 16
> Returned 1 row(s) in 1.51s
> {code}
> {code}
> [impala:21000] > select count(distinct i_class_id), count(distinct 
> i_brand_id) from item;
> Query: select count(distinct i_class_id), count(distinct i_brand_id) from item
> ERROR: com.cloudera.impala.common.AnalysisException: Analysis exception (in 
> select count(distinct i_class_id), count(distinct i_brand_id) from item)
>   at 
> com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:133)
>   at 
> com.cloudera.impala.service.Frontend.createExecRequest(Frontend.java:221)
>   at 
> com.cloudera.impala.service.JniFrontend.createExecRequest(JniFrontend.java:89)
> Caused by: com.cloudera.impala.common.AnalysisException: all DISTINCT 
> aggregate functions need to have the same set of parameters as COUNT(DISTINCT 
> i_class_id); deviating function: COUNT(DISTINCT i_brand_id)
>   at 
> com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo(AggregateInfo.java:196)
>   at 
> com.cloudera.impala.analysis.AggregateInfo.create(AggregateInfo.java:143)
>   at 
> com.cloudera.impala.analysis.SelectStmt.createAggInfo(SelectStmt.java:466)
>   at 
> com.cloudera.impala.analysis.SelectStmt.analyzeAggregation(SelectStmt.java:347)
>   at com.cloudera.impala.analysis.SelectStmt.analyze(SelectStmt.java:155)
>   at 
> com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:130)
>   ... 2 more
> {code}
> Hive supports this:
> {code}
> $ hive -e "select count(distinct i_class_id), count(distinct i_brand_id) from 
> item;"
> Logging initialized using configuration in 
> file:/etc/hive/conf.dist/hive-log4j.properties
> Hive history file=/tmp/grahn/hive_job_log_grahn_201303052234_1625576708.txt
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=
> Starting Job = job_201302081514_0073, Tracking URL = 
> http://impala:50030/jobdetails.jsp?jobid=job_201302081514_0073
> Kill Command = /usr/lib/hadoop/bin/hadoop job  
> -Dmapred.job.tracker=m0525.mtv.cloudera.com:8021 -kill job_201302081514_0073
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 
> 1
> 2013-03-05 22:34:43,255 Stage-1 map = 0%,  reduce = 0%
> 2013-03-05 22:34:49,323 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:50,337 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:51,351 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:52,360 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:53,370 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:54,379 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:55,389 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> 2013-03-05 22:34:56,402 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> 2013-03-05 22:34:57,413 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> 2013-03-05 22:34:58,424 Stage-1 map = 100%,  reduce = 100%, 

[jira] [Commented] (IMPALA-7628) test_tls_ecdh failing on CentOS 6/Python 2.6

2018-09-26 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628984#comment-16628984
 ] 

Tim Armstrong commented on IMPALA-7628:
---

Temporary fix here: https://gerrit.cloudera.org/#/c/11519/

> test_tls_ecdh failing on CentOS 6/Python 2.6
> 
>
> Key: IMPALA-7628
> URL: https://issues.apache.org/jira/browse/IMPALA-7628
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
> Environment: CentOS 6.4, Python 2.6
>Reporter: Tim Armstrong
>Assignee: Thomas Tauber-Marshall
>Priority: Blocker
>
> {noformat}
> custom_cluster/test_client_ssl.py:125: in test_tls_ecdh
> self._validate_positive_cases("%s/server-cert.pem" % self.CERT_DIR)
> custom_cluster/test_client_ssl.py:198: in _validate_positive_cases
> result = run_impala_shell_cmd(shell_options)
> shell/util.py:97: in run_impala_shell_cmd
> result.stderr)
> E   AssertionError: Cmd --ssl -q 'select 1 + 2' was expected to succeed: 
> Starting Impala Shell without Kerberos authentication
> E   SSL is enabled. Impala server certificates will NOT be verified (set 
> --ca_cert to change)
> E   
> /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80:
>  DeprecationWarning: 3th positional argument is deprecated. Use keyward 
> argument insteand.
> E DeprecationWarning)
> E   
> /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80:
>  DeprecationWarning: 4th positional argument is deprecated. Use keyward 
> argument insteand.
> E DeprecationWarning)
> E   
> /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80:
>  DeprecationWarning: 5th positional argument is deprecated. Use keyward 
> argument insteand.
> E DeprecationWarning)
> E   
> /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:216:
>  DeprecationWarning: validate is deprecated. Use cert_reqs=ssl.CERT_NONE 
> instead
> E DeprecationWarning)
> E   No handlers could be found for logger "thrift.transport.TSSLSocket"
> E   Error connecting: TTransportException, Could not connect to 
> localhost:21000: [Errno 1] _ssl.c:490: error:14094410:SSL 
> routines:SSL3_READ_BYTES:sslv3 alert handshake failure
> E   Not connected to Impala, could not execute queries.
> {noformat}
> Git hash is e38715e25297cc3643482be04e3b1b273e339b54
> I'm going to push out a temporary fix to unblock tests (since there are other 
> related tests skipped on this platform) but I'll let Thomas validate the 
> correctness of it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7629) TestClientSsl tests seem to be disabled on non-legacy platforms

2018-09-26 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-7629:
-

 Summary: TestClientSsl tests seem to be disabled on non-legacy 
platforms
 Key: IMPALA-7629
 URL: https://issues.apache.org/jira/browse/IMPALA-7629
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.1.0
 Environment: Ubuntu 16.04, Python 2.7.14
Reporter: Tim Armstrong
Assignee: Philip Zeyliger


I noticed that when I ran some of these tests on Ubuntu 16.04 they are skipped:
{noformat}
$ impala-py.test tests/custom_cluster/test_client_ssl.py -k ecdh
...
tests/custom_cluster/test_client_ssl.py::TestClientSsl::test_tls_ecdh[exec_option:
 {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
'exec_single_node_rows_threshold': 0} | table_format: text/none] SKIPPED
{noformat}

I don't think this is intended. The logic in IMPALA-6990 looks backwards in 
that HAS_LEGACY_OPENSSL is a non-None integer (i.e. truthy) when the version 
field exists.

Assigning to Phil since he reviewed the patch and probably has some context.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7628) test_tls_ecdh failing on CentOS 6/Python 2.6

2018-09-26 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7628:
--
Summary: test_tls_ecdh failing on CentOS 6/Python 2.6  (was: test_tls_edch 
failing on CentOS 6/Python 2.6)

> test_tls_ecdh failing on CentOS 6/Python 2.6
> 
>
> Key: IMPALA-7628
> URL: https://issues.apache.org/jira/browse/IMPALA-7628
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
> Environment: CentOS 6.4, Python 2.6
>Reporter: Tim Armstrong
>Assignee: Thomas Tauber-Marshall
>Priority: Blocker
>
> {noformat}
> custom_cluster/test_client_ssl.py:125: in test_tls_ecdh
> self._validate_positive_cases("%s/server-cert.pem" % self.CERT_DIR)
> custom_cluster/test_client_ssl.py:198: in _validate_positive_cases
> result = run_impala_shell_cmd(shell_options)
> shell/util.py:97: in run_impala_shell_cmd
> result.stderr)
> E   AssertionError: Cmd --ssl -q 'select 1 + 2' was expected to succeed: 
> Starting Impala Shell without Kerberos authentication
> E   SSL is enabled. Impala server certificates will NOT be verified (set 
> --ca_cert to change)
> E   
> /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80:
>  DeprecationWarning: 3th positional argument is deprecated. Use keyward 
> argument insteand.
> E DeprecationWarning)
> E   
> /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80:
>  DeprecationWarning: 4th positional argument is deprecated. Use keyward 
> argument insteand.
> E DeprecationWarning)
> E   
> /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80:
>  DeprecationWarning: 5th positional argument is deprecated. Use keyward 
> argument insteand.
> E DeprecationWarning)
> E   
> /data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:216:
>  DeprecationWarning: validate is deprecated. Use cert_reqs=ssl.CERT_NONE 
> instead
> E DeprecationWarning)
> E   No handlers could be found for logger "thrift.transport.TSSLSocket"
> E   Error connecting: TTransportException, Could not connect to 
> localhost:21000: [Errno 1] _ssl.c:490: error:14094410:SSL 
> routines:SSL3_READ_BYTES:sslv3 alert handshake failure
> E   Not connected to Impala, could not execute queries.
> {noformat}
> Git hash is e38715e25297cc3643482be04e3b1b273e339b54
> I'm going to push out a temporary fix to unblock tests (since there are other 
> related tests skipped on this platform) but I'll let Thomas validate the 
> correctness of it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7628) test_tls_edch failing on CentOS 6/Python 2.6

2018-09-26 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-7628:
-

 Summary: test_tls_edch failing on CentOS 6/Python 2.6
 Key: IMPALA-7628
 URL: https://issues.apache.org/jira/browse/IMPALA-7628
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.1.0
 Environment: CentOS 6.4, Python 2.6
Reporter: Tim Armstrong
Assignee: Thomas Tauber-Marshall


{noformat}
custom_cluster/test_client_ssl.py:125: in test_tls_ecdh
self._validate_positive_cases("%s/server-cert.pem" % self.CERT_DIR)
custom_cluster/test_client_ssl.py:198: in _validate_positive_cases
result = run_impala_shell_cmd(shell_options)
shell/util.py:97: in run_impala_shell_cmd
result.stderr)
E   AssertionError: Cmd --ssl -q 'select 1 + 2' was expected to succeed: 
Starting Impala Shell without Kerberos authentication
E   SSL is enabled. Impala server certificates will NOT be verified (set 
--ca_cert to change)
E   
/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80:
 DeprecationWarning: 3th positional argument is deprecated. Use keyward 
argument insteand.
E DeprecationWarning)
E   
/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80:
 DeprecationWarning: 4th positional argument is deprecated. Use keyward 
argument insteand.
E DeprecationWarning)
E   
/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:80:
 DeprecationWarning: 5th positional argument is deprecated. Use keyward 
argument insteand.
E DeprecationWarning)
E   
/data/jenkins/workspace/impala-asf-master-exhaustive-centos6/Impala-Toolchain/thrift-0.9.3-p4/python/lib64/python2.6/site-packages/thrift/transport/TSSLSocket.py:216:
 DeprecationWarning: validate is deprecated. Use cert_reqs=ssl.CERT_NONE instead
E DeprecationWarning)
E   No handlers could be found for logger "thrift.transport.TSSLSocket"
E   Error connecting: TTransportException, Could not connect to 
localhost:21000: [Errno 1] _ssl.c:490: error:14094410:SSL 
routines:SSL3_READ_BYTES:sslv3 alert handshake failure
E   Not connected to Impala, could not execute queries.
{noformat}

Git hash is e38715e25297cc3643482be04e3b1b273e339b54

I'm going to push out a temporary fix to unblock tests (since there are other 
related tests skipped on this platform) but I'll let Thomas validate the 
correctness of it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-110) Add support for multiple distinct operators in the same query block

2018-09-26 Thread Eric Campbell (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628966#comment-16628966
 ] 

Eric Campbell commented on IMPALA-110:
--

Thank you for your email. I will be taking time off starting Wednesday 
9/26/2018 and returning Friday, 9/28/2018 and will have limited access to my 
email. If you require immediate assistance, please contact Narmada Gomatam.


> Add support for multiple distinct operators in the same query block
> ---
>
> Key: IMPALA-110
> URL: https://issues.apache.org/jira/browse/IMPALA-110
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Affects Versions: Impala 0.5, Impala 1.4, Impala 2.0, Impala 2.2, Impala 
> 2.3.0
>Reporter: Greg Rahn
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>  Labels: sql-language
>
> Impala only allows a single (DISTINCT columns) expression in each query.
> {color:red}Note:
> If you do not need precise accuracy, you can produce an estimate of the 
> distinct values for a column by specifying NDV(column); a query can contain 
> multiple instances of NDV(column). To make Impala automatically rewrite 
> COUNT(DISTINCT) expressions to NDV(), enable the APPX_COUNT_DISTINCT query 
> option.
> {color}
> {code}
> [impala:21000] > select count(distinct i_class_id) from item;
> Query: select count(distinct i_class_id) from item
> Query finished, fetching results ...
> 16
> Returned 1 row(s) in 1.51s
> {code}
> {code}
> [impala:21000] > select count(distinct i_class_id), count(distinct 
> i_brand_id) from item;
> Query: select count(distinct i_class_id), count(distinct i_brand_id) from item
> ERROR: com.cloudera.impala.common.AnalysisException: Analysis exception (in 
> select count(distinct i_class_id), count(distinct i_brand_id) from item)
>   at 
> com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:133)
>   at 
> com.cloudera.impala.service.Frontend.createExecRequest(Frontend.java:221)
>   at 
> com.cloudera.impala.service.JniFrontend.createExecRequest(JniFrontend.java:89)
> Caused by: com.cloudera.impala.common.AnalysisException: all DISTINCT 
> aggregate functions need to have the same set of parameters as COUNT(DISTINCT 
> i_class_id); deviating function: COUNT(DISTINCT i_brand_id)
>   at 
> com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo(AggregateInfo.java:196)
>   at 
> com.cloudera.impala.analysis.AggregateInfo.create(AggregateInfo.java:143)
>   at 
> com.cloudera.impala.analysis.SelectStmt.createAggInfo(SelectStmt.java:466)
>   at 
> com.cloudera.impala.analysis.SelectStmt.analyzeAggregation(SelectStmt.java:347)
>   at com.cloudera.impala.analysis.SelectStmt.analyze(SelectStmt.java:155)
>   at 
> com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:130)
>   ... 2 more
> {code}
> Hive supports this:
> {code}
> $ hive -e "select count(distinct i_class_id), count(distinct i_brand_id) from 
> item;"
> Logging initialized using configuration in 
> file:/etc/hive/conf.dist/hive-log4j.properties
> Hive history file=/tmp/grahn/hive_job_log_grahn_201303052234_1625576708.txt
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=
> Starting Job = job_201302081514_0073, Tracking URL = 
> http://impala:50030/jobdetails.jsp?jobid=job_201302081514_0073
> Kill Command = /usr/lib/hadoop/bin/hadoop job  
> -Dmapred.job.tracker=m0525.mtv.cloudera.com:8021 -kill job_201302081514_0073
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 
> 1
> 2013-03-05 22:34:43,255 Stage-1 map = 0%,  reduce = 0%
> 2013-03-05 22:34:49,323 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:50,337 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:51,351 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:52,360 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:53,370 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:54,379 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:55,389 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> 2013-03-05 22:34:56,402 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> 2013-03-05 22:34:57,413 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> 2013-03-05 22:34:58,424 Stage-1 map = 100%,  reduce = 

[jira] [Commented] (IMPALA-110) Add support for multiple distinct operators in the same query block

2018-09-26 Thread Ruslan Dautkhanov (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628963#comment-16628963
 ] 

Ruslan Dautkhanov commented on IMPALA-110:
--

Hooray!! Congrats - great work everyone involved. 
Will this be part of Impala 2.13 release / CDH 5.16? 
Thank you!

> Add support for multiple distinct operators in the same query block
> ---
>
> Key: IMPALA-110
> URL: https://issues.apache.org/jira/browse/IMPALA-110
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Affects Versions: Impala 0.5, Impala 1.4, Impala 2.0, Impala 2.2, Impala 
> 2.3.0
>Reporter: Greg Rahn
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>  Labels: sql-language
>
> Impala only allows a single (DISTINCT columns) expression in each query.
> {color:red}Note:
> If you do not need precise accuracy, you can produce an estimate of the 
> distinct values for a column by specifying NDV(column); a query can contain 
> multiple instances of NDV(column). To make Impala automatically rewrite 
> COUNT(DISTINCT) expressions to NDV(), enable the APPX_COUNT_DISTINCT query 
> option.
> {color}
> {code}
> [impala:21000] > select count(distinct i_class_id) from item;
> Query: select count(distinct i_class_id) from item
> Query finished, fetching results ...
> 16
> Returned 1 row(s) in 1.51s
> {code}
> {code}
> [impala:21000] > select count(distinct i_class_id), count(distinct 
> i_brand_id) from item;
> Query: select count(distinct i_class_id), count(distinct i_brand_id) from item
> ERROR: com.cloudera.impala.common.AnalysisException: Analysis exception (in 
> select count(distinct i_class_id), count(distinct i_brand_id) from item)
>   at 
> com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:133)
>   at 
> com.cloudera.impala.service.Frontend.createExecRequest(Frontend.java:221)
>   at 
> com.cloudera.impala.service.JniFrontend.createExecRequest(JniFrontend.java:89)
> Caused by: com.cloudera.impala.common.AnalysisException: all DISTINCT 
> aggregate functions need to have the same set of parameters as COUNT(DISTINCT 
> i_class_id); deviating function: COUNT(DISTINCT i_brand_id)
>   at 
> com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo(AggregateInfo.java:196)
>   at 
> com.cloudera.impala.analysis.AggregateInfo.create(AggregateInfo.java:143)
>   at 
> com.cloudera.impala.analysis.SelectStmt.createAggInfo(SelectStmt.java:466)
>   at 
> com.cloudera.impala.analysis.SelectStmt.analyzeAggregation(SelectStmt.java:347)
>   at com.cloudera.impala.analysis.SelectStmt.analyze(SelectStmt.java:155)
>   at 
> com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:130)
>   ... 2 more
> {code}
> Hive supports this:
> {code}
> $ hive -e "select count(distinct i_class_id), count(distinct i_brand_id) from 
> item;"
> Logging initialized using configuration in 
> file:/etc/hive/conf.dist/hive-log4j.properties
> Hive history file=/tmp/grahn/hive_job_log_grahn_201303052234_1625576708.txt
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=
> Starting Job = job_201302081514_0073, Tracking URL = 
> http://impala:50030/jobdetails.jsp?jobid=job_201302081514_0073
> Kill Command = /usr/lib/hadoop/bin/hadoop job  
> -Dmapred.job.tracker=m0525.mtv.cloudera.com:8021 -kill job_201302081514_0073
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 
> 1
> 2013-03-05 22:34:43,255 Stage-1 map = 0%,  reduce = 0%
> 2013-03-05 22:34:49,323 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:50,337 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:51,351 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:52,360 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:53,370 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:54,379 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:55,389 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> 2013-03-05 22:34:56,402 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> 2013-03-05 22:34:57,413 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> 2013-03-05 22:34:58,424 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> MapReduce Total cumulative CPU time: 8 seconds 580 msec
> Ended Job 

[jira] [Commented] (IMPALA-7600) Mem limit exceeded in test_kudu_scan_mem_usage

2018-09-26 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628945#comment-16628945
 ] 

ASF subversion and git services commented on IMPALA-7600:
-

Commit ce145ffee6ee68a60c4ef663cb9f47f22d9eb19f in impala's branch 
refs/heads/master from [~tarmstr...@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=ce145ff ]

IMPALA-7600: bump mem_limit for test_kudu_scan_mem_usage

The estimate for memory consumption for this scan is 9 columns * 384kb
per column = 3.375mb. So if we set the mem_limit to 6.5mb, we should
still not get more than one scanner thread, but we can avoid hitting
out-of-memory.

The issue in the JIRA was queued row batches. With this change, and
num_scanner_threads=2, there should be max 12 row batches
(10 in the queue, 2 in the scanner threads about to be enqueued)
and based on the column stats I'd estimate that each row batch is
around 200kb, so this change should provide significantly more headroom.

Change-Id: I6d992cc076bc8678089f765bdffe92e877e9d229
Reviewed-on: http://gerrit.cloudera.org:8080/11513
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Mem limit exceeded in test_kudu_scan_mem_usage
> --
>
> Key: IMPALA-7600
> URL: https://issues.apache.org/jira/browse/IMPALA-7600
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: broken-build, flaky
>
> Seen in an exhaustive release build:
> {noformat}
> 00:05:35  TestScanMemLimit.test_kudu_scan_mem_usage[exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: avro/snap/block] 
> 00:05:35 [gw6] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-asf-master-exhaustive-release/repos/Impala/bin/../infra/python/env/bin/python
> 00:05:35 query_test/test_mem_usage_scaling.py:358: in test_kudu_scan_mem_usage
> 00:05:35 self.run_test_case('QueryTest/kudu-scan-mem-usage', vector)
> 00:05:35 common/impala_test_suite.py:408: in run_test_case
> 00:05:35 result = self.__execute_query(target_impalad_client, query, 
> user=user)
> 00:05:35 common/impala_test_suite.py:623: in __execute_query
> 00:05:35 return impalad_client.execute(query, user=user)
> 00:05:35 common/impala_connection.py:160: in execute
> 00:05:35 return self.__beeswax_client.execute(sql_stmt, user=user)
> 00:05:35 beeswax/impala_beeswax.py:176: in execute
> 00:05:35 handle = self.__execute_query(query_string.strip(), user=user)
> 00:05:35 beeswax/impala_beeswax.py:350: in __execute_query
> 00:05:35 self.wait_for_finished(handle)
> 00:05:35 beeswax/impala_beeswax.py:371: in wait_for_finished
> 00:05:35 raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> 00:05:35 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> 00:05:35 EQuery aborted:Memory limit exceeded: Error occurred on backend 
> impala-ec2-centos74-m5-4xlarge-ondemand-0e2c.vpc.cloudera.com:22000 by 
> fragment b34270820f59a0c9:a507139e0001
> 00:05:35 E   Memory left in process limit: 10.12 GB
> 00:05:35 E   Memory left in query limit: -16.92 KB
> 00:05:35 E   Query(b34270820f59a0c9:a507139e): memory limit exceeded. 
> Limit=4.00 MB Reservation=0 ReservationLimit=0 OtherMemory=4.02 MB Total=4.02 
> MB Peak=4.02 MB
> 00:05:35 E Fragment b34270820f59a0c9:a507139e: Reservation=0 
> OtherMemory=40.10 KB Total=40.10 KB Peak=340.00 KB
> 00:05:35 E   EXCHANGE_NODE (id=2): Reservation=32.00 KB OtherMemory=0 
> Total=32.00 KB Peak=32.00 KB
> 00:05:35 E KrpcDeferredRpcs: Total=0 Peak=0
> 00:05:35 E   PLAN_ROOT_SINK: Total=0 Peak=0
> 00:05:35 E   CodeGen: Total=103.00 B Peak=332.00 KB
> 00:05:35 E Fragment b34270820f59a0c9:a507139e0001: Reservation=0 
> OtherMemory=3.98 MB Total=3.98 MB Peak=3.98 MB
> 00:05:35 E   SORT_NODE (id=1): Total=342.00 KB Peak=342.00 KB
> 00:05:35 E   KUDU_SCAN_NODE (id=0): Total=3.63 MB Peak=3.63 MB
> 00:05:35 E Queued Batches: Total=3.30 MB Peak=3.63 MB
> 00:05:35 E   KrpcDataStreamSender (dst_id=2): Total=1.16 KB Peak=1.16 KB
> 00:05:35 E   CodeGen: Total=3.66 KB Peak=1.14 MB
> 00:05:35 E   
> 00:05:35 E   Memory limit exceeded: Error occurred on backend 
> impala-ec2-centos74-m5-4xlarge-ondemand-0e2c.vpc.cloudera.com:22000 by 
> fragment b34270820f59a0c9:a507139e0001
> 00:05:35 E   Memory left in process limit: 10.12 GB
> 00:05:35 E   Memory left in query limit: -16.92 KB
> 00:05:35 E   Query(b34270820f59a0c9:a507139e): memory limit exceeded. 
> Limit=4.00 MB 

[jira] [Assigned] (IMPALA-7627) Parallel the fetching permission process

2018-09-26 Thread Peikai Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peikai Zheng reassigned IMPALA-7627:


Assignee: Peikai Zheng

> Parallel the fetching permission process
> 
>
> Key: IMPALA-7627
> URL: https://issues.apache.org/jira/browse/IMPALA-7627
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Peikai Zheng
>Assignee: Peikai Zheng
>Priority: Major
>
> There are three phases when the Catalogd loading the metadata of a table.
> Firstly, the Catalogd fetches the metadata from Hive metastore;
> Then, the Catalogd fetches the permission of each partition from HDFS 
> NameNode;
> Finally, the Catalogd loads the file descriptor from HDFS NameNode.
> According to my test result:
> ||Average Time(GetFileInfoThread=10) || phase 1 || phase 2 || phase 3||   
> 
> |idm.sauron_message|9.9917115|459.2106944|95.0179163|
> |default.revenue_enriched|12.3377474|111.2969046|40.827472|
> |default.upp_raw_prod|1.5143162|50.0251426|12.6805323|
> |default.hit_to_beacon_playback_prod|1.4294509|49.7670539|18.3557858|
> |default.sitetracking_enriched|13.0003804|112.8746656|42.1824032|
> |default.player_custom_event|9.2618705|493.4865302|116.4986184|
> |default.revenue_day_est|57.9116561|106.5028664|24.005822|
> The majority of the time occupied by the second phase. 
> So, I suggest to parallel the second phase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7624) test-with-docker sometimes hangs creating docker containers

2018-09-26 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628286#comment-16628286
 ] 

ASF subversion and git services commented on IMPALA-7624:
-

Commit 91673fee607b552f142c6ab2aad0e96efa9e0f80 in impala's branch 
refs/heads/master from [~philip]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=91673fe ]

IMPALA-7624: Workaround docker/kernel bug causing test-with-docker to sometimes 
hang.

I've observed that builds of test-with-docker that have "suite
parallelism" sometimes hang when the Docker containers are
being created. (The implementation had multiple threads calling
"docker create" simultaneously.) Trolling the mailing lists,
it's maybe a bug in Docker or the kernel. I've never caught
it live enough to strace it.

A hopeful workaround is to serialize the docker create calls, which is
easy and harmless, given that "docker create" is usually pretty quick
(subsecond) and the overall run time here is hours+.

With this change, I was able to run test-with-docker with
--suite-concurrency=6 on a c5.9xlarge in AWS, with a total runtime of
1h35m.

The hangs are intermittent and cause, in the typical case, inconsistency
in runtimes because less parallelism happens when one of the "docker
create" calls hang. (I've seen them resume after one of the other
containers finishes.) We'll find out with time whether this stabilizes
it or has no effect.

Change-Id: I3e44db7a6ce08a42d6fe574d7348332578cd9e51
Reviewed-on: http://gerrit.cloudera.org:8080/11481
Reviewed-by: Philip Zeyliger 
Tested-by: Impala Public Jenkins 


> test-with-docker sometimes hangs creating docker containers
> ---
>
> Key: IMPALA-7624
> URL: https://issues.apache.org/jira/browse/IMPALA-7624
> Project: IMPALA
>  Issue Type: Task
>Reporter: Philip Zeyliger
>Priority: Major
>
> I've seen the test-with-docker executions hang, or sort of hang, in threads 
> doing {{docker create}}. I think this is ultimately a Docker or kernel bug, 
> but we can work around it by serializing our "docker create" invocations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7456) Deprecated file-based authorization

2018-09-26 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628283#comment-16628283
 ] 

ASF subversion and git services commented on IMPALA-7456:
-

Commit 48640b5dfa131ca0c7ae9e541e376d11ac6e6d33 in impala's branch 
refs/heads/master from [~aholley]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=48640b5 ]

IMPALA-7456: Deprecate file-based authorization

This patch simply adds a warning message to the log when the
authorization_policy_file run-time flag is used.  Sentry has
deprecated the use of policy files and they do not support
user level privileges which are required for object ownership.
Here is the Jira where it will be removed. SENTRY-1922

Test:
- Added custom cluster test to validate logs
- Ran all custom cluster tests

Change-Id: Ibbb13f3ef1c3a00812c180ecef022ea638c2ebc7
Reviewed-on: http://gerrit.cloudera.org:8080/11502
Reviewed-by: Fredy Wijaya 
Tested-by: Impala Public Jenkins 


> Deprecated file-based authorization
> ---
>
> Key: IMPALA-7456
> URL: https://issues.apache.org/jira/browse/IMPALA-7456
> Project: IMPALA
>  Issue Type: Dependency upgrade
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Adam Holley
>Assignee: Adam Holley
>Priority: Major
>  Labels: security
> Fix For: Impala 3.1.0
>
>
> Sentry has deprecated their support of file-based authorizations.  Some newer 
> security features such as object ownership require user level authorizations 
> which the file-based security does not support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-2990) Coordinator should timeout a connection for an unresponsive backend

2018-09-26 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628285#comment-16628285
 ] 

ASF subversion and git services commented on IMPALA-2990:
-

Commit f46de21140f3bb483884fc49f5ded7afc466faac in impala's branch 
refs/heads/master from [~tarmstr...@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=f46de21 ]

IMPALA-1760: Implement shutdown command

This is the same patch except with fixes for the test failures
on EC and S3 noted in the JIRA.

This allows graceful shutdown of executors and partially graceful
shutdown of coordinators (new operations fail, old operations can
continue).

Details:
* In order to allow future admin commands, this is implemented with
  function-like syntax and does not add any reserved words.
* ALL privilege is required on the server
* The coordinator impalad that the client is connected to can be shut
  down directly with ":shutdown()".
* Remote shutdown of another impalad is supported, e.g. with
  ":shutdown('hostname')", so that non-coordinators can be shut down
  and for the convenience of the client, which does not have to
  connect to the specific impalad. There is no assumption that the
  other impalad is registered in the statestore; just that the
  coordinator can connect to the other daemon's thrift endpoint.
  This simplifies things and allows shutdown in various important
  cases, e.g. statestore down.
* The shutdown time limit can be overridden to force a quicker or
  slower shutdown by specifying a deadline in seconds after the
  statement is executed.
* If shutting down, a banner is shown on the root debug page.

Workflow:
1. (if a coordinator) clients are prevented from submitting
  queries to this coordinator via some out-of-band mechanism,
  e.g. load balancer
2. the shutdown process is started via ":shutdown()"
3. a bit is set in the statestore and propagated to coordinators,
  which stop scheduling fragment instances on this daemon
  (if an executor).
4. the query startup grace period (which is ideally set to the AC
  queueing delay plus some additional leeway) expires
5. once the daemon is quiesced (i.e. no fragments, no registered
  queries), it shuts itself down.
6. If the daemon does not successfully quiesce (e.g. rogue clients,
  long-running queries), after a longer timeout (counted from the start
  of the shutdown process) it will shut down anyway.

What this does:
* Executors can be shut down without causing a service-wide outage
* Shutting down an executor will not disrupt any short-running queries
  and will wait for long-running queries up to a threshold.
* Coordinators can be shut down without query failures only if
  there is an out-of-band mechanism to prevent submission of more
  queries to the shut down coordinator. If queries are submitted to
  a coordinator after shutdown has started, they will fail.
* Long running queries or other issues (e.g. stuck fragments) will
  slow down but not prevent eventual shutdown.

Limitations:
* The startup grace period needs to be configured to be greater than
  the latency of statestore updates + scheduling + admission +
  coordinator startup. Otherwise a coordinator may send a
  fragment instance to the shutting down impalad. (We could
  automate this configuration as a follow-on)
* The startup grace period means a minimum latency for shutdown,
  even if the cluster is idle.
* We depend on the statestore detecting the process going down
  if queries are still running on that backend when the timeout
  expires. This may still be subject to existing problems,
  e.g. IMPALA-2990.

Tests:
* Added parser, analysis and authorization tests.
* End-to-end test of shutting down impalads.
* End-to-end test of shutting down then restarting an executor while
  queries are running.
* End-to-end test of shutting down a coordinator
  - New queries cannot be started on coord, existing queries continue to run
  - Exercises various Beeswax and HS2 operations.

Change-Id: I8f3679ef442745a60a0ab97c4e9eac437aef9463
Reviewed-on: http://gerrit.cloudera.org:8080/11484
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Coordinator should timeout a connection for an unresponsive backend
> ---
>
> Key: IMPALA-2990
> URL: https://issues.apache.org/jira/browse/IMPALA-2990
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.3.0
>Reporter: Sailesh Mukil
>Assignee: Michael Ho
>Priority: Critical
>  Labels: hang, observability, supportability
>
> The coordinator currently waits indefinitely if it does not hear back from a 
> backend. This could cause a query to hang indefinitely in case of a network 
> error, etc.
> We should add logic for determining 

[jira] [Commented] (IMPALA-7546) Impala 3.1 Doc: Doc the new query option TIMEZONE

2018-09-26 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628281#comment-16628281
 ] 

ASF subversion and git services commented on IMPALA-7546:
-

Commit 17bc980d9540b29a1667841b7bffc2084204ac35 in impala's branch 
refs/heads/master from [~arodoni_cloudera]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=17bc980 ]

IMPALA-7546: [DOCS] A new TIMEZONE query option

Documented the new TIMEZONE query option to set a time TIMEZONE
to be used in timestamp conversions.

Change-Id: I734b8b37ae2360422fce269ed87507a04e8c05ac
Reviewed-on: http://gerrit.cloudera.org:8080/11505
Tested-by: Impala Public Jenkins 
Reviewed-by: Csaba Ringhofer 


> Impala 3.1 Doc: Doc the new query option TIMEZONE
> -
>
> Key: IMPALA-7546
> URL: https://issues.apache.org/jira/browse/IMPALA-7546
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc
> Fix For: Impala 3.1.0
>
>
> https://gerrit.cloudera.org/#/c/11505/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-1760) Add decommissioning support / graceful shutdown / quiesce

2018-09-26 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628284#comment-16628284
 ] 

ASF subversion and git services commented on IMPALA-1760:
-

Commit f46de21140f3bb483884fc49f5ded7afc466faac in impala's branch 
refs/heads/master from [~tarmstr...@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=f46de21 ]

IMPALA-1760: Implement shutdown command

This is the same patch except with fixes for the test failures
on EC and S3 noted in the JIRA.

This allows graceful shutdown of executors and partially graceful
shutdown of coordinators (new operations fail, old operations can
continue).

Details:
* In order to allow future admin commands, this is implemented with
  function-like syntax and does not add any reserved words.
* ALL privilege is required on the server
* The coordinator impalad that the client is connected to can be shut
  down directly with ":shutdown()".
* Remote shutdown of another impalad is supported, e.g. with
  ":shutdown('hostname')", so that non-coordinators can be shut down
  and for the convenience of the client, which does not have to
  connect to the specific impalad. There is no assumption that the
  other impalad is registered in the statestore; just that the
  coordinator can connect to the other daemon's thrift endpoint.
  This simplifies things and allows shutdown in various important
  cases, e.g. statestore down.
* The shutdown time limit can be overridden to force a quicker or
  slower shutdown by specifying a deadline in seconds after the
  statement is executed.
* If shutting down, a banner is shown on the root debug page.

Workflow:
1. (if a coordinator) clients are prevented from submitting
  queries to this coordinator via some out-of-band mechanism,
  e.g. load balancer
2. the shutdown process is started via ":shutdown()"
3. a bit is set in the statestore and propagated to coordinators,
  which stop scheduling fragment instances on this daemon
  (if an executor).
4. the query startup grace period (which is ideally set to the AC
  queueing delay plus some additional leeway) expires
5. once the daemon is quiesced (i.e. no fragments, no registered
  queries), it shuts itself down.
6. If the daemon does not successfully quiesce (e.g. rogue clients,
  long-running queries), after a longer timeout (counted from the start
  of the shutdown process) it will shut down anyway.

What this does:
* Executors can be shut down without causing a service-wide outage
* Shutting down an executor will not disrupt any short-running queries
  and will wait for long-running queries up to a threshold.
* Coordinators can be shut down without query failures only if
  there is an out-of-band mechanism to prevent submission of more
  queries to the shut down coordinator. If queries are submitted to
  a coordinator after shutdown has started, they will fail.
* Long running queries or other issues (e.g. stuck fragments) will
  slow down but not prevent eventual shutdown.

Limitations:
* The startup grace period needs to be configured to be greater than
  the latency of statestore updates + scheduling + admission +
  coordinator startup. Otherwise a coordinator may send a
  fragment instance to the shutting down impalad. (We could
  automate this configuration as a follow-on)
* The startup grace period means a minimum latency for shutdown,
  even if the cluster is idle.
* We depend on the statestore detecting the process going down
  if queries are still running on that backend when the timeout
  expires. This may still be subject to existing problems,
  e.g. IMPALA-2990.

Tests:
* Added parser, analysis and authorization tests.
* End-to-end test of shutting down impalads.
* End-to-end test of shutting down then restarting an executor while
  queries are running.
* End-to-end test of shutting down a coordinator
  - New queries cannot be started on coord, existing queries continue to run
  - Exercises various Beeswax and HS2 operations.

Change-Id: I8f3679ef442745a60a0ab97c4e9eac437aef9463
Reviewed-on: http://gerrit.cloudera.org:8080/11484
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Add decommissioning support / graceful shutdown / quiesce
> -
>
> Key: IMPALA-1760
> URL: https://issues.apache.org/jira/browse/IMPALA-1760
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Distributed Exec
>Affects Versions: Impala 2.1.1
>Reporter: Henry Robinson
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: resource-management, scalability, scheduler, usability
>
> In larger clusters, node maintenance is a frequent occurrence. There's no way 
> currently to stop an Impala node without failing running queries, without 
> draining queries across the whole 

[jira] [Commented] (IMPALA-110) Add support for multiple distinct operators in the same query block

2018-09-26 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628287#comment-16628287
 ] 

ASF subversion and git services commented on IMPALA-110:


Commit df53ec2385190bba2b3cefb43b094cde6d33642f in impala's branch 
refs/heads/master from [~twmarshall]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=df53ec2 ]

IMPALA-110: Support for multiple DISTINCT

This patch adds support for having multiple aggregate functions in a
single SELECT block that use DISTINCT over different sets of columns.

Planner design:
- The existing tree-based plan shape with a two-phased
  aggregation is maintained.
- Existing plans are not changed.
- Aggregates are grouped into 'aggregation classes' based on their
  expressions in the distinct portion which may be empty for
  non-distinct aggregates.
- The aggregation framework is generalized to simultaneously process
  multiple aggregation classes within the tree-based plan. This
  process splits the results of different aggregation classes into
  separate rows, so a final aggregation is needed to transpose the
  results into the desired form.
- Main challenge: Each aggregation class consumes and produces
  different tuples, so conceptually a union-type of tuples flows
  through the runtime. The tuple union is represented by a TupleRow
  with one tuple per aggregation class. Only one tuple in such a
  TupleRow is non-NULL.
- Backend exec nodes in the aggregation plan will be aware of this
  tuple-union either explicitly in their implementation or by relying
  on expressions that distinguish the aggregation classes.
- To distinguish the aggregation classes, e.g. in hash exchanges,
  CASE expressions are crafted to hash/group on the appropriate slots.

Deferred FE work:
- Beautify/condense the long CASE exprs
- Push applicable conjuncts into individual aggregators before
  the transposition step
- Added a few testing TODOs to reduce the size of this patch
- Decide whether we want to change existing plans to the new model

Execution design:
- Previous patches separated out aggregation logic from the exec node
  into Aggregators. This is extended to support multiple Aggregators
  per node, with different grouping and aggregating functions.
- There is a fast path for aggregations with only one aggregator,
  which leaves the execution essentially unchanged from before.
- When there are multiple aggregators, the first aggregation node in
  the plan replicates its input to each aggregator. The output of this
  step is rows where only a single tuple is non-null, corresponding to
  the aggregator that produced the row.
- A new expr is introduced, ValidTupleId, which takes one of these
  rows and returns which tuple is non-null.
- For additional aggregation nodes, the input is split apart into
  'mini-batches' according to which aggregator the row corresponds to.

Testing:
- Added analyzer and planner tests
- Added end-to-end queries tests
- Ran hdfs/core tests
- Added support in the query generator and ran in a loop.

Change-Id: I055402eaef6d81e5f70e850d9f8a621e766830a4
Reviewed-on: http://gerrit.cloudera.org:8080/10771
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Add support for multiple distinct operators in the same query block
> ---
>
> Key: IMPALA-110
> URL: https://issues.apache.org/jira/browse/IMPALA-110
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Affects Versions: Impala 0.5, Impala 1.4, Impala 2.0, Impala 2.2, Impala 
> 2.3.0
>Reporter: Greg Rahn
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>  Labels: sql-language
>
> Impala only allows a single (DISTINCT columns) expression in each query.
> {color:red}Note:
> If you do not need precise accuracy, you can produce an estimate of the 
> distinct values for a column by specifying NDV(column); a query can contain 
> multiple instances of NDV(column). To make Impala automatically rewrite 
> COUNT(DISTINCT) expressions to NDV(), enable the APPX_COUNT_DISTINCT query 
> option.
> {color}
> {code}
> [impala:21000] > select count(distinct i_class_id) from item;
> Query: select count(distinct i_class_id) from item
> Query finished, fetching results ...
> 16
> Returned 1 row(s) in 1.51s
> {code}
> {code}
> [impala:21000] > select count(distinct i_class_id), count(distinct 
> i_brand_id) from item;
> Query: select count(distinct i_class_id), count(distinct i_brand_id) from item
> ERROR: com.cloudera.impala.common.AnalysisException: Analysis exception (in 
> select count(distinct i_class_id), count(distinct i_brand_id) from item)
>   at 
> com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:133)
>   at 
> 

[jira] [Commented] (IMPALA-7537) REVOKE GRANT OPTION regression

2018-09-26 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628282#comment-16628282
 ] 

ASF subversion and git services commented on IMPALA-7537:
-

Commit c5dc6ded68c62f9f2138ab3376531c6292d1df78 in impala's branch 
refs/heads/master from [~aholley]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=c5dc6de ]

IMPALA-7537: REVOKE GRANT OPTION regression

This patch fixes several issues around granting and revoking of
privileges.  This includes:
- REVOKE ALL ON SERVER where the privilege has the grant option was
  removing from the cache but not Sentry.
- With the addition of the grantoption to the name in the catalog
  object, refactoring was required to make grants and revokes work
  correctly.

Assertions with regard to granting and revoking:
- If there is a privilege that has the grant option, that privilege
  can be revoked simply with "REVOKE privilege..." or the grant option
  can be removed with "REVOKE GRANT OPTION ON..."
- We should not limit the privilege being revoked simply because it
  has the grant option.
- If a privilege already exists without the grant option, granting the
  privilege with the grant option should add the grant option to it.
- If a privilege already exists with the grant option, granting the
  privilege without the grant option will not change anything as the
  expectation is if you want to remove the grant option, you should
  explicitly use the "REVOKE GRANT OPTION ON...".

Testing:
- Added new grant/revoke tests that validate cache and Sentry refresh
- Ran all FE, E2E, and custom-cluster tests.

Change-Id: I3be5c8f15e9bc53e9661347578832bf446abaedc
Reviewed-on: http://gerrit.cloudera.org:8080/11483
Reviewed-by: Fredy Wijaya 
Tested-by: Impala Public Jenkins 


> REVOKE GRANT OPTION regression
> --
>
> Key: IMPALA-7537
> URL: https://issues.apache.org/jira/browse/IMPALA-7537
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Adam Holley
>Assignee: Adam Holley
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> Recent commit ec88aa2 added 'grantoption' to the privilege name.  This name 
> is used by the catalog cache which broke "revoke grant option" since the 
> privilege names do not match.
> [localhost:21000] default> create role foo_role;
> [localhost:21000] default> grant all on server to foo_role with grant option;
> [localhost:21000] default> revoke grant option for all on server from 
> foo_role;
> ERROR: IllegalStateException: null



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org