date:20190812

[jira] [Created] (IMPALA-8859) test_shell_commandline.test_global_config_file fails in remote cluster test

2019-08-12 Thread Tim Armstrong (JIRA)

Tim Armstrong created IMPALA-8859:
-

 Summary: test_shell_commandline.test_global_config_file fails in 
remote cluster test
 Key: IMPALA-8859
 URL: https://issues.apache.org/jira/browse/IMPALA-8859
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.3.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong


{noformat}
test_shell_command_line.test_global_config_file failing to connect to 
impala-shell. Below is the error:
{code:java}
shell/test_shell_commandline.py:494: in test_global_config_file result = 
run_impala_shell_cmd(vector, args, env=env) shell/util.py:110: in 
run_impala_shell_cmd expect_success and wait_until_connected) 
shell/util.py:128: in run_impala_shell_cmd_no_expect p = ImpalaShell(vector, 
shell_args, env=env, wait_until_connected=wait_until_connected) 
shell/util.py:186: in __init__ assert connected, "Impala shell is not 
connected" E AssertionError: Impala shell is not connected{code}
 {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8766) Change hadoop cloud dependencies to use hadoop-cloud-storage

2019-08-12 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905765#comment-16905765
 ] 

ASF subversion and git services commented on IMPALA-8766:
-

Commit 8094811b5d975e18e20071552f86c2e3f8c0fc8f in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=8094811 ]

IMPALA-8766: Undo hadoop-cloud-storage + HWX Nexus

Previous commits for IMPALA-8766 attempted to use hadoop-cloud-storage
to satisfy Impala's cloud dependencies (e.g. hadoop-aws, hadoop-azure,
etc). On builds with USE_CDP_HIVE=true, this adds Knox
gateway-cloud-bindings. However, the entry for hadoop-cloud-storage
artifact in the impala.cdp.repo maven repository introduces
dependencies that are external to that repository. This requires the
HWX Nexus repository to resolve those dangling dependencies.
Unfortunately, HWX Nexus ages out old jars, including the ones we
need.

This stops using hadoop-cloud-storage, and instead adds a direct
dependency to Knox for USE_CDP_HIVE=true. It disables the HWX Nexus
repository and leaves a tombstone explaining why.

Testing:
 - Deleted my .m2 directory and rebuilt Impala with USE_CDP_HIVE=true
 - Verified the CLASSPATH still contains the right jars on USE_CDP_HIVE=true

Change-Id: I79a0c2575fc50bbc3b393c150c0bce22258ea1bd
Reviewed-on: http://gerrit.cloudera.org:8080/14024
Tested-by: Impala Public Jenkins 
Reviewed-by: Vihang Karajgaonkar 


> Change hadoop cloud dependencies to use hadoop-cloud-storage
> 
>
> Key: IMPALA-8766
> URL: https://issues.apache.org/jira/browse/IMPALA-8766
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 3.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> Currently, fe/pom.xml specifically includes hadoop-aws, hadoop-azure, and 
> hadoop-azure-datalake directly. There is a meta-package in hadoop called 
> hadoop-cloud-storage that includes these dependencies and others as 
> customized by the hadoop provider, with appropriate exclusions applied to 
> each package.
> Migrating Impala to use this meta-package would make it easier for different 
> providers of hadoop to customize hadoop-cloud-storage and the resulting 
> CLASSPATH without needing to change Impala. For example, a hadoop provider 
> may want to include Apache Knox for cloud identity management.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8766) Change hadoop cloud dependencies to use hadoop-cloud-storage

2019-08-12 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905764#comment-16905764
 ] 

ASF subversion and git services commented on IMPALA-8766:
-

Commit 8094811b5d975e18e20071552f86c2e3f8c0fc8f in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=8094811 ]

IMPALA-8766: Undo hadoop-cloud-storage + HWX Nexus

Previous commits for IMPALA-8766 attempted to use hadoop-cloud-storage
to satisfy Impala's cloud dependencies (e.g. hadoop-aws, hadoop-azure,
etc). On builds with USE_CDP_HIVE=true, this adds Knox
gateway-cloud-bindings. However, the entry for hadoop-cloud-storage
artifact in the impala.cdp.repo maven repository introduces
dependencies that are external to that repository. This requires the
HWX Nexus repository to resolve those dangling dependencies.
Unfortunately, HWX Nexus ages out old jars, including the ones we
need.

This stops using hadoop-cloud-storage, and instead adds a direct
dependency to Knox for USE_CDP_HIVE=true. It disables the HWX Nexus
repository and leaves a tombstone explaining why.

Testing:
 - Deleted my .m2 directory and rebuilt Impala with USE_CDP_HIVE=true
 - Verified the CLASSPATH still contains the right jars on USE_CDP_HIVE=true

Change-Id: I79a0c2575fc50bbc3b393c150c0bce22258ea1bd
Reviewed-on: http://gerrit.cloudera.org:8080/14024
Tested-by: Impala Public Jenkins 
Reviewed-by: Vihang Karajgaonkar 


> Change hadoop cloud dependencies to use hadoop-cloud-storage
> 
>
> Key: IMPALA-8766
> URL: https://issues.apache.org/jira/browse/IMPALA-8766
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 3.3.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> Currently, fe/pom.xml specifically includes hadoop-aws, hadoop-azure, and 
> hadoop-azure-datalake directly. There is a meta-package in hadoop called 
> hadoop-cloud-storage that includes these dependencies and others as 
> customized by the hadoop provider, with appropriate exclusions applied to 
> each package.
> Migrating Impala to use this meta-package would make it easier for different 
> providers of hadoop to customize hadoop-cloud-storage and the resulting 
> CLASSPATH without needing to change Impala. For example, a hadoop provider 
> may want to include Apache Knox for cloud identity management.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8791) Handle the case where there is no fragment scheduled on the coordinator for a query

2019-08-12 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905766#comment-16905766
 ] 

ASF subversion and git services commented on IMPALA-8791:
-

Commit 2df3b8cf82af66199f5851c84f3aa065577f6d7d in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=2df3b8c ]

Revert "IMPALA-8791: Handle the case where there is no fragment scheduled on"

This reverts commit 760169edcbca438c5964380a604b6c271c6bd1a3.

Change-Id: Id20cf3581995f450de6f491e7874cbcf23b52cda
Reviewed-on: http://gerrit.cloudera.org:8080/14052
Reviewed-by: Tim Armstrong 
Tested-by: Tim Armstrong 


> Handle the case where there is no fragment scheduled on the coordinator for a 
> query
> ---
>
> Key: IMPALA-8791
> URL: https://issues.apache.org/jira/browse/IMPALA-8791
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Product Backlog
>Reporter: Bikramjeet Vig
>Assignee: Bikramjeet Vig
>Priority: Major
>
> For insert statements executed on a dedicated coord, the fragments get 
> schduled only on executors but a query state object still gets started up on 
> the coord host with the coord_mem_limit. we end up with a situation where the 
> mem admitted is zero for the coord but the mem_reserved is non-zero which 
> would affect other admission decisions.
> There is also a case where there is no coordinator fragment but the execution 
> fragment gets scheduled on the coord (eg. insert into  values) for 
> this case, the mem admitted is per_backend_mem_limit_ but the mem limit 
> applied to the coord query state is coord_backend_mem_limit_ which again 
> causes a inconsistency.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-8790) IllegalStateException: Illegal reference to non-materialized slot

2019-08-12 Thread Quanlong Huang (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-8790:
--

Assignee: Quanlong Huang

> IllegalStateException: Illegal reference to non-materialized slot
> -
>
> Key: IMPALA-8790
> URL: https://issues.apache.org/jira/browse/IMPALA-8790
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
> Attachments: foo.parq
>
>
> Reproduce:
> {code:sql}
> $ hdfs dfs -put foo.parq hdfs:///tmp
> impala> create table foo (uid string, cid string) stored as parquet;
> impala> load data inpath 'hdfs:///tmp/foo.parq' into table foo;
> {code}
> With the stats, the following query hits an IllegalStateException:
> {code:sql}
> impala> compute stats foo;
> impala> explain select uid, cid,
>rank() over (partition by uid order by count(*) desc)
> from (select uid, cid from foo) w
> group by uid, cid;
> ERROR: IllegalStateException: Illegal reference to non-materialized slot: 
> tid=1 sid=2{code}
> Without the stats, it runs successfully:
> {code:sql}
> impala> drop stats foo;
> impala> explain select uid, cid,
>rank() over (partition by uid order by count(*) desc)
> from (select uid, cid from foo) w
> group by uid, cid;
> ++
> | Explain String  
>|
> ++
> | Max Per-Host Resource Reservation: Memory=84.02MB Threads=5 
>|
> | Per-Host Resource Estimates: Memory=304MB   
>|
> | WARNING: The following tables are missing relevant table and/or column 
> statistics. |
> | common_action.foo   
>|
> | 
>|
> | PLAN-ROOT SINK  
>|
> | |   
>|
> | 07:EXCHANGE [UNPARTITIONED] 
>|
> | |   
>|
> | 03:ANALYTIC 
>|
> | |  functions: rank()
>|
> | |  partition by: uid
>|
> | |  order by: count(*) DESC  
>|
> | |  window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
>|
> | |  row-size=40B cardinality=1.10K   
>|
> | |   
>|
> | 02:SORT 
>|
> | |  order by: uid ASC NULLS FIRST, count(*) DESC 
>|
> | |  row-size=32B cardinality=1.10K   
>|
> | |   
>|
> | 06:EXCHANGE [HASH(uid)] 
>|
> | |   
>|
> | 05:AGGREGATE [FINALIZE] 
>|
> | |  output: count:merge(*)   
>|
> | |  group by: uid, cid   
>|
> | |  row-size=32B cardinality=1.10K   
>|
> | |   
>|
> | 04:EXCHANGE [HASH(uid,cid)] 
>|
> | |   
>|
> | 01:AGGREGATE [STREAMING]
>|
> | |  output: count(*) 
>|
> | |  group by: uid, cid   
>|
> | |  row-size=32B cardinality=1.10K   
>|
> | |   
>|
> | 00:SCAN HDFS [common_action.foo]
>

[jira] [Reopened] (IMPALA-8791) Handle the case where there is no fragment scheduled on the coordinator for a query

2019-08-12 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reopened IMPALA-8791:
---

> Handle the case where there is no fragment scheduled on the coordinator for a 
> query
> ---
>
> Key: IMPALA-8791
> URL: https://issues.apache.org/jira/browse/IMPALA-8791
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Product Backlog
>Reporter: Bikramjeet Vig
>Assignee: Bikramjeet Vig
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> For insert statements executed on a dedicated coord, the fragments get 
> schduled only on executors but a query state object still gets started up on 
> the coord host with the coord_mem_limit. we end up with a situation where the 
> mem admitted is zero for the coord but the mem_reserved is non-zero which 
> would affect other admission decisions.
> There is also a case where there is no coordinator fragment but the execution 
> fragment gets scheduled on the coord (eg. insert into  values) for 
> this case, the mem admitted is per_backend_mem_limit_ but the mem limit 
> applied to the coord query state is coord_backend_mem_limit_ which again 
> causes a inconsistency.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-8791) Handle the case where there is no fragment scheduled on the coordinator for a query

2019-08-12 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8791:
--
Fix Version/s: (was: Impala 3.3.0)

> Handle the case where there is no fragment scheduled on the coordinator for a 
> query
> ---
>
> Key: IMPALA-8791
> URL: https://issues.apache.org/jira/browse/IMPALA-8791
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Product Backlog
>Reporter: Bikramjeet Vig
>Assignee: Bikramjeet Vig
>Priority: Major
>
> For insert statements executed on a dedicated coord, the fragments get 
> schduled only on executors but a query state object still gets started up on 
> the coord host with the coord_mem_limit. we end up with a situation where the 
> mem admitted is zero for the coord but the mem_reserved is non-zero which 
> would affect other admission decisions.
> There is also a case where there is no coordinator fragment but the execution 
> fragment gets scheduled on the coord (eg. insert into  values) for 
> this case, the mem admitted is per_backend_mem_limit_ but the mem limit 
> applied to the coord query state is coord_backend_mem_limit_ which again 
> causes a inconsistency.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8791) Handle the case where there is no fragment scheduled on the coordinator for a query

2019-08-12 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905709#comment-16905709
 ] 

Tim Armstrong commented on IMPALA-8791:
---

This hit a DCHECK during data loading:
{noformat}
F0812 17:00:54.053512 18031 coordinator-backend-state.cc:547] Check failed: 
!IsEmptyBackend() 
*** Check failure stack trace: ***
@  0x4d6ac1c  google::LogMessage::Fail()
@  0x4d6c4c1  google::LogMessage::SendToLog()
@  0x4d6a5f6  google::LogMessage::Flush()
@  0x4d6dbbd  google::LogMessageFatal::~LogMessageFatal()
@  0x2942e6e  impala::Coordinator::BackendState::PublishFilter()
@  0x292f9f3  impala::Coordinator::UpdateFilter()
@  0x2238320  impala::ClientRequestState::UpdateFilter()
@  0x21d1241  impala::ImpalaServer::UpdateFilter()
@  0x227cae3  impala::ImpalaInternalService::UpdateFilter()
@  0x276bd65  
impala::ImpalaInternalServiceProcessor::process_UpdateFilter()
@  0x276bab3  impala::ImpalaInternalServiceProcessor::dispatchCall()
@  0x1b29793  apache::thrift::TDispatchProcessor::process()
@  0x1f84119  
apache::thrift::server::TAcceptQueueServer::Task::run()
@  0x1f7985e  impala::ThriftThread::RunRunnable()
@  0x1f7af84  boost::_mfi::mf2<>::operator()()
@  0x1f7ae1a  boost::_bi::list3<>::operator()<>()
@  0x1f7ab66  boost::_bi::bind_t<>::operator()()
@  0x1f7aa79  
boost::detail::function::void_function_obj_invoker0<>::invoke()
@  0x1e9a331  boost::function0<>::operator()()
@  0x23e2674  impala::Thread::SuperviseThread()
@  0x23ea9f8  boost::_bi::list5<>::operator()<>()
@  0x23ea91c  boost::_bi::bind_t<>::operator()()
@  0x23ea8df  boost::detail::thread_data<>::run()
@  0x3c93a79  thread_proxy
@ 0x7f62f3b86e24  start_thread
@ 0x7f62f02e034c  __clone
{noformat}

> Handle the case where there is no fragment scheduled on the coordinator for a 
> query
> ---
>
> Key: IMPALA-8791
> URL: https://issues.apache.org/jira/browse/IMPALA-8791
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Product Backlog
>Reporter: Bikramjeet Vig
>Assignee: Bikramjeet Vig
>Priority: Major
>
> For insert statements executed on a dedicated coord, the fragments get 
> schduled only on executors but a query state object still gets started up on 
> the coord host with the coord_mem_limit. we end up with a situation where the 
> mem admitted is zero for the coord but the mem_reserved is non-zero which 
> would affect other admission decisions.
> There is also a case where there is no coordinator fragment but the execution 
> fragment gets scheduled on the coord (eg. insert into  values) for 
> this case, the mem admitted is per_backend_mem_limit_ but the mem limit 
> applied to the coord query state is coord_backend_mem_limit_ which again 
> causes a inconsistency.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8857) test_kudu_col_not_null_changed may fail because client reads older timestamp

2019-08-12 Thread Tim Armstrong (JIRA)

Tim Armstrong created IMPALA-8857:
-

 Summary: test_kudu_col_not_null_changed may fail because client 
reads older timestamp
 Key: IMPALA-8857
 URL: https://issues.apache.org/jira/browse/IMPALA-8857
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.3.0
Reporter: Tim Armstrong
Assignee: Thomas Tauber-Marshall


{noformat}
uery_test/test_kudu.py:242: in test_kudu_col_not_null_changed
assert len(cursor.fetchall()) == 100
E   assert 61 == 100
E+  where 61 = len([(0, None), (2, None), (4, None), (11, None), (12, 
None), (19, None), ...])
E+where [(0, None), (2, None), (4, None), (11, None), (12, None), (19, 
None), ...] = >()
E+  where > = 
.fetchall
{noformat}

I believe this is a flaky tests, since there's no attempt to pass the timestamp 
from the kudu client that did the insert to the impala client that's doing the 
reading.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8856) TestKuduHMSIntegration.test_drop_managed_kudu_table failed with "the table does not exist"

2019-08-12 Thread Tim Armstrong (JIRA)

Tim Armstrong created IMPALA-8856:
-

 Summary: TestKuduHMSIntegration.test_drop_managed_kudu_table 
failed with "the table does not exist"
 Key: IMPALA-8856
 URL: https://issues.apache.org/jira/browse/IMPALA-8856
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Affects Versions: Impala 3.3.0
Reporter: Tim Armstrong
Assignee: Hao Hao
 Attachments: 
catalogd.impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com.jenkins.log.INFO.20190812-144934.5595,
 hive-metastore.log, 
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com.jenkins.log.INFO.20190812-144935.5670,
 
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com.jenkins.log.INFO.20190812-144935.5675,
 
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com.jenkins.log.INFO.20190812-144935.5675,
 
kudu-master.impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com.jenkins.log.INFO.20190812-144746.28998,
 
kudu-tserver.impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com.jenkins.diagnostics.20190812-144746.0.28965,
 
kudu-tserver.impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com.jenkins.log.INFO.20190812-144746.28981,
 
kudu-tserver.impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com.jenkins.log.INFO.20190812-144746.29022

{noformat}
custom_cluster.test_kudu.TestKuduHMSIntegration.test_drop_managed_kudu_table 
(from pytest)
Failing for the past 1 build (Since Failed#31 )
Took 27 sec.
add description
Error Message

KuduNotFound: the table does not exist: table_name: 
"test_drop_managed_kudu_table_a82c250c.foo"

Stacktrace

custom_cluster/test_kudu.py:256: in test_drop_managed_kudu_table
kudu_client.delete_table(kudu_tbl_name)
kudu/client.pyx:392: in kudu.client.Client.delete_table (kudu/client.cpp:7106)
???
kudu/errors.pyx:56: in kudu.errors.check_status (kudu/errors.cpp:904)
???
E   KuduNotFound: the table does not exist: table_name: 
"test_drop_managed_kudu_table_a82c250c.foo"

Standard Error

-- 2019-08-12 14:49:32,854 INFO MainThread: Starting cluster with command: 
/data/jenkins/workspace/impala-asf-master-core-s3-data-cache/repos/Impala/bin/start-impala-cluster.py
 '--state_store_args=--statestore_update_frequency_ms=50 
--statestore_priority_update_frequency_ms=50 
--statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
--log_dir=/data/jenkins/workspace/impala-asf-master-core-s3-data-cache/repos/Impala/logs/custom_cluster_tests
 --log_level=1 --impalad_args=--default_query_options=
14:49:33 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
14:49:33 MainThread: Starting State Store logging to 
/data/jenkins/workspace/impala-asf-master-core-s3-data-cache/repos/Impala/logs/custom_cluster_tests/statestored.INFO
14:49:34 MainThread: Starting Catalog Service logging to 
/data/jenkins/workspace/impala-asf-master-core-s3-data-cache/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
14:49:35 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-asf-master-core-s3-data-cache/repos/Impala/logs/custom_cluster_tests/impalad.INFO
14:49:35 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-asf-master-core-s3-data-cache/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
14:49:35 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-asf-master-core-s3-data-cache/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
14:49:38 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
14:49:38 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
14:49:38 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com:25000
14:49:38 MainThread: Debug webpage not yet available: ('Connection aborted.', 
error(111, 'Connection refused'))
14:49:40 MainThread: Debug webpage did not become available in expected time.
14:49:40 MainThread: Waiting for num_known_live_backends=3. Current value: None
14:49:41 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
14:49:41 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com:25000
14:49:41 MainThread: Waiting for num_known_live_backends=3. Current value: 0
14:49:42 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
14:49:42 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com:25000
14:49:42 MainThread: num_known_live_backends has reached value: 3
14:49:43 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
14:49:43 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-1188.vpc.cloudera.com:25001
14:49:43 MainThread: num_known_live_backends has reache

[jira] [Resolved] (IMPALA-8791) Handle the case where there is no fragment scheduled on the coordinator for a query

2019-08-12 Thread Bikramjeet Vig (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikramjeet Vig resolved IMPALA-8791.

   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Handle the case where there is no fragment scheduled on the coordinator for a 
> query
> ---
>
> Key: IMPALA-8791
> URL: https://issues.apache.org/jira/browse/IMPALA-8791
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Product Backlog
>Reporter: Bikramjeet Vig
>Assignee: Bikramjeet Vig
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> For insert statements executed on a dedicated coord, the fragments get 
> schduled only on executors but a query state object still gets started up on 
> the coord host with the coord_mem_limit. we end up with a situation where the 
> mem admitted is zero for the coord but the mem_reserved is non-zero which 
> would affect other admission decisions.
> There is also a case where there is no coordinator fragment but the execution 
> fragment gets scheduled on the coord (eg. insert into  values) for 
> this case, the mem admitted is per_backend_mem_limit_ but the mem limit 
> applied to the coord query state is coord_backend_mem_limit_ which again 
> causes a inconsistency.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8791) Handle the case where there is no fragment scheduled on the coordinator for a query

2019-08-12 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905625#comment-16905625
 ] 

ASF subversion and git services commented on IMPALA-8791:
-

Commit 760169edcbca438c5964380a604b6c271c6bd1a3 in impala's branch 
refs/heads/master from Bikramjeet Vig
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=760169e ]

IMPALA-8791: Handle the case where there is no fragment scheduled on
the coordinator

This patch fixes a bug where if an insert or CTAS query has no
fragments scheduled on the coordinator and a mem limit is to be
enforced on the query (either through query option or automatically
through estimates) then the same limit is also applied to the
coordinator backend even though it does not execute anything.

Highlights:
- coord_backend_mem_to_admit_/mem_limit will always refer to the memory
to admit/limit for the coordinator regardless of which fragments are
scheduled on it.

- There will always be a BackendExecParams added for the coordinator
because coordinator always spawns a QueryState object with a mem_tracker
for tracking runtime filter mem and the result set cache. For the case
where this BackendExecParams is empty (no instances scheduled) it would
ensure that some minimal amount of memory is accounted for by the
admission controller and the right mem limit is applied to the
QueryState spawned by the coordinator

- added changes to Coordinator and Coordinator::BackendState classes
to handle an empty BackendExecParams object

Testing:
The following cases need to be tested where the kind of fragments
schduled on the coordinator backend are:
1. Coordinator fragment + other exec fragments
2. Coordinator fragment only
3. other exec fragments only (eg. insert into values OR insert
   into select 1)
4. No fragments, but coordinator still creates a QueryState

Case 1 is covered by tests working with non-dedicated coordinators.
Rest are covered by test_mem_limit_dedicated_coordinator in
test_admission_controller.py

Change-Id: If5631fa1490d9612ffac3c4c4715348de47d6df2
Reviewed-on: http://gerrit.cloudera.org:8080/13992
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Handle the case where there is no fragment scheduled on the coordinator for a 
> query
> ---
>
> Key: IMPALA-8791
> URL: https://issues.apache.org/jira/browse/IMPALA-8791
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Product Backlog
>Reporter: Bikramjeet Vig
>Assignee: Bikramjeet Vig
>Priority: Major
>
> For insert statements executed on a dedicated coord, the fragments get 
> schduled only on executors but a query state object still gets started up on 
> the coord host with the coord_mem_limit. we end up with a situation where the 
> mem admitted is zero for the coord but the mem_reserved is non-zero which 
> would affect other admission decisions.
> There is also a case where there is no coordinator fragment but the execution 
> fragment gets scheduled on the coord (eg. insert into  values) for 
> this case, the mem admitted is per_backend_mem_limit_ but the mem limit 
> applied to the coord query state is coord_backend_mem_limit_ which again 
> causes a inconsistency.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8712) Convert ExecQueryFInstance() RPC to become asynchronous

2019-08-12 Thread Michael Ho (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905619#comment-16905619
 ] 

Michael Ho commented on IMPALA-8712:


Please also see https://issues.apache.org/jira/browse/IMPALA-4475

> Convert ExecQueryFInstance() RPC to become asynchronous
> ---
>
> Key: IMPALA-8712
> URL: https://issues.apache.org/jira/browse/IMPALA-8712
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 3.3.0
>Reporter: Michael Ho
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>
> Now that IMPALA-7467 is fixed, ExecQueryFInstance() can utilize the async RPC 
> capabilities of KRPC instead of relying on the half-baked way of using 
> {{ExecEnv::exec_rpc_thread_pool_}} to start query fragment instances. We 
> already have a reactor thread pool in KRPC to handle sending client RPCs 
> asynchronously. Also various tasks under IMPALA-5486 can also benefit from 
> making ExecQueryFInstance() asynchronous so the RPCs can be cancelled.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8847) Add partition events may contain empty partition object list

2019-08-12 Thread Vihang Karajgaonkar (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905611#comment-16905611
 ] 

Vihang Karajgaonkar commented on IMPALA-8847:
-

Patch is in review.

> Add partition events may contain empty partition object list
> 
>
> Key: IMPALA-8847
> URL: https://issues.apache.org/jira/browse/IMPALA-8847
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> When event polling is ON and when an external application like Hive issues a 
> {{alter table  add if not exists partition ()}} it is 
> possble that command did not add a partition since it is preexisting. 
> However, metastore still generates a ADD_PARTITION event in such a case with 
> empty list of added partitions. Such events cause a Precondition to fail 
> while processing on the EventsProcessor side and event polling goes into 
> error state.
> The fix would be simple. Ignore such events.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8845) Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState

2019-08-12 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905607#comment-16905607
 ] 

Tim Armstrong commented on IMPALA-8845:
---

Yeah, I think the issue in IMPALA-3990 should be extremely rare and could be 
mitigated by increasing the stream cache duration, so there's probably a good 
argument for not worrying about it. It just never sat that well with me because 
it was hard to reason about the behaviour. I think the problem that this JIRA 
tracks is a much bigger issue in practice.

Agree with your analysis that the problem is that there's not necessarily 
enough buffering on the coordinator's side of the exchange for all the sender 
fragments to send their final batch and clean themselves up.

We could also consider something like ignoring all error statuses from 
fragments once the last row has been appended to the PRS (we already do 
something like this when the client has hit EOS)

> Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState
> 
>
> Key: IMPALA-8845
> URL: https://issues.apache.org/jira/browse/IMPALA-8845
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While testing IMPALA-8818, I found that IMPALA-8780 does not always cause all 
> non-coordinator fragments to shutdown. In certain setups, TopN queries 
> ({{select * from [table] order by [col] limit [limit]}}) where all results 
> are successfully spooled, still keep non-coordinator fragments alive.
> The issue is that sometimes the {{DATASTREAM SINK}} for the TopN <-- Scan 
> Node fragment ends up blocking waiting for a response to a {{TransmitData()}} 
> RPC. This prevents the fragment from shutting down.
> I haven't traced the issue exactly, but what I *think* is happening is that 
> the {{MERGING-EXCHANGE}} operator in the coordinator fragment hits {{eos}} 
> whenever it has received enough rows to reach the limit defined in the query, 
> which could occur before the {{DATASTREAM SINK}} sends all the rows from the 
> TopN / Scan Node fragment.
> So the TopN / Scan Node fragments end up hanging until they are explicitly 
> closed.
> The fix is to close the {{ExecNode}} tree in {{FragmentInstanceState}} as 
> eagerly as possible. Moving the close call to before the call to 
> {{DataSink::FlushFinal}} fixes the issue. It has the added benefit that it 
> shuts down and releases all {{ExecNode}} resources as soon as it can. When 
> result spooling is enabled, this is particularly important because 
> {{FlushFinal}} might block until the consumer reads all rows.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8845) Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState

2019-08-12 Thread Michael Ho (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905597#comment-16905597
 ] 

Michael Ho commented on IMPALA-8845:


Oops.. looks like Sahil already updated the JIRA with the same observation. 
Didn't mean to post the same thing above but the observation is the same.

> Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState
> 
>
> Key: IMPALA-8845
> URL: https://issues.apache.org/jira/browse/IMPALA-8845
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While testing IMPALA-8818, I found that IMPALA-8780 does not always cause all 
> non-coordinator fragments to shutdown. In certain setups, TopN queries 
> ({{select * from [table] order by [col] limit [limit]}}) where all results 
> are successfully spooled, still keep non-coordinator fragments alive.
> The issue is that sometimes the {{DATASTREAM SINK}} for the TopN <-- Scan 
> Node fragment ends up blocking waiting for a response to a {{TransmitData()}} 
> RPC. This prevents the fragment from shutting down.
> I haven't traced the issue exactly, but what I *think* is happening is that 
> the {{MERGING-EXCHANGE}} operator in the coordinator fragment hits {{eos}} 
> whenever it has received enough rows to reach the limit defined in the query, 
> which could occur before the {{DATASTREAM SINK}} sends all the rows from the 
> TopN / Scan Node fragment.
> So the TopN / Scan Node fragments end up hanging until they are explicitly 
> closed.
> The fix is to close the {{ExecNode}} tree in {{FragmentInstanceState}} as 
> eagerly as possible. Moving the close call to before the call to 
> {{DataSink::FlushFinal}} fixes the issue. It has the added benefit that it 
> shuts down and releases all {{ExecNode}} resources as soon as it can. When 
> result spooling is enabled, this is particularly important because 
> {{FlushFinal}} might block until the consumer reads all rows.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-8845) Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState

2019-08-12 Thread Michael Ho (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905594#comment-16905594
 ] 

Michael Ho edited comment on IMPALA-8845 at 8/12/19 9:31 PM:
-

{quote} I haven't traced the issue exactly, but what I think is happening is 
that the MERGING-EXCHANGE operator in the coordinator fragment hits eos 
whenever it has received enough rows to reach the limit defined in the query, 
which could occur before the DATASTREAM SINK sends all the rows from the TopN / 
Scan Node fragment. {quote}

If I understand the above correctly, your observation was that the 
Merging-Exchange has been closed already and the other fragment instance is 
stuck in an RPC call. Usually, when the receiving fragment is closed, it will 
be put into a "closed receiver cache". Incoming traffic will probe against this 
cache and notices that it's closed already and short-circuits the reply to the 
DataStreamSender. At which point, the DataStreamSender should skip issuing the 
RPC (see [code here| 
https://github.com/apache/impala/blob/master/be/src/runtime/krpc-data-stream-sender.cc#L410-L411
 ] However, there is an expiration time (5 minutes) for entries in the cache so 
eventually expired entries will be removed. Traffic arriving for that receiver 
may be stuck for {{--datastream_sender_timeout_ms}} before returning with an 
error.

I probably need to look at the log to confirm whether the latter case is what's 
happening there. Please also see 
https://issues.apache.org/jira/browse/IMPALA-6818




was (Author: kwho):
{quote} I haven't traced the issue exactly, but what I think is happening is 
that the MERGING-EXCHANGE operator in the coordinator fragment hits eos 
whenever it has received enough rows to reach the limit defined in the query, 
which could occur before the DATASTREAM SINK sends all the rows from the TopN / 
Scan Node fragment. {quote}

If I understand the above correctly, your observation was that the 
Merging-Exchange has been closed already and the other fragment instance is 
stuck in an RPC call. Usually, when the receiving fragment is closed, it will 
be put into a "closed receiver cache". Incoming traffic will probe against this 
cache and notices that it's closed already and short-circuits the reply to the 
DataStreamSender. At which point, the DataStreamSender should skip issuing the 
RPC (see [code here| 
https://github.com/apache/impala/blob/master/be/src/runtime/krpc-data-stream-sender.cc#L410-L411
 ] However, there is an expiration time (5 minutes) for entries in the cache so 
eventually expired entries will be removed. Traffic arriving for that receiver 
may be stuck for {{--datastream_sender_timeout_ms}} before returning with an 
error.

That said, if the DataStreamSender manages to 



> Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState
> 
>
> Key: IMPALA-8845
> URL: https://issues.apache.org/jira/browse/IMPALA-8845
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While testing IMPALA-8818, I found that IMPALA-8780 does not always cause all 
> non-coordinator fragments to shutdown. In certain setups, TopN queries 
> ({{select * from [table] order by [col] limit [limit]}}) where all results 
> are successfully spooled, still keep non-coordinator fragments alive.
> The issue is that sometimes the {{DATASTREAM SINK}} for the TopN <-- Scan 
> Node fragment ends up blocking waiting for a response to a {{TransmitData()}} 
> RPC. This prevents the fragment from shutting down.
> I haven't traced the issue exactly, but what I *think* is happening is that 
> the {{MERGING-EXCHANGE}} operator in the coordinator fragment hits {{eos}} 
> whenever it has received enough rows to reach the limit defined in the query, 
> which could occur before the {{DATASTREAM SINK}} sends all the rows from the 
> TopN / Scan Node fragment.
> So the TopN / Scan Node fragments end up hanging until they are explicitly 
> closed.
> The fix is to close the {{ExecNode}} tree in {{FragmentInstanceState}} as 
> eagerly as possible. Moving the close call to before the call to 
> {{DataSink::FlushFinal}} fixes the issue. It has the added benefit that it 
> shuts down and releases all {{ExecNode}} resources as soon as it can. When 
> result spooling is enabled, this is particularly important because 
> {{FlushFinal}} might block until the consumer reads all rows.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8845) Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState

2019-08-12 Thread Michael Ho (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905594#comment-16905594
 ] 

Michael Ho commented on IMPALA-8845:


{quote} I haven't traced the issue exactly, but what I think is happening is 
that the MERGING-EXCHANGE operator in the coordinator fragment hits eos 
whenever it has received enough rows to reach the limit defined in the query, 
which could occur before the DATASTREAM SINK sends all the rows from the TopN / 
Scan Node fragment. {quote}

If I understand the above correctly, your observation was that the 
Merging-Exchange has been closed already and the other fragment instance is 
stuck in an RPC call. Usually, when the receiving fragment is closed, it will 
be put into a "closed receiver cache". Incoming traffic will probe against this 
cache and notices that it's closed already and short-circuits the reply to the 
DataStreamSender. At which point, the DataStreamSender should skip issuing the 
RPC (see [code here| 
https://github.com/apache/impala/blob/master/be/src/runtime/krpc-data-stream-sender.cc#L410-L411
 ] However, there is an expiration time (5 minutes) for entries in the cache so 
eventually expired entries will be removed. Traffic arriving for that receiver 
may be stuck for {{--datastream_sender_timeout_ms}} before returning with an 
error.

That said, if the DataStreamSender manages to 



> Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState
> 
>
> Key: IMPALA-8845
> URL: https://issues.apache.org/jira/browse/IMPALA-8845
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While testing IMPALA-8818, I found that IMPALA-8780 does not always cause all 
> non-coordinator fragments to shutdown. In certain setups, TopN queries 
> ({{select * from [table] order by [col] limit [limit]}}) where all results 
> are successfully spooled, still keep non-coordinator fragments alive.
> The issue is that sometimes the {{DATASTREAM SINK}} for the TopN <-- Scan 
> Node fragment ends up blocking waiting for a response to a {{TransmitData()}} 
> RPC. This prevents the fragment from shutting down.
> I haven't traced the issue exactly, but what I *think* is happening is that 
> the {{MERGING-EXCHANGE}} operator in the coordinator fragment hits {{eos}} 
> whenever it has received enough rows to reach the limit defined in the query, 
> which could occur before the {{DATASTREAM SINK}} sends all the rows from the 
> TopN / Scan Node fragment.
> So the TopN / Scan Node fragments end up hanging until they are explicitly 
> closed.
> The fix is to close the {{ExecNode}} tree in {{FragmentInstanceState}} as 
> eagerly as possible. Moving the close call to before the call to 
> {{DataSink::FlushFinal}} fixes the issue. It has the added benefit that it 
> shuts down and releases all {{ExecNode}} resources as soon as it can. When 
> result spooling is enabled, this is particularly important because 
> {{FlushFinal}} might block until the consumer reads all rows.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8845) Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState

2019-08-12 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905578#comment-16905578
 ] 

Sahil Takiar commented on IMPALA-8845:
--

IMPALA-6984 looks like it might be relevant as well.

> Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState
> 
>
> Key: IMPALA-8845
> URL: https://issues.apache.org/jira/browse/IMPALA-8845
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While testing IMPALA-8818, I found that IMPALA-8780 does not always cause all 
> non-coordinator fragments to shutdown. In certain setups, TopN queries 
> ({{select * from [table] order by [col] limit [limit]}}) where all results 
> are successfully spooled, still keep non-coordinator fragments alive.
> The issue is that sometimes the {{DATASTREAM SINK}} for the TopN <-- Scan 
> Node fragment ends up blocking waiting for a response to a {{TransmitData()}} 
> RPC. This prevents the fragment from shutting down.
> I haven't traced the issue exactly, but what I *think* is happening is that 
> the {{MERGING-EXCHANGE}} operator in the coordinator fragment hits {{eos}} 
> whenever it has received enough rows to reach the limit defined in the query, 
> which could occur before the {{DATASTREAM SINK}} sends all the rows from the 
> TopN / Scan Node fragment.
> So the TopN / Scan Node fragments end up hanging until they are explicitly 
> closed.
> The fix is to close the {{ExecNode}} tree in {{FragmentInstanceState}} as 
> eagerly as possible. Moving the close call to before the call to 
> {{DataSink::FlushFinal}} fixes the issue. It has the added benefit that it 
> shuts down and releases all {{ExecNode}} resources as soon as it can. When 
> result spooling is enabled, this is particularly important because 
> {{FlushFinal}} might block until the consumer reads all rows.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8855) Impala docs do not mention all places VALUES clause can be used

2019-08-12 Thread Tim Armstrong (JIRA)

Tim Armstrong created IMPALA-8855:
-

 Summary: Impala docs do not mention all places VALUES clause can 
be used
 Key: IMPALA-8855
 URL: https://issues.apache.org/jira/browse/IMPALA-8855
 Project: IMPALA
  Issue Type: Improvement
  Components: Docs
Reporter: Tim Armstrong
Assignee: Alex Rodoni


The documentation only mentions the values clause in the context of an INSERT 
statement. https://impala.apache.org/docs/build/html/topics/impala_insert.html

It can actually be used anywhere that a SELECT statement could be used, e.g. 
this is a valid query:
{noformat}
values ('hello', 'world')
{noformat}





--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8845) Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState

2019-08-12 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905564#comment-16905564
 ] 

Sahil Takiar commented on IMPALA-8845:
--

As far as I can tell, the issue described in IMPALA-3990 is still there:
 * The kRPC receiver is closed by {{FragmentInstanceState::Close}} --> 
{{ExchangeNode::Close}} --> {{KrpcDataStreamRecvr::Close}} --> 
{{KrpcDataStreamMgr::DeregisterRecvr}}
 ** {{DeregisterRecvr}} adds the receiver to the {{closed_stream_cache_}}
 * Attempts to send data to the closed receiver will initially get a 
{{DATASTREAM_RECVR_CLOSED}} response
 ** The call trace here is {{DataStreamService::TransmitData}} --> 
{{KrpcDataStreamMgr::AddData}} 
 ** If {{AddData}} finds the receiver in the {{closed_stream_cache_}} it 
responds to the sender with an {{DATASTREAM_RECVR_CLOSED}} error
 ** When the sender receives {{DATASTREAM_RECVR_CLOSED}} it will drop all 
incoming data to {{KrpcDataStreamSender::Channel::TransmitData}} (so no more 
RPCs should be sent to the receiver)
 * If an RPC is sent to a receiver after the {{STREAM_EXPIRATION_TIME_MS}} 
timeout is hit, then the query will fail
 ** The maintenance thread in {{KrpcDataStreamMgr::Maintenance}} will 
eventually remove the receiver from the {{closed_stream_cache_}} and attempts 
to send data to that receiver will eventually hit a 
{{DATASTREAM_SENDER_TIMEOUT}} error (after {{datastream_sender_timeout_ms}} has 
elapsed)
 ** This should be rare, because the logic in {{DATASTREAM_RECVR_CLOSED}} 
should prevent any more rows from being sent to the exchange, but it can happen 
if there are large delays between when row batches are sent

So (as described in IMPALA-3990) if a fragment sends an RPC to an exchange, the 
exchange hits eos and shuts down the kRPC receiver, the 
{{STREAM_EXPIRATION_TIME_MS}} timeout expires, and then the fragment sends 
another RPC to the exchange, an error will occur after 
{{datastream_sender_timeout_ms}}, and the query will fail.

> Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState
> 
>
> Key: IMPALA-8845
> URL: https://issues.apache.org/jira/browse/IMPALA-8845
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While testing IMPALA-8818, I found that IMPALA-8780 does not always cause all 
> non-coordinator fragments to shutdown. In certain setups, TopN queries 
> ({{select * from [table] order by [col] limit [limit]}}) where all results 
> are successfully spooled, still keep non-coordinator fragments alive.
> The issue is that sometimes the {{DATASTREAM SINK}} for the TopN <-- Scan 
> Node fragment ends up blocking waiting for a response to a {{TransmitData()}} 
> RPC. This prevents the fragment from shutting down.
> I haven't traced the issue exactly, but what I *think* is happening is that 
> the {{MERGING-EXCHANGE}} operator in the coordinator fragment hits {{eos}} 
> whenever it has received enough rows to reach the limit defined in the query, 
> which could occur before the {{DATASTREAM SINK}} sends all the rows from the 
> TopN / Scan Node fragment.
> So the TopN / Scan Node fragments end up hanging until they are explicitly 
> closed.
> The fix is to close the {{ExecNode}} tree in {{FragmentInstanceState}} as 
> eagerly as possible. Moving the close call to before the call to 
> {{DataSink::FlushFinal}} fixes the issue. It has the added benefit that it 
> shuts down and releases all {{ExecNode}} resources as soon as it can. When 
> result spooling is enabled, this is particularly important because 
> {{FlushFinal}} might block until the consumer reads all rows.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Closed] (IMPALA-7374) Impala Doc: Doc DATE type

2019-08-12 Thread Alex Rodoni (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-7374.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Impala Doc: Doc DATE type
> -
>
> Key: IMPALA-7374
> URL: https://issues.apache.org/jira/browse/IMPALA-7374
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
> Fix For: Impala 3.3.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Closed] (IMPALA-8429) Update docs to reflect default join distribution mode change

2019-08-12 Thread Alex Rodoni (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-8429.
---

> Update docs to reflect default join distribution mode change
> 
>
> Key: IMPALA-8429
> URL: https://issues.apache.org/jira/browse/IMPALA-8429
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Affects Versions: Impala 3.2.0
>Reporter: Balazs Jeszenszky
>Assignee: Alex Rodoni
>Priority: Minor
>
> The 'DEFAULT_JOIN_DISTRIBUTION_MODE Query Option' page needs an update to 
> reflect the changes in IMPALA-5120.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-8429) Update docs to reflect default join distribution mode change

2019-08-12 Thread Balazs Jeszenszky (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky resolved IMPALA-8429.
---
Resolution: Invalid

> Update docs to reflect default join distribution mode change
> 
>
> Key: IMPALA-8429
> URL: https://issues.apache.org/jira/browse/IMPALA-8429
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Affects Versions: Impala 3.2.0
>Reporter: Balazs Jeszenszky
>Assignee: Alex Rodoni
>Priority: Minor
>
> The 'DEFAULT_JOIN_DISTRIBUTION_MODE Query Option' page needs an update to 
> reflect the changes in IMPALA-5120.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8429) Update docs to reflect default join distribution mode change

2019-08-12 Thread Balazs Jeszenszky (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905518#comment-16905518
 ] 

Balazs Jeszenszky commented on IMPALA-8429:
---

Sorry for the delay. This took me a while to figure out again for some reason. 
I agree the docs are correct, I don't think I was aware of IMPALA-5381 at the 
time of submitting this request.

> Update docs to reflect default join distribution mode change
> 
>
> Key: IMPALA-8429
> URL: https://issues.apache.org/jira/browse/IMPALA-8429
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Affects Versions: Impala 3.2.0
>Reporter: Balazs Jeszenszky
>Assignee: Alex Rodoni
>Priority: Minor
>
> The 'DEFAULT_JOIN_DISTRIBUTION_MODE Query Option' page needs an update to 
> reflect the changes in IMPALA-5120.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8845) Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState

2019-08-12 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905493#comment-16905493
 ] 

Sahil Takiar commented on IMPALA-8845:
--

CC: [~kwho]

Still trying to understand the code here, but looping in Michael as well.

> Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState
> 
>
> Key: IMPALA-8845
> URL: https://issues.apache.org/jira/browse/IMPALA-8845
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While testing IMPALA-8818, I found that IMPALA-8780 does not always cause all 
> non-coordinator fragments to shutdown. In certain setups, TopN queries 
> ({{select * from [table] order by [col] limit [limit]}}) where all results 
> are successfully spooled, still keep non-coordinator fragments alive.
> The issue is that sometimes the {{DATASTREAM SINK}} for the TopN <-- Scan 
> Node fragment ends up blocking waiting for a response to a {{TransmitData()}} 
> RPC. This prevents the fragment from shutting down.
> I haven't traced the issue exactly, but what I *think* is happening is that 
> the {{MERGING-EXCHANGE}} operator in the coordinator fragment hits {{eos}} 
> whenever it has received enough rows to reach the limit defined in the query, 
> which could occur before the {{DATASTREAM SINK}} sends all the rows from the 
> TopN / Scan Node fragment.
> So the TopN / Scan Node fragments end up hanging until they are explicitly 
> closed.
> The fix is to close the {{ExecNode}} tree in {{FragmentInstanceState}} as 
> eagerly as possible. Moving the close call to before the call to 
> {{DataSink::FlushFinal}} fixes the issue. It has the added benefit that it 
> shuts down and releases all {{ExecNode}} resources as soon as it can. When 
> result spooling is enabled, this is particularly important because 
> {{FlushFinal}} might block until the consumer reads all rows.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7374) Impala Doc: Doc DATE type

2019-08-12 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905492#comment-16905492
 ] 

ASF subversion and git services commented on IMPALA-7374:
-

Commit 8eb50076c2b232b60fa5e44fb9341c752e2bf417 in impala's branch 
refs/heads/master from Alex Rodoni
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=8eb5007 ]

IMPALA-7374: [DOCS] Document the new DATE data type in Impala

Change-Id: I0c28361c7f0d225708eaf4b955c6704520eaaa68
Reviewed-on: http://gerrit.cloudera.org:8080/13983
Tested-by: Impala Public Jenkins 
Reviewed-by: Attila Jeges 


> Impala Doc: Doc DATE type
> -
>
> Key: IMPALA-7374
> URL: https://issues.apache.org/jira/browse/IMPALA-7374
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8837) Impala Doc: Document impersonalization via HTTP and Knox authentication

2019-08-12 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905494#comment-16905494
 ] 

ASF subversion and git services commented on IMPALA-8837:
-

Commit 620329f6d72d1a09edf560e7da0bc1d09e13a57f in impala's branch 
refs/heads/master from Alex Rodoni
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=620329f ]

IMPALA-8837: [DOCS] HTTP support for proxy/delegation connection

- Added a line on Knox support.

Change-Id: I591e0fd736ea114aa52a999acf41806a94e49382
Reviewed-on: http://gerrit.cloudera.org:8080/14033
Tested-by: Impala Public Jenkins 
Reviewed-by: Thomas Tauber-Marshall 


> Impala Doc: Document impersonalization via HTTP and Knox authentication
> ---
>
> Key: IMPALA-8837
> URL: https://issues.apache.org/jira/browse/IMPALA-8837
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
> Fix For: Impala 3.3.0
>
>
> https://gerrit.cloudera.org/#/c/14033/



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8846) Undefined behaviour in RleEncoder::Put

2019-08-12 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905495#comment-16905495
 ] 

ASF subversion and git services commented on IMPALA-8846:
-

Commit f26a32f85542bfdbceb7306a06327f66dc30294a in impala's branch 
refs/heads/master from Daniel Becker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f26a32f ]

IMPALA-8846: Undefined behaviour in RleEncoder::Put

To test for overflow, we used 'repeat_count_ <=
std::numeric_limits::max()', but this is always true as
repeat_count_ is an int. This could have lead to undefined behaviour
because we increment repeat_count_ afterwards.

Changed the comparison not to allow equality.

Change-Id: I269443d1f1680e672fde7dd88eab5fcb56c65613
Reviewed-on: http://gerrit.cloudera.org:8080/14042
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Undefined behaviour in RleEncoder::Put
> --
>
> Key: IMPALA-8846
> URL: https://issues.apache.org/jira/browse/IMPALA-8846
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
> Attachments: original.txt, with_check.txt
>
>
> On line 
> [https://github.com/apache/impala/blob/4000da35be69e469500f5f11e0e5fdec119cf5c7/be/src/util/rle-encoding.h#L346,]
>  we test repeat_count_ <= std::numeric_limits::max(), which is 
> always true (repeat_count_ is an int), then we increment repeat_count which 
> could be std::numeric_limits::max() and overflow, which is undefined 
> behaviour for signed integers.
>  
> We should either change <= to < or if we think that this never happens, 
> remove the misleading check.
> If we correct the check, it may lead to some (probably small) performance 
> regression because the compiler could have optimised this out.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Closed] (IMPALA-8837) Impala Doc: Document impersonalization via HTTP and Knox authentication

2019-08-12 Thread Alex Rodoni (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-8837.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Impala Doc: Document impersonalization via HTTP and Knox authentication
> ---
>
> Key: IMPALA-8837
> URL: https://issues.apache.org/jira/browse/IMPALA-8837
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
> Fix For: Impala 3.3.0
>
>
> https://gerrit.cloudera.org/#/c/14033/



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8847) Add partition events may contain empty partition object list

2019-08-12 Thread Vihang Karajgaonkar (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905448#comment-16905448
 ] 

Vihang Karajgaonkar commented on IMPALA-8847:
-

I created a HIVE jira since I believe this behavior is inconsistent and 
misleading from HMS. HMS should not generate such events in the first place.

> Add partition events may contain empty partition object list
> 
>
> Key: IMPALA-8847
> URL: https://issues.apache.org/jira/browse/IMPALA-8847
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> When event polling is ON and when an external application like Hive issues a 
> {{alter table  add if not exists partition ()}} it is 
> possble that command did not add a partition since it is preexisting. 
> However, metastore still generates a ADD_PARTITION event in such a case with 
> empty list of added partitions. Such events cause a Precondition to fail 
> while processing on the EventsProcessor side and event polling goes into 
> error state.
> The fix would be simple. Ignore such events.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8845) Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState

2019-08-12 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905370#comment-16905370
 ] 

Tim Armstrong commented on IMPALA-8845:
---

I remember I had some concerns about doing an early close on a subtree that I 
documented here: IMPALA-3990. The concern was that the datastream sender might 
not tear itself down cleanly. I'm not sure if the code has changed since then.

The reason why we didn't see this problem was that we would only do an early 
close on a subtree either for peculiar queries or when the limit was hit at the 
coordinator, and the coordinator actually issues Cancel() RPCs to all the 
fragments.

My concern here would be that we might mess things up by relying on the Close() 
propagating down the tree instead of the coordinator sending out Cancel() RPCs.

> Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState
> 
>
> Key: IMPALA-8845
> URL: https://issues.apache.org/jira/browse/IMPALA-8845
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> While testing IMPALA-8818, I found that IMPALA-8780 does not always cause all 
> non-coordinator fragments to shutdown. In certain setups, TopN queries 
> ({{select * from [table] order by [col] limit [limit]}}) where all results 
> are successfully spooled, still keep non-coordinator fragments alive.
> The issue is that sometimes the {{DATASTREAM SINK}} for the TopN <-- Scan 
> Node fragment ends up blocking waiting for a response to a {{TransmitData()}} 
> RPC. This prevents the fragment from shutting down.
> I haven't traced the issue exactly, but what I *think* is happening is that 
> the {{MERGING-EXCHANGE}} operator in the coordinator fragment hits {{eos}} 
> whenever it has received enough rows to reach the limit defined in the query, 
> which could occur before the {{DATASTREAM SINK}} sends all the rows from the 
> TopN / Scan Node fragment.
> So the TopN / Scan Node fragments end up hanging until they are explicitly 
> closed.
> The fix is to close the {{ExecNode}} tree in {{FragmentInstanceState}} as 
> eagerly as possible. Moving the close call to before the call to 
> {{DataSink::FlushFinal}} fixes the issue. It has the added benefit that it 
> shuts down and releases all {{ExecNode}} resources as soon as it can. When 
> result spooling is enabled, this is particularly important because 
> {{FlushFinal}} might block until the consumer reads all rows.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8852) ImpalaD fail to start on a non-datanode with "Invalid short-circuit reads configuration"

2019-08-12 Thread Adriano (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905346#comment-16905346
 ] 

Adriano commented on IMPALA-8852:
-

[~lv] agree, considering we have a solution adding the CM properties.

> ImpalaD fail to start on a non-datanode with "Invalid short-circuit reads 
> configuration"
> 
>
> Key: IMPALA-8852
> URL: https://issues.apache.org/jira/browse/IMPALA-8852
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0, Impala 3.3.0
>Reporter: Adriano
>Priority: Major
>  Labels: ramp-up
>
> On coordinator only nodes ([typically the edge 
> nodes|https://www.cloudera.com/documentation/enterprise/5-15-x/topics/impala_dedicated_coordinator.html#concept_omm_gf1_n2b]):
> {code:java}
> --is_coordinator=true
> --is_executor=false
> {code}
> the *dfs.domain.socket.path* (can be nonexistent on the local FS as the 
> Datanode role eventually is not installed on the edge node).
> The non existing path prevent the ImpalaD to start with the message:
> {code:java}
> I0809 04:15:53.899714 25364 status.cc:124] Invalid short-circuit reads 
> configuration:
>   - Impala cannot read or execute the parent directory of 
> dfs.domain.socket.path
> @   0xb35f19
> @  0x100e2fe
> @  0x103f274
> @  0x102836f
> @   0xa9f573
> @ 0x7f97807e93d4
> @   0xafb3b8
> E0809 04:15:53.899749 25364 impala-server.cc:278] Invalid short-circuit reads 
> configuration:
>   - Impala cannot read or execute the parent directory of 
> dfs.domain.socket.path
> {code}
> despite a coordinator-only ImpalaD does not do short circuit reads.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-8852) ImpalaD fail to start on a non-datanode with "Invalid short-circuit reads configuration"

2019-08-12 Thread Adriano (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903830#comment-16903830
 ] 

Adriano edited comment on IMPALA-8852 at 8/12/19 4:19 PM:
--

WORKAROUND -1-: 
Create the dfs.domain.socket.path manually with the proper hdfs user permission 
on the local fs as:

{code:java}
# mkdir /var/run/hdfs-sockets/
# chown hdfs:hadoop  /var/run/hdfs-sockets/
# chmod 755  /var/run/hdfs-sockets/
# mkdir /var/run/hdfs-sockets/dn
# chown hdfs:hdfs  /var/run/hdfs-sockets/dn
# chmod 1666  /var/run/hdfs-sockets/dn
{code}

*Best Solution*: 
1) On the Dedicated Coordinators, where the "-is_executor=false"  add the 
following property into "Impala Daemon HDFS Advanced Configuration Snippet 
(Safety Valve)":
{code:java}

dfs.client.read.shortcircuit
false
Disable shortcircuit on dedicated coordinator

{code}

2) Save the changes and restart the Impala Daemon Coordinator instances.


was (Author: adrenas):
WORKAROUND -1-: 
Create the dfs.domain.socket.path manually with the proper hdfs user permission 
on the local fs as:

{code:java}
# mkdir /var/run/hdfs-sockets/
# chown hdfs:hadoop  /var/run/hdfs-sockets/
# chmod 755  /var/run/hdfs-sockets/
# mkdir /var/run/hdfs-sockets/dn
# chown hdfs:hdfs  /var/run/hdfs-sockets/dn
# chmod 1666  /var/run/hdfs-sockets/dn
{code}

WORKAROUND -2-: 
1) On the Dedicated Coordinators, where the "-is_executor=false"  add the 
following property into "Impala Daemon HDFS Advanced Configuration Snippet 
(Safety Valve)":
{code:java}

dfs.client.read.shortcircuit
false
Disable shortcircuit on dedicated coordinator

{code}

2) Save the changes and restart the Impala Daemon Coordinator instances.

> ImpalaD fail to start on a non-datanode with "Invalid short-circuit reads 
> configuration"
> 
>
> Key: IMPALA-8852
> URL: https://issues.apache.org/jira/browse/IMPALA-8852
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0, Impala 3.3.0
>Reporter: Adriano
>Priority: Major
>  Labels: ramp-up
>
> On coordinator only nodes ([typically the edge 
> nodes|https://www.cloudera.com/documentation/enterprise/5-15-x/topics/impala_dedicated_coordinator.html#concept_omm_gf1_n2b]):
> {code:java}
> --is_coordinator=true
> --is_executor=false
> {code}
> the *dfs.domain.socket.path* (can be nonexistent on the local FS as the 
> Datanode role eventually is not installed on the edge node).
> The non existing path prevent the ImpalaD to start with the message:
> {code:java}
> I0809 04:15:53.899714 25364 status.cc:124] Invalid short-circuit reads 
> configuration:
>   - Impala cannot read or execute the parent directory of 
> dfs.domain.socket.path
> @   0xb35f19
> @  0x100e2fe
> @  0x103f274
> @  0x102836f
> @   0xa9f573
> @ 0x7f97807e93d4
> @   0xafb3b8
> E0809 04:15:53.899749 25364 impala-server.cc:278] Invalid short-circuit reads 
> configuration:
>   - Impala cannot read or execute the parent directory of 
> dfs.domain.socket.path
> {code}
> despite a coordinator-only ImpalaD does not do short circuit reads.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-8852) ImpalaD fail to start on a non-datanode with "Invalid short-circuit reads configuration"

2019-08-12 Thread Lars Volker (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker updated IMPALA-8852:

Affects Version/s: Impala 3.3.0

> ImpalaD fail to start on a non-datanode with "Invalid short-circuit reads 
> configuration"
> 
>
> Key: IMPALA-8852
> URL: https://issues.apache.org/jira/browse/IMPALA-8852
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0, Impala 3.3.0
>Reporter: Adriano
>Priority: Major
>  Labels: ramp-up
>
> On coordinator only nodes ([typically the edge 
> nodes|https://www.cloudera.com/documentation/enterprise/5-15-x/topics/impala_dedicated_coordinator.html#concept_omm_gf1_n2b]):
> {code:java}
> --is_coordinator=true
> --is_executor=false
> {code}
> the *dfs.domain.socket.path* (can be nonexistent on the local FS as the 
> Datanode role eventually is not installed on the edge node).
> The non existing path prevent the ImpalaD to start with the message:
> {code:java}
> I0809 04:15:53.899714 25364 status.cc:124] Invalid short-circuit reads 
> configuration:
>   - Impala cannot read or execute the parent directory of 
> dfs.domain.socket.path
> @   0xb35f19
> @  0x100e2fe
> @  0x103f274
> @  0x102836f
> @   0xa9f573
> @ 0x7f97807e93d4
> @   0xafb3b8
> E0809 04:15:53.899749 25364 impala-server.cc:278] Invalid short-circuit reads 
> configuration:
>   - Impala cannot read or execute the parent directory of 
> dfs.domain.socket.path
> {code}
> despite a coordinator-only ImpalaD does not do short circuit reads.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-8852) ImpalaD fail to start on a non-datanode with "Invalid short-circuit reads configuration"

2019-08-12 Thread Lars Volker (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker updated IMPALA-8852:

Priority: Major  (was: Minor)

> ImpalaD fail to start on a non-datanode with "Invalid short-circuit reads 
> configuration"
> 
>
> Key: IMPALA-8852
> URL: https://issues.apache.org/jira/browse/IMPALA-8852
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Adriano
>Priority: Major
>  Labels: ramp-up
>
> On coordinator only nodes ([typically the edge 
> nodes|https://www.cloudera.com/documentation/enterprise/5-15-x/topics/impala_dedicated_coordinator.html#concept_omm_gf1_n2b]):
> {code:java}
> --is_coordinator=true
> --is_executor=false
> {code}
> the *dfs.domain.socket.path* (can be nonexistent on the local FS as the 
> Datanode role eventually is not installed on the edge node).
> The non existing path prevent the ImpalaD to start with the message:
> {code:java}
> I0809 04:15:53.899714 25364 status.cc:124] Invalid short-circuit reads 
> configuration:
>   - Impala cannot read or execute the parent directory of 
> dfs.domain.socket.path
> @   0xb35f19
> @  0x100e2fe
> @  0x103f274
> @  0x102836f
> @   0xa9f573
> @ 0x7f97807e93d4
> @   0xafb3b8
> E0809 04:15:53.899749 25364 impala-server.cc:278] Invalid short-circuit reads 
> configuration:
>   - Impala cannot read or execute the parent directory of 
> dfs.domain.socket.path
> {code}
> despite a coordinator-only ImpalaD does not do short circuit reads.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8852) ImpalaD fail to start on a non-datanode with "Invalid short-circuit reads configuration"

2019-08-12 Thread Lars Volker (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905325#comment-16905325
 ] 

Lars Volker commented on IMPALA-8852:
-

Thanks for filing this issue. As a permanent solution we should only emit a 
warning when the socket cannot be found and {{-is_executor=false}}.

> ImpalaD fail to start on a non-datanode with "Invalid short-circuit reads 
> configuration"
> 
>
> Key: IMPALA-8852
> URL: https://issues.apache.org/jira/browse/IMPALA-8852
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Adriano
>Priority: Minor
>
> On coordinator only nodes ([typically the edge 
> nodes|https://www.cloudera.com/documentation/enterprise/5-15-x/topics/impala_dedicated_coordinator.html#concept_omm_gf1_n2b]):
> {code:java}
> --is_coordinator=true
> --is_executor=false
> {code}
> the *dfs.domain.socket.path* (can be nonexistent on the local FS as the 
> Datanode role eventually is not installed on the edge node).
> The non existing path prevent the ImpalaD to start with the message:
> {code:java}
> I0809 04:15:53.899714 25364 status.cc:124] Invalid short-circuit reads 
> configuration:
>   - Impala cannot read or execute the parent directory of 
> dfs.domain.socket.path
> @   0xb35f19
> @  0x100e2fe
> @  0x103f274
> @  0x102836f
> @   0xa9f573
> @ 0x7f97807e93d4
> @   0xafb3b8
> E0809 04:15:53.899749 25364 impala-server.cc:278] Invalid short-circuit reads 
> configuration:
>   - Impala cannot read or execute the parent directory of 
> dfs.domain.socket.path
> {code}
> despite a coordinator-only ImpalaD does not do short circuit reads.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-8852) ImpalaD fail to start on a non-datanode with "Invalid short-circuit reads configuration"

2019-08-12 Thread Lars Volker (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker updated IMPALA-8852:

Labels: ramp-up  (was: )

> ImpalaD fail to start on a non-datanode with "Invalid short-circuit reads 
> configuration"
> 
>
> Key: IMPALA-8852
> URL: https://issues.apache.org/jira/browse/IMPALA-8852
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Adriano
>Priority: Minor
>  Labels: ramp-up
>
> On coordinator only nodes ([typically the edge 
> nodes|https://www.cloudera.com/documentation/enterprise/5-15-x/topics/impala_dedicated_coordinator.html#concept_omm_gf1_n2b]):
> {code:java}
> --is_coordinator=true
> --is_executor=false
> {code}
> the *dfs.domain.socket.path* (can be nonexistent on the local FS as the 
> Datanode role eventually is not installed on the edge node).
> The non existing path prevent the ImpalaD to start with the message:
> {code:java}
> I0809 04:15:53.899714 25364 status.cc:124] Invalid short-circuit reads 
> configuration:
>   - Impala cannot read or execute the parent directory of 
> dfs.domain.socket.path
> @   0xb35f19
> @  0x100e2fe
> @  0x103f274
> @  0x102836f
> @   0xa9f573
> @ 0x7f97807e93d4
> @   0xafb3b8
> E0809 04:15:53.899749 25364 impala-server.cc:278] Invalid short-circuit reads 
> configuration:
>   - Impala cannot read or execute the parent directory of 
> dfs.domain.socket.path
> {code}
> despite a coordinator-only ImpalaD does not do short circuit reads.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-8854) test_acid_insert is failing with "Processor has no capabilities"

2019-08-12 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8854:
--
Labels: broken-build  (was: )

> test_acid_insert is failing with "Processor has no capabilities"
> 
>
> Key: IMPALA-8854
> URL: https://issues.apache.org/jira/browse/IMPALA-8854
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Zoltán Borók-Nagy
>Priority: Blocker
>  Labels: broken-build
>
> {noformat}
> query_test.test_insert.TestInsertQueries.test_acid_insert[compression_codec: 
> none | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, 
> 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> text/none] (from pytest)
> Failing for the past 1 build (Since Failed#82 )
> Took 40 ms.
> add description
> Error Message
> MetaException: MetaException(_message='Processor has no capabilities, cannot 
> create an ACID table.')
> Stacktrace
> query_test/test_insert.py:155: in test_acid_insert
> multiple_impalad=vector.get_value('exec_option')['sync_ddl'] == 1)
> /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/tests/common/impala_test_suite.py:556:
>  in run_test_case
> self.execute_test_case_setup(test_section['SETUP'], table_format_info)
> /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/tests/common/impala_test_suite.py:656:
>  in execute_test_case_setup
> self.__reset_table(db_name, table_name)
> /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/tests/common/impala_test_suite.py:809:
>  in __reset_table
> self.hive_client.create_table(table)
> /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/shell/gen-py/hive_metastore/ThriftHiveMetastore.py:2483:
>  in create_table
> self.recv_create_table()
> /data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/shell/gen-py/hive_metastore/ThriftHiveMetastore.py:2509:
>  in recv_create_table
> raise result.o3
> E   MetaException: MetaException(_message='Processor has no capabilities, 
> cannot create an ACID table.')
> Standard Error
> SET 
> client_identifier=query_test/test_insert.py::TestInsertQueries::()::test_acid_insert[compression_codec:none|protocol:beeswax|exec_option:{'sync_ddl':0;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':True;'abort_on_error':1;'exec_single_nod;
> -- executing against localhost:21000
> use functional;
> -- 2019-08-11 15:41:53,042 INFO MainThread: Started query 
> 904ec6d54245fbc8:98705d64
> SET 
> client_identifier=query_test/test_insert.py::TestInsertQueries::()::test_acid_insert[compression_codec:none|protocol:beeswax|exec_option:{'sync_ddl':0;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':True;'abort_on_error':1;'exec_single_nod;
> SET sync_ddl=0;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=True;
> SET abort_on_error=1;
> SET exec_single_node_rows_threshold=0;
>  {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8854) test_acid_insert is failing with "Processor has no capabilities"

2019-08-12 Thread Tim Armstrong (JIRA)

Tim Armstrong created IMPALA-8854:
-

 Summary: test_acid_insert is failing with "Processor has no 
capabilities"
 Key: IMPALA-8854
 URL: https://issues.apache.org/jira/browse/IMPALA-8854
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 3.3.0
Reporter: Tim Armstrong
Assignee: Zoltán Borók-Nagy


{noformat}
query_test.test_insert.TestInsertQueries.test_acid_insert[compression_codec: 
none | protocol: beeswax | exec_option: {'sync_ddl': 0, 'batch_size': 0, 
'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
text/none] (from pytest)

Failing for the past 1 build (Since Failed#82 )
Took 40 ms.
add description
Error Message
MetaException: MetaException(_message='Processor has no capabilities, cannot 
create an ACID table.')
Stacktrace
query_test/test_insert.py:155: in test_acid_insert
multiple_impalad=vector.get_value('exec_option')['sync_ddl'] == 1)
/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/tests/common/impala_test_suite.py:556:
 in run_test_case
self.execute_test_case_setup(test_section['SETUP'], table_format_info)
/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/tests/common/impala_test_suite.py:656:
 in execute_test_case_setup
self.__reset_table(db_name, table_name)
/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/tests/common/impala_test_suite.py:809:
 in __reset_table
self.hive_client.create_table(table)
/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/shell/gen-py/hive_metastore/ThriftHiveMetastore.py:2483:
 in create_table
self.recv_create_table()
/data/jenkins/workspace/impala-cdpd-master-exhaustive-release/repos/Impala/shell/gen-py/hive_metastore/ThriftHiveMetastore.py:2509:
 in recv_create_table
raise result.o3
E   MetaException: MetaException(_message='Processor has no capabilities, 
cannot create an ACID table.')
Standard Error
SET 
client_identifier=query_test/test_insert.py::TestInsertQueries::()::test_acid_insert[compression_codec:none|protocol:beeswax|exec_option:{'sync_ddl':0;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':True;'abort_on_error':1;'exec_single_nod;
-- executing against localhost:21000
use functional;

-- 2019-08-11 15:41:53,042 INFO MainThread: Started query 
904ec6d54245fbc8:98705d64
SET 
client_identifier=query_test/test_insert.py::TestInsertQueries::()::test_acid_insert[compression_codec:none|protocol:beeswax|exec_option:{'sync_ddl':0;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':True;'abort_on_error':1;'exec_single_nod;
SET sync_ddl=0;
SET batch_size=0;
SET num_nodes=0;
SET disable_codegen_rows_threshold=0;
SET disable_codegen=True;
SET abort_on_error=1;
SET exec_single_node_rows_threshold=0;
 {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-8850) TmpFileMgrTest.TestDirectoryLimitParsing failed in asf-master build with error "Value of: dirs3.size() Actual: 3 Expected: 4"

2019-08-12 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8850.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> TmpFileMgrTest.TestDirectoryLimitParsing failed in asf-master build with 
> error "Value of: dirs3.size()   Actual: 3 Expected: 4"
> ---
>
> Key: IMPALA-8850
> URL: https://issues.apache.org/jira/browse/IMPALA-8850
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Xiaomeng Zhang
>Assignee: Tim Armstrong
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.3.0
>
>
> {code:java}
> Error Message
> Value of: dirs3.size()   Actual: 3 Expected: 4
> Stacktrace
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/runtime/tmp-file-mgr-test.cc:834
> Value of: dirs3.size()
>   Actual: 3
> Expected: 4
> {code}
> Looks like due to commit 
> [https://github.com/apache/impala/commit/411189a8d733a66c363c72f8c404123d68640a3e]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-8848) Cardinality of UnionNode does not handle missing input cardinality correctly

2019-08-12 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8848.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Cardinality of UnionNode does not handle missing input cardinality correctly
> 
>
> Key: IMPALA-8848
> URL: https://issues.apache.org/jira/browse/IMPALA-8848
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: planner
> Fix For: Impala 3.3.0
>
> Attachments: profile_4d48b2a3bb0236f1_bfa8157d
>
>
> {noformat}
> |  35:UNION
> |  |  mem-estimate=0B mem-reservation=0B thread-reservation=0
> |  |  tuple-ids=82 row-size=28B cardinality=0
> |  |  in pipelines: 75(GETNEXT)
> |  |
> |  75:AGGREGATE [FINALIZE]
> |  |  output: sum:mergews_ext_list_price - ws_ext_wholesale_cost - 
> ws_ext_discount_amt) + ws_ext_sales_price) / 2))
> |  |  group by: c_customer_id, c_first_name, c_last_name, 
> c_preferred_cust_flag, c_birth_country, c_login, c_email_address, d_year
> |  |  mem-estimate=128.00MB mem-reservation=34.00MB spill-buffer=2.00MB 
> thread-reservation=0
> |  |  tuple-ids=81 row-size=104B cardinality=unavailable
> |  |  in pipelines: 75(GETNEXT), 36(OPEN)
> {noformat}
> I expect that the cardinality should be unavailable, not 0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-2426) COMPUTE INCREMENTAL STATS doesn't compute stats for newly discovered partitions

2019-08-12 Thread Tamas Mate (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905305#comment-16905305
 ] 

Tamas Mate commented on IMPALA-2426:


The {{loadTableMetadata}} is parameterized with {{null}} where the partitions 
that should be reloaded can be specified. The previous partition list can be 
obtained from {{tbl}}, this should do the trick.
{code:java}
loadTableMetadata(tbl, newCatalogVersion, reloadFileMetadata,
reloadTableSchema, null, "ALTER TABLE " + params.getAlter_type().name());
{code}

> COMPUTE INCREMENTAL STATS doesn't compute stats for newly discovered 
> partitions
> ---
>
> Key: IMPALA-2426
> URL: https://issues.apache.org/jira/browse/IMPALA-2426
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.2.4, Impala 2.3.0
>Reporter: Jim Apple
>Assignee: Tamas Mate
>Priority: Minor
>  Labels: catalog-server, ramp-up
>
> In the following sequence, I expect the stats for partition 333 to be 
> computed, but they are not:
> # In Impala: create table T (x int) paritioned by (y int)
> # In Impala: insert into table T partition (y=42) values (2)
> # In Hive: alter table T add partition (y=333)
> # In Impala: compute incremental stats T
> # In Impala: show table stats T



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-2426) COMPUTE INCREMENTAL STATS doesn't compute stats for newly discovered partitions

2019-08-12 Thread Tamas Mate (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905295#comment-16905295
 ] 

Tamas Mate commented on IMPALA-2426:


An alter table {{UPDATE_STATS}} is being called to persist the stats, at the 
moment it leaves the {{reloadMetadata}} true, which later causes a table reload 
from HMS. The alter table {{UPDATE_STATS}} is called after the stats collection 
queries are executed, therefore Impala does not have stats for a new partition.

These are the related code parts from 
[CatalogOpExecutor|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L685]:
{code:java}
case UPDATE_STATS:
  Preconditions.checkState(params.isSetUpdate_stats_params());
  Reference numUpdatedColumns = new Reference<>(0L);
  alterTableUpdateStats(tbl, params.getUpdate_stats_params(),
  numUpdatedPartitions, numUpdatedColumns);
  reloadTableSchema = true;
  addSummary(response, "Updated " + numUpdatedPartitions.getRef() +
  " partition(s) and " + numUpdatedColumns.getRef() + " column(s).");
  break;
{code}
{code:java}
if (reloadMetadata) {
  loadTableMetadata(tbl, newCatalogVersion, reloadFileMetadata,
  reloadTableSchema, null, "ALTER TABLE " + params.getAlter_type().name());
  addTableToCatalogUpdate(tbl, response.result);
}
{code}
We talked about this Jira during a discussion with [~balazsj_impala_220b] and 
this unexpected side effect should possibly be removed. The fact that compute 
stats refreshing the metadata could cause trouble during a Hive ingestion for 
example.

> COMPUTE INCREMENTAL STATS doesn't compute stats for newly discovered 
> partitions
> ---
>
> Key: IMPALA-2426
> URL: https://issues.apache.org/jira/browse/IMPALA-2426
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.2.4, Impala 2.3.0
>Reporter: Jim Apple
>Assignee: Tamas Mate
>Priority: Minor
>  Labels: catalog-server, ramp-up
>
> In the following sequence, I expect the stats for partition 333 to be 
> computed, but they are not:
> # In Impala: create table T (x int) paritioned by (y int)
> # In Impala: insert into table T partition (y=42) values (2)
> # In Hive: alter table T add partition (y=333)
> # In Impala: compute incremental stats T
> # In Impala: show table stats T



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-8852) ImpalaD fail to start on a non-datanode with "Invalid short-circuit reads configuration"

2019-08-12 Thread Adriano (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adriano updated IMPALA-8852:

Summary: ImpalaD fail to start on a non-datanode with "Invalid 
short-circuit reads configuration"  (was: The dfs.domain.socket.path can be 
nonexistent on coordinator only nodes)

> ImpalaD fail to start on a non-datanode with "Invalid short-circuit reads 
> configuration"
> 
>
> Key: IMPALA-8852
> URL: https://issues.apache.org/jira/browse/IMPALA-8852
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Adriano
>Priority: Minor
>
> On coordinator only nodes ([typically the edge 
> nodes|https://www.cloudera.com/documentation/enterprise/5-15-x/topics/impala_dedicated_coordinator.html#concept_omm_gf1_n2b]):
> {code:java}
> --is_coordinator=true
> --is_executor=false
> {code}
> the *dfs.domain.socket.path* (can be nonexistent on the local FS as the 
> Datanode role eventually is not installed on the edge node).
> The non existing path prevent the ImpalaD to start with the message:
> {code:java}
> I0809 04:15:53.899714 25364 status.cc:124] Invalid short-circuit reads 
> configuration:
>   - Impala cannot read or execute the parent directory of 
> dfs.domain.socket.path
> @   0xb35f19
> @  0x100e2fe
> @  0x103f274
> @  0x102836f
> @   0xa9f573
> @ 0x7f97807e93d4
> @   0xafb3b8
> E0809 04:15:53.899749 25364 impala-server.cc:278] Invalid short-circuit reads 
> configuration:
>   - Impala cannot read or execute the parent directory of 
> dfs.domain.socket.path
> {code}
> despite a coordinator-only ImpalaD does not do short circuit reads.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-8852) The dfs.domain.socket.path can be nonexistent on coordinator only nodes

2019-08-12 Thread Adriano (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903830#comment-16903830
 ] 

Adriano edited comment on IMPALA-8852 at 8/12/19 8:33 AM:
--

WORKAROUND -1-: 
Create the dfs.domain.socket.path manually with the proper hdfs user permission 
on the local fs as:

{code:java}
# mkdir /var/run/hdfs-sockets/
# chown hdfs:hadoop  /var/run/hdfs-sockets/
# chmod 755  /var/run/hdfs-sockets/
# mkdir /var/run/hdfs-sockets/dn
# chown hdfs:hdfs  /var/run/hdfs-sockets/dn
# chmod 1666  /var/run/hdfs-sockets/dn
{code}

WORKAROUND -2-: 
1) On the Dedicated Coordinators, where the "-is_executor=false"  add the 
following property into "Impala Daemon HDFS Advanced Configuration Snippet 
(Safety Valve)":
{code:java}

dfs.client.read.shortcircuit
false
Disable shortcircuit on dedicated coordinator

{code}

2) Save the changes and restart the Impala Daemon Coordinator instances.


was (Author: adrenas):
WORKAROUND -1-: 
Create the dfs.domain.socket.path manually with the proper hdfs user permission 
on the local fs as:

{code:java}
# mkdir /var/run/hdfs-sockets/
# chown hdfs:hadoop  /var/run/hdfs-sockets/
# chmod 755  /var/run/hdfs-sockets/
# mkdir /var/run/hdfs-sockets/dn
# chown hdfs:hdfs  /var/run/hdfs-sockets/dn
# chmod 1666  /var/run/hdfs-sockets/dn
{code}

WORKAROUND -2-: 
1) On the Dedicated Coordinators, add the following line into "Impala Daemon 
Command Line Argument Advanced Configuration Snippet (Safety Valve)":
-is_executor=false

2) Add the following property into "Impala Daemon HDFS Advanced Configuration 
Snippet (Safety Valve)":
{code:java}

dfs.client.read.shortcircuit
false
Disable shortcircuit on dedicated coordinator

{code}

3) Save the changes and restart the Impala Daemon Coordinator instances.

> The dfs.domain.socket.path can be nonexistent on coordinator only nodes
> ---
>
> Key: IMPALA-8852
> URL: https://issues.apache.org/jira/browse/IMPALA-8852
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Adriano
>Priority: Minor
>
> On coordinator only nodes ([typically the edge 
> nodes|https://www.cloudera.com/documentation/enterprise/5-15-x/topics/impala_dedicated_coordinator.html#concept_omm_gf1_n2b]):
> {code:java}
> --is_coordinator=true
> --is_executor=false
> {code}
> the *dfs.domain.socket.path* (can be nonexistent on the local FS as the 
> Datanode role eventually is not installed on the edge node).
> The non existing path prevent the ImpalaD to start with the message:
> {code:java}
> I0809 04:15:53.899714 25364 status.cc:124] Invalid short-circuit reads 
> configuration:
>   - Impala cannot read or execute the parent directory of 
> dfs.domain.socket.path
> @   0xb35f19
> @  0x100e2fe
> @  0x103f274
> @  0x102836f
> @   0xa9f573
> @ 0x7f97807e93d4
> @   0xafb3b8
> E0809 04:15:53.899749 25364 impala-server.cc:278] Invalid short-circuit reads 
> configuration:
>   - Impala cannot read or execute the parent directory of 
> dfs.domain.socket.path
> {code}
> despite a coordinator-only ImpalaD does not do short circuit reads.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-8852) The dfs.domain.socket.path can be nonexistent on coordinator only nodes

2019-08-12 Thread Adriano (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903830#comment-16903830
 ] 

Adriano edited comment on IMPALA-8852 at 8/12/19 8:31 AM:
--

WORKAROUND -1-: 
Create the dfs.domain.socket.path manually with the proper hdfs user permission 
on the local fs as:

{code:java}
# mkdir /var/run/hdfs-sockets/
# chown hdfs:hadoop  /var/run/hdfs-sockets/
# chmod 755  /var/run/hdfs-sockets/
# mkdir /var/run/hdfs-sockets/dn
# chown hdfs:hdfs  /var/run/hdfs-sockets/dn
# chmod 1666  /var/run/hdfs-sockets/dn
{code}

WORKAROUND -2-: 
1) On the Dedicated Coordinators, add the following line into "Impala Daemon 
Command Line Argument Advanced Configuration Snippet (Safety Valve)":
-is_executor=false

2) Add the following property into "Impala Daemon HDFS Advanced Configuration 
Snippet (Safety Valve)":
{code:java}

dfs.client.read.shortcircuit
false
Disable shortcircuit on dedicated coordinator

{code}

3) Save the changes and restart the Impala Daemon Coordinator instances.


was (Author: adrenas):
WORKAROUND: 
Create the dfs.domain.socket.path manually with the proper hdfs user permission 
on the local fs as:

{code:java}
# mkdir /var/run/hdfs-sockets/
# chown hdfs:hadoop  /var/run/hdfs-sockets/
# chmod 755  /var/run/hdfs-sockets/
# mkdir /var/run/hdfs-sockets/dn
# chown hdfs:hdfs  /var/run/hdfs-sockets/dn
# chmod 1666  /var/run/hdfs-sockets/dn
{code}


> The dfs.domain.socket.path can be nonexistent on coordinator only nodes
> ---
>
> Key: IMPALA-8852
> URL: https://issues.apache.org/jira/browse/IMPALA-8852
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Adriano
>Priority: Minor
>
> On coordinator only nodes ([typically the edge 
> nodes|https://www.cloudera.com/documentation/enterprise/5-15-x/topics/impala_dedicated_coordinator.html#concept_omm_gf1_n2b]):
> {code:java}
> --is_coordinator=true
> --is_executor=false
> {code}
> the *dfs.domain.socket.path* (can be nonexistent on the local FS as the 
> Datanode role eventually is not installed on the edge node).
> The non existing path prevent the ImpalaD to start with the message:
> {code:java}
> I0809 04:15:53.899714 25364 status.cc:124] Invalid short-circuit reads 
> configuration:
>   - Impala cannot read or execute the parent directory of 
> dfs.domain.socket.path
> @   0xb35f19
> @  0x100e2fe
> @  0x103f274
> @  0x102836f
> @   0xa9f573
> @ 0x7f97807e93d4
> @   0xafb3b8
> E0809 04:15:53.899749 25364 impala-server.cc:278] Invalid short-circuit reads 
> configuration:
>   - Impala cannot read or execute the parent directory of 
> dfs.domain.socket.path
> {code}
> despite a coordinator-only ImpalaD does not do short circuit reads.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

47 matches

Mail list logo