[jira] [Commented] (IMPALA-7534) Handle invalidation races in CatalogdMetaProvider cache
[ https://issues.apache.org/jira/browse/IMPALA-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720802#comment-16720802 ] Paul Rogers commented on IMPALA-7534: - As it turns out, it is not the Guava cache that has the race condition, it is our code. We do not use the "loading" feature of the cache. Instead, we seem to use the unprotected get/check/put pattern. The correct use case is illustrated in the [Drill project}https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/compile/CodeCompiler.java#L145] where the cache has provided rock-solid highly multi-threaded service as Drill's generated code cache. To avoid the race condition, we need to use a loader. See the [Guava docs|https://github.com/google/guava/wiki/CachesExplained#from-a-cacheloader] for details. Without a loader, there is no way for the Guava cache to ensure concurrency control since the get/check/put operations occur in our code, not Guava's. With a cache loader, Guava handles locking: the first reader loads the value, any concurrent readers wait for the first reader to complete. Error handling is built-in: if the first reader encounters an error, the slot is not populated. The second reader tries again. Likely we have our own concurrency control outside of the Guava cache. Perhaps we can eliminate that, if we have global lock, and rely on the Guava cache. Looks like adding this to our code will be a non-trivial exercise. We'll want to provide detailed concurrency tests since, if we didn't notice the problem before now, we likely don't have these tests. Will do more research and propose a concrete set of changes to make full use of the Guava loading cache, or provide an explanation why we can't. > Handle invalidation races in CatalogdMetaProvider cache > --- > > Key: IMPALA-7534 > URL: https://issues.apache.org/jira/browse/IMPALA-7534 > Project: IMPALA > Issue Type: Sub-task >Reporter: Todd Lipcon >Assignee: Paul Rogers >Priority: Major > > There is a well-known race in Guava's LoadingCache that we are using for > CatalogdMetaProvider which we are not currently handling: > - thread 1 gets a cache miss and makes a request to fetch some data from the > catalogd. It fetches the catalog object with version 1 and then gets context > switched out or otherwise slow > - thread 2 receives an invalidation for the same object, because it has > changed to v2. It calls 'invalidate' on the cache, but nothing is yet cached. > - thread 1 puts back v1 of the object into the cache > In essence we've "missed" an invalidation. This is also described in this > nice post: https://softwaremill.com/race-condition-cache-guava-caffeine/ > The race is quite unlikely but could cause some unexpected results that are > hard to reason about, so we should look into a fix. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7974) Impala Doc: Doc automatic invalidates using metastore notification events
[ https://issues.apache.org/jira/browse/IMPALA-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni updated IMPALA-7974: Summary: Impala Doc: Doc automatic invalidates using metastore notification events (was: Impala Doc: Support automatic invalidates using metastore notification events ) > Impala Doc: Doc automatic invalidates using metastore notification events > - > > Key: IMPALA-7974 > URL: https://issues.apache.org/jira/browse/IMPALA-7974 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_32 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-7977) Impala Doc: Doc the support for fine-grained updates at partition level
[ https://issues.apache.org/jira/browse/IMPALA-7977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni closed IMPALA-7977. --- Resolution: Invalid > Impala Doc: Doc the support for fine-grained updates at partition level > --- > > Key: IMPALA-7977 > URL: https://issues.apache.org/jira/browse/IMPALA-7977 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_32 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-7977) Impala Doc: Doc the support for fine-grained updates at partition level
[ https://issues.apache.org/jira/browse/IMPALA-7977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni closed IMPALA-7977. --- Resolution: Invalid > Impala Doc: Doc the support for fine-grained updates at partition level > --- > > Key: IMPALA-7977 > URL: https://issues.apache.org/jira/browse/IMPALA-7977 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_32 > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IMPALA-7954) Support automatic invalidates using metastore notification events
[ https://issues.apache.org/jira/browse/IMPALA-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni reassigned IMPALA-7954: --- Assignee: Vihang Karajgaonkar (was: Alex Rodoni) > Support automatic invalidates using metastore notification events > - > > Key: IMPALA-7954 > URL: https://issues.apache.org/jira/browse/IMPALA-7954 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 3.1.0 >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Attachments: Automatic_invalidate_DesignDoc_v1.pdf > > > Currently, in Impala there are multiple ways to invalidate or refresh the > metadata stored in Catalog for Tables. Objects in Catalog can be invalidated > either on usage based approach (invalidate_tables_timeout_s) or when there is > GC pressure (invalidate_tables_on_memory_pressure) as added in IMPALA-7448. > However, most users issue invalidate commands when they want to sync to the > latest information from HDFS or HMS. Unfortunately, when data is modified or > new data is added outside Impala (eg. Hive) or a different Impala cluster, > users don't have a clear idea on whether they have to issue invalidate or > not. To be on the safer side, users keep issuing invalidate commands more > than necessary and it causes performance as well as stability issues. > Hive Metastore provides a simple API to get incremental updates to the > metadata information stored in its database. Each API which does a > add/alter/drop operation in metastore generates event(s) which can be fetched > using {{get_next_notification}} API. Each event has a unique and increasing > event_id. The current notification event id can be fetched using > {{get_current_notificationEventId}} API. > This JIRA proposes to make use of such events from metastore to proactively > either invalidate or refresh information in the catalogD. When configured, > CatalogD could poll for such events and take action (like add/drop/refresh > partition, add/drop/invalidate tables and databases) based on the events. > This way we can automatically refresh the catalogD state using events and it > would greatly help the use-cases where users want to see the latest > information (within a configurable interval of time delay) without flooding > the system with invalidate requests. > I will be attaching a design doc to this JIRA and create subtasks for the > work. Feel free to make comments on the JIRA or make suggestions to improve > the design. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7977) Impala Doc: Doc the support for fine-grained updates at partition level
[ https://issues.apache.org/jira/browse/IMPALA-7977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720722#comment-16720722 ] Vihang Karajgaonkar commented on IMPALA-7977: - Not sure if we need 2 JIRAs for documentation. This is a sub-task for the main feature. I think one documentation JIRA should suffice. Sorry I am new in Impala world so don't know how this works. > Impala Doc: Doc the support for fine-grained updates at partition level > --- > > Key: IMPALA-7977 > URL: https://issues.apache.org/jira/browse/IMPALA-7977 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_32 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7978) Impala Doc: Clarify Impala Memory requirements
Alex Rodoni created IMPALA-7978: --- Summary: Impala Doc: Clarify Impala Memory requirements Key: IMPALA-7978 URL: https://issues.apache.org/jira/browse/IMPALA-7978 Project: IMPALA Issue Type: Bug Components: Docs Reporter: Alex Rodoni Assignee: Alex Rodoni -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-7954) Support automatic invalidates using metastore notification events
[ https://issues.apache.org/jira/browse/IMPALA-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-7954 started by Alex Rodoni. --- > Support automatic invalidates using metastore notification events > - > > Key: IMPALA-7954 > URL: https://issues.apache.org/jira/browse/IMPALA-7954 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 3.1.0 >Reporter: Vihang Karajgaonkar >Assignee: Alex Rodoni >Priority: Major > Attachments: Automatic_invalidate_DesignDoc_v1.pdf > > > Currently, in Impala there are multiple ways to invalidate or refresh the > metadata stored in Catalog for Tables. Objects in Catalog can be invalidated > either on usage based approach (invalidate_tables_timeout_s) or when there is > GC pressure (invalidate_tables_on_memory_pressure) as added in IMPALA-7448. > However, most users issue invalidate commands when they want to sync to the > latest information from HDFS or HMS. Unfortunately, when data is modified or > new data is added outside Impala (eg. Hive) or a different Impala cluster, > users don't have a clear idea on whether they have to issue invalidate or > not. To be on the safer side, users keep issuing invalidate commands more > than necessary and it causes performance as well as stability issues. > Hive Metastore provides a simple API to get incremental updates to the > metadata information stored in its database. Each API which does a > add/alter/drop operation in metastore generates event(s) which can be fetched > using {{get_next_notification}} API. Each event has a unique and increasing > event_id. The current notification event id can be fetched using > {{get_current_notificationEventId}} API. > This JIRA proposes to make use of such events from metastore to proactively > either invalidate or refresh information in the catalogD. When configured, > CatalogD could poll for such events and take action (like add/drop/refresh > partition, add/drop/invalidate tables and databases) based on the events. > This way we can automatically refresh the catalogD state using events and it > would greatly help the use-cases where users want to see the latest > information (within a configurable interval of time delay) without flooding > the system with invalidate requests. > I will be attaching a design doc to this JIRA and create subtasks for the > work. Feel free to make comments on the JIRA or make suggestions to improve > the design. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7978) Impala Doc: Clarify Impala Memory requirements
Alex Rodoni created IMPALA-7978: --- Summary: Impala Doc: Clarify Impala Memory requirements Key: IMPALA-7978 URL: https://issues.apache.org/jira/browse/IMPALA-7978 Project: IMPALA Issue Type: Bug Components: Docs Reporter: Alex Rodoni Assignee: Alex Rodoni -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-7977) Impala Doc: Doc the support for fine-grained updates at partition level
Alex Rodoni created IMPALA-7977: --- Summary: Impala Doc: Doc the support for fine-grained updates at partition level Key: IMPALA-7977 URL: https://issues.apache.org/jira/browse/IMPALA-7977 Project: IMPALA Issue Type: Sub-task Components: Docs Reporter: Alex Rodoni Assignee: Alex Rodoni -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-7977) Impala Doc: Doc the support for fine-grained updates at partition level
Alex Rodoni created IMPALA-7977: --- Summary: Impala Doc: Doc the support for fine-grained updates at partition level Key: IMPALA-7977 URL: https://issues.apache.org/jira/browse/IMPALA-7977 Project: IMPALA Issue Type: Sub-task Components: Docs Reporter: Alex Rodoni Assignee: Alex Rodoni -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7973) Add support for fine-grained updates at partition level
[ https://issues.apache.org/jira/browse/IMPALA-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720713#comment-16720713 ] Vihang Karajgaonkar commented on IMPALA-7973: - Yes, this would introduce a new configuration and it would need documentation support so that users are aware of what to expect and how to use this feature. > Add support for fine-grained updates at partition level > --- > > Key: IMPALA-7973 > URL: https://issues.apache.org/jira/browse/IMPALA-7973 > Project: IMPALA > Issue Type: Sub-task >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > > When data is inserted into a partition or a new partition is created in a > large table, we should not be invalidating the whole table. Instead it should > be possible to refresh/add/drop certain partitions on the table directly > based on the event information. This would help with the performance of > subsequent access to the table by avoiding reloading the large table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-6692) When partition exchange is followed by sort each sort node becomes a synchronization point across the cluster
[ https://issues.apache.org/jira/browse/IMPALA-6692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720709#comment-16720709 ] Michael Ho edited comment on IMPALA-6692 at 12/13/18 11:55 PM: --- {quote} Have we seen problems with any operators except for sort? The agg and join should be better behaved in that they don't do such large chunks of work in-between pulling batches from their children. There are probably some tweaks we could do to improve those further. {quote} Just attached [^profile-spilling.txt] as an example of the same behavior due to spilling. In particular, multiple agg running simultaneously on {{philip-bigg-12.vpc.cloudera.com}} were spilling heavily and that indirectly slowed down other fragments. {quote} So even though other operators like join/groupby try to be more incremental in work, I'd be surprised if they don't have 99th percentile latencies in the 100s of milliseconds just due to process-level or OS-level stuff, at which point most of your servers will be 90% idle with no apparent bottleneck. {quote} I agree. That's the nature of the distributed execution. It's as fast as the slowest node. In [^profile-spilling.txt], a single instance of slow agg operator slows down all other instances of the fragments which feed into the agg operators. That's bad as resources (e.g. memory) are held up unnecessarily by the stalled fragments which may eventually make the problem worse. As discussed in previous comments, the general fix proposed is to add more buffering in either the sender or receiver side. We already have some limited buffering (10MB) at the receiver side. It seems natural to extend the Exchange operator to support spilling. which enables more aggressive buffering at the receiver side. Doing so will also make the code simpler as we can remove the logic for deferred RPC replies from {{KrpcDataStreamRecvr}}. On the other hand, adding spilling to the Exchange operator may make the slow nodes even slower by piling more IO on it. An alternate is to add spilling to the sender side but if we may have to do so per-channel, it's unclear how it will affect the minimum memory reservation for queries in large cluster deployment. [~tarmstrong], any idea ? was (Author: kwho): {quote} Have we seen problems with any operators except for sort? The agg and join should be better behaved in that they don't do such large chunks of work in-between pulling batches from their children. There are probably some tweaks we could do to improve those further. {quote} Just attached [^profile-spilling.txt] as an example of the same behavior due to spilling. In particular, multiple agg running simultaneously on {{philip-bigg-12.vpc.cloudera.com}} were spilling heavily and that indirectly slowed down other fragments down. {quote} So even though other operators like join/groupby try to be more incremental in work, I'd be surprised if they don't have 99th percentile latencies in the 100s of milliseconds just due to process-level or OS-level stuff, at which point most of your servers will be 90% idle with no apparent bottleneck. {quote} I agree. That's the nature of the distributed execution. It's as fast as the slowest node. In [^profile-spilling.txt], a single instance of slow agg operator slows down all other instances of the fragments which feed into the agg operators. That's bad as resources (e.g. memory) are held up unnecessarily by the stalled fragments which may eventually make the problem worse. As discussed in previous comments, the general fix proposed is to add more buffering in either the sender or receiver side. We already have some limited buffering (10MB) at the receiver side. It seems natural to extend the Exchange operator to support spilling. which enables more aggressive buffering at the receiver side. Doing so will also make the code simpler as we can remove the logic for deferred RPC replies from {{KrpcDataStreamRecvr}}. On the other hand, adding spilling to the Exchange operator may make the slow nodes even slower by piling more IO on it. An alternate is to add spilling to the sender side but if we may have to do so per-channel, it's unclear how it will affect the minimum memory reservation for queries in large cluster deployment. [~tarmstrong], any idea ? > When partition exchange is followed by sort each sort node becomes a > synchronization point across the cluster > - > > Key: IMPALA-6692 > URL: https://issues.apache.org/jira/browse/IMPALA-6692 > Project: IMPALA > Issue Type: Sub-task > Components: Backend, Distributed Exec >Affects Versions: Impala 2.10.0 >Reporter: Mostafa Mokhtar >Priority: Critical > Labels: perf,
[jira] [Commented] (IMPALA-6692) When partition exchange is followed by sort each sort node becomes a synchronization point across the cluster
[ https://issues.apache.org/jira/browse/IMPALA-6692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720709#comment-16720709 ] Michael Ho commented on IMPALA-6692: {quote} Have we seen problems with any operators except for sort? The agg and join should be better behaved in that they don't do such large chunks of work in-between pulling batches from their children. There are probably some tweaks we could do to improve those further. {quote} Just attached [^profile-spilling.txt] as an example of the same behavior due to spilling. In particular, multiple agg running simultaneously on {{philip-bigg-12.vpc.cloudera.com}} were spilling heavily and that indirectly slowed down other fragments down. {quote} So even though other operators like join/groupby try to be more incremental in work, I'd be surprised if they don't have 99th percentile latencies in the 100s of milliseconds just due to process-level or OS-level stuff, at which point most of your servers will be 90% idle with no apparent bottleneck. {quote} I agree. That's the nature of the distributed execution. It's as fast as the slowest node. In [^profile-spilling.txt], a single instance of slow agg operator slows down all other instances of the fragments which feed into the agg operators. That's bad as resources (e.g. memory) are held up unnecessarily by the stalled fragments which may eventually make the problem worse. As discussed in previous comments, the general fix proposed is to add more buffering in either the sender or receiver side. We already have some limited buffering (10MB) at the receiver side. It seems natural to extend the Exchange operator to support spilling. which enables more aggressive buffering at the receiver side. Doing so will also make the code simpler as we can remove the logic for deferred RPC replies from {{KrpcDataStreamRecvr}}. On the other hand, adding spilling to the Exchange operator may make the slow nodes even slower by piling more IO on it. An alternate is to add spilling to the sender side but if we may have to do so per-channel, it's unclear how it will affect the minimum memory reservation for queries in large cluster deployment. [~tarmstrong], any idea ? > When partition exchange is followed by sort each sort node becomes a > synchronization point across the cluster > - > > Key: IMPALA-6692 > URL: https://issues.apache.org/jira/browse/IMPALA-6692 > Project: IMPALA > Issue Type: Sub-task > Components: Backend, Distributed Exec >Affects Versions: Impala 2.10.0 >Reporter: Mostafa Mokhtar >Priority: Critical > Labels: perf, resource-management > Attachments: Kudu table insert without KRPC no sort.txt, Kudu table > insert without KRPC.txt, kudu_partial_sort_insert_vd1129.foo.com_2.txt, > profile-spilling.txt > > > Issue described in this JIRA applies to > * Analytical functions > * Writes to Partitioned Parquet tables > * Writes to Kudu tables > When inserting into a Kudu table from Impala the plan is something like HDFS > SCAN -> Partition Exchange -> Partial Sort -> Kudu Insert. > The query initially makes good progress then significantly slows down and > very few nodes make progress. > While the insert is running the query goes through different phases > * Phase 1 > ** Scan is reading data fast, sending data through to exchange > ** Partial Sort keeps accumulating batches > ** Network and CPU is busy, life appears to be OK > * Phase 2 > ** One of the Sort operators reaches its memory limit and stops calling > ExchangeNode::GetNext for a while > ** This creates back pressure against the DataStreamSenders > ** The Partial Sort doesn't call GetNext until it has finished sorting GBs of > data (Partial sort memory is unbounded as of 03/16/2018) > ** All exchange operators in the cluster eventually get blocked on that Sort > operator and can no longer make progress > ** After a while the Sort is able to accept more batches which temporarily > unblocks execution across the cluster > ** Another sort operator reaches its memory limit and this loop repeats itself > Below are stacks from one of the blocked hosts > _Sort node waiting on data from exchange node as it didn't start sorting > since the memory limit for the sort wasn't reached_ > {code} > Thread 90 (Thread 0x7f8d7d233700 (LWP 21625)): > #0 0x003a6f00b68c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x7fab1422174c in > std::condition_variable::wait(std::unique_lock&) () from > /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.205/lib/impala/lib/libstdc++.so.6 > #2 0x00b4d5aa in void > std::_V2::condition_variable_any::wait > >(boost::unique_lock&) () > #3 0x00b4ab6a in >
[jira] [Updated] (IMPALA-7931) test_shutdown_executor fails with timeout waiting for query target state
[ https://issues.apache.org/jira/browse/IMPALA-7931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-7931: -- Description: On a recent S3 test run test_shutdown_executor hit a timeout waiting for a query to reach state FINISHED. Instead the query stays at state 5 (EXCEPTION). {noformat} 12:51:11 __ TestShutdownCommand.test_shutdown_executor __ 12:51:11 custom_cluster/test_restart_services.py:209: in test_shutdown_executor 12:51:11 assert self.__fetch_and_get_num_backends(QUERY, before_shutdown_handle) == 3 12:51:11 custom_cluster/test_restart_services.py:356: in __fetch_and_get_num_backends 12:51:11 self.client.QUERY_STATES['FINISHED'], timeout=20) 12:51:11 common/impala_service.py:267: in wait_for_query_state 12:51:11 target_state, query_state) 12:51:11 E AssertionError: Did not reach query state in time target=4 actual=5 {noformat} >From the logs I can see that the query fails because one of the executors >becomes unreachable: {noformat} I1204 12:31:39.954125 5609 impala-server.cc:1792] Query a34c3a84775e5599:b2b25eb9: Failed due to unreachable impalad(s): jenkins-worker:22001 {noformat} The query was {{select count\(*) from functional_parquet.alltypes where sleep(1) = bool_col}}. It seems that the query took longer than expected and was still running when the executor shut down. I can reproduce by adding a sleep to the test: {noformat} diff --git a/tests/custom_cluster/test_restart_services.py b/tests/custom_cluster/test_restart_services.py index e441cbc..32bc8a1 100644 --- a/tests/custom_cluster/test_restart_services.py +++ b/tests/custom_cluster/test_restart_services.py @@ -206,7 +206,7 @@ class TestShutdownCommand(CustomClusterTestSuite, HS2TestSuite): after_shutdown_handle = self.__exec_and_wait_until_running(QUERY) # Finish executing the first query before the backend exits. -assert self.__fetch_and_get_num_backends(QUERY, before_shutdown_handle) == 3 +assert self.__fetch_and_get_num_backends(QUERY, before_shutdown_handle, delay=5) == 3 # Wait for the impalad to exit, then start it back up and run another query, which # should be scheduled on it again. @@ -349,11 +349,14 @@ class TestShutdownCommand(CustomClusterTestSuite, HS2TestSuite): self.client.QUERY_STATES['RUNNING'], timeout=20) return handle - def __fetch_and_get_num_backends(self, query, handle): + def __fetch_and_get_num_backends(self, query, handle, delay=0): """Fetch the results of 'query' from the beeswax handle 'handle', close the query and return the number of backends obtained from the profile.""" self.impalad_test_service.wait_for_query_state(self.client, handle, self.client.QUERY_STATES['FINISHED'], timeout=20) +if delay > 0: + LOG.info("sleeping for {0}".format(delay)) + time.sleep(delay) self.client.fetch(query, handle) profile = self.client.get_runtime_profile(handle) self.client.close_query(handle) {noformat} was: On a recent S3 test run test_shutdown_executor hit a timeout waiting for a query to reach state FINISHED. Instead the query stays at state 5 (EXCEPTION). {noformat} 12:51:11 __ TestShutdownCommand.test_shutdown_executor __ 12:51:11 custom_cluster/test_restart_services.py:209: in test_shutdown_executor 12:51:11 assert self.__fetch_and_get_num_backends(QUERY, before_shutdown_handle) == 3 12:51:11 custom_cluster/test_restart_services.py:356: in __fetch_and_get_num_backends 12:51:11 self.client.QUERY_STATES['FINISHED'], timeout=20) 12:51:11 common/impala_service.py:267: in wait_for_query_state 12:51:11 target_state, query_state) 12:51:11 E AssertionError: Did not reach query state in time target=4 actual=5 {noformat} >From the logs I can see that the query fails because one of the executors >becomes unreachable: {noformat} I1204 12:31:39.954125 5609 impala-server.cc:1792] Query a34c3a84775e5599:b2b25eb9: Failed due to unreachable impalad(s): jenkins-worker:22001 {noformat} The query was {{select count\(*) from functional_parquet.alltypes where sleep(1) = bool_col}}. It seems that the query took longer than expected and was still running when the executor shut down. > test_shutdown_executor fails with timeout waiting for query target state > > > Key: IMPALA-7931 > URL: https://issues.apache.org/jira/browse/IMPALA-7931 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.2.0 >Reporter: Lars Volker >Assignee: Tim Armstrong >Priority: Critical > Labels: broken-build > Attachments: impala-7931-impalad-logs.tar.gz > > > On a recent S3
[jira] [Commented] (IMPALA-7931) test_shutdown_executor fails with timeout waiting for query target state
[ https://issues.apache.org/jira/browse/IMPALA-7931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720701#comment-16720701 ] Tim Armstrong commented on IMPALA-7931: --- I see what you mean with IMPALA_SERVER_NUM_FRAGMENTS_IN_FLIGHT being decremented before the final status report is sent. Would it make sense to decrement that later, or is there some reason to decrement it before sending the final status report? Which new metrics are you referring to? I didn't see much in QueryExecMgr > test_shutdown_executor fails with timeout waiting for query target state > > > Key: IMPALA-7931 > URL: https://issues.apache.org/jira/browse/IMPALA-7931 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.2.0 >Reporter: Lars Volker >Assignee: Tim Armstrong >Priority: Critical > Labels: broken-build > Attachments: impala-7931-impalad-logs.tar.gz > > > On a recent S3 test run test_shutdown_executor hit a timeout waiting for a > query to reach state FINISHED. Instead the query stays at state 5 (EXCEPTION). > {noformat} > 12:51:11 __ TestShutdownCommand.test_shutdown_executor > __ > 12:51:11 custom_cluster/test_restart_services.py:209: in > test_shutdown_executor > 12:51:11 assert self.__fetch_and_get_num_backends(QUERY, > before_shutdown_handle) == 3 > 12:51:11 custom_cluster/test_restart_services.py:356: in > __fetch_and_get_num_backends > 12:51:11 self.client.QUERY_STATES['FINISHED'], timeout=20) > 12:51:11 common/impala_service.py:267: in wait_for_query_state > 12:51:11 target_state, query_state) > 12:51:11 E AssertionError: Did not reach query state in time target=4 > actual=5 > {noformat} > From the logs I can see that the query fails because one of the executors > becomes unreachable: > {noformat} > I1204 12:31:39.954125 5609 impala-server.cc:1792] Query > a34c3a84775e5599:b2b25eb9: Failed due to unreachable impalad(s): > jenkins-worker:22001 > {noformat} > The query was {{select count\(*) from functional_parquet.alltypes where > sleep(1) = bool_col}}. > It seems that the query took longer than expected and was still running when > the executor shut down. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-6692) When partition exchange is followed by sort each sort node becomes a synchronization point across the cluster
[ https://issues.apache.org/jira/browse/IMPALA-6692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Ho updated IMPALA-6692: --- Attachment: profile-spilling.txt > When partition exchange is followed by sort each sort node becomes a > synchronization point across the cluster > - > > Key: IMPALA-6692 > URL: https://issues.apache.org/jira/browse/IMPALA-6692 > Project: IMPALA > Issue Type: Sub-task > Components: Backend, Distributed Exec >Affects Versions: Impala 2.10.0 >Reporter: Mostafa Mokhtar >Priority: Critical > Labels: perf, resource-management > Attachments: Kudu table insert without KRPC no sort.txt, Kudu table > insert without KRPC.txt, kudu_partial_sort_insert_vd1129.foo.com_2.txt, > profile-spilling.txt > > > Issue described in this JIRA applies to > * Analytical functions > * Writes to Partitioned Parquet tables > * Writes to Kudu tables > When inserting into a Kudu table from Impala the plan is something like HDFS > SCAN -> Partition Exchange -> Partial Sort -> Kudu Insert. > The query initially makes good progress then significantly slows down and > very few nodes make progress. > While the insert is running the query goes through different phases > * Phase 1 > ** Scan is reading data fast, sending data through to exchange > ** Partial Sort keeps accumulating batches > ** Network and CPU is busy, life appears to be OK > * Phase 2 > ** One of the Sort operators reaches its memory limit and stops calling > ExchangeNode::GetNext for a while > ** This creates back pressure against the DataStreamSenders > ** The Partial Sort doesn't call GetNext until it has finished sorting GBs of > data (Partial sort memory is unbounded as of 03/16/2018) > ** All exchange operators in the cluster eventually get blocked on that Sort > operator and can no longer make progress > ** After a while the Sort is able to accept more batches which temporarily > unblocks execution across the cluster > ** Another sort operator reaches its memory limit and this loop repeats itself > Below are stacks from one of the blocked hosts > _Sort node waiting on data from exchange node as it didn't start sorting > since the memory limit for the sort wasn't reached_ > {code} > Thread 90 (Thread 0x7f8d7d233700 (LWP 21625)): > #0 0x003a6f00b68c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x7fab1422174c in > std::condition_variable::wait(std::unique_lock&) () from > /opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.205/lib/impala/lib/libstdc++.so.6 > #2 0x00b4d5aa in void > std::_V2::condition_variable_any::wait > >(boost::unique_lock&) () > #3 0x00b4ab6a in > impala::KrpcDataStreamRecvr::SenderQueue::GetBatch(impala::RowBatch**) () > #4 0x00b4b0c8 in > impala::KrpcDataStreamRecvr::GetBatch(impala::RowBatch**) () > #5 0x00dca7c5 in > impala::ExchangeNode::FillInputRowBatch(impala::RuntimeState*) () > #6 0x00dcacae in > impala::ExchangeNode::GetNext(impala::RuntimeState*, impala::RowBatch*, > bool*) () > #7 0x01032ac3 in > impala::PartialSortNode::GetNext(impala::RuntimeState*, impala::RowBatch*, > bool*) () > #8 0x00ba9c92 in impala::FragmentInstanceState::ExecInternal() () > #9 0x00bac7df in impala::FragmentInstanceState::Exec() () > #10 0x00b9ab1a in > impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) () > #11 0x00d5da9f in > impala::Thread::SuperviseThread(std::basic_string std::char_traits, std::allocator > const&, > std::basic_string, std::allocator > > const&, boost::function, impala::ThreadDebugInfo const*, > impala::Promise*) () > #12 0x00d5e29a in boost::detail::thread_data void (*)(std::basic_string, std::allocator > > const&, std::basic_string, > std::allocator > const&, boost::function, > impala::ThreadDebugInfo const*, impala::Promise*), > boost::_bi::list5 std::char_traits, std::allocator > >, > boost::_bi::value, > std::allocator > >, boost::_bi::value >, > boost::_bi::value, > boost::_bi::value*> > > >::run() () > #13 0x012d70ba in thread_proxy () > #14 0x003a6f007aa1 in start_thread () from /lib64/libpthread.so.0 > #15 0x003a6ece893d in clone () from /lib64/libc.so.6 > {code} > _DataStreamSender blocked due to back pressure from the DataStreamRecvr on > the node which has a Sort that is spilling_ > {code} > Thread 89 (Thread 0x7fa8f6a15700 (LWP 21626)): > #0 0x003a6f00ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x01237e77 in > impala::KrpcDataStreamSender::Channel::WaitForRpc(std::unique_lock*) > () > #2 0x01238b8d in >
[jira] [Commented] (IMPALA-7351) Add memory estimates for plan nodes and sinks with missing estimates
[ https://issues.apache.org/jira/browse/IMPALA-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720673#comment-16720673 ] Bikramjeet Vig commented on IMPALA-7351: All of the missing estimates seem to be covered now except for the PlanRootSink. I am deferring that to IMPALA-4268 since that can change the mem requirement. > Add memory estimates for plan nodes and sinks with missing estimates > > > Key: IMPALA-7351 > URL: https://issues.apache.org/jira/browse/IMPALA-7351 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Reporter: Tim Armstrong >Assignee: Bikramjeet Vig >Priority: Major > Labels: admission-control, resource-management > > Many plan nodes and sinks, e.g. KuduScanNode, KuduTableSink, ExchangeNode, > etc are missing memory estimates entirely. > We should add a basic estimate for all these cases based on experiments and > data from real workloads. In some cases 0 may be the right estimate (e.g. for > streaming nodes like SelectNode that just pass through data) but we should > remove TODOs and document the reasoning in those cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-7938) jenkins.impala.io returns 504, not accepting new pre-submit jobs
[ https://issues.apache.org/jira/browse/IMPALA-7938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Apple resolved IMPALA-7938. --- Resolution: Cannot Reproduce After some reboots, this seems to be resolved, it seems. > jenkins.impala.io returns 504, not accepting new pre-submit jobs > > > Key: IMPALA-7938 > URL: https://issues.apache.org/jira/browse/IMPALA-7938 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Reporter: Lars Volker >Priority: Blocker > Labels: jenkins > > Jenkins doesn't accept new jobs, > https://jenkins.impala.io/job/gerrit-verify-dryrun/build?delay=0sec returns a > 504 Bad Gateway error. > The logs have this: > {noformat} > Dec 06, 2018 10:32:56 PM > org.jenkinsci.plugins.workflow.support.concurrent.Timeout lambda$ping$0 > INFO: Running > CpsFlowExecution[Owner[parallel-all-tests/4671:parallel-all-tests #4671]] > unresponsive for 4 hr 44 min > Dec 06, 2018 10:33:01 PM > org.jenkinsci.plugins.workflow.support.concurrent.Timeout lambda$ping$0 > INFO: Running > CpsFlowExecution[Owner[parallel-all-tests/4671:parallel-all-tests #4671]] > unresponsive for 4 hr 44 min > Dec 06, 2018 10:33:03 PM > com.sonyericsson.hudson.plugins.gerrit.trigger.GerritProjectListUpdater > tryLoadProjectList > INFO: Not connected to , waiting for 64 second(s) > Dec 06, 2018 10:33:06 PM > org.jenkinsci.plugins.workflow.support.concurrent.Timeout lambda$ping$0 > INFO: Running > CpsFlowExecution[Owner[parallel-all-tests/4671:parallel-all-tests #4671]] > unresponsive for 4 hr 44 min > Dec 06, 2018 10:33:11 PM > org.jenkinsci.plugins.workflow.support.concurrent.Timeout lambda$ping$0 > INFO: Running > CpsFlowExecution[Owner[parallel-all-tests/4671:parallel-all-tests #4671]] > unresponsive for 4 hr 44 min > {noformat} > This could be > [JENKINS-52362|https://issues.jenkins-ci.org/browse/JENKINS-52362]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-7938) jenkins.impala.io returns 504, not accepting new pre-submit jobs
[ https://issues.apache.org/jira/browse/IMPALA-7938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Apple resolved IMPALA-7938. --- Resolution: Cannot Reproduce After some reboots, this seems to be resolved, it seems. > jenkins.impala.io returns 504, not accepting new pre-submit jobs > > > Key: IMPALA-7938 > URL: https://issues.apache.org/jira/browse/IMPALA-7938 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Reporter: Lars Volker >Priority: Blocker > Labels: jenkins > > Jenkins doesn't accept new jobs, > https://jenkins.impala.io/job/gerrit-verify-dryrun/build?delay=0sec returns a > 504 Bad Gateway error. > The logs have this: > {noformat} > Dec 06, 2018 10:32:56 PM > org.jenkinsci.plugins.workflow.support.concurrent.Timeout lambda$ping$0 > INFO: Running > CpsFlowExecution[Owner[parallel-all-tests/4671:parallel-all-tests #4671]] > unresponsive for 4 hr 44 min > Dec 06, 2018 10:33:01 PM > org.jenkinsci.plugins.workflow.support.concurrent.Timeout lambda$ping$0 > INFO: Running > CpsFlowExecution[Owner[parallel-all-tests/4671:parallel-all-tests #4671]] > unresponsive for 4 hr 44 min > Dec 06, 2018 10:33:03 PM > com.sonyericsson.hudson.plugins.gerrit.trigger.GerritProjectListUpdater > tryLoadProjectList > INFO: Not connected to , waiting for 64 second(s) > Dec 06, 2018 10:33:06 PM > org.jenkinsci.plugins.workflow.support.concurrent.Timeout lambda$ping$0 > INFO: Running > CpsFlowExecution[Owner[parallel-all-tests/4671:parallel-all-tests #4671]] > unresponsive for 4 hr 44 min > Dec 06, 2018 10:33:11 PM > org.jenkinsci.plugins.workflow.support.concurrent.Timeout lambda$ping$0 > INFO: Running > CpsFlowExecution[Owner[parallel-all-tests/4671:parallel-all-tests #4671]] > unresponsive for 4 hr 44 min > {noformat} > This could be > [JENKINS-52362|https://issues.jenkins-ci.org/browse/JENKINS-52362]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IMPALA-5397) Set "End Time" earlier rather than on unregistration.
[ https://issues.apache.org/jira/browse/IMPALA-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-5397: - Assignee: Pooja Nilangekar > Set "End Time" earlier rather than on unregistration. > - > > Key: IMPALA-5397 > URL: https://issues.apache.org/jira/browse/IMPALA-5397 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.9.0 >Reporter: Mostafa Mokhtar >Assignee: Pooja Nilangekar >Priority: Major > Labels: admission-control, query-lifecycle > > When queries are executed from Hue and hit the idle query timeout then the > query duration keeps going up even though the query was cancelled and it is > not actually doing any more work. The end time is only set when the query is > actually unregistered. > Queries below finished in 1s640ms while the reported time is much longer. > |User||Default Db||Statement||Query Type||Start Time||Waiting > Time||Duration||Scan Progress||State||Last Event||# rows fetched||Resource > Pool||Details||Action| > |hue/va1026.halxg.cloudera@halxg.cloudera.com|tpcds_1000_parquet|select > count(*) from tpcds_1000_parquet.inventory|QUERY|2017-05-31 > 09:38:20.472804000|4m27s|4m32s|261 / 261 ( 100%)|FINISHED|First row > fetched|1|root.default|Details|Close| > |hue/va1026.halxg.cloudera@halxg.cloudera.com|tpcds_1000_parquet|select > count(*) from tpcds_1000_parquet.inventory|QUERY|2017-05-31 > 08:38:52.780237000|2017-05-31 09:38:20.289582000|59m27s|261 / 261 ( > 100%)|FINISHED|1|root.default|Details| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-6591) TestClientSsl hung for a long time
[ https://issues.apache.org/jira/browse/IMPALA-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fredy Wijaya reassigned IMPALA-6591: Assignee: Fredy Wijaya (was: Csaba Ringhofer) > TestClientSsl hung for a long time > -- > > Key: IMPALA-6591 > URL: https://issues.apache.org/jira/browse/IMPALA-6591 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 2.12.0, Impala 3.1.0, Impala 3.2.0 >Reporter: Tim Armstrong >Assignee: Fredy Wijaya >Priority: Blocker > Labels: broken-build, flaky, hang > Fix For: Impala 3.2.0 > > > {noformat} > 18:49:13 > custom_cluster/test_catalog_wait.py::TestCatalogWait::test_delayed_catalog > PASSED > 18:49:53 > custom_cluster/test_client_ssl.py::TestClientSsl::test_ssl[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: text/none] Build timed > out (after 1,440 minutes). Marking the build as failed. > 12:20:15 Build was aborted > 12:20:15 Archiving artifacts > {noformat} > I unfortunately wasn't able to get any logs... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-6591) TestClientSsl hung for a long time
[ https://issues.apache.org/jira/browse/IMPALA-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fredy Wijaya resolved IMPALA-6591. -- Resolution: Fixed Fix Version/s: Impala 3.2.0 > TestClientSsl hung for a long time > -- > > Key: IMPALA-6591 > URL: https://issues.apache.org/jira/browse/IMPALA-6591 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 2.12.0, Impala 3.1.0, Impala 3.2.0 >Reporter: Tim Armstrong >Assignee: Fredy Wijaya >Priority: Blocker > Labels: broken-build, flaky, hang > Fix For: Impala 3.2.0 > > > {noformat} > 18:49:13 > custom_cluster/test_catalog_wait.py::TestCatalogWait::test_delayed_catalog > PASSED > 18:49:53 > custom_cluster/test_client_ssl.py::TestClientSsl::test_ssl[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: text/none] Build timed > out (after 1,440 minutes). Marking the build as failed. > 12:20:15 Build was aborted > 12:20:15 Archiving artifacts > {noformat} > I unfortunately wasn't able to get any logs... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-6591) TestClientSsl hung for a long time
[ https://issues.apache.org/jira/browse/IMPALA-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fredy Wijaya resolved IMPALA-6591. -- Resolution: Fixed Fix Version/s: Impala 3.2.0 > TestClientSsl hung for a long time > -- > > Key: IMPALA-6591 > URL: https://issues.apache.org/jira/browse/IMPALA-6591 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 2.12.0, Impala 3.1.0, Impala 3.2.0 >Reporter: Tim Armstrong >Assignee: Fredy Wijaya >Priority: Blocker > Labels: broken-build, flaky, hang > Fix For: Impala 3.2.0 > > > {noformat} > 18:49:13 > custom_cluster/test_catalog_wait.py::TestCatalogWait::test_delayed_catalog > PASSED > 18:49:53 > custom_cluster/test_client_ssl.py::TestClientSsl::test_ssl[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: text/none] Build timed > out (after 1,440 minutes). Marking the build as failed. > 12:20:15 Build was aborted > 12:20:15 Archiving artifacts > {noformat} > I unfortunately wasn't able to get any logs... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-4268) Rework coordinator buffering to buffer more data
[ https://issues.apache.org/jira/browse/IMPALA-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-4268: -- Target Version: Impala 3.2.0 (was: Product Backlog) > Rework coordinator buffering to buffer more data > > > Key: IMPALA-4268 > URL: https://issues.apache.org/jira/browse/IMPALA-4268 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.8.0 >Reporter: Henry Robinson >Assignee: Pooja Nilangekar >Priority: Major > Labels: query-lifecycle, resource-management > Attachments: rows-produced-histogram.png > > > {{PlanRootSink}} executes the producer thread (the coordinator fragment > execution thread) in a separate thread to the consumer (i.e. the thread > handling the fetch RPC), which calls {{GetNext()}} to retrieve the rows. The > implementation was simplified by handing off a single batch at a time from > the producers to consumer. > This decision causes some problems: > * Many context switches for the sender. Adding buffering would allow the > sender to append to the buffer and continue progress without a context switch. > * Query execution can't release resources until the client has fetched the > final batch, because the coordinator fragment thread is still running and > potentially producing backpressure all the way down the plan tree. > * The consumer can't fulfil fetch requests greater than Impala's internal > BATCH_SIZE, because it is only given one batch at a time. > The tricky part is managing the mismatch between the size of the row batches > processed in {{Send()}} and the size of the fetch result asked for by the > client without impacting performance too badly. The sender materializes > output rows in a {{QueryResultSet}} that is owned by the coordinator. That is > not, currently, a splittable object - instead it contains the actual RPC > response struct that will hit the wire when the RPC completes. As > asynchronous sender does not know the batch size, because it can in theory > change on every fetch call (although most reasonable clients will not > randomly change the fetch size). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-7864) TestLocalCatalogRetries::test_replan_limit is flaky
[ https://issues.apache.org/jira/browse/IMPALA-7864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bharath v reassigned IMPALA-7864: - Assignee: bharath v (was: Todd Lipcon) > TestLocalCatalogRetries::test_replan_limit is flaky > --- > > Key: IMPALA-7864 > URL: https://issues.apache.org/jira/browse/IMPALA-7864 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 3.0, Impala 2.12.0 > Environment: Ubuntu 16.04 >Reporter: Jim Apple >Assignee: bharath v >Priority: Critical > Labels: broken-build, flaky > > In https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/3605/, > TestLocalCatalogRetries::test_replan_limit failed on an unrelated patch. On > my development machine, the test passed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7973) Add support for fine-grained updates at partition level
[ https://issues.apache.org/jira/browse/IMPALA-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720538#comment-16720538 ] Alex Rodoni commented on IMPALA-7973: - [~vihangk1] Is this a user-facing feature that has a doc impact? > Add support for fine-grained updates at partition level > --- > > Key: IMPALA-7973 > URL: https://issues.apache.org/jira/browse/IMPALA-7973 > Project: IMPALA > Issue Type: Sub-task >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > > When data is inserted into a partition or a new partition is created in a > large table, we should not be invalidating the whole table. Instead it should > be possible to refresh/add/drop certain partitions on the table directly > based on the event information. This would help with the performance of > subsequent access to the table by avoiding reloading the large table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7973) Add support for fine-grained updates at partition level
Vihang Karajgaonkar created IMPALA-7973: --- Summary: Add support for fine-grained updates at partition level Key: IMPALA-7973 URL: https://issues.apache.org/jira/browse/IMPALA-7973 Project: IMPALA Issue Type: Sub-task Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar When data is inserted into a partition or a new partition is created in a large table, we should not be invalidating the whole table. Instead it should be possible to refresh/add/drop certain partitions on the table directly based on the event information. This would help with the performance of subsequent access to the table by avoiding reloading the large table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-7976) Add a flag to disably sync using events at a table level
Vihang Karajgaonkar created IMPALA-7976: --- Summary: Add a flag to disably sync using events at a table level Key: IMPALA-7976 URL: https://issues.apache.org/jira/browse/IMPALA-7976 Project: IMPALA Issue Type: Sub-task Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar It is possible that certain tables need not be updated by the event mechanism. Possible use-cases could be frequently updated tables (from multiple systems) or for tables where we don't really need latest data. We should be able to add a flag (parameter) to the table which can be used to skip event processing for a given table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7976) Add a flag to disably sync using events at a table level
Vihang Karajgaonkar created IMPALA-7976: --- Summary: Add a flag to disably sync using events at a table level Key: IMPALA-7976 URL: https://issues.apache.org/jira/browse/IMPALA-7976 Project: IMPALA Issue Type: Sub-task Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar It is possible that certain tables need not be updated by the event mechanism. Possible use-cases could be frequently updated tables (from multiple systems) or for tables where we don't really need latest data. We should be able to add a flag (parameter) to the table which can be used to skip event processing for a given table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IMPALA-7975) Improve supportability of the automatic invalidate feature
[ https://issues.apache.org/jira/browse/IMPALA-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar reassigned IMPALA-7975: --- Assignee: (was: Vihang Karajgaonkar) > Improve supportability of the automatic invalidate feature > -- > > Key: IMPALA-7975 > URL: https://issues.apache.org/jira/browse/IMPALA-7975 > Project: IMPALA > Issue Type: Sub-task >Reporter: Vihang Karajgaonkar >Priority: Major > > Some of the things which can be done to improve supportability of this > feature: > * Add metrics to detect issues pertaining to this feature > # Time taken to fetch the notifications (Would be useful to have average, > min, max) > # Time taken to process a batch of events received > # Number of times particular table was invalidated (would be useful to have > some rate metric like number_of_invalidates/per_hour) > * Ability to turn ON/OFF for this feature (possibly without the need of a > restart) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7974) Impala Doc: Support automatic invalidates using metastore notification events
Alex Rodoni created IMPALA-7974: --- Summary: Impala Doc: Support automatic invalidates using metastore notification events Key: IMPALA-7974 URL: https://issues.apache.org/jira/browse/IMPALA-7974 Project: IMPALA Issue Type: Sub-task Components: Docs Reporter: Alex Rodoni Assignee: Alex Rodoni -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-7974) Impala Doc: Support automatic invalidates using metastore notification events
Alex Rodoni created IMPALA-7974: --- Summary: Impala Doc: Support automatic invalidates using metastore notification events Key: IMPALA-7974 URL: https://issues.apache.org/jira/browse/IMPALA-7974 Project: IMPALA Issue Type: Sub-task Components: Docs Reporter: Alex Rodoni Assignee: Alex Rodoni -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7975) Improve supportability of the automatic invalidate feature
Vihang Karajgaonkar created IMPALA-7975: --- Summary: Improve supportability of the automatic invalidate feature Key: IMPALA-7975 URL: https://issues.apache.org/jira/browse/IMPALA-7975 Project: IMPALA Issue Type: Sub-task Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar Some of the things which can be done to improve supportability of this feature: * Add metrics to detect issues pertaining to this feature # Time taken to fetch the notifications (Would be useful to have average, min, max) # Time taken to process a batch of events received # Number of times particular table was invalidated (would be useful to have some rate metric like number_of_invalidates/per_hour) * Ability to turn ON/OFF for this feature (possibly without the need of a restart) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-7972) Detect self-events to avoid unnecessary invalidates
Vihang Karajgaonkar created IMPALA-7972: --- Summary: Detect self-events to avoid unnecessary invalidates Key: IMPALA-7972 URL: https://issues.apache.org/jira/browse/IMPALA-7972 Project: IMPALA Issue Type: Sub-task Reporter: Vihang Karajgaonkar When a metastore objects are created/altered or dropped, it would generate a event which will be polled by the same Catalog server. Such self-events should be detected and we should avoid invalidating the tables when they are received. See the design doc attached to the main JIRA for more details. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7972) Detect self-events to avoid unnecessary invalidates
Vihang Karajgaonkar created IMPALA-7972: --- Summary: Detect self-events to avoid unnecessary invalidates Key: IMPALA-7972 URL: https://issues.apache.org/jira/browse/IMPALA-7972 Project: IMPALA Issue Type: Sub-task Reporter: Vihang Karajgaonkar When a metastore objects are created/altered or dropped, it would generate a event which will be polled by the same Catalog server. Such self-events should be detected and we should avoid invalidating the tables when they are received. See the design doc attached to the main JIRA for more details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-7971) Add support for insert operations from Impala
Vihang Karajgaonkar created IMPALA-7971: --- Summary: Add support for insert operations from Impala Key: IMPALA-7971 URL: https://issues.apache.org/jira/browse/IMPALA-7971 Project: IMPALA Issue Type: Sub-task Reporter: Vihang Karajgaonkar When data is inserted into existing tables and partitions, Catalog does not issue any metastore API calls. Metastore provides a API called {{fire_listener_event}} which can be used to add a {{INSERT_EVENT}} to the metastore notification log. This event can be used by other Impala instances to invalidate or update the filemetada information when data is inserted or overrwriten on a given table or partition. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7971) Add support for insert operations from Impala
Vihang Karajgaonkar created IMPALA-7971: --- Summary: Add support for insert operations from Impala Key: IMPALA-7971 URL: https://issues.apache.org/jira/browse/IMPALA-7971 Project: IMPALA Issue Type: Sub-task Reporter: Vihang Karajgaonkar When data is inserted into existing tables and partitions, Catalog does not issue any metastore API calls. Metastore provides a API called {{fire_listener_event}} which can be used to add a {{INSERT_EVENT}} to the metastore notification log. This event can be used by other Impala instances to invalidate or update the filemetada information when data is inserted or overrwriten on a given table or partition. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-7970) Add support for automatic invalidates by polling metastore events
Vihang Karajgaonkar created IMPALA-7970: --- Summary: Add support for automatic invalidates by polling metastore events Key: IMPALA-7970 URL: https://issues.apache.org/jira/browse/IMPALA-7970 Project: IMPALA Issue Type: Sub-task Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar This JIRA will add the infrastructure pieces needed to poll metastore for events at a configurable interval issue {{invalidate}} on the table objects. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7970) Add support for automatic invalidates by polling metastore events
Vihang Karajgaonkar created IMPALA-7970: --- Summary: Add support for automatic invalidates by polling metastore events Key: IMPALA-7970 URL: https://issues.apache.org/jira/browse/IMPALA-7970 Project: IMPALA Issue Type: Sub-task Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar This JIRA will add the infrastructure pieces needed to poll metastore for events at a configurable interval issue {{invalidate}} on the table objects. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IMPALA-4268) Rework coordinator buffering to buffer more data
[ https://issues.apache.org/jira/browse/IMPALA-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720508#comment-16720508 ] Tim Armstrong commented on IMPALA-4268: --- Another option to consider is whether the above design could be simplified by doing the buffering in a more convenient intermediate format that is more of a straightforward buffer implementation, e.g. a ring of buffers containing rows serialised in Impala's internal format (we have utilities to do this serialisation). The extra copy does have some inherent cost, but could be made fast and might simplify the code. We also don't need to optimise return of large result sets via the coordinator to the nth degree, since the bottleneck is usually going to be on the client side or the network, rather than in the Impala C++ code. > Rework coordinator buffering to buffer more data > > > Key: IMPALA-4268 > URL: https://issues.apache.org/jira/browse/IMPALA-4268 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.8.0 >Reporter: Henry Robinson >Assignee: Pooja Nilangekar >Priority: Major > Labels: query-lifecycle, resource-management > Attachments: rows-produced-histogram.png > > > {{PlanRootSink}} executes the producer thread (the coordinator fragment > execution thread) in a separate thread to the consumer (i.e. the thread > handling the fetch RPC), which calls {{GetNext()}} to retrieve the rows. The > implementation was simplified by handing off a single batch at a time from > the producers to consumer. > This decision causes some problems: > * Many context switches for the sender. Adding buffering would allow the > sender to append to the buffer and continue progress without a context switch. > * Query execution can't release resources until the client has fetched the > final batch, because the coordinator fragment thread is still running and > potentially producing backpressure all the way down the plan tree. > * The consumer can't fulfil fetch requests greater than Impala's internal > BATCH_SIZE, because it is only given one batch at a time. > The tricky part is managing the mismatch between the size of the row batches > processed in {{Send()}} and the size of the fetch result asked for by the > client without impacting performance too badly. The sender materializes > output rows in a {{QueryResultSet}} that is owned by the coordinator. That is > not, currently, a splittable object - instead it contains the actual RPC > response struct that will hit the wire when the RPC completes. As > asynchronous sender does not know the batch size, because it can in theory > change on every fetch call (although most reasonable clients will not > randomly change the fetch size). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-4268) Rework coordinator buffering to buffer more data
[ https://issues.apache.org/jira/browse/IMPALA-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-4268: -- Description: {{PlanRootSink}} executes the producer thread (the coordinator fragment execution thread) in a separate thread to the consumer (i.e. the thread handling the fetch RPC), which calls {{GetNext()}} to retrieve the rows. The implementation was simplified by handing off a single batch at a time from the producers to consumer. This decision causes some problems: * Many context switches for the sender. Adding buffering would allow the sender to append to the buffer and continue progress without a context switch. * Query execution can't release resources until the client has fetched the final batch, because the coordinator fragment thread is still running and potentially producing backpressure all the way down the plan tree. * The consumer can't fulfil fetch requests greater than Impala's internal BATCH_SIZE, because it is only given one batch at a time. The tricky part is managing the mismatch between the size of the row batches processed in {{Send()}} and the size of the fetch result asked for by the client without impacting performance too badly. The sender materializes output rows in a {{QueryResultSet}} that is owned by the coordinator. That is not, currently, a splittable object - instead it contains the actual RPC response struct that will hit the wire when the RPC completes. As asynchronous sender does not know the batch size, because it can in theory change on every fetch call (although most reasonable clients will not randomly change the fetch size). was: In IMPALA-2905, we are introducing a {{PlanRootSink}} that handles the production of output rows at the root of a plan. The implementation in IMPALA-2905 has the plan execute in a separate thread to the consumer, which calls {{GetNext()}} to retrieve the rows. However, the sender thread will block until {{GetNext()}} is called, so that there are no complications about memory usage and ownership due to having several batches in flight at one time. However, this also leads to many context switches, as each {{GetNext()}} call yields to the sender to produce the rows. If the sender was to fill a buffer asynchronously, the consumer could pull out of that buffer without taking a context switch in many cases (and the extra buffering might smooth out any performance spikes due to client delays, which currently directly affect plan execution). The tricky part is managing the mismatch between the size of the row batches processed in {{Send()}} and the size of the fetch result asked for by the client. The sender materializes output rows in a {{QueryResultSet}} that is owned by the coordinator. That is not, currently, a splittable object - instead it contains the actual RPC response struct that will hit the wire when the RPC completes. As asynchronous sender cannot know the batch size, which may change on every fetch call. So the {{GetNext()}} implementation would need to be able to split out the {{QueryResultSet}} to match the correct fetch size, and handle stitching together other {{QueryResultSets}} - without doing extra copies. > Rework coordinator buffering to buffer more data > > > Key: IMPALA-4268 > URL: https://issues.apache.org/jira/browse/IMPALA-4268 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.8.0 >Reporter: Henry Robinson >Assignee: Pooja Nilangekar >Priority: Major > Labels: query-lifecycle, resource-management > Attachments: rows-produced-histogram.png > > > {{PlanRootSink}} executes the producer thread (the coordinator fragment > execution thread) in a separate thread to the consumer (i.e. the thread > handling the fetch RPC), which calls {{GetNext()}} to retrieve the rows. The > implementation was simplified by handing off a single batch at a time from > the producers to consumer. > This decision causes some problems: > * Many context switches for the sender. Adding buffering would allow the > sender to append to the buffer and continue progress without a context switch. > * Query execution can't release resources until the client has fetched the > final batch, because the coordinator fragment thread is still running and > potentially producing backpressure all the way down the plan tree. > * The consumer can't fulfil fetch requests greater than Impala's internal > BATCH_SIZE, because it is only given one batch at a time. > The tricky part is managing the mismatch between the size of the row batches > processed in {{Send()}} and the size of the fetch result asked for by the > client without impacting performance too badly. The sender materializes > output rows in a
[jira] [Assigned] (IMPALA-4268) Rework coordinator buffering to buffer more data
[ https://issues.apache.org/jira/browse/IMPALA-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-4268: - Assignee: Pooja Nilangekar (was: Bikramjeet Vig) > Rework coordinator buffering to buffer more data > > > Key: IMPALA-4268 > URL: https://issues.apache.org/jira/browse/IMPALA-4268 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.8.0 >Reporter: Henry Robinson >Assignee: Pooja Nilangekar >Priority: Major > Labels: query-lifecycle, resource-management > Attachments: rows-produced-histogram.png > > > In IMPALA-2905, we are introducing a {{PlanRootSink}} that handles the > production of output rows at the root of a plan. > The implementation in IMPALA-2905 has the plan execute in a separate thread > to the consumer, which calls {{GetNext()}} to retrieve the rows. However, the > sender thread will block until {{GetNext()}} is called, so that there are no > complications about memory usage and ownership due to having several batches > in flight at one time. > However, this also leads to many context switches, as each {{GetNext()}} call > yields to the sender to produce the rows. If the sender was to fill a buffer > asynchronously, the consumer could pull out of that buffer without taking a > context switch in many cases (and the extra buffering might smooth out any > performance spikes due to client delays, which currently directly affect plan > execution). > The tricky part is managing the mismatch between the size of the row batches > processed in {{Send()}} and the size of the fetch result asked for by the > client. The sender materializes output rows in a {{QueryResultSet}} that is > owned by the coordinator. That is not, currently, a splittable object - > instead it contains the actual RPC response struct that will hit the wire > when the RPC completes. As asynchronous sender cannot know the batch size, > which may change on every fetch call. So the {{GetNext()}} implementation > would need to be able to split out the {{QueryResultSet}} to match the > correct fetch size, and handle stitching together other {{QueryResultSets}} - > without doing extra copies. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-4889) Use sidecars for Thrift-wrapped RPC payloads
[ https://issues.apache.org/jira/browse/IMPALA-4889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Ho resolved IMPALA-4889. Resolution: Fixed Fix Version/s: (was: Not Applicable) Impala 3.2.0 This was done as part of IMPALA-7213 > Use sidecars for Thrift-wrapped RPC payloads > > > Key: IMPALA-4889 > URL: https://issues.apache.org/jira/browse/IMPALA-4889 > Project: IMPALA > Issue Type: Sub-task > Components: Distributed Exec >Affects Versions: Impala 2.9.0 >Reporter: Henry Robinson >Priority: Major > Fix For: Impala 3.2.0 > > > When Kudu supports client-side payloads > (https://issues.apache.org/jira/browse/KUDU-1866), we should use that for all > Thrift-wrapped RPCs to save one copy / serialization step. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-4889) Use sidecars for Thrift-wrapped RPC payloads
[ https://issues.apache.org/jira/browse/IMPALA-4889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Ho resolved IMPALA-4889. Resolution: Fixed Fix Version/s: (was: Not Applicable) Impala 3.2.0 This was done as part of IMPALA-7213 > Use sidecars for Thrift-wrapped RPC payloads > > > Key: IMPALA-4889 > URL: https://issues.apache.org/jira/browse/IMPALA-4889 > Project: IMPALA > Issue Type: Sub-task > Components: Distributed Exec >Affects Versions: Impala 2.9.0 >Reporter: Henry Robinson >Priority: Major > Fix For: Impala 3.2.0 > > > When Kudu supports client-side payloads > (https://issues.apache.org/jira/browse/KUDU-1866), we should use that for all > Thrift-wrapped RPCs to save one copy / serialization step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IMPALA-4268) Rework coordinator buffering to buffer more data
[ https://issues.apache.org/jira/browse/IMPALA-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720496#comment-16720496 ] Tim Armstrong commented on IMPALA-4268: --- I added a few related issues as subtasks since they're likely to either be fixed as part of the solution above or are dependent on it. > Rework coordinator buffering to buffer more data > > > Key: IMPALA-4268 > URL: https://issues.apache.org/jira/browse/IMPALA-4268 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.8.0 >Reporter: Henry Robinson >Assignee: Pooja Nilangekar >Priority: Major > Labels: query-lifecycle, resource-management > Attachments: rows-produced-histogram.png > > > In IMPALA-2905, we are introducing a {{PlanRootSink}} that handles the > production of output rows at the root of a plan. > The implementation in IMPALA-2905 has the plan execute in a separate thread > to the consumer, which calls {{GetNext()}} to retrieve the rows. However, the > sender thread will block until {{GetNext()}} is called, so that there are no > complications about memory usage and ownership due to having several batches > in flight at one time. > However, this also leads to many context switches, as each {{GetNext()}} call > yields to the sender to produce the rows. If the sender was to fill a buffer > asynchronously, the consumer could pull out of that buffer without taking a > context switch in many cases (and the extra buffering might smooth out any > performance spikes due to client delays, which currently directly affect plan > execution). > The tricky part is managing the mismatch between the size of the row batches > processed in {{Send()}} and the size of the fetch result asked for by the > client. The sender materializes output rows in a {{QueryResultSet}} that is > owned by the coordinator. That is not, currently, a splittable object - > instead it contains the actual RPC response struct that will hit the wire > when the RPC completes. As asynchronous sender cannot know the batch size, > which may change on every fetch call. So the {{GetNext()}} implementation > would need to be able to split out the {{QueryResultSet}} to match the > correct fetch size, and handle stitching together other {{QueryResultSets}} - > without doing extra copies. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-7312) Non-blocking mode for Fetch() RPC
[ https://issues.apache.org/jira/browse/IMPALA-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-7312: - Assignee: Pooja Nilangekar > Non-blocking mode for Fetch() RPC > - > > Key: IMPALA-7312 > URL: https://issues.apache.org/jira/browse/IMPALA-7312 > Project: IMPALA > Issue Type: Sub-task > Components: Clients >Reporter: Tim Armstrong >Assignee: Pooja Nilangekar >Priority: Major > Labels: resource-management > > Currently Fetch() can block for an arbitrary amount of time until a batch of > rows is produced. It might be helpful to have a mode where it returns quickly > when there is no data available, so that threads and RPC slots are not tied > up. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-4268) Rework coordinator buffering to buffer more data
[ https://issues.apache.org/jira/browse/IMPALA-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-4268: -- Summary: Rework coordinator buffering to buffer more data (was: buffer more than a batch of rows at coordinator) > Rework coordinator buffering to buffer more data > > > Key: IMPALA-4268 > URL: https://issues.apache.org/jira/browse/IMPALA-4268 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.8.0 >Reporter: Henry Robinson >Assignee: Bikramjeet Vig >Priority: Major > Labels: query-lifecycle, resource-management > Attachments: rows-produced-histogram.png > > > In IMPALA-2905, we are introducing a {{PlanRootSink}} that handles the > production of output rows at the root of a plan. > The implementation in IMPALA-2905 has the plan execute in a separate thread > to the consumer, which calls {{GetNext()}} to retrieve the rows. However, the > sender thread will block until {{GetNext()}} is called, so that there are no > complications about memory usage and ownership due to having several batches > in flight at one time. > However, this also leads to many context switches, as each {{GetNext()}} call > yields to the sender to produce the rows. If the sender was to fill a buffer > asynchronously, the consumer could pull out of that buffer without taking a > context switch in many cases (and the extra buffering might smooth out any > performance spikes due to client delays, which currently directly affect plan > execution). > The tricky part is managing the mismatch between the size of the row batches > processed in {{Send()}} and the size of the fetch result asked for by the > client. The sender materializes output rows in a {{QueryResultSet}} that is > owned by the coordinator. That is not, currently, a splittable object - > instead it contains the actual RPC response struct that will hit the wire > when the RPC completes. As asynchronous sender cannot know the batch size, > which may change on every fetch call. So the {{GetNext()}} implementation > would need to be able to split out the {{QueryResultSet}} to match the > correct fetch size, and handle stitching together other {{QueryResultSets}} - > without doing extra copies. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-1618) Impala server should always try to fulfill requested fetch size
[ https://issues.apache.org/jira/browse/IMPALA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-1618: - Assignee: Pooja Nilangekar > Impala server should always try to fulfill requested fetch size > --- > > Key: IMPALA-1618 > URL: https://issues.apache.org/jira/browse/IMPALA-1618 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Affects Versions: Impala 2.0.1 >Reporter: casey >Assignee: Pooja Nilangekar >Priority: Minor > Labels: usability > > The thrift fetch request specifies the number of rows that it would like but > the Impala server may return fewer even though more results are available. > For example, using the default row_batch size of 1024, if the client requests > 1023 rows, the first response contains 1023 rows but the second response > contains only 1 row. This is because the server internally uses row_batch > (1024), returns the requested count (1023) and caches the remaining row, then > the next time around only uses the cache. > In general the end user should set both the row batch size and the thrift > request size. In practice the query writer setting row_batch and the > driver/programmer setting fetch size may often be different people. > There is one case that works fine now though - setting the batch size to less > than the thrift req size. In this case the thrift response is always the same > as batch size. > Code example: > {noformat} > dev@localhost:~/impyla$ git diff > diff --git a/impala/_rpc/hiveserver2.py b/impala/_rpc/hiveserver2.py > index 6139002..31fdab7 100644 > --- a/impala/_rpc/hiveserver2.py > +++ b/impala/_rpc/hiveserver2.py > @@ -265,6 +265,7 @@ def fetch_results(service, operation_handle, > hs2_protocol_version, schema=None, > req = TFetchResultsReq(operationHandle=operation_handle, > orientation=orientation, > maxRows=max_rows) > +print("req: " + str(max_rows)) > resp = service.FetchResults(req) > err_if_rpc_not_ok(resp) > > @@ -273,6 +274,7 @@ def fetch_results(service, operation_handle, > hs2_protocol_version, schema=None, > for (i, col) in enumerate(resp.results.columns)] > num_cols = len(tcols) > num_rows = len(tcols[0].values) > +print("rec: " + str(num_rows)) > rows = [] > for i in xrange(num_rows): > row = [] > dev@localhost:~/impyla$ cat test.py > from impala.dbapi import connect > conn = connect() > cur = conn.cursor() > cur.set_arraysize(1024) > cur.execute("set batch_size=1025") > cur.execute("select * from tpch.lineitem") > while True: > rows = cur.fetchmany() > if not rows: > break > cur.close() > conn.close() > dev@localhost:~/impyla$ python test.py | head > Failed to import pandas > req: 1024 > rec: 1024 > req: 1024 > rec: 1 > req: 1024 > rec: 1024 > req: 1024 > rec: 1 > req: 1024 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7312) Non-blocking mode for Fetch() RPC
[ https://issues.apache.org/jira/browse/IMPALA-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-7312: -- Issue Type: Sub-task (was: Improvement) Parent: IMPALA-4268 > Non-blocking mode for Fetch() RPC > - > > Key: IMPALA-7312 > URL: https://issues.apache.org/jira/browse/IMPALA-7312 > Project: IMPALA > Issue Type: Sub-task > Components: Clients >Reporter: Tim Armstrong >Priority: Major > Labels: resource-management > > Currently Fetch() can block for an arbitrary amount of time until a batch of > rows is produced. It might be helpful to have a mode where it returns quickly > when there is no data available, so that threads and RPC slots are not tied > up. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-558) HS2::FetchResults sets hasMoreRows in many cases where no more rows are to be returned
[ https://issues.apache.org/jira/browse/IMPALA-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-558: Assignee: Pooja Nilangekar > HS2::FetchResults sets hasMoreRows in many cases where no more rows are to be > returned > -- > > Key: IMPALA-558 > URL: https://issues.apache.org/jira/browse/IMPALA-558 > Project: IMPALA > Issue Type: Sub-task > Components: Clients >Affects Versions: Impala 1.1 >Reporter: Henry Robinson >Assignee: Pooja Nilangekar >Priority: Minor > Labels: query-lifecycle > > The first call to {{FetchResults}} always sets {{hasMoreRows}} even when 0 > rows should be returned. The next call correctly sets {{hasMoreRows == > False}}. The upshot is there's always an extra round-trip, although > correctness isn't affected. > {code} > execute_statement_req = TCLIService.TExecuteStatementReq() > execute_statement_req.sessionHandle = resp.sessionHandle > execute_statement_req.statement = "SELECT COUNT(*) FROM > functional.alltypes WHERE 1 = 2" > execute_statement_resp = > self.hs2_client.ExecuteStatement(execute_statement_req) > > fetch_results_req = TCLIService.TFetchResultsReq() > fetch_results_req.operationHandle = execute_statement_resp.operationHandle > fetch_results_req.maxRows = 100 > fetch_results_resp = self.hs2_client.FetchResults(fetch_results_req) > > assert not fetch_results_resp.hasMoreRows # Fails > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-7728) Impala Doc: Add the Changing Privileges section in Impala Sentry doc
[ https://issues.apache.org/jira/browse/IMPALA-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni closed IMPALA-7728. --- Resolution: Fixed Fix Version/s: (was: Impala 3.1.0) Impala 3.2.0 > Impala Doc: Add the Changing Privileges section in Impala Sentry doc > > > Key: IMPALA-7728 > URL: https://issues.apache.org/jira/browse/IMPALA-7728 > Project: IMPALA > Issue Type: Task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Fix For: Impala 3.2.0 > > > https://gerrit.cloudera.org/#/c/12071/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IMPALA-4268) buffer more than a batch of rows at coordinator
[ https://issues.apache.org/jira/browse/IMPALA-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720490#comment-16720490 ] Tim Armstrong commented on IMPALA-4268: --- Here's a rough approach that I think would work reasonably well in practice. The goal is to avoid multiple copies of the results in cases where it is likely to be a performance bottleneck, i.e. when the result set is large and when the client is rapidly fetching the results. We should assume that the common case is clients repeatedly fetching the same number of rows. * The producer, at every point in time, has a target batch size which is a guess about what the next Fetch() size is likely to be. ** Initially this would be some hardcoded value, based on common client behaviour ** It would be updated to the last fetch request size every time the client fetches. * The ClientRequestState or PlanRootSink (tbd) has some number of buffered QueryResultSet objects. We want this to be bounded to a threshold in bytes. * The producer tries to fill the last queued QueryResultSet object up to the target fetch size, moving onto a new one once it is full. * The consumer tries to fulfil the requested fetch size by either returning a queued QueryResultSet directly (if it is close enough to the right size) or by copying rows from the queued QueryResultsSets into its own output QueryResultSet. There are some subtle issues to think about: ** Should the consumer block waiting for more rows if its target number of rows are not yet available? Returning too eagerly results in many tiny batches being returned, but blocking indefinitely delays return of results to the client. One option is to, initially, block until either there are enough rows to fill the target request size, the producer has hit eos, or until some timeout has elapsed. ** We need to have some way to get the batches aligned if they get misaligned earlier in the process. Maybe a heuristic is, when reaching a batch boundary, only continue consuming that batch if it will fit entirely in the output. One thing to keep in mind is that, ideally, the client result cache and the buffer should be the same thing. The difference is that the client result cache retains all results and discards the cache if it overflows, whereas the PlanRootSink buffering is a bounded queue where the producer blocks when it gets full. We could combine those behaviours if we have a single buffer with a read cursor that is the next row to be returned. We would only advance the read cursor when returning rows instead of discarding the read rows. Then, when the buffer is full, the producer first discards the already-read cached rows to free up space, then blocks if the buffer is still full. We should look at staging the work - maybe we don't need to merge them as part of the first bit of work. > buffer more than a batch of rows at coordinator > --- > > Key: IMPALA-4268 > URL: https://issues.apache.org/jira/browse/IMPALA-4268 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.8.0 >Reporter: Henry Robinson >Assignee: Bikramjeet Vig >Priority: Major > Labels: query-lifecycle, resource-management > Attachments: rows-produced-histogram.png > > > In IMPALA-2905, we are introducing a {{PlanRootSink}} that handles the > production of output rows at the root of a plan. > The implementation in IMPALA-2905 has the plan execute in a separate thread > to the consumer, which calls {{GetNext()}} to retrieve the rows. However, the > sender thread will block until {{GetNext()}} is called, so that there are no > complications about memory usage and ownership due to having several batches > in flight at one time. > However, this also leads to many context switches, as each {{GetNext()}} call > yields to the sender to produce the rows. If the sender was to fill a buffer > asynchronously, the consumer could pull out of that buffer without taking a > context switch in many cases (and the extra buffering might smooth out any > performance spikes due to client delays, which currently directly affect plan > execution). > The tricky part is managing the mismatch between the size of the row batches > processed in {{Send()}} and the size of the fetch result asked for by the > client. The sender materializes output rows in a {{QueryResultSet}} that is > owned by the coordinator. That is not, currently, a splittable object - > instead it contains the actual RPC response struct that will hit the wire > when the RPC completes. As asynchronous sender cannot know the batch size, > which may change on every fetch call. So the {{GetNext()}} implementation > would need to be able to split out the {{QueryResultSet}} to match the > correct fetch
[jira] [Resolved] (IMPALA-7926) test_reconnect failing
[ https://issues.apache.org/jira/browse/IMPALA-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Tauber-Marshall resolved IMPALA-7926. Resolution: Fixed Fix Version/s: Impala 3.2.0 > test_reconnect failing > -- > > Key: IMPALA-7926 > URL: https://issues.apache.org/jira/browse/IMPALA-7926 > Project: IMPALA > Issue Type: Bug >Reporter: Thomas Tauber-Marshall >Assignee: Thomas Tauber-Marshall >Priority: Major > Labels: broken-build > Fix For: Impala 3.2.0 > > > {noformat} > 00:52:58 __ TestImpalaShellInteractive.test_reconnect > ___ > 00:52:58 shell/test_shell_interactive.py:191: in test_reconnect > 00:52:58 assert get_num_open_sessions(initial_impala_service) == > num_sessions_initial, \ > 00:52:58 E AssertionError: Connection to localhost.localdomain:21000 should > have been closed > 00:52:58 E assert 0 == 1 > 00:52:58 E+ where 0 = 0xccdf848>() > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-7926) test_reconnect failing
[ https://issues.apache.org/jira/browse/IMPALA-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Tauber-Marshall resolved IMPALA-7926. Resolution: Fixed Fix Version/s: Impala 3.2.0 > test_reconnect failing > -- > > Key: IMPALA-7926 > URL: https://issues.apache.org/jira/browse/IMPALA-7926 > Project: IMPALA > Issue Type: Bug >Reporter: Thomas Tauber-Marshall >Assignee: Thomas Tauber-Marshall >Priority: Major > Labels: broken-build > Fix For: Impala 3.2.0 > > > {noformat} > 00:52:58 __ TestImpalaShellInteractive.test_reconnect > ___ > 00:52:58 shell/test_shell_interactive.py:191: in test_reconnect > 00:52:58 assert get_num_open_sessions(initial_impala_service) == > num_sessions_initial, \ > 00:52:58 E AssertionError: Connection to localhost.localdomain:21000 should > have been closed > 00:52:58 E assert 0 == 1 > 00:52:58 E+ where 0 = 0xccdf848>() > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-7969) Always admit trivial queries immediately
Tim Armstrong created IMPALA-7969: - Summary: Always admit trivial queries immediately Key: IMPALA-7969 URL: https://issues.apache.org/jira/browse/IMPALA-7969 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Tim Armstrong Assignee: Bikramjeet Vig Here are two common query types that consume minimal resources: * {{select ... from ... limit 0}}, which is used by some clients to determine column types * {{select , , }}, which just evaluates some constant expressions on the coordinator Currently these queries get queued if there are existing queued queries or the number of queries limit is exceeded, which is inconvenient for use cases where latency is important. I think the planner should identify trivial queries and admission controller should admit immediately. Here's an initial thought on the definition of a trivial query: * Must have PLAN ROOT SINK as the root * Can contain UNION and EMPTYSET nodes only -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-7969) Always admit trivial queries immediately
Tim Armstrong created IMPALA-7969: - Summary: Always admit trivial queries immediately Key: IMPALA-7969 URL: https://issues.apache.org/jira/browse/IMPALA-7969 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Tim Armstrong Assignee: Bikramjeet Vig Here are two common query types that consume minimal resources: * {{select ... from ... limit 0}}, which is used by some clients to determine column types * {{select , , }}, which just evaluates some constant expressions on the coordinator Currently these queries get queued if there are existing queued queries or the number of queries limit is exceeded, which is inconvenient for use cases where latency is important. I think the planner should identify trivial queries and admission controller should admit immediately. Here's an initial thought on the definition of a trivial query: * Must have PLAN ROOT SINK as the root * Can contain UNION and EMPTYSET nodes only -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7926) test_reconnect failing
[ https://issues.apache.org/jira/browse/IMPALA-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720166#comment-16720166 ] ASF subversion and git services commented on IMPALA-7926: - Commit 1cbcd0c37d47b1b22780a788e59f0f3e67d2a626 in impala's branch refs/heads/master from [~twmarshall] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=1cbcd0c ] IMPALA-7926: Fix flakiness in test_reconnect test_reconnect launches a shell that connects to one impalad in the minicluster then reconnects to a different impalad while checking that the impalad's open session metric changes accordingly. To do this, the test gets the number of open sessions at the start of the test and then expects that the number of sessions will have increased by 1 on the impalad that the shell is currently connected to. This can be a problem if there is a session left over from another test that is still active when test_reconnect starts but exits while it's running. test_reconnect is already marked to run serially, so there shouldn't be any other sessions open while it runs anyways. The solution is to wait at the start of the test until any sessions left over from other tests have exited. Testing: - Ran the test in an environment where the timing was previously causing it to fail almost deterministically and it now passes. Change-Id: I3017ca3bf7b4e33440cffb80e9a48a63bec14434 Reviewed-on: http://gerrit.cloudera.org:8080/12045 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > test_reconnect failing > -- > > Key: IMPALA-7926 > URL: https://issues.apache.org/jira/browse/IMPALA-7926 > Project: IMPALA > Issue Type: Bug >Reporter: Thomas Tauber-Marshall >Assignee: Thomas Tauber-Marshall >Priority: Major > Labels: broken-build > > {noformat} > 00:52:58 __ TestImpalaShellInteractive.test_reconnect > ___ > 00:52:58 shell/test_shell_interactive.py:191: in test_reconnect > 00:52:58 assert get_num_open_sessions(initial_impala_service) == > num_sessions_initial, \ > 00:52:58 E AssertionError: Connection to localhost.localdomain:21000 should > have been closed > 00:52:58 E assert 0 == 1 > 00:52:58 E+ where 0 = 0xccdf848>() > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7351) Add memory estimates for plan nodes and sinks with missing estimates
[ https://issues.apache.org/jira/browse/IMPALA-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720167#comment-16720167 ] ASF subversion and git services commented on IMPALA-7351: - Commit c209fed5350666c0b62004670714bdb404c39e06 in impala's branch refs/heads/master from [~bikram.sngh91] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=c209fed ] IMPALA-7351: Add estimates to kudu table sink The kudu table sink allocates untracked memory which is bounded by limits that impala enforces through the kudu client API. This patch adds a constant estimate to this table sink which is based on those limits. Testing: Modified planner tests accordingly. Change-Id: I89a45dce0cfbbe3cc0bc17d55ffdbd41cd7dbfbd Reviewed-on: http://gerrit.cloudera.org:8080/12077 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Add memory estimates for plan nodes and sinks with missing estimates > > > Key: IMPALA-7351 > URL: https://issues.apache.org/jira/browse/IMPALA-7351 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend >Reporter: Tim Armstrong >Assignee: Bikramjeet Vig >Priority: Major > Labels: admission-control, resource-management > > Many plan nodes and sinks, e.g. KuduScanNode, KuduTableSink, ExchangeNode, > etc are missing memory estimates entirely. > We should add a basic estimate for all these cases based on experiments and > data from real workloads. In some cases 0 may be the right estimate (e.g. for > streaming nodes like SelectNode that just pass through data) but we should > remove TODOs and document the reasoning in those cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7728) Impala Doc: Add the Changing Privileges section in Impala Sentry doc
[ https://issues.apache.org/jira/browse/IMPALA-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720164#comment-16720164 ] ASF subversion and git services commented on IMPALA-7728: - Commit 50ee98a0fae329c139b19761326e5263f179cf24 in impala's branch refs/heads/master from [~arodoni_cloudera] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=50ee98a ] IMPALA-7728: [DOCS] Added a section on Changing Privileges Change-Id: I955cb49cae24be6a93a90ccb5f2aa6ceb29cee8b Reviewed-on: http://gerrit.cloudera.org:8080/12071 Tested-by: Impala Public Jenkins Reviewed-by: Alex Rodoni > Impala Doc: Add the Changing Privileges section in Impala Sentry doc > > > Key: IMPALA-7728 > URL: https://issues.apache.org/jira/browse/IMPALA-7728 > Project: IMPALA > Issue Type: Task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Fix For: Impala 3.1.0 > > > https://gerrit.cloudera.org/#/c/12071/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-6591) TestClientSsl hung for a long time
[ https://issues.apache.org/jira/browse/IMPALA-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720165#comment-16720165 ] ASF subversion and git services commented on IMPALA-6591: - Commit 9c44853998705172015626e6e29d1d23a7f9e53e in impala's branch refs/heads/master from [~fredyw] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=9c44853 ] IMPALA-6591: Fix test_ssl flaky test test_ssl has a logic that waits for the number of in-flight queries to be 1. However, the logic for wait_for_num_in_flight_queries(1) only waits for the condition to be true for a period of time and does not throw an exception when the time has elapsed and the condition is not met. In other words, the logic in test_ssl that loops while the number of in-flight queries is 1 never gets executed. I was able to simulate this issue by making Impala shell start much longer. Prior to this patch, in the event that Impala shell took much longer to start, the test started sending the commands to Impala shell even when Impala shell was not ready to receive commands. The patch fixes the issue by waiting until Impala shell is connected. The patch also adds assert in other places that calls wait_for_num_in_flight_queries and updates the default behavior for Impala shell to wait until it is connected. Testing: - Ran core and exhaustive tests several times on CentOS 6 without any issue Change-Id: I9805269d8b806aecf5d744c219967649a041d49f Reviewed-on: http://gerrit.cloudera.org:8080/12047 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > TestClientSsl hung for a long time > -- > > Key: IMPALA-6591 > URL: https://issues.apache.org/jira/browse/IMPALA-6591 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 2.12.0, Impala 3.1.0, Impala 3.2.0 >Reporter: Tim Armstrong >Assignee: Csaba Ringhofer >Priority: Blocker > Labels: broken-build, flaky, hang > > {noformat} > 18:49:13 > custom_cluster/test_catalog_wait.py::TestCatalogWait::test_delayed_catalog > PASSED > 18:49:53 > custom_cluster/test_client_ssl.py::TestClientSsl::test_ssl[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: text/none] Build timed > out (after 1,440 minutes). Marking the build as failed. > 12:20:15 Build was aborted > 12:20:15 Archiving artifacts > {noformat} > I unfortunately wasn't able to get any logs... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org