[jira] [Created] (IMPALA-7777) Fail queries where the sum of offset and limit exceed the max value of int64
Sahil Takiar created IMPALA-: Summary: Fail queries where the sum of offset and limit exceed the max value of int64 Key: IMPALA- URL: https://issues.apache.org/jira/browse/IMPALA- Project: IMPALA Issue Type: Improvement Reporter: Sahil Takiar Assignee: Sahil Takiar A follow up to IMPALA-5004. We should prevent users from running queries where the sum of the offset and limit exceeds some threshold (e.g. {{Long.MAX_VALUE}}). If a user tries to run this query the impalad will crash, so we should reject queries that exceed the threshold. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-7776) Fail queries where the sum of offset and limit exceed the max value of int64
Sahil Takiar created IMPALA-7776: Summary: Fail queries where the sum of offset and limit exceed the max value of int64 Key: IMPALA-7776 URL: https://issues.apache.org/jira/browse/IMPALA-7776 Project: IMPALA Issue Type: Improvement Reporter: Sahil Takiar Assignee: Sahil Takiar A follow up to IMPALA-5004. We should prevent users from running queries where the sum of the offset and limit exceeds some threshold (e.g. {{Long.MAX_VALUE}}). If a user tries to run this query the impalad will crash, so we should reject queries that exceed the threshold. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-7776) Fail queries where the sum of offset and limit exceed the max value of int64
[ https://issues.apache.org/jira/browse/IMPALA-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-7776. -- Resolution: Duplicate > Fail queries where the sum of offset and limit exceed the max value of int64 > > > Key: IMPALA-7776 > URL: https://issues.apache.org/jira/browse/IMPALA-7776 > Project: IMPALA > Issue Type: Improvement >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > A follow up to IMPALA-5004. We should prevent users from running queries > where the sum of the offset and limit exceeds some threshold (e.g. > {{Long.MAX_VALUE}}). If a user tries to run this query the impalad will > crash, so we should reject queries that exceed the threshold. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-6249) Expose several build flags via web UI
[ https://issues.apache.org/jira/browse/IMPALA-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-6249. -- Resolution: Fixed Fix Version/s: Impala 3.1.0 > Expose several build flags via web UI > - > > Key: IMPALA-6249 > URL: https://issues.apache.org/jira/browse/IMPALA-6249 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure >Reporter: Tim Armstrong >Assignee: Sahil Takiar >Priority: Minor > Fix For: Impala 3.1.0 > > Attachments: Screen Shot 2018-09-06 at 11.47.45 AM.png > > > IMPALA-6241 added a .cmake_build_type file with the CMAKE_BUILD_TYPE value > for the last build. The file is used to detect the type of the build that the > python tests are running against. However, this assumes that the tests are > running from the same directory that the Impala cluster under test was built > from, which isn't necessarily true for all dev workflows and for remote > cluster tests. > It would be convenient if CMAKE_BUILD_TYPE was exposed from the Impalad web > UI. Currently we expose DEBUG/RELEASE depending on the value of NDEBUG - see > GetVersionString() and impalad-host:25000/?json=true, but we could expose the > precise build type, then allow the python tests to parse it from the web UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-7691) test_web_pages not being run
[ https://issues.apache.org/jira/browse/IMPALA-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-7691. -- Resolution: Fixed Fix Version/s: Impala 3.1.0 > test_web_pages not being run > > > Key: IMPALA-7691 > URL: https://issues.apache.org/jira/browse/IMPALA-7691 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure >Reporter: Thomas Tauber-Marshall >Assignee: Sahil Takiar >Priority: Blocker > Fix For: Impala 3.1.0 > > > test_web_pages.py is not being run by test/run-tests.py because the > 'webserver' directory is missing from VALID_TEST_DIRS -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-7777) Fix crash due to arithmetic overflows in Exchange Node
[ https://issues.apache.org/jira/browse/IMPALA-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-. -- Resolution: Fixed Fix Version/s: Impala 3.1.0 > Fix crash due to arithmetic overflows in Exchange Node > -- > > Key: IMPALA- > URL: https://issues.apache.org/jira/browse/IMPALA- > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0 >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.1.0 > > > A follow up to IMPALA-5004. Impala allows a value of LIMIT and OFFSET up to > 2^63. However, if a user tries to run a query with a large offset (e.g. > slightly lower than 2^63), the query will crash the impalad due to a > {{DCHECK_LE}} in {{row-batch.h}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-5004) Switch to sorting node for large TopN queries
[ https://issues.apache.org/jira/browse/IMPALA-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-5004. -- Resolution: Fixed Fix Version/s: Impala 3.1.0 > Switch to sorting node for large TopN queries > - > > Key: IMPALA-5004 > URL: https://issues.apache.org/jira/browse/IMPALA-5004 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 2.9.0 >Reporter: Lars Volker >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.1.0 > > > As explained by [~tarmstrong] in IMPALA-4995: > bq. We should also consider switching to the sort operator for large limits. > This allows it to spill. The memory requirements for TopN also are > problematic for large limits, since it would allocate large vectors that are > untracked and also require a large amount of contiguous memory. > There's already logic to select TopN vs. Sort: > [planner/SingleNodePlanner.java#L289|https://github.com/apache/incubator-impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L289] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-7836) Impala 3.1 Doc: New query option 'topn_bytes_limit' for TopN to Sort conversion
Sahil Takiar created IMPALA-7836: Summary: Impala 3.1 Doc: New query option 'topn_bytes_limit' for TopN to Sort conversion Key: IMPALA-7836 URL: https://issues.apache.org/jira/browse/IMPALA-7836 Project: IMPALA Issue Type: Sub-task Components: Frontend Affects Versions: Impala 2.9.0 Reporter: Sahil Takiar Assignee: Alex Rodoni IMPALA-5004 adds a new query level option called 'topn_bytes_limit' that we should document. The changes in IMPALA-5004 work by estimating the amount of memory required to run a TopN operator. The memory estimate is based on the size of the individual tuples that need to be processed by the TopN operator, as well as the sum of the limit and offset in the query. TopN operators don't spill to disk so they have to keep all rows they process in memory. If the estimated size of the working set of the TopN operator exceeds the threshold of 'topn_bytes_limit' the TopN operator will be replaced with a Sort operator. The Sort operator can spill to disk, but it processes all the data (the limit and offset have no affect). So switching to Sort might incur performance penalties, but it will require less memory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-7816) Race condition in HdfsScanNodeBase::StopAndFinalizeCounters
Sahil Takiar created IMPALA-7816: Summary: Race condition in HdfsScanNodeBase::StopAndFinalizeCounters Key: IMPALA-7816 URL: https://issues.apache.org/jira/browse/IMPALA-7816 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 3.1.0 Reporter: Sahil Takiar Assignee: Sahil Takiar While working on IMPALA-6964, I noticed that sometimes the runtime profile for a {{HDFS_SCAN_NODE}} will include {{File Formats: PARQUET/NONE:2}} and sometimes it won't (depending on the query). However, looking at the code, any scan of Parquet files should include this line. I debugged the code and there seems to a be a race condition where {{HdfsScanNodeBase::StopAndFinalizeCounters}} can be called before {{HdfsParquetScanner::Close}} is called for all the scan ranges. This causes the {{File Formats}} issue above because {{HdfsParquetScanner::Close}} calls {{HdfsScanNodeBase::RangeComplete}} which updates the shared object {{file_type_counts_}}, which is read in {{StopAndFinalizeCounters}} (so {{StopAndFinalizeCounters}} will write out the contents of {{file_type_counts_}} before all scanners can update it). {{StopAndFinalizeCounters}} can be called in two places: {{HdfsScanNodeBase::Close}} and in {{HdfsScanNode::GetNext}}. It can be called in {{GetNext}} when {{GetNextInternal}} reads enough rows to cross the query defined limit. So {{GetNext}} will call {{StopAndFinalizeCounters}} once the limit is reached, but not necessarily before the scanners are closed. I'm able to re-produce this locally by using the queries: {code:java} select * from functional_parquet.lineitem_sixblocks limit 10 {code} The runtime profile does not include {{File Formats}} {code:java} select * from functional_parquet.lineitem_sixblocks order by l_orderkey limit 10 {code} The runtime profile does include {{File Formats}} I tried to simply remove the call to {{StopAndFinalizeCounters}} from {{GetNext}} but that doesn't seem to work. It actually caused several other RP messages to get deleted (not entirely sure why). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-7625) test_web_pages.py backend tests are failing
Sahil Takiar created IMPALA-7625: Summary: test_web_pages.py backend tests are failing Key: IMPALA-7625 URL: https://issues.apache.org/jira/browse/IMPALA-7625 Project: IMPALA Issue Type: Test Components: Infrastructure Reporter: Sahil Takiar Assignee: Sahil Takiar While working on IMPALA-6249, we found that the tests under {{webserver/test_web_pages.py}} are not being run by Jenkins. We re-enabled the tests, however, a few of the backend specific tests are failing. IMPALA-6249 disabled these tests. This JIRA is to follow up on these tests and fix them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-8101) Thrift 11 compilation and Thrift ext-data-source compilation are always run
Sahil Takiar created IMPALA-8101: Summary: Thrift 11 compilation and Thrift ext-data-source compilation are always run Key: IMPALA-8101 URL: https://issues.apache.org/jira/browse/IMPALA-8101 Project: IMPALA Issue Type: Task Reporter: Sahil Takiar Assignee: Sahil Takiar [~tarmstrong] pointed out that after IMPALA-7924 the build output started displaying lines such as: "Running thrift 11 compiler on..." even during builds when Thrift files were not modified. I dug a bit deeper and found the following: * This seems to be happening for Thrift compilation of {{ext-data-source}} files as well (e.g. ExternalDataSource.thrift, Types.thrift, etc.); "Running thrift compiler for ext-data-source on..." is always printed * The issue is that the [custom command|https://cmake.org/cmake/help/v3.8/command/add_custom_command.html] for ext-data-source and Thrift 11 compilation specify an {{OUTPUT}} file that does not exist (and is not generated by Thrift) * According to the CMake docs "if the command does not actually create the {{OUTPUT}} then the rule will always run" - so Thrift compilation will run during every build * The issue is that you don't really know what files Thrift is going to generate without actually looking into the Thrift file and understanding Thrift internals * For C++ and Python there is a workaround; for C++ Thrift always generates a file \{THRIFT_FILE_NAME}_types.h (similar situation for Python); however, for Java no such file necessarily exists (ext-data-source only does Java gen) ** This is how regular Thrift compilation works (e.g. compilation of beeswax.thrift, ImpalaService.thrift, etc.); which is why we don't see the issue for regular Thrift compilation A solution for Thrift 11 compilation is to just add generated Python files to the {{OUTPUT}} for the custom_command. A solution for Thrift compilation of ext-data-source seems trickier, so open to suggestions. Ideally, Thrift would be provide a way to return the list of files generated from a .thrift file, without actually generating the files, but I don't see a way to do that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-6964) Track stats about column and page sizes in Parquet reader
[ https://issues.apache.org/jira/browse/IMPALA-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-6964. -- Resolution: Fixed Fix Version/s: Impala 3.2.0 > Track stats about column and page sizes in Parquet reader > - > > Key: IMPALA-6964 > URL: https://issues.apache.org/jira/browse/IMPALA-6964 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Assignee: Sahil Takiar >Priority: Major > Labels: observability, parquet, ramp-up > Fix For: Impala 3.2.0 > > > It would be good to have stats for scanned parquet data about page sizes. We > currently can't tell much about the "shape" of the parquet pages from the > profile. Some questions that are interesting: > * How big is each column? I.e. total compressed and decompressed size read. > * How big are pages on average? Either compressed or decompressed size > * What is the compression ratio for pages? Could be inferred from the above > two. > I think storing all the stats in the profile per-column would be too much > data, but we could probably infer most useful things from higher-level > aggregates. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-7924) Generate Thrift 11 Python Code
Sahil Takiar created IMPALA-7924: Summary: Generate Thrift 11 Python Code Key: IMPALA-7924 URL: https://issues.apache.org/jira/browse/IMPALA-7924 Project: IMPALA Issue Type: Task Components: Infrastructure Reporter: Sahil Takiar Assignee: Sahil Takiar Until IMPALA-7825 has been completed, it would be good to add the ability to generate Python code using Thrift 11. As stated in IMPALA-7825, Thrift has added performance improvements to its Python deserialization code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-7625) test_web_pages.py backend tests are failing
[ https://issues.apache.org/jira/browse/IMPALA-7625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-7625. -- Resolution: Fixed Fix Version/s: Impala 3.2.0 > test_web_pages.py backend tests are failing > --- > > Key: IMPALA-7625 > URL: https://issues.apache.org/jira/browse/IMPALA-7625 > Project: IMPALA > Issue Type: Test > Components: Infrastructure >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.2.0 > > > While working on IMPALA-6249, we found that the tests under > {{webserver/test_web_pages.py}} are not being run by Jenkins. We re-enabled > the tests, however, a few of the backend specific tests are failing. > IMPALA-6249 disabled these tests. This JIRA is to follow up on these tests > and fix them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-8360) SynchronousThreadPoolTest ASSERT_TRUE(*no_sleep_destroyed) failed
Sahil Takiar created IMPALA-8360: Summary: SynchronousThreadPoolTest ASSERT_TRUE(*no_sleep_destroyed) failed Key: IMPALA-8360 URL: https://issues.apache.org/jira/browse/IMPALA-8360 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Sahil Takiar Jenkins output: {code} Error Message Value of: *no_sleep_destroyed Actual: false Expected: true Stacktrace /data/jenkins/workspace/impala-cdh6.x-core-data-load/repos/Impala/be/src/util/thread-pool-test.cc:112 Value of: *no_sleep_destroyed Actual: false Expected: true {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-8343) TestParquetArrayEncodings parquet-ambiguous-list-modern.test failure
Sahil Takiar created IMPALA-8343: Summary: TestParquetArrayEncodings parquet-ambiguous-list-modern.test failure Key: IMPALA-8343 URL: https://issues.apache.org/jira/browse/IMPALA-8343 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Sahil Takiar The following query block in {{parquet-ambiguous-list-modern.test}} failed: {code} QUERY # 'f21' does not resolve with the 2-level encoding because it matches # a Parquet group in the schema. set parquet_fallback_schema_resolution=position; set parquet_array_resolution=two_level; select s2.f21 from ambig_modern.ambigarray; RESULTS CATCH has an incompatible Parquet schema TYPES int {code} With the error: {code} query_test/test_nested_types.py:556: in test_ambiguous_list vector, unique_database) common/impala_test_suite.py:415: in run_test_case assert False, "Expected exception: %s" % expected_str E AssertionError: Expected exception: has an incompatible Parquet schema {code} The full pytest configuration was: {code} query_test.test_nested_types.TestParquetArrayEncodings.test_ambiguous_list[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] {code} Seen once on centos 6. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-8391) Impala Doc
Sahil Takiar created IMPALA-8391: Summary: Impala Doc Key: IMPALA-8391 URL: https://issues.apache.org/jira/browse/IMPALA-8391 Project: IMPALA Issue Type: Sub-task Components: Docs Reporter: Sahil Takiar Assignee: Alex Rodoni The Impala-Kudu docs: [http://impala.apache.org/docs/build/html/topics/impala_kudu.html] [http://impala.apache.org/docs/build/html/topics/impala_tables.html] Need to be updated after IMPALA-7640 is merged. Specifically this part of the docs will no longer be accurate: {quote} When you create a Kudu table through Impala, it is assigned an internal Kudu table name of the form {{impala::db_name.table_name}}. You can see the Kudu-assigned name in the output of {{DESCRIBE FORMATTED}}, in the {{kudu.table_name}} field of the table properties. The Kudu-assigned name remains the same even if you use {{ALTER TABLE}} to rename the Impala table or move it to a different Impala database. You can issue the statement{{ALTER TABLE impala_name SET TBLPROPERTIES('kudu.table_name' = 'different_kudu_table_name')}} for the external tables created with the {{CREATE EXTERNAL TABLE}} statement. Changing the {{kudu.table_name}}property of an external table switches which underlying Kudu table the Impala table refers to. The underlying Kudu table must already exist. {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-7640) ALTER TABLE RENAME on managed Kudu table should rename underlying Kudu table
[ https://issues.apache.org/jira/browse/IMPALA-7640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-7640. -- Resolution: Fixed Fix Version/s: Impala 3.3.0 > ALTER TABLE RENAME on managed Kudu table should rename underlying Kudu table > > > Key: IMPALA-7640 > URL: https://issues.apache.org/jira/browse/IMPALA-7640 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 2.12.0 >Reporter: Mike Percy >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.3.0 > > > Currently, when I execute ALTER TABLE RENAME on a managed Kudu table it will > not rename the underlying Kudu table. Because of IMPALA-5654 it becomes > nearly impossible to rename the underlying Kudu table, which is confusing and > makes the Kudu tables harder to identify and manage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-8101) Thrift 11 compilation and Thrift ext-data-source compilation are always run
[ https://issues.apache.org/jira/browse/IMPALA-8101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8101. -- Resolution: Fixed Fix Version/s: Impala 3.3.0 > Thrift 11 compilation and Thrift ext-data-source compilation are always run > --- > > Key: IMPALA-8101 > URL: https://issues.apache.org/jira/browse/IMPALA-8101 > Project: IMPALA > Issue Type: Task >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.3.0 > > > [~tarmstrong] pointed out that after IMPALA-7924 the build output started > displaying lines such as: "Running thrift 11 compiler on..." even during > builds when Thrift files were not modified. > I dug a bit deeper and found the following: > * This seems to be happening for Thrift compilation of {{ext-data-source}} > files as well (e.g. ExternalDataSource.thrift, Types.thrift, etc.); "Running > thrift compiler for ext-data-source on..." is always printed > * The issue is that the [custom > command|https://cmake.org/cmake/help/v3.8/command/add_custom_command.html] > for ext-data-source and Thrift 11 compilation specify an {{OUTPUT}} file that > does not exist (and is not generated by Thrift) > * According to the CMake docs "if the command does not actually create the > {{OUTPUT}} then the rule will always run" - so Thrift compilation will run > during every build > * The issue is that you don't really know what files Thrift is going to > generate without actually looking into the Thrift file and understanding > Thrift internals > * For C++ and Python there is a workaround; for C++ Thrift always generates > a file \{THRIFT_FILE_NAME}_types.h (similar situation for Python); however, > for Java no such file necessarily exists (ext-data-source only does Java gen) > ** This is how regular Thrift compilation works (e.g. compilation of > beeswax.thrift, ImpalaService.thrift, etc.); which is why we don't see the > issue for regular Thrift compilation > A solution for Thrift 11 compilation is to just add generated Python files to > the {{OUTPUT}} for the custom_command. > A solution for Thrift compilation of ext-data-source seems trickier, so open > to suggestions. > Ideally, Thrift would be provide a way to return the list of files generated > from a .thrift file, without actually generating the files, but I don't see a > way to do that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-8237) Enabling preads always fetches hedged reads metrics
Sahil Takiar created IMPALA-8237: Summary: Enabling preads always fetches hedged reads metrics Key: IMPALA-8237 URL: https://issues.apache.org/jira/browse/IMPALA-8237 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Sahil Takiar Assignee: Sahil Takiar In {{HdfsFileReader}} if preads are enabled, we assume that hedged reads are enabled as well, so whenever we close a file we make a libhdfs call to collect a few hedged read metrics from the underlying {{FileSystem}} object. However, as part of IMPALA-5212 we may want to enable preads even when hedged reads are disabled, so making the call to libhdfs to fetch hedged read metrics will be a waste. Digging through the HDFS code, it seems the HDFS client triggers hedged reads only if {{dfs.client.hedged.read.threadpool.size}} is greater than 0. We can use the same check in {{HdfsFileReader}} to trigger the fetch of hedged read metrics. The issue is that currently libhdfs does not provide a good way of getting the value of {{dfs.client.hedged.read.threadpool.size}}, it provides a method called {{hdfsConfGetInt}}, but that method simply calls {{new Configuration()}} and fetches the value of {{dfs.client.hedged.read.threadpool.size}} from it. The issue is that calling {{new Configuration}} simply loads the current {{hdfs-site.xml}}, {{core-site.xml}}, etc. which does not take into account the scenario where the default configuration has been modified for specific filesystem objects - e.g. using {{hdfsBuilder}} to set non-default configuration parameters (see HDFS-14301 for more details). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-8232) Custom cluster tests should allow setting dfs.client settings for impalads
Sahil Takiar created IMPALA-8232: Summary: Custom cluster tests should allow setting dfs.client settings for impalads Key: IMPALA-8232 URL: https://issues.apache.org/jira/browse/IMPALA-8232 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Sahil Takiar Assignee: Sahil Takiar Right now, custom cluster tests only allow specifying impalad startup options, however, it would be nice if the tests could specify arbitrary HDFS client configs as well (e.g. {{dfs.client}} options). This would allow us to increase our test integration coverage with different HDFS client setups such as (1) disabling short-circuit reads (thus triggering the code path for a remote read) (requires setting {{dfs.client.read.shortcircuit}} to false), (2) enabling hedged reads (requires setting {{dfs.client.hedged.read.threadpool.size}} to a value greater than 0). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-8166) ParquetBytesReadPerColumn is displayed for non-Parquet scans
Sahil Takiar created IMPALA-8166: Summary: ParquetBytesReadPerColumn is displayed for non-Parquet scans Key: IMPALA-8166 URL: https://issues.apache.org/jira/browse/IMPALA-8166 Project: IMPALA Issue Type: Sub-task Reporter: Sahil Takiar Assignee: Sahil Takiar The issue is that these counters are added in {{hdfs-scan-node-base.h}} These counters are only updated for Parquet, so we should only display them if the Scan Node is scanning Parquet data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-8117) KuduCatalogOpExecutor.validateKuduTblExists does not check if Kudu table exists
Sahil Takiar created IMPALA-8117: Summary: KuduCatalogOpExecutor.validateKuduTblExists does not check if Kudu table exists Key: IMPALA-8117 URL: https://issues.apache.org/jira/browse/IMPALA-8117 Project: IMPALA Issue Type: Bug Components: Catalog Reporter: Sahil Takiar Assignee: Sahil Takiar -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-6050) Query profiles should clearly indicate storage layer(s) used
[ https://issues.apache.org/jira/browse/IMPALA-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-6050. -- Resolution: Fixed Fix Version/s: Impala 3.3.0 > Query profiles should clearly indicate storage layer(s) used > > > Key: IMPALA-6050 > URL: https://issues.apache.org/jira/browse/IMPALA-6050 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Sailesh Mukil >Assignee: Sahil Takiar >Priority: Major > Labels: adls, profile, s3, supportability > Fix For: Impala 3.3.0 > > > Currently, the query profile doesn't have the location of tables and > partitions, which makes it hard to figure out what storage layer a > table/partition that was queried was on. > As we're seeing more users run Impala workloads against cloud based storage > like S3 and ADLS, we should have the query profiles show this information. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-8544) Expose additional S3A / S3Guard metrics
Sahil Takiar created IMPALA-8544: Summary: Expose additional S3A / S3Guard metrics Key: IMPALA-8544 URL: https://issues.apache.org/jira/browse/IMPALA-8544 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Sahil Takiar Assignee: Sahil Takiar S3A / S3Guard internally collects several useful metrics that we should consider exposing to Impala users. The full list of statistics can be found in {{o.a.h.fs.s3a.Statistic}}. The stats include: the number of S3 operations performed (put, get, etc.), invocation counts for various {{FileSystem}} methods, stream statistics (bytes read, written, etc.), etc. Some interesting stats that stand out: * "stream_aborted": "Count of times the TCP stream was aborted" - the number of TCP connection aborts, a high value would indicate performance issues * "stream_read_exceptions" : "Number of exceptions invoked on input streams" - incremented whenever an {{IOException}} is caught while reading (these exception don't always get propagated to Impala because they trigger a retry) * "store_io_throttled": "Requests throttled and retried" - looks like it tracks the number of times the fs retries an operation because the original request hit a throttling exception * "s3guard_metadatastore_retry": "S3Guard metadata store retry events" - looks like it tracks the number of times the fs retries S3Guard operations * "s3guard_metadatastore_throttled" : "S3Guard metadata store throttled events" - similar to "store_io_throttled" but looks like it is specific to S3Guard We should consider how to expose these metrics via Impala logs / runtime profiles. There are a few options: * {{S3AFileSystem}} exposes {{StorageStatistics}} specific to S3A / S3Guard via the {{FileSystem#getStorageStatistics}} method; the {{S3AStorageStatistics}} seems to include all the S3A / S3Guard metrics, however, I think the stats might be aggregated globally, which would make it hard to create per-query specific metrics * {{S3AInstrumentation}} exposes all the metrics as well, and looks like it is per-fs instance, so it is not aggregated globally; {{S3AInstrumentation}} extends {{o.a.h.metrics2.MetricsSource}} so perhaps it is exposed via some API (haven't looked into this yet) * {{S3AInputStream#toString}} dumps the statistics from {{o.a.h.fs.s3a.S3AInstrumentation.InputStreamStatistics}} and {{S3AFileSystem#toString}} dumps them all as well * {{S3AFileSystem}} updates the stats in {{o.a.h.fs.Statistics.StatisticsData}} as well (e.g. bytesRead, bytesWritten, etc.) Impala has a {{hdfs-fs-cache}} as well, so {{hdfsFs}} objects get shared across threads. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-8523) Migrate hdfsOpen to builder-based openFile API
Sahil Takiar created IMPALA-8523: Summary: Migrate hdfsOpen to builder-based openFile API Key: IMPALA-8523 URL: https://issues.apache.org/jira/browse/IMPALA-8523 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Sahil Takiar Assignee: Sahil Takiar When opening files via libhdfs we call {{hdfsOpen}} which ultimately calls {{FileSystem#open(Path f, int bufferSize)}}. As of HADOOP-15229, the HDFS-client now exposes a new API for opening files called {{openFile}}. The new API has a few advantages (1) it is capable of specifying file specific configuration values in a builder-based manner (see {{o.a.h.fs.FSBuilder}} for details), and (2) it can open files asynchronously (e.g. see {{o.a.h.fs.FutureDataInputStreamBuilder}} for details. The async file opens are similar to IMPALA-7738 (Implement timeouts for HDFS open calls). To avoid overlap between IMPALA-7738 and the async file opens in {{openFile}}, HADOOP-15691 can be used to check which filesystems open files asynchronously and which ones don't (currently only S3A opens files asynchronously). The main use case for the new {{openFile}} API is Impala-S3 performance. Performance benchmarks have shown that setting {{fs.s3a.experimental.input.fadvise}} to {{RANDOM}} for Parquet files can significantly improve performance, however, this setting also adversely affects scans of non-splittable file formats such as gzipped files (see HADOOP-13203). One solution to this issue is to just document that setting {{fs.s3a.experimental.input.fadvise}} to {{RANDOM}} for Parquet improves performance, however, a better solution would be to use the new {{openFile}} API to specify different values of fadvise depending on the file type. This work is dependent on exposing the new {{openFile}} API via libhdfs (HDFS-14478). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-8525) preads should use hdfsPreadFully rather than hdfsPread
Sahil Takiar created IMPALA-8525: Summary: preads should use hdfsPreadFully rather than hdfsPread Key: IMPALA-8525 URL: https://issues.apache.org/jira/browse/IMPALA-8525 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Sahil Takiar Assignee: Sahil Takiar Impala preads (only enabled if {{use_hdfs_pread}} is true) use the {{hdfsPread}} API from libhdfs, which ultimately invokes {{PositionedReadable#read(long position, byte[] buffer, int offset, int length)}} in the HDFS-client. {{PositionedReadable}} also exposes the method {{readFully(long position, byte[] buffer, int offset, int length)}}. The difference is that {{#read}} will "Read up to the specified number of bytes" whereas {{#readFully}} will "Read the specified number of bytes". So there is no guarantee that {{#read}} will read *all* of the request bytes. Impala calls {{hdfsPread}} inside {{hdfs-file-reader.cc}} and invokes it inside a while loop until all the requested bytes have been read from the file. This can cause a few performance issues: (1) if the underlying {{FileSystem}} does not support ByteBuffer reads (HDFS-2834) (e.g. S3A does not support this feature) then {{hdfsPread}} will allocate a Java array equal in size to specified length of the buffer; the call to {{PositionedReadable#read}} may only fill up the buffer partially; Impala will repeat the call to {{hdfsPread}} since the buffer was not filled, which will cause another large array allocation; this can result in a lot of wasted time doing unnecessary array allocations (2) given that Impala calls {{hdfsPread}} in a while loop, there is no point in continuously calling {{hdfsPread}} when a single call to {{hdfsPreadFully}} will achieve the same thing (this doesn't actually affect performance much, but is unnecessary) Prior solutions to this problem have been to introduce a "chunk-size" to Impala reads (https://gerrit.cloudera.org/#/c/63/ - S3: DiskIoMgr related changes for S3). However, with the migration to {{hdfsPreadFully}} the chunk-size is no longer necessary. Furthermore, preads are most effective when the data is read all at once (e.g. in 8 MB chunks as specified by {{read_size}}) rather than in smaller chunks (typically 128K). For example, {{DFSInputStream#read(long position, byte[] buffer, int offset, int length)}} opens up remote block readers with a byte range determined by the value of {{length}} passed into the {{#read}} call. Similarly, {{S3AInputStream#readFully}} will issue an HTTP GET request with the size of the read specified by the given {{length}} (although fadvise must be set to RANDOM for this to work). This work is dependent on exposing {{readFully}} via libhdfs first: HDFS-14478 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-8250) Impala crashes with -Xcheck:jni
[ https://issues.apache.org/jira/browse/IMPALA-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8250. -- Resolution: Fixed I'm closing this. I re-ran exhaustive tests against Impala with -Xcheck:jni and Impala no longer crashes; all tests pass. There are still a ton of JNI warnings, so I'm going to file a follow up JIRA to fix them. > Impala crashes with -Xcheck:jni > --- > > Key: IMPALA-8250 > URL: https://issues.apache.org/jira/browse/IMPALA-8250 > Project: IMPALA > Issue Type: Task >Reporter: Philip Zeyliger >Priority: Major > Fix For: Impala 3.2.0 > > > The JVM has a checker for JNI usage, and Impala (and libhdfs) have some > violations. This ticket captures figuring that out. At least one of the > issues can crash Impala. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-8568) Fix Impala JNI warnings when -Xcheck:jni is enabled
Sahil Takiar created IMPALA-8568: Summary: Fix Impala JNI warnings when -Xcheck:jni is enabled Key: IMPALA-8568 URL: https://issues.apache.org/jira/browse/IMPALA-8568 Project: IMPALA Issue Type: Improvement Reporter: Sahil Takiar Assignee: Sahil Takiar IMPALA-8250 made a lot of improvements to our usage of the JNI. Impala no longer crashes when running exhaustive tests with -Xcheck:jni enabled. We made some progress in cleaning up libhdfs JNI usage in HDFS-14321 and HDFS-14348 as well. However, re-running exhaustive tests with -Xcheck:jni still shows a lot of warnings. It's not clear if these warnings are from libhdfs or Impala, but either way we should drive a fix. The most concerning of the current list of JNI warnings produced by Impala, are the "JNI call made without checking exceptions when required to from ..." warnings. Essentially, this means that when making a JNI call, we are not properly checking for exceptions. This can be problematic because a JNI call make throw an exception, and we end up swallowing it. There are lots of warnings about "WARNING: JNI local refs: [x], exceeds capacity: [y]". Based on some digging (e.g. https://community.oracle.com/message/13290783) it looks like these warnings aren't fatal, but are just bad practice. I think we can fix the most egregious offenders (looks like the HBase code is one of them), and hopefully live with the rest (a lot of the warnings are thrown by internal Java code as well). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-8428) Add support for caching file handles on s3
[ https://issues.apache.org/jira/browse/IMPALA-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8428. -- Resolution: Fixed Fix Version/s: Impala 3.3.0 > Add support for caching file handles on s3 > -- > > Key: IMPALA-8428 > URL: https://issues.apache.org/jira/browse/IMPALA-8428 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 3.3.0 >Reporter: Joe McDonnell >Assignee: Sahil Takiar >Priority: Critical > Fix For: Impala 3.3.0 > > > The file handle cache is currently disabled for S3, as the S3 connector > needed to implement proper unbuffer support. Now that > https://issues.apache.org/jira/browse/HADOOP-14747 is fixed, Impala should > provide an option to cache S3 file handles. > This is particularly important for data caching, as accessing the data cache > happens after obtaining a file handle. If getting a file handle is slow, the > caching will be less effective. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-8490) Impala Doc: the file handle cache now supports S3
Sahil Takiar created IMPALA-8490: Summary: Impala Doc: the file handle cache now supports S3 Key: IMPALA-8490 URL: https://issues.apache.org/jira/browse/IMPALA-8490 Project: IMPALA Issue Type: Sub-task Reporter: Sahil Takiar Assignee: Alex Rodoni https://impala.apache.org/docs/build/html/topics/impala_scalability.html state: {quote} Because this feature only involves HDFS data files, it does not apply to non-HDFS tables, such as Kudu or HBase tables, or tables that store their data on cloud services such as S3 or ADLS. {quote} This section should be updated because the file handle cache now supports S3 files. We should add a section to the docs similar to what we added when support for remote HDFS files was added to the file handle cache: {quote} In Impala 3.2 and higher, file handle caching also applies to remote HDFS file handles. This is controlled by the cache_remote_file_handles flag for an impalad. It is recommended that you use the default value of true as this caching prevents your NameNode from overloading when your cluster has many remote HDFS reads. {quote} Like {{cache_remote_file_handles}} the flag {{cache_s3_file_handles}} has been added as an impalad startup option (the flag is enabled by default). Unlike HDFS though, S3 has no NameNode, the benefit is that it eliminate a call to {{getFileStatus()}} on the target S3 file. So "prevents your NameNode from overloading when your cluster has many remote HDFS reads" should be changed to something like "avoids an unnecessary call to S3AFileSystem#getFileStatus() which reduces the number of API calls made to S3." -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-8760) TestAdmissionControllerStress.test_mem_limit timed out waiting for query to end
Sahil Takiar created IMPALA-8760: Summary: TestAdmissionControllerStress.test_mem_limit timed out waiting for query to end Key: IMPALA-8760 URL: https://issues.apache.org/jira/browse/IMPALA-8760 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Sahil Takiar Assignee: Bikramjeet Vig {code} custom_cluster.test_admission_controller.TestAdmissionControllerStress.test_mem_limit[num_queries: 30 | protocol: beeswax | table_format: text/none | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | submission_delay_ms: 150 | round_robin_submission: True]{code} Is failing with the exception: {code} Error Message AssertionError: Timed out waiting 90 seconds for query end assert (1562916293.1308379 - 1562916203.0256219) < 90 + where 1562916293.1308379 = time() Stacktrace custom_cluster/test_admission_controller.py:1649: in test_mem_limit {'request_pool': self.pool_name, 'mem_limit': query_mem_limit}) custom_cluster/test_admission_controller.py:1541: in run_admission_test self.end_admitted_queries(num_to_end) custom_cluster/test_admission_controller.py:1320: in end_admitted_queries assert (time() - start_time < STRESS_TIMEOUT),\ E AssertionError: Timed out waiting 90 seconds for query end E assert (1562916293.1308379 - 1562916203.0256219) < 90 E+ where 1562916293.1308379 = time() {code} Looks like the timeout of 90 seconds isn't enough. Looks similar to IMPALA-8295 -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IMPALA-8767) Jenkins failures due to "Could not get lock /var/lib/dpkg/lock"
Sahil Takiar created IMPALA-8767: Summary: Jenkins failures due to "Could not get lock /var/lib/dpkg/lock" Key: IMPALA-8767 URL: https://issues.apache.org/jira/browse/IMPALA-8767 Project: IMPALA Issue Type: Bug Reporter: Sahil Takiar -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IMPALA-8826) Impala Doc: Add docs for PLAN_ROOT_SINK
Sahil Takiar created IMPALA-8826: Summary: Impala Doc: Add docs for PLAN_ROOT_SINK Key: IMPALA-8826 URL: https://issues.apache.org/jira/browse/IMPALA-8826 Project: IMPALA Issue Type: Sub-task Components: Docs Reporter: Sahil Takiar Currently, I don't see many docs explaining what a {{PLAN_ROOT_SINK}} is, even though it shows up in explain plans and runtime profiles. After more of the changes in IMPALA-8656 are merged, understanding what {{PLAN_ROOT_SINK}} is will be more important, because it will start taking up a memory reservation and possibly spilling to disk. I don't see any docs on data sinks in general, so perhaps it would be useful to create a dedicated page for explaining data sinks and how they work. We can start by documenting the {{PLAN_ROOT_SINK}} as that may be the most commonly used one. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IMPALA-8825) Add additional counters to PlanRootSink
Sahil Takiar created IMPALA-8825: Summary: Add additional counters to PlanRootSink Key: IMPALA-8825 URL: https://issues.apache.org/jira/browse/IMPALA-8825 Project: IMPALA Issue Type: Sub-task Components: Backend Reporter: Sahil Takiar Assignee: Sahil Takiar The current entry in the runtime profile for {{PLAN_ROOT_SINK}} does not contain much useful information: {code:java} PLAN_ROOT_SINK:(Total: 234.996ms, non-child: 234.996ms, % non-child: 100.00%) - PeakMemoryUsage: 0{code} There are several additional counters we could add to the {{PlanRootSink}} (either the {{BufferedPlanRootSink}} or {{BlockingPlanRootSink}}): * Amount of time spent blocking inside the {{PlanRootSink}} - both the time spent by the client thread waiting for rows to become available and the time spent by the impala thread waiting for the client to consume rows ** So similar to the {{RowBatchQueueGetWaitTime}} and {{RowBatchQueuePutWaitTime}} inside the scan nodes ** The difference between these counters and the ones in {{ClientRequestState}} (e.g. {{ClientFetchWaitTimer}} and {{RowMaterializationTimer}}) should be documented * For {{BufferedPlanRootSink}} there are already several {{Buffer pool}} counters, we should make sure they are exposed in the {{PLAN_ROOT_SINK}} section * Track the number of rows sent (e.g. rows sent to {{PlanRootSink::Send}} and the number of rows fetched (might need to be tracked in the {{ClientRequestState}}) ** For {{BlockingPlanRootSink}} the sent and fetched values should be pretty much the same, but for {{BufferedPlanRootSink}} this is more useful ** Similar to {{RowsReturned}} in each exec node * The rate at which rows are sent and fetched ** Should be useful when attempting to debug perf of the fetching rows (e.g. if the send rate is much higher than the fetch rate, then maybe there is something wrong with the client) ** Similar to {{RowsReturnedRate}} in each exec node Open to other suggestions for counters that folks think are useful. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IMPALA-8871) Upgrade Thrift version in fe
Sahil Takiar created IMPALA-8871: Summary: Upgrade Thrift version in fe Key: IMPALA-8871 URL: https://issues.apache.org/jira/browse/IMPALA-8871 Project: IMPALA Issue Type: Improvement Components: Frontend Reporter: Sahil Takiar Assignee: Sahil Takiar We should upgrade the Thrift version in the frontend to 0.9.3-1. Since this is just a fe/ change, it does not require upgrading the toolchain. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (IMPALA-8890) DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch
[ https://issues.apache.org/jira/browse/IMPALA-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8890. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed > DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch > > > Key: IMPALA-8890 > URL: https://issues.apache.org/jira/browse/IMPALA-8890 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Affects Versions: Impala 3.4.0 >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Blocker > Fix For: Impala 3.4.0 > > Attachments: impalad.INFO, resolved.txt > > > Full stack: > {code} > F0823 13:19:42.262142 60340 buffered-tuple-stream.cc:291] > 6a4941285b46788d:68021ec6] Check failed: > !page->attached_to_output_batch > *** Check failure stack trace: *** > @ 0x4c987cc google::LogMessage::Fail() > @ 0x4c9a071 google::LogMessage::SendToLog() > @ 0x4c981a6 google::LogMessage::Flush() > @ 0x4c9b76d google::LogMessageFatal::~LogMessageFatal() > @ 0x2917f78 impala::BufferedTupleStream::ExpectedPinCount() > @ 0x29181ec impala::BufferedTupleStream::UnpinPageIfNeeded() > @ 0x291b27b impala::BufferedTupleStream::UnpinStream() > @ 0x297d429 impala::SpillableRowBatchQueue::AddBatch() > @ 0x25d5537 impala::BufferedPlanRootSink::Send() > @ 0x207e94c impala::FragmentInstanceState::ExecInternal() > @ 0x207afac impala::FragmentInstanceState::Exec() > @ 0x208e854 impala::QueryState::ExecFInstance() > @ 0x208cb21 > _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv > @ 0x2090536 > _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE > @ 0x1e9830b boost::function0<>::operator()() > @ 0x23e2d38 impala::Thread::SuperviseThread() > @ 0x23eb0bc boost::_bi::list5<>::operator()<>() > @ 0x23eafe0 boost::_bi::bind_t<>::operator()() > @ 0x23eafa3 boost::detail::thread_data<>::run() > @ 0x3bc1629 thread_proxy > @ 0x7f920a3786b9 start_thread > @ 0x7f9206b5741c clone > {code} > Happened once while I was running a full table scan of > {{tpch_parquet.orders}} via JDBC (client was running on another EC2 machine). > This was running on top of IMPALA-8819 with a fetch size of 32768. > Attached full logs and mini-dump stack. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-558) HS2::FetchResults sets hasMoreRows in many cases where no more rows are to be returned
[ https://issues.apache.org/jira/browse/IMPALA-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-558. - Resolution: Won't Fix Marking this as "Won't Fix" since it is not a major issue, and fixing this require a decent amount of code re-factoring. Furthermore, clients that enable result spooling should see this issue significantly less. > HS2::FetchResults sets hasMoreRows in many cases where no more rows are to be > returned > -- > > Key: IMPALA-558 > URL: https://issues.apache.org/jira/browse/IMPALA-558 > Project: IMPALA > Issue Type: Sub-task > Components: Clients >Affects Versions: Impala 1.1 >Reporter: Henry Robinson >Priority: Minor > Labels: query-lifecycle > > The first call to {{FetchResults}} always sets {{hasMoreRows}} even when 0 > rows should be returned. The next call correctly sets {{hasMoreRows == > False}}. The upshot is there's always an extra round-trip, although > correctness isn't affected. > {code} > execute_statement_req = TCLIService.TExecuteStatementReq() > execute_statement_req.sessionHandle = resp.sessionHandle > execute_statement_req.statement = "SELECT COUNT(*) FROM > functional.alltypes WHERE 1 = 2" > execute_statement_resp = > self.hs2_client.ExecuteStatement(execute_statement_req) > > fetch_results_req = TCLIService.TFetchResultsReq() > fetch_results_req.operationHandle = execute_statement_resp.operationHandle > fetch_results_req.maxRows = 100 > fetch_results_resp = self.hs2_client.FetchResults(fetch_results_req) > > assert not fetch_results_resp.hasMoreRows # Fails > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-8902) TestResultSpooling,test_spilling is flaky
[ https://issues.apache.org/jira/browse/IMPALA-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8902. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed Bumped up the timeout in the test, flakiness should be resolved. > TestResultSpooling,test_spilling is flaky > - > > Key: IMPALA-8902 > URL: https://issues.apache.org/jira/browse/IMPALA-8902 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 3.4.0 >Reporter: Attila Jeges >Assignee: Sahil Takiar >Priority: Critical > Fix For: Impala 3.4.0 > > > Error: > {code:java} > 17:45:10 FAIL > query_test/test_result_spooling.py::TestResultSpooling::()::test_spilling[protocol: > beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > parquet/none] > 17:45:10 === FAILURES > === > 17:45:10 TestResultSpooling.test_spilling[protocol: beeswax | exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, > 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] > 17:45:10 [gw1] linux2 -- Python 2.7.5 > /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/bin/../infra/python/env/bin/python > 17:45:10 query_test/test_result_spooling.py:104: in test_spilling > 17:45:10 .format(query, timeout)) > 17:45:10 E Timeout: Query select * from functional.alltypes order by id > limit 1500 did not spill spooled results within the timeout 10 > 17:45:10 - Captured stderr call > - > 17:45:10 SET > client_identifier=query_test/test_result_spooling.py::TestResultSpooling::()::test_spilling[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_threshold':0}|table_f; > 17:45:10 SET min_spillable_buffer_size=8192; > 17:45:10 SET batch_size=0; > 17:45:10 SET num_nodes=0; > 17:45:10 SET disable_codegen_rows_threshold=0; > 17:45:10 SET disable_codegen=False; > 17:45:10 SET abort_on_error=1; > 17:45:10 SET default_spillable_buffer_size=8192; > 17:45:10 SET max_result_spooling_mem=32768; > 17:45:10 SET exec_single_node_rows_threshold=0; > 17:45:10 -- executing against localhost:21000 > 17:45:10 > 17:45:10 select * from functional.alltypes order by id limit 1500; > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-1618) Impala server should always try to fulfill requested fetch size
[ https://issues.apache.org/jira/browse/IMPALA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-1618. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Impala server should always try to fulfill requested fetch size > --- > > Key: IMPALA-1618 > URL: https://issues.apache.org/jira/browse/IMPALA-1618 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Affects Versions: Impala 2.0.1 >Reporter: casey >Priority: Minor > Labels: usability > Fix For: Impala 3.4.0 > > > The thrift fetch request specifies the number of rows that it would like but > the Impala server may return fewer even though more results are available. > For example, using the default row_batch size of 1024, if the client requests > 1023 rows, the first response contains 1023 rows but the second response > contains only 1 row. This is because the server internally uses row_batch > (1024), returns the requested count (1023) and caches the remaining row, then > the next time around only uses the cache. > In general the end user should set both the row batch size and the thrift > request size. In practice the query writer setting row_batch and the > driver/programmer setting fetch size may often be different people. > There is one case that works fine now though - setting the batch size to less > than the thrift req size. In this case the thrift response is always the same > as batch size. > Code example: > {noformat} > dev@localhost:~/impyla$ git diff > diff --git a/impala/_rpc/hiveserver2.py b/impala/_rpc/hiveserver2.py > index 6139002..31fdab7 100644 > --- a/impala/_rpc/hiveserver2.py > +++ b/impala/_rpc/hiveserver2.py > @@ -265,6 +265,7 @@ def fetch_results(service, operation_handle, > hs2_protocol_version, schema=None, > req = TFetchResultsReq(operationHandle=operation_handle, > orientation=orientation, > maxRows=max_rows) > +print("req: " + str(max_rows)) > resp = service.FetchResults(req) > err_if_rpc_not_ok(resp) > > @@ -273,6 +274,7 @@ def fetch_results(service, operation_handle, > hs2_protocol_version, schema=None, > for (i, col) in enumerate(resp.results.columns)] > num_cols = len(tcols) > num_rows = len(tcols[0].values) > +print("rec: " + str(num_rows)) > rows = [] > for i in xrange(num_rows): > row = [] > dev@localhost:~/impyla$ cat test.py > from impala.dbapi import connect > conn = connect() > cur = conn.cursor() > cur.set_arraysize(1024) > cur.execute("set batch_size=1025") > cur.execute("select * from tpch.lineitem") > while True: > rows = cur.fetchmany() > if not rows: > break > cur.close() > conn.close() > dev@localhost:~/impyla$ python test.py | head > Failed to import pandas > req: 1024 > rec: 1024 > req: 1024 > rec: 1 > req: 1024 > rec: 1024 > req: 1024 > rec: 1 > req: 1024 > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-8819) BufferedPlanRootSink should handle non-default fetch sizes
[ https://issues.apache.org/jira/browse/IMPALA-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8819. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed > BufferedPlanRootSink should handle non-default fetch sizes > -- > > Key: IMPALA-8819 > URL: https://issues.apache.org/jira/browse/IMPALA-8819 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.4.0 > > > As of IMPALA-8780, the {{BufferedPlanRootSink}} returns an error whenever a > client sets the fetch size to a value lower than the {{BATCH_SIZE}}. The > issue is that when reading from a {{RowBatch}} from the queue, the batch > might contain more rows than the number requested by the client. So the > {{BufferedPlanRootSink}} needs to be able to partially read a {{RowBatch}} > and remember the index of the rows it read. Furthermore, {{num_results}} in > {{BufferedPlanRootSink::GetNext}} could be lower than {{BATCH_SIZE}} if the > query results cache in {{ClientRequestState}} has a cache hit (only happens > if the client cursor is reset). > Another issue is that the {{BufferedPlanRootSink}} can only read up to a > single {{RowBatch}} at a time. So if a fetch size larger than {{BATCH_SIZE}} > is specified, only {{BATCH_SIZE}} rows will be written to the given > {{QueryResultSet}}. This is consistent with the legacy behavior of > {{PlanRootSink}} (now {{BlockingPlanRootSink}}), but is not ideal because > that means clients can only read {{BATCH_SIZE}} rows at a time. A higher > fetch size would potentially reduce the number of round-trips necessary > between the client and the coordinator, which could improve fetch performance > (but only if the {{BlockingPlanRootSink}} is capable of filling all the > requested rows). -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IMPALA-8890) DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch
Sahil Takiar created IMPALA-8890: Summary: DCHECK(!page->attached_to_output_batch) in SpillableRowBatchQueue::AddBatch Key: IMPALA-8890 URL: https://issues.apache.org/jira/browse/IMPALA-8890 Project: IMPALA Issue Type: Sub-task Components: Backend Affects Versions: Impala 3.4.0 Reporter: Sahil Takiar Assignee: Sahil Takiar Attachments: impalad.INFO, resolved.txt Full stack: {code} F0823 13:19:42.262142 60340 buffered-tuple-stream.cc:291] 6a4941285b46788d:68021ec6] Check failed: !page->attached_to_output_batch *** Check failure stack trace: *** @ 0x4c987cc google::LogMessage::Fail() @ 0x4c9a071 google::LogMessage::SendToLog() @ 0x4c981a6 google::LogMessage::Flush() @ 0x4c9b76d google::LogMessageFatal::~LogMessageFatal() @ 0x2917f78 impala::BufferedTupleStream::ExpectedPinCount() @ 0x29181ec impala::BufferedTupleStream::UnpinPageIfNeeded() @ 0x291b27b impala::BufferedTupleStream::UnpinStream() @ 0x297d429 impala::SpillableRowBatchQueue::AddBatch() @ 0x25d5537 impala::BufferedPlanRootSink::Send() @ 0x207e94c impala::FragmentInstanceState::ExecInternal() @ 0x207afac impala::FragmentInstanceState::Exec() @ 0x208e854 impala::QueryState::ExecFInstance() @ 0x208cb21 _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv @ 0x2090536 _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE @ 0x1e9830b boost::function0<>::operator()() @ 0x23e2d38 impala::Thread::SuperviseThread() @ 0x23eb0bc boost::_bi::list5<>::operator()<>() @ 0x23eafe0 boost::_bi::bind_t<>::operator()() @ 0x23eafa3 boost::detail::thread_data<>::run() @ 0x3bc1629 thread_proxy @ 0x7f920a3786b9 start_thread @ 0x7f9206b5741c clone {code} Happened once while I was running a full table scan of {{tpch_parquet.orders}} via JDBC (client was running on another EC2 machine). This was running on top of IMPALA-8819 with a fetch size of 32768. Attached full logs and mini-dump stack. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IMPALA-8888) Profile fetch performance when result spooling is enabled
Sahil Takiar created IMPALA-: Summary: Profile fetch performance when result spooling is enabled Key: IMPALA- URL: https://issues.apache.org/jira/browse/IMPALA- Project: IMPALA Issue Type: Sub-task Reporter: Sahil Takiar Assignee: Sahil Takiar Profile the performance of fetching rows when result spooling is enabled. There are a few queries that can be used to benchmark the performance: {{time ./bin/impala-shell.sh -B -q "select l_orderkey from tpch_parquet.lineitem" > /dev/null}} {{time ./bin/impala-shell.sh -B -q "select * from tpch_parquet.orders" > /dev/null}} The first fetches one column and 6,001,215 the second fetches 9 columns and 1,500,000 - so a mix of rows fetched vs. columns fetched. The base line for the benchmark should be the commit prior to IMPALA-8780. The benchmark should check for both latency and CPU usage (to see if the copy into {{BufferedTupleStream}} has a significant overhead). Various fetch sizes should be used in the benchmark as well to see if increasing the fetch size for result spooling improves performance (ideally it should) (it would be nice to run some fetches between machines as well as that will better reflect network round trip latencies). -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Closed] (IMPALA-8818) Replace deque queue with spillable queue in BufferedPlanRootSink
[ https://issues.apache.org/jira/browse/IMPALA-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar closed IMPALA-8818. Fix Version/s: Impala 3.4.0 Resolution: Fixed > Replace deque queue with spillable queue in BufferedPlanRootSink > > > Key: IMPALA-8818 > URL: https://issues.apache.org/jira/browse/IMPALA-8818 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.4.0 > > > Add a {{SpillableRowBatchQueue}} to replace the {{DequeRowBatchQueue}} in > {{BufferedPlanRootSink}}. The {{SpillableRowBatchQueue}} will wrap a > {{BufferedTupleStream}} and take in a {{TBackendResourceProfile}} created by > {{PlanRootSink#computeResourceProfile}}. > *BufferedTupleStream Usage*: > The wrapped {{BufferedTupleStream}} should be created in 'attach_on_read' > mode so that pages are attached to the output {{RowBatch}} in > {{BufferedTupleStream::GetNext}}. The BTS should start off as pinned (e.g. > all pages are pinned). If a call to {{BufferedTupleStream::AddRow}} returns > false (it returns false if "the unused reservation was not sufficient to add > a new page to the stream large enough to fit 'row' and the stream could not > increase the reservation to get enough unused reservation"), it should unpin > the stream ({{BufferedTupleStream::UnpinStream}}) and then add the row (if > the row still could not be added, then an error must have occurred, perhaps > an IO error, in which case return the error and fail the query). > *Constraining Resources*: > When result spooling is disabled, a user can run a {{select * from > [massive-fact-table]}} and scroll through the results without affecting the > health of the Impala cluster (assuming they close they query promptly). > Impala will stream the results one batch at a time to the user. > With result spooling, a naive implementation might try and buffer the enter > fact table, and end up spilling all the contents to disk, which can > potentially take up a large amount of space. So there needs to be > restrictions on the memory and disk space used by the {{BufferedTupleStream}} > in order to ensure a scan of a massive table does not consume all the memory > or disk space of the Impala coordinator. > This problem can be solved by placing a max size on the amount of unpinned > memory (perhaps through a new config option > {{MAX_PINNED_RESULT_SPOOLING_MEMORY}} (maybe set to a few GBs by default). > The max amount of pinned memory should already be constrained by the > reservation (see next paragraph). NUM_ROWS_PRODUCED_LIMIT already limits the > number of rows returned by a query, and so it should limit the number of rows > buffered by the BTS as well (although it is set to 0 by default). > SCRATCH_LIMIT already limits the amount of disk space used for spilling > (although it is set to -1 by default). > The {{PlanRootSink}} should attempt to accurately estimate how much memory it > needs to buffer all results in memory. This requires setting an accurate > value of {{ResourceProfile#memEstimateBytes_}} in > {{PlanRootSink#computeResourceProfile}}. If statistics are available, the > estimate can be based on the number of estimated rows returned multiplied by > the size of the rows returned. The min reservation should account for a read > and write page for the {{BufferedTupleStream}}. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IMPALA-8906) TestObservability.test_query_profile_contains_query_compilation_metadata_load_events is flaky
Sahil Takiar created IMPALA-8906: Summary: TestObservability.test_query_profile_contains_query_compilation_metadata_load_events is flaky Key: IMPALA-8906 URL: https://issues.apache.org/jira/browse/IMPALA-8906 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Sahil Takiar Assignee: Tamas Mate This test failed in a recent run of ubuntu-16.04-dockerised-tests: [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/1100/testReport/junit/query_test.test_observability/TestObservability/test_query_profile_contains_query_compilation_metadata_load_events/] Error Message: {code:java} query_test/test_observability.py:340: in test_query_profile_contains_query_compilation_metadata_load_events self.__verify_profile_event_sequence(load_event_regexes, runtime_profile) query_test/test_observability.py:432: in __verify_profile_event_sequence assert event_regex_index == 0, \ E AssertionError: CatalogFetch.PartitionLists.Misses not in- CatalogFetch.PartitionLists.Hits: 1 E Query (id=56480a470616cf3c:7cfadfbe): E DEBUG MODE WARNING: Query profile created while running a DEBUG build of Impala. Use RELEASE builds to measure query performance. E Summary: E Session ID: 854d1d6ab3cb65b7:9ba11e621c088385 E Session Type: BEESWAX E Start Time: 2019-08-28 20:01:05.725329000 E End Time: 2019-08-28 20:01:07.305869000 E Query Type: QUERY E Query State: FINISHED E Query Status: OK E Impala Version: impalad version 3.4.0-SNAPSHOT DEBUG (build 207b1443ff1b116d2d031dc5325ce971af80c4a6) E User: ubuntu E Connected User: ubuntu E Delegated User: E Network Address: 172.18.0.1:44044 E Default Db: default E Sql Statement: select * from functional.alltypes E Coordinator: f6d78aab23cf:22000 E Query Options (set by configuration): DEBUG_ACTION=CRS_BEFORE_ADMISSION:SLEEP@1000,TIMEZONE=Zulu,CLIENT_IDENTIFIER=query_test/test_observability.py::TestObservability::()::test_exec_summary_in_runtime_profile E Query Options (set by configuration and planner): DEBUG_ACTION=CRS_BEFORE_ADMISSION:SLEEP@1000,MT_DOP=0,TIMEZONE=Zulu,CLIENT_IDENTIFIER=query_test/test_observability.py::TestObservability::()::test_exec_summary_in_runtime_profile E Plan: E E Max Per-Host Resource Reservation: Memory=32.00KB Threads=3 E Per-Host Resource Estimates: Memory=160MB E Codegen disabled by planner E Analyzed query: SELECT * FROM functional.alltypes E E F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 E | Per-Host Resources: mem-estimate=490.49KB mem-reservation=0B thread-reservation=1 E PLAN-ROOT SINK E | mem-estimate=0B mem-reservation=0B thread-reservation=0 E | E 01:EXCHANGE [UNPARTITIONED] E | mem-estimate=490.49KB mem-reservation=0B thread-reservation=0 E | tuple-ids=0 row-size=89B cardinality=7.30K E | in pipelines: 00(GETNEXT) E | E F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3 E Per-Host Resources: mem-estimate=160.00MB mem-reservation=32.00KB thread-reservation=2 E 00:SCAN HDFS [functional.alltypes, RANDOM] EHDFS partitions=24/24 files=24 size=478.45KB Estored statistics: E table: rows=7.30K size=478.45KB E partitions: 24/24 rows=7.30K E columns: all E extrapolated-rows=disabled max-scan-range-rows=310 E mem-estimate=160.00MB mem-reservation=32.00KB thread-reservation=1 E tuple-ids=0 row-size=89B cardinality=7.30K Ein pipelines: 00(GETNEXT) E E Estimated Per-Host Mem: 168274422 E Request Pool: default-pool E Per Host Min Memory Reservation: 6db176633e3a:22000(32.00 KB) bf5c6b4d70c3:22000(32.00 KB) f6d78aab23cf:22000(32.00 KB) E Per Host Number of Fragment Instances: 6db176633e3a:22000(1) bf5c6b4d70c3:22000(1) f6d78aab23cf:22000(2) E Admission result: Admitted immediately E Cluster Memory Admitted: 481.44 MB E Executor Group: default E ExecSummary: E Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail E - E F01:ROOT 1 323.998ms 323.998ms 0 0 E 01:EXCHANGE1 3.999ms3.999ms 7.30K 7.30K 776.00 KB 490.49 KB UNPARTITIONEDE F00:EXCHANGE SENDER37.999ms 19.999ms 1.55 KB 0 E 00:SCAN HDFS 3 66.666ms
[jira] [Created] (IMPALA-8907) TestResultSpooling.test_slow_query is flaky
Sahil Takiar created IMPALA-8907: Summary: TestResultSpooling.test_slow_query is flaky Key: IMPALA-8907 URL: https://issues.apache.org/jira/browse/IMPALA-8907 Project: IMPALA Issue Type: Bug Reporter: Sahil Takiar Assignee: Sahil Takiar Recently failed in a ubuntu-16.04-dockerised-tests job: [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/1102/testReport/junit/query_test.test_result_spooling/TestResultSpooling/test_slow_query_protocol__beeswax___exec_optionbatch_size___0___num_nodes___0___disable_codegen_rows_threshold___0___disable_codegen___False___abort_on_error___1___exec_single_node_rows_threshold___0table_format__parquet_none_/] Error Message: {code:java} query_test/test_result_spooling.py:172: in test_slow_query assert re.search(get_wait_time_regex, self.client.get_runtime_profile(handle)) \ E assert None is not None E+ where None = ('RowBatchGetWaitTime: [1-9]', 'Query (id=7f47e1d6a1a1c804:492214eb):\n DEBUG MODE WARNING: Query profile created while running a DEBUG buil... - OptimizationTime: 331.998ms\n - PeakMemoryUsage: 1.09 MB (1144320)\n - PrepareTime: 31.999ms\n') E +where = re.search E+and 'Query (id=7f47e1d6a1a1c804:492214eb):\n DEBUG MODE WARNING: Query profile created while running a DEBUG buil... - OptimizationTime: 331.998ms\n - PeakMemoryUsage: 1.09 MB (1144320)\n - PrepareTime: 31.999ms\n' = >() E+ where > = .get_runtime_profile E+where = .client {code} Stacktrace: {code:java} query_test/test_result_spooling.py:172: in test_slow_query assert re.search(get_wait_time_regex, self.client.get_runtime_profile(handle)) \ E assert None is not None E+ where None = ('RowBatchGetWaitTime: [1-9]', 'Query (id=7f47e1d6a1a1c804:492214eb):\n DEBUG MODE WARNING: Query profile created while running a DEBUG buil... - OptimizationTime: 331.998ms\n - PeakMemoryUsage: 1.09 MB (1144320)\n - PrepareTime: 31.999ms\n') E+where = re.search E+and 'Query (id=7f47e1d6a1a1c804:492214eb):\n DEBUG MODE WARNING: Query profile created while running a DEBUG buil... - OptimizationTime: 331.998ms\n - PeakMemoryUsage: 1.09 MB (1144320)\n - PrepareTime: 31.999ms\n' = >() E+ where > = .get_runtime_profile E+where = .client {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-8845) Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState
[ https://issues.apache.org/jira/browse/IMPALA-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8845. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState > > > Key: IMPALA-8845 > URL: https://issues.apache.org/jira/browse/IMPALA-8845 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Michael Ho >Priority: Major > Fix For: Impala 3.4.0 > > > While testing IMPALA-8818, I found that IMPALA-8780 does not always cause all > non-coordinator fragments to shutdown. In certain setups, TopN queries > ({{select * from [table] order by [col] limit [limit]}}) where all results > are successfully spooled, still keep non-coordinator fragments alive. > The issue is that sometimes the {{DATASTREAM SINK}} for the TopN <-- Scan > Node fragment ends up blocking waiting for a response to a {{TransmitData()}} > RPC. This prevents the fragment from shutting down. > I haven't traced the issue exactly, but what I *think* is happening is that > the {{MERGING-EXCHANGE}} operator in the coordinator fragment hits {{eos}} > whenever it has received enough rows to reach the limit defined in the query, > which could occur before the {{DATASTREAM SINK}} sends all the rows from the > TopN / Scan Node fragment. > So the TopN / Scan Node fragments end up hanging until they are explicitly > closed. > The fix is to close the {{ExecNode}} tree in {{FragmentInstanceState}} as > eagerly as possible. Moving the close call to before the call to > {{DataSink::FlushFinal}} fixes the issue. It has the added benefit that it > shuts down and releases all {{ExecNode}} resources as soon as it can. When > result spooling is enabled, this is particularly important because > {{FlushFinal}} might block until the consumer reads all rows. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-8907) TestResultSpooling.test_slow_query is flaky
[ https://issues.apache.org/jira/browse/IMPALA-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8907. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed > TestResultSpooling.test_slow_query is flaky > --- > > Key: IMPALA-8907 > URL: https://issues.apache.org/jira/browse/IMPALA-8907 > Project: IMPALA > Issue Type: Bug >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.4.0 > > > Recently failed in an ubuntu-16.04-dockerised-tests job: > [https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/1102/testReport/junit/query_test.test_result_spooling/TestResultSpooling/test_slow_query_protocol__beeswax___exec_optionbatch_size___0___num_nodes___0___disable_codegen_rows_threshold___0___disable_codegen___False___abort_on_error___1___exec_single_node_rows_threshold___0table_format__parquet_none_/] > Error Message: > {code:java} > query_test/test_result_spooling.py:172: in test_slow_query assert > re.search(get_wait_time_regex, self.client.get_runtime_profile(handle)) \ E > assert None is not None E+ where None = 0x7f0da4115c08>('RowBatchGetWaitTime: [1-9]', 'Query > (id=7f47e1d6a1a1c804:492214eb):\n DEBUG MODE WARNING: Query profile > created while running a DEBUG buil... - OptimizationTime: 331.998ms\n >- PeakMemoryUsage: 1.09 MB (1144320)\n - PrepareTime: > 31.999ms\n') E+where = re.search > E+and 'Query (id=7f47e1d6a1a1c804:492214eb):\n DEBUG MODE > WARNING: Query profile created while running a DEBUG buil... - > OptimizationTime: 331.998ms\n - PeakMemoryUsage: 1.09 MB > (1144320)\n - PrepareTime: 31.999ms\n' = BeeswaxConnection.get_runtime_profile of > 0x7f0d94afa7d0>>( 0x7f0d94afffd0>) E+ where BeeswaxConnection.get_runtime_profile of > > > = 0x7f0d94afa7d0>.get_runtime_profile E+where > = > .client > {code} > Stacktrace: > {code:java} > query_test/test_result_spooling.py:172: in test_slow_query > assert re.search(get_wait_time_regex, > self.client.get_runtime_profile(handle)) \ > E assert None is not None > E+ where None = 0x7f0da4115c08>('RowBatchGetWaitTime: [1-9]', 'Query > (id=7f47e1d6a1a1c804:492214eb):\n DEBUG MODE WARNING: Query profile > created while running a DEBUG buil... - OptimizationTime: 331.998ms\n >- PeakMemoryUsage: 1.09 MB (1144320)\n - PrepareTime: > 31.999ms\n') > E+where = re.search > E+and 'Query (id=7f47e1d6a1a1c804:492214eb):\n DEBUG MODE > WARNING: Query profile created while running a DEBUG buil... - > OptimizationTime: 331.998ms\n - PeakMemoryUsage: 1.09 MB > (1144320)\n - PrepareTime: 31.999ms\n' = BeeswaxConnection.get_runtime_profile of > 0x7f0d94afa7d0>>( 0x7f0d94afffd0>) > E+ where > > = 0x7f0d94afa7d0>.get_runtime_profile > E+where at 0x7f0d94afa7d0> = 0x7f0d94af3d50>.client {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IMPALA-8925) Consider replacing ClientRequestState ResultCache with result spooling
Sahil Takiar created IMPALA-8925: Summary: Consider replacing ClientRequestState ResultCache with result spooling Key: IMPALA-8925 URL: https://issues.apache.org/jira/browse/IMPALA-8925 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Sahil Takiar The {{ClientRequestState}} maintains an internal results cache (which is really just a {{QueryResultSet}}) in order to provide support for the {{TFetchOrientation.FETCH_FIRST}} fetch orientation (used by Hue - see [https://github.com/apache/impala/commit/6b769d011d2016a73483f63b311e108d17d9a083]). The cache itself has some limitations: * It caches all results in a {{QueryResultSet}} with limited admission control integration * It has a max size, if the size is exceeded the cache is emptied * It cannot spill to disk Result spooling could potentially replace the query result cache and provide a few benefits; it should be able to fit more rows since it can spill to disk. The memory is better tracked as well since it integrates with both admitted and reserved memory. Hue currently sets the max result set fetch size to [https://github.com/cloudera/hue/blob/master/apps/impala/src/impala/impala_flags.py#L61], would be good to check how well that value works for Hue users so we can decide if replacing the current result cache with result spooling makes sense. This would require some changes to result spooling as well, currently it discards rows whenever it reads them from the underlying {{BufferedTupleStream}}. It would need the ability to reset the read cursor, which would require some changes to the {{PlanRootSink}} interface as well. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IMPALA-8924) DCHECK(!closed_) in SpillableRowBatchQueue::IsEmpty
Sahil Takiar created IMPALA-8924: Summary: DCHECK(!closed_) in SpillableRowBatchQueue::IsEmpty Key: IMPALA-8924 URL: https://issues.apache.org/jira/browse/IMPALA-8924 Project: IMPALA Issue Type: Sub-task Components: Backend Affects Versions: Impala 3.4.0 Reporter: Sahil Takiar Assignee: Sahil Takiar When running exhaustive tests with result spooling enabled, there are several impalad crashes with the following stack: {code:java} #0 0x7f5e797541f7 in raise () from /lib64/libc.so.6 #1 0x7f5e797558e8 in abort () from /lib64/libc.so.6 #2 0x04cc5834 in google::DumpStackTraceAndExit() () #3 0x04cbc28d in google::LogMessage::Fail() () #4 0x04cbdb32 in google::LogMessage::SendToLog() () #5 0x04cbbc67 in google::LogMessage::Flush() () #6 0x04cbf22e in google::LogMessageFatal::~LogMessageFatal() () #7 0x029a16cd in impala::SpillableRowBatchQueue::IsEmpty (this=0x13d504e0) at /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/spillable-row-batch-queue.cc:128 #8 0x025f5610 in impala::BufferedPlanRootSink::IsQueueEmpty (this=0x13943000) at /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/exec/buffered-plan-root-sink.h:147 #9 0x025f4e81 in impala::BufferedPlanRootSink::GetNext (this=0x13943000, state=0x13d2a1c0, results=0x173c8520, num_results=-1, eos=0xd30cde1) at /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/exec/buffered-plan-root-sink.cc:158 #10 0x0294ef4d in impala::Coordinator::GetNext (this=0xe4ed180, results=0x173c8520, max_rows=-1, eos=0xd30cde1) at /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/coordinator.cc:683 #11 0x02251043 in impala::ClientRequestState::FetchRowsInternal (this=0xd30c800, max_rows=-1, fetched_rows=0x173c8520) at /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/client-request-state.cc:959 #12 0x022503e7 in impala::ClientRequestState::FetchRows (this=0xd30c800, max_rows=-1, fetched_rows=0x173c8520) at /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/client-request-state.cc:851 #13 0x0226a36d in impala::ImpalaServer::FetchInternal (this=0x12d14800, request_state=0xd30c800, start_over=false, fetch_size=-1, query_results=0x7f5daf861138) at /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/impala-beeswax-server.cc:582 #14 0x02264970 in impala::ImpalaServer::fetch (this=0x12d14800, query_results=..., query_handle=..., start_over=false, fetch_size=-1) at /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/impala-beeswax-server.cc:188 #15 0x027caf09 in beeswax::BeeswaxServiceProcessor::process_fetch (this=0x12d6fc20, seqid=0, iprot=0x119f5780, oprot=0x119f56c0, callContext=0xdf92060) at /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/BeeswaxService.cpp:3398 #16 0x027c94e6 in beeswax::BeeswaxServiceProcessor::dispatchCall (this=0x12d6fc20, iprot=0x119f5780, oprot=0x119f56c0, fname=..., seqid=0, callContext=0xdf92060) at /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/BeeswaxService.cpp:3200 #17 0x02796f13 in impala::ImpalaServiceProcessor::dispatchCall (this=0x12d6fc20, iprot=0x119f5780, oprot=0x119f56c0, fname=..., seqid=0, callContext=0xdf92060) at /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/ImpalaService.cpp:1824 #18 0x01b3cee4 in apache::thrift::TDispatchProcessor::process (this=0x12d6fc20, in=..., out=..., connectionContext=0xdf92060) at /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/thrift-0.9.3-p7/include/thrift/TDispatchProcessor.h:121 #19 0x01f9bf28 in apache::thrift::server::TAcceptQueueServer::Task::run (this=0xdf92000) at /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/rpc/TAcceptQueueServer.cpp:84 #20 0x01f9166d in impala::ThriftThread::RunRunnable (this=0x116ddfc0, runnable=..., promise=0x7f5db0862e90) at /data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/rpc/thrift-thread.cc:74 #21 0x01f92d93 in boost::_mfi::mf2, impala::Promise*>::operator() (this=0x121e7800, p=0x116ddfc0, a1=..., a2=0x7f5db0862e90) at /data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.57.0-p3/include/boost/bind/mem_fn_template.hpp:280 #22 0x01f92c29 in boost::_bi::list3, boost::_bi::value >, boost::_bi::value*> >::operator(), impala::Promise*>, boost::_bi::list0> (this=0x121e7810, f=..., a=...) at
[jira] [Created] (IMPALA-8926) TestResultSpooling::_test_full_queue is flaky
Sahil Takiar created IMPALA-8926: Summary: TestResultSpooling::_test_full_queue is flaky Key: IMPALA-8926 URL: https://issues.apache.org/jira/browse/IMPALA-8926 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 3.4.0 Reporter: Sahil Takiar Assignee: Sahil Takiar Has happened a few times, error message is: {code:java} query_test/test_result_spooling.py:116: in test_full_queue_large_fetch self._test_full_queue(vector, query, fetch_size=num_rows) query_test/test_result_spooling.py:148: in _test_full_queue assert re.search(send_wait_time_regex, self.client.get_runtime_profile(handle)) \ E assert None is not None E+ where None = ('RowBatchSendWaitTime: [1-9]', 'Query (id=e948cdd2bbde9430:082830be):\n DEBUG MODE WARNING: Query profile created while running a DEBUG buil...: 0.000ns\n - WriteIoBytes: 0\n - WriteIoOps: 0 (0)\n - WriteIoWaitTime: 0.000ns\n') E+where = re.search E +and 'Query (id=e948cdd2bbde9430:082830be):\n DEBUG MODE WARNING: Query profile created while running a DEBUG buil...: 0.000ns\n - WriteIoBytes: 0\n - WriteIoOps: 0 (0)\n - WriteIoWaitTime: 0.000ns\n' = >() E+ where > = .get_runtime_profile E+where = .client {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IMPALA-8942) Set file format specific values for split sizes on non-block stores
Sahil Takiar created IMPALA-8942: Summary: Set file format specific values for split sizes on non-block stores Key: IMPALA-8942 URL: https://issues.apache.org/jira/browse/IMPALA-8942 Project: IMPALA Issue Type: Improvement Components: Frontend Reporter: Sahil Takiar Assignee: Sahil Takiar Parquet scans on non-block based storage systems (e.g. S3, ADLS, etc.) can suffer from uneven scan range assignment due to the behavior described in IMPALA-3453. The frontend should set different split sizes depending on the file type and file system. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IMPALA-8934) Add failpoint tests to result spooling code
Sahil Takiar created IMPALA-8934: Summary: Add failpoint tests to result spooling code Key: IMPALA-8934 URL: https://issues.apache.org/jira/browse/IMPALA-8934 Project: IMPALA Issue Type: Sub-task Affects Versions: Impala 3.2.0 Reporter: Sahil Takiar Assignee: Sahil Takiar IMPALA-8924 was discovered while running {{test_failpoints.py}} with results spooling enabled. The goal of this JIRA is to add similar failpoint coverage to {{test_result_spooling.py}} so that we have sufficient coverage for the various failure paths when result spooling is enabled. The failure paths that should be covered include: * Failures while executing the exec tree should be handled correctly -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-7312) Non-blocking mode for Fetch() RPC
[ https://issues.apache.org/jira/browse/IMPALA-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-7312. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Non-blocking mode for Fetch() RPC > - > > Key: IMPALA-7312 > URL: https://issues.apache.org/jira/browse/IMPALA-7312 > Project: IMPALA > Issue Type: Sub-task > Components: Clients >Reporter: Tim Armstrong >Assignee: Sahil Takiar >Priority: Major > Labels: resource-management > Fix For: Impala 3.4.0 > > > Currently Fetch() can block for an arbitrary amount of time until a batch of > rows is produced. It might be helpful to have a mode where it returns quickly > when there is no data available, so that threads and RPC slots are not tied > up. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-8825) Add additional counters to PlanRootSink
[ https://issues.apache.org/jira/browse/IMPALA-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8825. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Add additional counters to PlanRootSink > --- > > Key: IMPALA-8825 > URL: https://issues.apache.org/jira/browse/IMPALA-8825 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.4.0 > > > The current entry in the runtime profile for {{PLAN_ROOT_SINK}} does not > contain much useful information: > {code:java} > PLAN_ROOT_SINK:(Total: 234.996ms, non-child: 234.996ms, % non-child: 100.00%) > - PeakMemoryUsage: 0{code} > There are several additional counters we could add to the {{PlanRootSink}} > (either the {{BufferedPlanRootSink}} or {{BlockingPlanRootSink}}): > * Amount of time spent blocking inside the {{PlanRootSink}} - both the time > spent by the client thread waiting for rows to become available and the time > spent by the impala thread waiting for the client to consume rows > ** So similar to the {{RowBatchQueueGetWaitTime}} and > {{RowBatchQueuePutWaitTime}} inside the scan nodes > ** The difference between these counters and the ones in > {{ClientRequestState}} (e.g. {{ClientFetchWaitTimer}} and > {{RowMaterializationTimer}}) should be documented > * For {{BufferedPlanRootSink}} there are already several {{Buffer pool}} > counters, we should make sure they are exposed in the {{PLAN_ROOT_SINK}} > section > * Track the number of rows sent (e.g. rows sent to {{PlanRootSink::Send}} > and the number of rows fetched (might need to be tracked in the > {{ClientRequestState}}) > ** For {{BlockingPlanRootSink}} the sent and fetched values should be pretty > much the same, but for {{BufferedPlanRootSink}} this is more useful > ** Similar to {{RowsReturned}} in each exec node > * The rate at which rows are sent and fetched > ** Should be useful when attempting to debug perf of the fetching rows (e.g. > if the send rate is much higher than the fetch rate, then maybe there is > something wrong with the client) > ** Similar to {{RowsReturnedRate}} in each exec node > Open to other suggestions for counters that folks think are useful. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-8818) Replace deque queue with spillable queue in BufferedPlanRootSink
[ https://issues.apache.org/jira/browse/IMPALA-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8818. -- Resolution: Fixed > Replace deque queue with spillable queue in BufferedPlanRootSink > > > Key: IMPALA-8818 > URL: https://issues.apache.org/jira/browse/IMPALA-8818 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.4.0 > > > Add a {{SpillableRowBatchQueue}} to replace the {{DequeRowBatchQueue}} in > {{BufferedPlanRootSink}}. The {{SpillableRowBatchQueue}} will wrap a > {{BufferedTupleStream}} and take in a {{TBackendResourceProfile}} created by > {{PlanRootSink#computeResourceProfile}}. > *BufferedTupleStream Usage*: > The wrapped {{BufferedTupleStream}} should be created in 'attach_on_read' > mode so that pages are attached to the output {{RowBatch}} in > {{BufferedTupleStream::GetNext}}. The BTS should start off as pinned (e.g. > all pages are pinned). If a call to {{BufferedTupleStream::AddRow}} returns > false (it returns false if "the unused reservation was not sufficient to add > a new page to the stream large enough to fit 'row' and the stream could not > increase the reservation to get enough unused reservation"), it should unpin > the stream ({{BufferedTupleStream::UnpinStream}}) and then add the row (if > the row still could not be added, then an error must have occurred, perhaps > an IO error, in which case return the error and fail the query). > *Constraining Resources*: > When result spooling is disabled, a user can run a {{select * from > [massive-fact-table]}} and scroll through the results without affecting the > health of the Impala cluster (assuming they close they query promptly). > Impala will stream the results one batch at a time to the user. > With result spooling, a naive implementation might try and buffer the enter > fact table, and end up spilling all the contents to disk, which can > potentially take up a large amount of space. So there needs to be > restrictions on the memory and disk space used by the {{BufferedTupleStream}} > in order to ensure a scan of a massive table does not consume all the memory > or disk space of the Impala coordinator. > This problem can be solved by placing a max size on the amount of unpinned > memory (perhaps through a new config option > {{MAX_PINNED_RESULT_SPOOLING_MEMORY}} (maybe set to a few GBs by default). > The max amount of pinned memory should already be constrained by the > reservation (see next paragraph). NUM_ROWS_PRODUCED_LIMIT already limits the > number of rows returned by a query, and so it should limit the number of rows > buffered by the BTS as well (although it is set to 0 by default). > SCRATCH_LIMIT already limits the amount of disk space used for spilling > (although it is set to -1 by default). > The {{PlanRootSink}} should attempt to accurately estimate how much memory it > needs to buffer all results in memory. This requires setting an accurate > value of {{ResourceProfile#memEstimateBytes_}} in > {{PlanRootSink#computeResourceProfile}}. If statistics are available, the > estimate can be based on the number of estimated rows returned multiplied by > the size of the rows returned. The min reservation should account for a read > and write page for the {{BufferedTupleStream}}. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-8779) Add RowBatchQueue interface with an implementation backed by a std::queue
[ https://issues.apache.org/jira/browse/IMPALA-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8779. -- Resolution: Won't Fix Marking this as 'Won't Fix' for now. There does not seem to be a strong need to add this in right now, given that there is no other use case for a generic {{RowBatch}} queue. The one used in the scan nodes has some unique requirements and re-factoring it to use a generic interface does not seem worth it. We can re-visit this later if we find a stronger use case for it. > Add RowBatchQueue interface with an implementation backed by a std::queue > - > > Key: IMPALA-8779 > URL: https://issues.apache.org/jira/browse/IMPALA-8779 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > Add a {{RowBatchQueue}} interface with an implementation backed by a > {{std::queue}}. Introducing a generic queue that can buffer {{RowBatch}}-es > will help with the implementation of {{BufferedPlanRootSink}}. Rather than > tie the {{BufferedPlanRootSink}} to a specific method of queuing row batches, > we can use an interface. In future patches, a {{RowBatchQueue}} backed by a > {{BufferedTupleStream}} can easily be switched out in > {{BufferedPlanRootSink}}. > We should consider re-factoring the existing {{RowBatchQueue}} to use the new > interface. The KRPC receiver does some buffering of {{RowBatch}}-es as well > which might benefit from the new RowBatchQueue interface, and some more KRPC > buffering might be added in IMPALA-6692. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IMPALA-8949) PlannerTest differences when running on S3 vs HDFS
Sahil Takiar created IMPALA-8949: Summary: PlannerTest differences when running on S3 vs HDFS Key: IMPALA-8949 URL: https://issues.apache.org/jira/browse/IMPALA-8949 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Sahil Takiar While re-enabling the {{S3PlannerTest}} in IMPALA-8944, there are several tests that are consistently failing due to actual diffs in the explain plan: * org.apache.impala.planner.S3PlannerTest.testTpcds * org.apache.impala.planner.S3PlannerTest.testTpch * org.apache.impala.planner.S3PlannerTest.testJoinOrder * org.apache.impala.planner.S3PlannerTest.testSubqueryRewrite All are failing for non-trivial reasons - e.g. differences in memory estimates, join orders, etc. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IMPALA-8950) Add -d and -f option to copyFromLocal and re-enable disabled S3 tests
Sahil Takiar created IMPALA-8950: Summary: Add -d and -f option to copyFromLocal and re-enable disabled S3 tests Key: IMPALA-8950 URL: https://issues.apache.org/jira/browse/IMPALA-8950 Project: IMPALA Issue Type: Test Reporter: Sahil Takiar Assignee: Sahil Takiar The {{-d}} option for {{hdfs dfs -copyFromLocal}} "Skip[s] creation of temporary file with the suffix ._COPYING_". The {{-f}} option "Overwrites the destination if it already exists". By using the {{-d}} option, copies to S3 avoid the additional overhead of copying data to a tmp file and then renaming the file. The {{-f}} option overwrites the file if it exists, which should be safe since tests should be writing to unique directories anyway. With HADOOP-16490, {{create(overwrite=true)}} avoids issuing a HEAD request on the path, which prevents any cached 404s on the S3 key. After these changes, the tests disabled by IMPALA-8189 can be re-enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-8944) Update and re-enable S3PlannerTest
Sahil Takiar created IMPALA-8944: Summary: Update and re-enable S3PlannerTest Key: IMPALA-8944 URL: https://issues.apache.org/jira/browse/IMPALA-8944 Project: IMPALA Issue Type: Test Reporter: Sahil Takiar Assignee: Sahil Takiar It looks like we don't run {{S3PlannerTest}} in our regular Jenkins jobs. When run against a HDFS mini-cluster, they are skipped by the {{TARGET_FILESYSTEM}} is not S3. On our S3 jobs, they don't run either because we skip all fe/ tests (most of them don't work against S3 / assume they are running on HDFS). A few things need to be fixed to get this working: * The test cases in {{S3PlannerTest}} need to be fixed * The Jenkins jobs that runs the S3 tests needs the ability to run specific fe/ tests (e.g. just the {{S3PlannerTest}} and to skip the rest) -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (IMPALA-8803) Coordinator should release admitted memory per-backend rather than per-query
[ https://issues.apache.org/jira/browse/IMPALA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8803. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Coordinator should release admitted memory per-backend rather than per-query > > > Key: IMPALA-8803 > URL: https://issues.apache.org/jira/browse/IMPALA-8803 > Project: IMPALA > Issue Type: Sub-task >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.4.0 > > > When {{SPOOL_QUERY_RESULTS}} is true, the coordinator backend may be long > lived, even though all other backends for the query have completed. > Currently, the Coordinator only releases admitted memory when the entire > query has completed (include the coordinator fragment) - > https://github.com/apache/impala/blob/72c9370856d7436885adbee3e8da7e7d9336df15/be/src/runtime/coordinator.cc#L562 > In order to more aggressively return admitted memory, the coordinator should > release memory when each backend for a query completes, rather than waiting > for the entire query to complete. > Releasing memory per backend should be batched because releasing admitted > memory in the admission controller requires obtaining a global lock and > refreshing the internal stats of the admission controller. Batching will help > mitigate any additional overhead from releasing admitted memory per backend. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IMPALA-8845) Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState
Sahil Takiar created IMPALA-8845: Summary: Close ExecNode tree prior to calling FlushFinal in FragmentInstanceState Key: IMPALA-8845 URL: https://issues.apache.org/jira/browse/IMPALA-8845 Project: IMPALA Issue Type: Sub-task Components: Backend Reporter: Sahil Takiar Assignee: Sahil Takiar While testing IMPALA-8818, I found that IMPALA-8780 does not always cause all non-coordinator fragments to shutdown. In certain setups, TopN queries ({{select * from [table] order by [col] limit [limit]}}) where all results are successfully spooled, still keep non-coordinator fragments alive. The issue is that sometimes the {{DATASTREAM SINK}} for the TopN <-- Scan Node fragment ends up blocking waiting for a response to a {{TransmitData()}} RPC. This prevents the fragment from shutting down. I haven't traced the issue exactly, but what I *think* is happening is that the {{MERGING-EXCHANGE}} operator in the coordinator fragment hits {{eos}} whenever it has received enough rows to reach the limit defined in the query, which could occur before the {{DATASTREAM SINK}} sends all the rows from the TopN / Scan Node fragment. So the TopN / Scan Node fragments end up hanging until they are explicitly closed. The fix is to close the {{ExecNode}} tree in {{FragmentInstanceState}} as eagerly as possible. Moving the close call to before the call to {{DataSink::FlushFinal}} fixes the issue. It has the added benefit that it shuts down and releases all {{ExecNode}} resources as soon as it can. When result spooling is enabled, this is particularly important because {{FlushFinal}} might block until the consumer reads all rows. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (IMPALA-8784) Implement a RowBatchQueue backed by a BufferedTupleStream
[ https://issues.apache.org/jira/browse/IMPALA-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8784. -- Resolution: Duplicate > Implement a RowBatchQueue backed by a BufferedTupleStream > - > > Key: IMPALA-8784 > URL: https://issues.apache.org/jira/browse/IMPALA-8784 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > The {{BufferedPlanRootSink}} should use a {{RowBatchQueue}} backed by a > {{BufferedTupleStream}}. This requires the following changes: > * Creating a new {{SpillableRowBatchQueue}} that implements > {{RowBatchQueue}} and internally uses a {{BufferedTupleStream}} > * Changing the implementation of {{RowBatchQueue}} used by > {{BufferedPlanRootSink}} to {{SpillableRowBatchQueue}} > * Update {{PlanRootSink.java}} so that it sets a {{ResourceProfile}} that > should be used by the {{BufferedPlanRootSink}} > * Update {{DataSinks.thrift}} so that it passes {{ResourceProfile}}-s from > the fe/ to the be/ > * {{BufferedPlanRootSink}} should Initialize and close a > {{ReservationManager}} to be used by the {{BufferedTupleStream}} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IMPALA-8818) Replace deque queue with spillable queue in BufferedPlanRootSink
Sahil Takiar created IMPALA-8818: Summary: Replace deque queue with spillable queue in BufferedPlanRootSink Key: IMPALA-8818 URL: https://issues.apache.org/jira/browse/IMPALA-8818 Project: IMPALA Issue Type: Sub-task Components: Backend Reporter: Sahil Takiar Assignee: Sahil Takiar Add a {{SpillableRowBatchQueue}} to replace the {{DequeRowBatchQueue}} in {{BufferedPlanRootSink}}. The {{SpillableRowBatchQueue}} will wrap a {{BufferedTupleStream}} and take in a {{TBackendResourceProfile}} created by {{PlanRootSink#computeResourceProfile}}. *BufferedTupleStream Usage*: The wrapped {{BufferedTupleStream}} should be created in 'attach_on_read' mode so that pages are attached to the output {{RowBatch}} in {{BufferedTupleStream::GetNext}}. The BTS should start off as pinned (e.g. all pages are pinned). If a call to {{BufferedTupleStream::AddRow}} returns false (it returns false if "the unused reservation was not sufficient to add a new page to the stream large enough to fit 'row' and the stream could not increase the reservation to get enough unused reservation"), it should unpin the stream ({{BufferedTupleStream::UnpinStream}}) and then add the row (if the row still could not be added, then an error must have occurred, perhaps an IO error, in which case return the error and fail the query). *Constraining Resources*: When result spooling is disabled, a user can run a {{select * from [massive-fact-table]}} and scroll through the results without affecting the health of the Impala cluster (assuming they close they query promptly). Impala will stream the results one batch at a time to the user. With result spooling, a naive implementation might try and buffer the enter fact table, and end up spilling all the contents to disk, which can potentially take up a large amount of space. So there needs to be restrictions on the memory and disk space used by the {{BufferedTupleStream}} in order to ensure a scan of a massive table does not consume all the memory or disk space of the Impala coordinator. This problem can be solved by placing a max size on the amount of unpinned memory (perhaps through a new config option {{MAX_PINNED_RESULT_SPOOLING_MEMORY}} (maybe set to a few GBs by default). The max amount of pinned memory should already be constrained by the reservation (see next paragraph). NUM_ROWS_PRODUCED_LIMIT already limits the number of rows returned by a query, and so it should limit the number of rows buffered by the BTS as well (although it is set to 0 by default). SCRATCH_LIMIT already limits the amount of disk space used for spilling (although it is set to -1 by default). The {{PlanRootSink}} should attempt to accurately estimate how much memory it needs to buffer all results in memory. This requires setting an accurate value of {{ResourceProfile#memEstimateBytes_}} in {{PlanRootSink#computeResourceProfile}}. If statistics are available, the estimate can be based on the number of estimated rows returned multiplied by the size of the rows returned. The min reservation should account for a read and write page for the {{BufferedTupleStream}}. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IMPALA-8819) BufferedPlanRootSink should handle non-default fetch sizes
Sahil Takiar created IMPALA-8819: Summary: BufferedPlanRootSink should handle non-default fetch sizes Key: IMPALA-8819 URL: https://issues.apache.org/jira/browse/IMPALA-8819 Project: IMPALA Issue Type: Sub-task Components: Backend Reporter: Sahil Takiar Assignee: Sahil Takiar As of IMPALA-8780, the {{BufferedPlanRootSink}} returns an error whenever a client sets the fetch size to a value lower than the {{BATCH_SIZE}}. The issue is that when reading from a {{RowBatch}} from the queue, the batch might contain more rows than the number requested by the client. So the {{BufferedPlanRootSink}} needs to be able to partially read a {{RowBatch}} and remember the index of the rows it read. Furthermore, {{num_results}} in {{BufferedPlanRootSink::GetNext}} could be lower than {{BATCH_SIZE}} if the query results cache in {{ClientRequestState}} has a cache hit (only happens if the client cursor is reset). Another issue is that the {{BufferedPlanRootSink}} can only read up to a single {{RowBatch}} at a time. So if a fetch size larger than {{BATCH_SIZE}} is specified, only {{BATCH_SIZE}} rows will be written to the given {{QueryResultSet}}. This is consistent with the legacy behavior of {{PlanRootSink}} (now {{BlockingPlanRootSink}}), but is not ideal because that means clients can only read {{BATCH_SIZE}} rows at a time. A higher fetch size would potentially reduce the number of round-trips necessary between the client and the coordinator, which could improve fetch performance (but only if the {{BlockingPlanRootSink}} is capable of filling all the requested rows). -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (IMPALA-8780) Implementation of BufferedPlanRootSink where FlushFinal blocks until all rows are fetched
[ https://issues.apache.org/jira/browse/IMPALA-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8780. -- Resolution: Fixed Fix Version/s: Impala 3.3.0 Closing as fixed. We ended up referring the complete re-factoring of IMPALA-8779 to a later patch. > Implementation of BufferedPlanRootSink where FlushFinal blocks until all rows > are fetched > - > > Key: IMPALA-8780 > URL: https://issues.apache.org/jira/browse/IMPALA-8780 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.3.0 > > > Implement {{BufferedPlanRootSink}} so that {{FlushFinal}} blocks until all > rows are fetched. The implementation should use the {{RowBatchQueue}} > introduced by IMPALA-8779. By blocking in {{FlushFinal}} all non-coordinator > fragments will be closed if all results fit in the {{RowBatchQueue}}. > {{BufferedPlanRootSink::Send}} should enqueue each given {{RowBatch}} onto > the queue and then return. If the queue is full, it should block until there > is more space left in the queue. {{BufferedPlanRootSink::GetNext}} reads from > the queue and then fills in the given {{QueryResultSet}} by using the > {{DataSink}} {{ScalarExprEvaluator}}-s. Since the producer thread can call > {{BufferedPlanRootSink::Close}} while the consumer is calling > {{BufferedPlanRootSink::GetNext}} the two methods need to be synchronized so > that the {{DataSink}} {{MemTracker}}-s are not closed while {{GetNext}} is > running. > The implementation of {{BufferedPlanRootSink}} should remain the same > regardless of whether a {{std::queue}} backed {{RowBatchQueue}} or a > {{BufferedTupleStream}} backed {{RowBatchQueue}} is used. > {{BufferedPlanRootSink}} and {{BlockingPlanRootSink}} are similar in the > sense that {{BlockingPlanRootSink}} buffers one {{RowBatch}}, so for queries > that return under 1024 rows, all non-coordinator fragments are closed > immediately as well. The advantage of {{BufferedPlanRootSink}} is that allows > buffering for 1+ {{RowBatch}}-es. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IMPALA-8803) Coordinator should release admitted memory per-backend rather than per-query
Sahil Takiar created IMPALA-8803: Summary: Coordinator should release admitted memory per-backend rather than per-query Key: IMPALA-8803 URL: https://issues.apache.org/jira/browse/IMPALA-8803 Project: IMPALA Issue Type: Sub-task Reporter: Sahil Takiar Assignee: Sahil Takiar When {{SPOOL_QUERY_RESULTS}} is true, the coordinator backend may be long lived, even though all other backends for the query have completed. Currently, the Coordinator only releases admitted memory when the entire query has completed (include the coordinator fragment) - https://github.com/apache/impala/blob/72c9370856d7436885adbee3e8da7e7d9336df15/be/src/runtime/coordinator.cc#L562 In order to more aggressively return admitted memory, the coordinator should release memory when each backend for a query completes, rather than waiting for the entire query to complete. Releasing memory per backend should be batched because releasing admitted memory in the admission controller requires obtaining a global lock and refreshing the internal stats of the admission controller. Batching will help mitigate any additional overhead from releasing admitted memory per backend. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (IMPALA-8781) Add additional tests in test_result_spooling.py and validate cancellation logic
[ https://issues.apache.org/jira/browse/IMPALA-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8781. -- Resolution: Fixed Fix Version/s: Impala 3.3.0 Commit Hash: bbec8fa74961755269298706302477780019e7d5 IMPALA-8781: Result spooling tests to cover edge cases and cancellation Adds additional tests to test_result_spooling.py to cover various edge cases when fetching query results (ensure all Impala types are returned properly, UDFs are evaluated correctly, etc.). A new QueryTest file result-spooling.test is added to encapsulate all these tests. Tests with a decreased ROW_BATCH_SIZE are added as well to validate that BufferedPlanRootSink buffers row batches correctly. BufferedPlanRootSink requires careful synchronization of the producer and consumer threads, especially when queries are cancelled. The TestResultSpoolingCancellation class is dedicated to running cancellation tests with SPOOL_QUERY_RESULTS = true. The implementation is heavily borrowed from test_cancellation.py and some of the logic is re-factored into a new utility class called cancel_utils.py to avoid code duplication between test_cancellation.py and test_result_spooling.py. Testing: * Looped test_result_spooling.py overnight with no failures * Core tests passed Change-Id: Ib3b3a1539c4a5fa9b43c8ca315cea16c9701e283 Reviewed-on: http://gerrit.cloudera.org:8080/13907 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Add additional tests in test_result_spooling.py and validate cancellation > logic > --- > > Key: IMPALA-8781 > URL: https://issues.apache.org/jira/browse/IMPALA-8781 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.3.0 > > > {{test_result_spooling.py}} currently runs a few basic tests with result > spooling enabled. We should add some more to cover all necessary edge cases > (ensure all Impala types are returned correctly, UDFs are evaluated > correctly, etc.) and add tests to validate the cancellation logic in > {{PlanRootSink}}. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IMPALA-8779) Add RowBatchQueue interface with an implementation backed by a std::queue
Sahil Takiar created IMPALA-8779: Summary: Add RowBatchQueue interface with an implementation backed by a std::queue Key: IMPALA-8779 URL: https://issues.apache.org/jira/browse/IMPALA-8779 Project: IMPALA Issue Type: Sub-task Components: Backend Reporter: Sahil Takiar Assignee: Sahil Takiar Add a {{RowBatchQueue}} interface with an implementation backed by a {{std::queue}}. Introducing a generic queue that can buffer {{RowBatch}}es will help with the implementation of {{BufferedTupleSink}}. Rather than tie the {{BufferedTupleSink}} to a specific method of queuing row batches, we can use an interface. In future patches, a {{RowBatchQueue}} backed by a {{BufferedTupleStream}} can easily be switched out in {{BufferedTupleSink}}. We should consider re-factoring the existing {{RowBatchQueue}} to use the new interface as well. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IMPALA-8780) Implementation of BufferedPlanRootSink where FlushFinal blocks until all rows are fetched
Sahil Takiar created IMPALA-8780: Summary: Implementation of BufferedPlanRootSink where FlushFinal blocks until all rows are fetched Key: IMPALA-8780 URL: https://issues.apache.org/jira/browse/IMPALA-8780 Project: IMPALA Issue Type: Sub-task Components: Backend Reporter: Sahil Takiar Assignee: Sahil Takiar Implement {{BufferedPlanRootSink}} so that {{FlushFinal}} blocks until all rows are fetched. The implementation should use the {{RowBatchQueue}} introduced by IMPALA-8779. By blocking in {{FlushFinal}} all non-coordinator fragments will be closed if all results fit in the {{RowBatchQueue}}. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IMPALA-8786) BufferedPlanRootSink should directly write to a QueryResultSet if one is available
Sahil Takiar created IMPALA-8786: Summary: BufferedPlanRootSink should directly write to a QueryResultSet if one is available Key: IMPALA-8786 URL: https://issues.apache.org/jira/browse/IMPALA-8786 Project: IMPALA Issue Type: Sub-task Components: Backend Reporter: Sahil Takiar Assignee: Sahil Takiar {{BufferedPlanRootSink}} uses a {{RowBatchQueue}} to buffer {{RowBatch}}-es and then the consumer thread reads them and writes them to a given {{QueryResultSet}}. Implementations of {{RowBatchQueue}} might end up copying the buffered {{RowBatch}}-es (e.g. if the queue is backed by a {{BufferedTupleStream}}). An optimization would be for the producer thread to directly write to the consumer {{QueryResultSet}}. This optimization would only be triggered if (1) the queue is empty, and (2) the consumer thread has a {{QueryResultSet}} available for writing. This "fast path" is useful in a few different scenarios: * If the consumer is faster than at reading rows than the producer is at sending them; in this case, the overhead of buffering rows in a {{RowBatchQueue}} can be completely avoided * For queries that return under 1024 its likely that the consumer will produce a {{QueryResultSet}} before the first {{RowBatch}} is returned (except perhaps for very trivial queries) -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IMPALA-8764) Kudu data load failures due to "Clock considered unsynchronized"
Sahil Takiar created IMPALA-8764: Summary: Kudu data load failures due to "Clock considered unsynchronized" Key: IMPALA-8764 URL: https://issues.apache.org/jira/browse/IMPALA-8764 Project: IMPALA Issue Type: Bug Components: Infrastructure Affects Versions: Impala 3.3.0 Reporter: Sahil Takiar Dataload error: {code} 03:08:38 03:08:38 Error executing impala SQL: Impala/logs/data_loading/sql/functional/create-functional-query-exhaustive-impala-generated-kudu-none-none.sql See: Impala/logs/data_loading/sql/functional/create-functional-query-exhaustive-impala-generated-kudu-none-none.sql.log {code} Digging through the mini-cluster logs, I see that the Kudu tservers crashed with this error: {code} F0715 02:58:43.202059 649 hybrid_clock.cc:339] Check failed: _s.ok() unable to get current time with error bound: Service unavailable: could not read system time source: Error reading clock. Clock considered unsynchronized *** Check failure stack trace: *** Wrote minidump to Impala/testdata/cluster/cdh6/node-3/var/log/kudu/ts/minidumps/kudu-tserver/395e6bb9-9b2f-468e-4d37d898-74b96d61.dmp Wrote minidump to Impala/testdata/cluster/cdh6/node-3/var/log/kudu/ts/minidumps/kudu-tserver/395e6bb9-9b2f-468e-4d37d898-74b96d61.dmp *** Aborted at 1563184723 (unix time) try "date -d @1563184723" if you are using GNU date *** PC: @ 0x7ff75ed631f7 __GI_raise *** SIGABRT (@0x7d10232) received by PID 562 (TID 0x7ff756c1e700) from PID 562; stack trace: *** @ 0x7ff760b545e0 (unknown) @ 0x7ff75ed631f7 __GI_raise @ 0x7ff75ed648e8 __GI_abort @ 0x1fb7309 kudu::AbortFailureFunction() @ 0x9c054d google::LogMessage::Fail() @ 0x9c240d google::LogMessage::SendToLog() @ 0x9c0089 google::LogMessage::Flush() @ 0x9c2eaf google::LogMessageFatal::~LogMessageFatal() @ 0xc0c60e kudu::clock::HybridClock::WalltimeWithErrorOrDie() @ 0xc0c67e kudu::clock::HybridClock::NowWithError() @ 0xc0d4aa kudu::clock::HybridClock::NowForMetrics() @ 0x9a29c0 kudu::FunctionGauge<>::WriteValue() @ 0x1fb0dc0 kudu::Gauge::WriteAsJson() @ 0x1fb3212 kudu::MetricEntity::WriteAsJson() @ 0x1fb390e kudu::MetricRegistry::WriteAsJson() @ 0xa856a3 kudu::server::DiagnosticsLog::LogMetrics() @ 0xa8789a kudu::server::DiagnosticsLog::RunThread() @ 0x1ff44d7 kudu::Thread::SuperviseThread() @ 0x7ff760b4ce25 start_thread @ 0x7ff75ee2634d __clone {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IMPALA-8962) FETCH_ROWS_TIMEOUT_MS should apply before rows are available
Sahil Takiar created IMPALA-8962: Summary: FETCH_ROWS_TIMEOUT_MS should apply before rows are available Key: IMPALA-8962 URL: https://issues.apache.org/jira/browse/IMPALA-8962 Project: IMPALA Issue Type: Bug Components: Clients Reporter: Sahil Takiar Assignee: Sahil Takiar IMPALA-7312 added a fetch timeout controlled by the query option {{FETCH_ROWS_TIMEOUT_MS}}. The issue is that the timeout only applies after the *first* batch of rows are available. The issue is that both Beeswax and HS2 clients call {{request_state->BlockOnWait}} inside {{ImpalaServer::FetchInternal}}. The call to {{BlockOnWait}} blocks until rows are ready to be consumed via {{ClientRequestState::FetchRows}}. So clients can still end up blocking indefinitely waiting for the first row batch to appear. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-8944) Update and re-enable S3PlannerTest
[ https://issues.apache.org/jira/browse/IMPALA-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8944. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Update and re-enable S3PlannerTest > -- > > Key: IMPALA-8944 > URL: https://issues.apache.org/jira/browse/IMPALA-8944 > Project: IMPALA > Issue Type: Test >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.4.0 > > > It looks like we don't run {{S3PlannerTest}} in our regular Jenkins jobs. > When run against a HDFS mini-cluster, they are skipped because the > {{TARGET_FILESYSTEM}} is not S3. On our S3 jobs, they don't run either > because we skip all fe/ tests (most of them don't work against S3 / assume > they are running on HDFS). > A few things need to be fixed to get this working: > * The test cases in {{S3PlannerTest}} need to be fixed > * The Jenkins jobs that runs the S3 tests needs the ability to run specific > fe/ tests (e.g. just the {{S3PlannerTest}} and to skip the rest) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-8934) Add failpoint tests to result spooling code
[ https://issues.apache.org/jira/browse/IMPALA-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8934. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Add failpoint tests to result spooling code > --- > > Key: IMPALA-8934 > URL: https://issues.apache.org/jira/browse/IMPALA-8934 > Project: IMPALA > Issue Type: Sub-task >Affects Versions: Impala 3.2.0 >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.4.0 > > > IMPALA-8924 was discovered while running {{test_failpoints.py}} with results > spooling enabled. The goal of this JIRA is to add similar failpoint coverage > to {{test_result_spooling.py}} so that we have sufficient coverage for the > various failure paths when result spooling is enabled. > The failure paths that should be covered include: > * Failures while executing the exec tree should be handled correctly -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-8957) TestFetchAndSpooling.test_rows_sent_counters is flaky
Sahil Takiar created IMPALA-8957: Summary: TestFetchAndSpooling.test_rows_sent_counters is flaky Key: IMPALA-8957 URL: https://issues.apache.org/jira/browse/IMPALA-8957 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Sahil Takiar Assignee: Sahil Takiar Error Details {noformat} query_test/test_fetch.py:77: in test_rows_sent_counters assert re.search("RowsSentRate: [1-9]", result.runtime_profile) E assert None E + where None = ('RowsSentRate: [1-9]', 'Query (id=3946b19649af9ce3:7f38be67):\n DEBUG MODE WARNING: Query profile created while running a DEBUG buil... - OptimizationTime: 59.000ms\n - PeakMemoryUsage: 213.50 KB (218624)\n - PrepareTime: 26.000ms\n') E + where = re.search E + and 'Query (id=3946b19649af9ce3:7f38be67):\n DEBUG MODE WARNING: Query profile created while running a DEBUG buil... - OptimizationTime: 59.000ms\n - PeakMemoryUsage: 213.50 KB (218624)\n - PrepareTime: 26.000ms\n' = .runtime_profile{noformat} Stack Trace {noformat} query_test/test_fetch.py:77: in test_rows_sent_counters assert re.search("RowsSentRate: [1-9]", result.runtime_profile) E assert None E+ where None = ('RowsSentRate: [1-9]', 'Query (id=3946b19649af9ce3:7f38be67):\n DEBUG MODE WARNING: Query profile created while running a DEBUG buil... - OptimizationTime: 59.000ms\n - PeakMemoryUsage: 213.50 KB (218624)\n - PrepareTime: 26.000ms\n') E+where = re.search E+and 'Query (id=3946b19649af9ce3:7f38be67):\n DEBUG MODE WARNING: Query profile created while running a DEBUG buil... - OptimizationTime: 59.000ms\n - PeakMemoryUsage: 213.50 KB (218624)\n - PrepareTime: 26.000ms\n' = .runtime_profile{noformat} Standard Error {noformat} SET client_identifier=query_test/test_fetch.py::TestFetchAndSpooling::()::test_rows_sent_counters[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_threshold':0}|table; SET batch_size=0; SET num_nodes=0; SET disable_codegen_rows_threshold=0; SET disable_codegen=False; SET abort_on_error=1; SET exec_single_node_rows_threshold=0; -- executing against localhost:21000 select id from functional.alltypes limit 10; -- 2019-09-18 18:51:20,759 INFO MainThread: Started query 3946b19649af9ce3:7f38be67{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-8939) TestResultSpooling.test_full_queue_large_fetch is flaky
[ https://issues.apache.org/jira/browse/IMPALA-8939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8939. -- Resolution: Duplicate > TestResultSpooling.test_full_queue_large_fetch is flaky > --- > > Key: IMPALA-8939 > URL: https://issues.apache.org/jira/browse/IMPALA-8939 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.3.0 >Reporter: Csaba Ringhofer >Priority: Critical > > The query profile contains RowBatchSendWaitTime: 0.000ns time to time, which > causes this test to fail. > This seems to be common when USE_CDP_HIVE=true, but I also seen it in non-CDP > builds. > I did not investigate the cause, so I don't know whether CDPness should have > any effect. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-8634) Catalog client should be resilient to temporary Catalog outage
[ https://issues.apache.org/jira/browse/IMPALA-8634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8634. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Catalog client should be resilient to temporary Catalog outage > -- > > Key: IMPALA-8634 > URL: https://issues.apache.org/jira/browse/IMPALA-8634 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 3.2.0 >Reporter: Michael Ho >Assignee: Sahil Takiar >Priority: Critical > Fix For: Impala 3.4.0 > > > Currently, when the catalog server is down, catalog clients will fail all > RPCs sent to it. In essence, DDL queries will fail and the Impala service > becomes a lot less functional. Catalog clients should consider retrying > failed RPCs with some exponential backoff in between while catalog server is > being restarted after crashing. We probably need to add [a test > |https://github.com/apache/impala/blob/master/tests/custom_cluster/test_restart_services.py] > to exercise the paths of catalog restart to verify coordinators are > resilient to it. > cc'ing [~stakiar], [~joemcdonnell], [~twm378] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-8888) Profile fetch performance when result spooling is enabled
[ https://issues.apache.org/jira/browse/IMPALA-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-. -- Fix Version/s: Not Applicable Resolution: Fixed > Profile fetch performance when result spooling is enabled > - > > Key: IMPALA- > URL: https://issues.apache.org/jira/browse/IMPALA- > Project: IMPALA > Issue Type: Sub-task >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Not Applicable > > > Profile the performance of fetching rows when result spooling is enabled. > There are a few queries that can be used to benchmark the performance: > {{time ./bin/impala-shell.sh -B -q "select l_orderkey from > tpch_parquet.lineitem" > /dev/null}} > {{time ./bin/impala-shell.sh -B -q "select * from tpch_parquet.orders" > > /dev/null}} > The first fetches one column and 6,001,215 the second fetches 9 columns and > 1,500,000 - so a mix of rows fetched vs. columns fetched. > The base line for the benchmark should be the commit prior to IMPALA-8780. > The benchmark should check for both latency and CPU usage (to see if the copy > into {{BufferedTupleStream}} has a significant overhead). > Various fetch sizes should be used in the benchmark as well to see if > increasing the fetch size for result spooling improves performance (ideally > it should) (it would be nice to run some fetches between machines as well as > that will better reflect network round trip latencies). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-8786) BufferedPlanRootSink should directly write to a QueryResultSet if one is available
[ https://issues.apache.org/jira/browse/IMPALA-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8786. -- Fix Version/s: Not Applicable Resolution: Later After doing lots of perf profiling (some results are in IMPALA-). I have concluded that result spooling does not add significant overhead, in some cases it actually *improves* performance (seen mostly when selecting a large number of rows from Impala). So while there are some interesting ideas of possible optimizations here, I am going to close this JIRA and mark the 'Resolution' as 'Later'. We can re-visit these optimizations later if we think they add significant benefit. > BufferedPlanRootSink should directly write to a QueryResultSet if one is > available > -- > > Key: IMPALA-8786 > URL: https://issues.apache.org/jira/browse/IMPALA-8786 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Not Applicable > > > {{BufferedPlanRootSink}} uses a {{RowBatchQueue}} to buffer {{RowBatch}}-es > and then the consumer thread reads them and writes them to a given > {{QueryResultSet}}. Implementations of {{RowBatchQueue}} might end up copying > the buffered {{RowBatch}}-es (e.g. if the queue is backed by a > {{BufferedTupleStream}}). An optimization would be for the producer thread to > directly write to the consumer {{QueryResultSet}}. This optimization would > only be triggered if (1) the queue is empty, and (2) the consumer thread has > a {{QueryResultSet}} available for writing. > This "fast path" is useful in a few different scenarios: > * If the consumer is faster than at reading rows than the producer is at > sending them; in this case, the overhead of buffering rows in a > {{RowBatchQueue}} can be completely avoided > * For queries that return under 1024 its likely that the consumer will > produce a {{QueryResultSet}} before the first {{RowBatch}} is returned > (except perhaps for very trivial queries) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-4268) Rework coordinator buffering to buffer more data
[ https://issues.apache.org/jira/browse/IMPALA-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-4268. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed All related tasks and sub-tasks have been completed. The majority if this work was done in IMPALA-8656, which is now complete, so closing this JIRA. > Rework coordinator buffering to buffer more data > > > Key: IMPALA-4268 > URL: https://issues.apache.org/jira/browse/IMPALA-4268 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.8.0 >Reporter: Henry Robinson >Priority: Major > Labels: query-lifecycle, resource-management > Fix For: Impala 3.4.0 > > Attachments: rows-produced-histogram.png > > > {{PlanRootSink}} executes the producer thread (the coordinator fragment > execution thread) in a separate thread to the consumer (i.e. the thread > handling the fetch RPC), which calls {{GetNext()}} to retrieve the rows. The > implementation was simplified by handing off a single batch at a time from > the producers to consumer. > This decision causes some problems: > * Many context switches for the sender. Adding buffering would allow the > sender to append to the buffer and continue progress without a context switch. > * Query execution can't release resources until the client has fetched the > final batch, because the coordinator fragment thread is still running and > potentially producing backpressure all the way down the plan tree. > * The consumer can't fulfil fetch requests greater than Impala's internal > BATCH_SIZE, because it is only given one batch at a time. > The tricky part is managing the mismatch between the size of the row batches > processed in {{Send()}} and the size of the fetch result asked for by the > client without impacting performance too badly. The sender materializes > output rows in a {{QueryResultSet}} that is owned by the coordinator. That is > not, currently, a splittable object - instead it contains the actual RPC > response struct that will hit the wire when the RPC completes. As > asynchronous sender does not know the batch size, because it can in theory > change on every fetch call (although most reasonable clients will not > randomly change the fetch size). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-8942) Set file format specific values for split sizes on non-block stores
[ https://issues.apache.org/jira/browse/IMPALA-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8942. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Set file format specific values for split sizes on non-block stores > --- > > Key: IMPALA-8942 > URL: https://issues.apache.org/jira/browse/IMPALA-8942 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.4.0 > > > Parquet scans on non-block based storage systems (e.g. S3, ADLS, etc.) can > suffer from uneven scan range assignment due to the behavior described in > IMPALA-3453. The frontend should set different split sizes depending on the > file type and file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9117) test_lineage.py and test_mt_dop.py are failing on ABFS
Sahil Takiar created IMPALA-9117: Summary: test_lineage.py and test_mt_dop.py are failing on ABFS Key: IMPALA-9117 URL: https://issues.apache.org/jira/browse/IMPALA-9117 Project: IMPALA Issue Type: Test Reporter: Sahil Takiar Assignee: Sahil Takiar Both failures are known issues. {{TestLineage::test_lineage_output}} is failing because the test requires HBase to run (this test is already disabled for S3). {{TestMtDopFlags::test_mt_dop_all}} is failing because it runs {{QueryTest/insert}} which includes a query that writes a folder that ends in a dot. ABFS does not allow files or directories to end in a dot - IMPALA-7860 / IMPALA-7681 / HADOOP-15860. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9120) Refreshing an ABFS table with a deleted directory fails
Sahil Takiar created IMPALA-9120: Summary: Refreshing an ABFS table with a deleted directory fails Key: IMPALA-9120 URL: https://issues.apache.org/jira/browse/IMPALA-9120 Project: IMPALA Issue Type: Bug Components: Catalog Reporter: Sahil Takiar Assignee: Sahil Takiar The following fails on ABFS (but succeeds on HDFS): {code:java} hdfs dfs -mkdir /test-external-table ./bin/impala-shell.sh [localhost:21000] default> create external table (col int) location '/test-external-table'; [localhost:21000] default> select * from test; hdfs dfs -rm -r -skipTrash /test-external-table ./bin/impala-shell.sh [localhost:21000] default> refresh test; ERROR: TableLoadingException: Refreshing file and block metadata for 1 paths for table default.test: failed to load 1 paths. Check the catalog server log for more details.{code} This causes the test tests/query_test/test_hdfs_file_mods.py::TestHdfsFileMods::test_file_modifications[modification_type: delete_directory | ...] to fail on ABFS as well. The error from catalogd is: {code:java} E1104 22:38:53.748571 87486 ParallelFileMetadataLoader.java:102] Loading file and block metadata for 1 paths for table test_file_modifications_d0471c2c.t1 encountered an error loading data for path abfss://[]@[].dfs.core.windows.net/test-warehouse/test_file_modifications_d0471c2c Java exception follows: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: GET https://[].dfs.core.windows.net/[]?resource=filesystem=5000=test-warehouse/test_file_modifications_d0471c2c=90=false StatusCode=404 StatusDescription=The specified path does not exist. ErrorCode=PathNotFound ErrorMessage=The specified path does not exist. RequestId:[] Time:2019-11-04T22:38:53.7469083Z at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:99) at org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:606) at org.apache.impala.catalog.HdfsTable.loadAllPartitions(HdfsTable.java:547) at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:973) at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:896) at org.apache.impala.catalog.TableLoader.load(TableLoader.java:83) at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:244) at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:241) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.FileNotFoundException: GET https://[].dfs.core.windows.net/[]?resource=filesystem=5000=test-warehouse/test_file_modifications_d0471c2c=90=false StatusCode=404 StatusDescription=The specified path does not exist. ErrorCode=PathNotFound ErrorMessage=The specified path does not exist. RequestId:[] Time:2019-11-04T22:38:53.7469083Z at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.checkException(AzureBlobFileSystem.java:957) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.listStatus(AzureBlobFileSystem.java:351) at org.apache.hadoop.fs.FileSystem.listStatusBatch(FileSystem.java:1790) at org.apache.hadoop.fs.FileSystem$DirListingIterator.fetchMore(FileSystem.java:2058) at org.apache.hadoop.fs.FileSystem$DirListingIterator.hasNext(FileSystem.java:2047) at org.apache.impala.common.FileSystemUtil$RecursingIterator.hasNext(FileSystemUtil.java:722) at org.apache.impala.common.FileSystemUtil$FilterIterator.hasNext(FileSystemUtil.java:679) at org.apache.impala.catalog.FileMetadataLoader.load(FileMetadataLoader.java:166) at org.apache.impala.catalog.ParallelFileMetadataLoader.lambda$load$0(ParallelFileMetadataLoader.java:93) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293) at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61) at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:45) at org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:93) ... 11 more Caused by: GET https://[].dfs.core.windows.net/[]?resource=filesystem=5000=test-warehouse/test_file_modifications_d0471c2c=90=false StatusCode=404 StatusDescription=The specified path does not
[jira] [Created] (IMPALA-9137) Blacklist node if a DataStreamService RPC to the node fails
Sahil Takiar created IMPALA-9137: Summary: Blacklist node if a DataStreamService RPC to the node fails Key: IMPALA-9137 URL: https://issues.apache.org/jira/browse/IMPALA-9137 Project: IMPALA Issue Type: Sub-task Components: Backend Reporter: Sahil Takiar Assignee: Sahil Takiar If a query fails because a RPC to a specific node failed, the query error message will of the form: {{ERROR: TransmitData() to 10.65.30.141:27000 failed: Network error: recv got EOF from 10.65.30.141:27000 (error 108)}} or {{ERROR: TransmitData() to 10.65.29.251:27000 failed: Network error: recv error from 0.0.0.0:0: Transport endpoint is not connected (error 107)}} or {{ERROR: TransmitData() to 10.65.26.254:27000 failed: Network error: Client connection negotiation failed: client connection to 10.65.26.254:27000: connect: Connection refused (error 111)}} or {{ERROR: EndDataStream() to 127.0.0.1:27002 failed: Network error: recv error from 0.0.0.0:0: Transport endpoint is not connected (error 107)}} RPCs are already retried, so it is likely that something is wrong with the target node. Perhaps it crashed or is so overloaded that it can't process RPC requests. In any case, the Impala Coordinator should blacklist the target of the failed RPC so that future queries don't fail with the same error. If the node crashed, the statestore will eventually remove the failed node from the cluster as well. However, the statestore can take a while to detect a failed node because it has a long timeout. The issue is that queries can still fail in within the timeout window. This is necessary for transparent query retries because if a node does crash, it will take too long for the statestore to remove the crashed node from the cluster. So any attempt at retrying a query will just fail. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9138) Classify certain errors as retryable
Sahil Takiar created IMPALA-9138: Summary: Classify certain errors as retryable Key: IMPALA-9138 URL: https://issues.apache.org/jira/browse/IMPALA-9138 Project: IMPALA Issue Type: Sub-task Components: Backend Reporter: Sahil Takiar Assignee: Sahil Takiar Impala should be able to classify certain errors as "retryable". This can be done by modifying the {{TStatus}} object to have a "type". For now, the only types can be "GENERAL" and "RETRYABLE". This way when a {{TStatus}} object is created, it can be marked as retryable. If the {{TStatus}} is retryable, the Coordinator can trigger a retry of the query. This approach allows us to incrementally mark more and more errors as retryable as necessary. For now, just RPC failures will be marked as retryable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9124) Transparently retry queries that fail due to cluster membership changes
Sahil Takiar created IMPALA-9124: Summary: Transparently retry queries that fail due to cluster membership changes Key: IMPALA-9124 URL: https://issues.apache.org/jira/browse/IMPALA-9124 Project: IMPALA Issue Type: New Feature Components: Backend, Clients Reporter: Sahil Takiar Assignee: Sahil Takiar Currently, if the Impala Coordinator or any Executors run into errors during query execution, Impala will fail the entire query. It would improve user experience to transparently retry the query for some transient, recoverable errors. This JIRA focuses on retrying queries that would otherwise fail due to cluster membership changes. Specifically, node failures that cause changes in the cluster membership (currently the Coordinator cancels all queries running on a node if it detects that the node is no longer part of the cluster) and node blacklisting (the Coordinator blacklists a node because it detects a problem with that node - can’t execute RPCs against the node). It is not focused on retrying general errors (e.g. any frontend errors, MemLimitExceeded exceptions, etc.). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-7860) Tests use partition name that isn't supported on ABFS
[ https://issues.apache.org/jira/browse/IMPALA-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-7860. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed Closing this as Fixed. HADOOP-15860 was done a while ago, and now Impala-on-ABFS cannot write files / directories that end with a period (which is expected). There was one bug that was introduced to Impala due to this change: IMPALA-8557 - but that has been fixed now as well. In IMPALA-9117 I created a new skip flag for ABFS tests for the "cannot write write trailing periods" behavior, and added it to any affected tests. > Tests use partition name that isn't supported on ABFS > - > > Key: IMPALA-7860 > URL: https://issues.apache.org/jira/browse/IMPALA-7860 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Reporter: Sean Mackrory >Priority: Major > Fix For: Impala 3.4.0 > > > IMPALA-7681 introduced support for the ADLS Gen2 service / ABFS client. As > mentioned in the code review for that > (https://gerrit.cloudera.org/#/c/11630/) a couple of tests were failing > because they use a partition name that ends with a period. If the tests are > modified to end with anything other than a period, they work just fine. > In HADOOP-15860, that's sounding like it's just a known limitation of the > blob storage that shares infrastructure with ADLS Gen2 that won't be changing > any time soon. I propose we modify the tests to just use a slightly different > partition name. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9112) Consider removing hdfsExists calls when writing out files
Sahil Takiar created IMPALA-9112: Summary: Consider removing hdfsExists calls when writing out files Key: IMPALA-9112 URL: https://issues.apache.org/jira/browse/IMPALA-9112 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Sahil Takiar Assignee: Sahil Takiar There are a few places in the backend where we call {{hdfsExists}} before writing out a file. This can cause issues when writing data to S3, because S3 can cache 404 Not Found errors. This issue manifests itself with errors such as: {code:java} ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op (RENAME s3a://[bucket-name]/[table-name]/_impala_insert_staging/3943ae7ccf00711e_59606d88/.3943ae7ccf00711e-59606d88000b_562151879_dir/year=2015/3943ae7ccf00711e-59606d88000b_1994902389_data.0.parq TO s3a://[bucket-name]/[table-name]/3943ae7ccf00711e-59606d88000b_1994902389_data.0.parq) failed, error was: s3a://[bucket-name]/[table-name]/_impala_insert_staging/3943ae7ccf00711e_59606d88/.3943ae7ccf00711e-59606d88000b_562151879_dir/year=2015/3943ae7ccf00711e-59606d88000b_1994902389_data.0.parq Error(5): Input/output error Root cause: AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: []; S3 Extended Request ID: []){code} HADOOP-13884, HADOOP-13950, HADOOP-16490 - the HDFS clients allow specifying an "overwrite" option when creating a file; this can avoid doing any HEAD requests when opening a file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9113) Queries can hang if an impalad is killed after a query has FINISHED
Sahil Takiar created IMPALA-9113: Summary: Queries can hang if an impalad is killed after a query has FINISHED Key: IMPALA-9113 URL: https://issues.apache.org/jira/browse/IMPALA-9113 Project: IMPALA Issue Type: Bug Components: Backend, Clients Reporter: Sahil Takiar Assignee: Sahil Takiar There is a race condition in the query coordination code that could cause queries to hang indefinitely in an un-cancellable state if an impalad crashes after the query has transitioned to the FINISHED state, but before all backends have completed. The issue occurs if: * A query produces all results * A client issues a fetch request to read all of those results * The client fetch request fetches all available rows (e.g. eos is hit) * {{Coordinator::GetNext}} then calls {{SetNonErrorTerminalState(ExecState::RETURNED_RESULTS)}} which eventually calls {{WaitForBackends()}} * {{WaitForBackends()}} will block until all backends have completed * One of the impalads running the query crashes, and thus never reports success for the query fragment it was running * The {{WaitForBackends()}} call will then block indefinitely * Any attempt to cancel the query fails because the original fetch request that drove the {{WaitForBackends()}} call has acquired the {{ClientRequestState}} lock, which thus prevents any cancellation from occurring. Implementing IMPALA-6984 should theoretically fix because as soon as eos is hit, it would call {{CancelBackends()}} rather than {{WaitForBackends()}}. Another solution would be to add a timeout to the {{WaitForBackends()}} so that it returns after the timeout is hit, this would force the fetch request to return 0 rows with {{hasMoreRows=true}}, and unblock any cancellation threads. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9113) Queries can hang if an impalad is killed after a query has FINISHED
[ https://issues.apache.org/jira/browse/IMPALA-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9113. -- Resolution: Not A Problem Yup, you are right, the lock is released in {{ClientRequestState::FetchRowsInternal}}. There is even a comment in the code for this: [https://github.com/apache/impala/blob/master/be/src/service/client-request-state.cc#L1000] > Queries can hang if an impalad is killed after a query has FINISHED > --- > > Key: IMPALA-9113 > URL: https://issues.apache.org/jira/browse/IMPALA-9113 > Project: IMPALA > Issue Type: Bug > Components: Backend, Clients >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > There is a race condition in the query coordination code that could cause > queries to hang indefinitely in an un-cancellable state if an impalad crashes > after the query has transitioned to the FINISHED state, but before all > backends have completed. > The issue occurs if: > * A query produces all results > * A client issues a fetch request to read all of those results > * The client fetch request fetches all available rows (e.g. eos is hit) > * {{Coordinator::GetNext}} then calls > {{SetNonErrorTerminalState(ExecState::RETURNED_RESULTS)}} which eventually > calls {{WaitForBackends()}} > * {{WaitForBackends()}} will block until all backends have completed > * One of the impalads running the query crashes, and thus never reports > success for the query fragment it was running > * The {{WaitForBackends()}} call will then block indefinitely > * Any attempt to cancel the query fails because the original fetch request > that drove the {{WaitForBackends()}} call has acquired the > {{ClientRequestState}} lock, which thus prevents any cancellation from > occurring. > Implementing IMPALA-6984 should theoretically fix this because as soon as eos > is hit, the coordinator will call {{CancelBackends()}} rather than > {{WaitForBackends()}}. Another solution would be to add a timeout to the > {{WaitForBackends()}} so that it returns after the timeout is hit, this would > force the fetch request to return 0 rows with {{hasMoreRows=true}}, and > unblock any cancellation threads. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-8557) Impala on ABFS failed with error "IllegalArgumentException: ABFS does not allow files or directories to end with a dot."
[ https://issues.apache.org/jira/browse/IMPALA-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8557. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Impala on ABFS failed with error "IllegalArgumentException: ABFS does not > allow files or directories to end with a dot." > > > Key: IMPALA-8557 > URL: https://issues.apache.org/jira/browse/IMPALA-8557 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.2.0 >Reporter: Eric Lin >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.4.0 > > > HDFS introduced below feature to stop users from creating a file that ends > with "." on ABFS: > https://issues.apache.org/jira/browse/HADOOP-15860 > As a result of this change, Impala now writes to ABFS fails with such error. > I can see that it generates temp file using this format "$0.$1.$2": > https://github.com/cloudera/Impala/blob/cdh6.2.0/be/src/exec/hdfs-table-sink.cc#L329 > $2 is the file extension and will be empty if it is TEXT file format: > https://github.com/cloudera/Impala/blob/cdh6.2.0/be/src/exec/hdfs-text-table-writer.cc#L65 > Since HADOOP-15860 was backported into CDH6.2, it is currently only affecting > 6.2 and works in older versions. > There is no way to override this empty file extension so no workaround is > possible, unless user choose another file format. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-7726) Drop with purge tests fail against ABFS due to trash misbehavior
[ https://issues.apache.org/jira/browse/IMPALA-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-7726. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed Closing as Fixed. I re-enabled the tests, looped them overnight, and didn't hit any failures. So it is likely whatever bug was causing these issues has been resolved. > Drop with purge tests fail against ABFS due to trash misbehavior > > > Key: IMPALA-7726 > URL: https://issues.apache.org/jira/browse/IMPALA-7726 > Project: IMPALA > Issue Type: Bug >Reporter: Sean Mackrory >Assignee: Sean Mackrory >Priority: Major > Labels: flaky > Fix For: Impala 3.4.0 > > > In testing IMPALA-7681, I've seen test_drop_partition_with_purge and > test_drop_table_with_purge fail because of files not found in the trash are a > drop without purge. I've traced that functionality through Hive, which uses > Hadoop's Trash API, and traced through a bunch of scenarios in that API with > ABFS and I can't see it misbehaving in any way. It also should be pretty > FS-agnostic. I also suspected a bug in abfs_utils.py's exists() function, but > have not been able to find one. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-9117) test_lineage.py and test_mt_dop.py are failing on ABFS
[ https://issues.apache.org/jira/browse/IMPALA-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-9117. -- Fix Version/s: Impala 3.4.0 Resolution: Fixed > test_lineage.py and test_mt_dop.py are failing on ABFS > -- > > Key: IMPALA-9117 > URL: https://issues.apache.org/jira/browse/IMPALA-9117 > Project: IMPALA > Issue Type: Test >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.4.0 > > > Both failures are known issues. > {{TestLineage::test_lineage_output}} is failing because the test requires > HBase to run (this test is already disabled for S3). > {{TestMtDopFlags::test_mt_dop_all}} is failing because it runs > {{QueryTest/insert}} which includes a query that writes a folder that ends in > a dot. ABFS does not allow files or directories to end in a dot - IMPALA-7860 > / IMPALA-7681 / HADOOP-15860. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9190) CatalogdMetaProviderTest.testPiggybackFailure is flaky
Sahil Takiar created IMPALA-9190: Summary: CatalogdMetaProviderTest.testPiggybackFailure is flaky Key: IMPALA-9190 URL: https://issues.apache.org/jira/browse/IMPALA-9190 Project: IMPALA Issue Type: Bug Reporter: Sahil Takiar The following test is flaky: org.apache.impala.catalog.local.CatalogdMetaProviderTest.testPiggybackFailure Error Message {code} Did not see enough piggybacked loads! Stacktrace java.lang.AssertionError: Did not see enough piggybacked loads! at org.junit.Assert.fail(Assert.java:88) at org.apache.impala.catalog.local.CatalogdMetaProviderTest.doTestPiggyback(CatalogdMetaProviderTest.java:314) at org.apache.impala.catalog.local.CatalogdMetaProviderTest.testPiggybackFailure(CatalogdMetaProviderTest.java:273) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:236) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:386) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:323) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:143) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9187) TestExecutorGroups.test_executor_group_shutdown is flaky
Sahil Takiar created IMPALA-9187: Summary: TestExecutorGroups.test_executor_group_shutdown is flaky Key: IMPALA-9187 URL: https://issues.apache.org/jira/browse/IMPALA-9187 Project: IMPALA Issue Type: Bug Reporter: Sahil Takiar Assignee: Lars Volker The following test is flaky: custom_cluster.test_executor_groups.TestExecutorGroups.test_executor_group_shutdown (from pytest) Error Message {code} AssertionError: Query (id=6c4bb1c6f501bae4:ee491183): DEBUG MODE WARNING: Query profile created while running a DEBUG build of Impala. Use RELEASE builds to measure query performance. Summary: Session ID: 104c00e26afad563:fad6988e52bf9cba Session Type: BEESWAX Start Time: 2019-11-22 00:19:26.497324000 End Time: Query Type: QUERY Query State: COMPILED Query Status: OK Impala Version: impalad version 3.4.0-SNAPSHOT DEBUG (build 2bdca39a8b178b7186dd24141a8e97fa0c46358f) User: jenkins Connected User: jenkins Delegated User: Network Address: 127.0.0.1:59977 Default Db: default Sql Statement: select sleep(3) Coordinator: []:22000 Query Options (set by configuration): TIMEZONE=America/Los_Angeles,CLIENT_IDENTIFIER=custom_cluster/test_executor_groups.py::TestExecutorGroups::()::test_executor_group_shutdown Query Options (set by configuration and planner): NUM_NODES=1,NUM_SCANNER_THREADS=1,RUNTIME_FILTER_MODE=0,MT_DOP=0,TIMEZONE=America/Los_Angeles,CLIENT_IDENTIFIER=custom_cluster/test_executor_groups.py::TestExecutorGroups::()::test_executor_group_shutdown Plan: Max Per-Host Resource Reservation: Memory=0B Threads=1 Per-Host Resource Estimates: Memory=10MB Dedicated Coordinator Resource Estimate: Memory=100MB Codegen disabled by planner Analyzed query: SELECT sleep(CAST(3 AS INT)) F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | Per-Host Resources: mem-estimate=0B mem-reservation=0B thread-reservation=1 PLAN-ROOT SINK | output exprs: sleep(3) | mem-estimate=0B mem-reservation=0B thread-reservation=0 | 00:UNION constant-operands=1 mem-estimate=0B mem-reservation=0B thread-reservation=0 tuple-ids=0 row-size=1B cardinality=1 in pipelines: Estimated Per-Host Mem: 10485760 Request Pool: default-pool Per Host Min Memory Reservation: []:22000(0) Per Host Number of Fragment Instances: []:22000(1) Admission result: Queued Query Compilation: 5.077ms - Metadata of all 0 tables cached: 679.990us (679.990us) - Analysis finished: 1.269ms (589.508us) - Authorization finished (noop): 1.350ms (81.387us) - Value transfer graph computed: 1.681ms (330.356us) - Single node plan created: 1.801ms (120.709us) - Distributed plan created: 1.880ms (78.868us) - Planning finished: 5.077ms (3.196ms) Query Timeline: 11.000ms - Query submitted: 0.000ns (0.000ns) - Planning finished: 7.000ms (7.000ms) - Submit for admission: 9.000ms (2.000ms) - Queued: 11.000ms (2.000ms) - ComputeScanRangeAssignmentTimer: 0.000ns Frontend: ImpalaServer: - ClientFetchWaitTimer: 0.000ns - NumRowsFetched: 0 (0) - NumRowsFetchedFromCache: 0 (0) - RowMaterializationRate: 0 - RowMaterializationTimer: 0.000ns assert 'Initial admission queue reason: number of running queries' in 'Query (id=6c4bb1c6f501bae4:ee491183):\n DEBUG MODE WARNING: Query profile created while running a DEBUG buil...0)\n - NumRowsFetchedFromCache: 0 (0)\n - RowMaterializationRate: 0\n - RowMaterializationTimer: 0.000ns\n' {code} Stacktrace {code} custom_cluster/test_executor_groups.py:185: in test_executor_group_shutdown assert "Initial admission queue reason: number of running queries" in profile, profile E AssertionError: Query (id=6c4bb1c6f501bae4:ee491183): E DEBUG MODE WARNING: Query profile created while running a DEBUG build of Impala. Use RELEASE builds to measure query performance. E Summary: E Session ID: 104c00e26afad563:fad6988e52bf9cba E Session Type: BEESWAX E Start Time: 2019-11-22 00:19:26.497324000 E End Time: E Query Type: QUERY E Query State: COMPILED E Query Status: OK {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9188) Dataload is failing when USE_CDP_HIVE=true
Sahil Takiar created IMPALA-9188: Summary: Dataload is failing when USE_CDP_HIVE=true Key: IMPALA-9188 URL: https://issues.apache.org/jira/browse/IMPALA-9188 Project: IMPALA Issue Type: Bug Reporter: Sahil Takiar Assignee: Anurag Mantripragada When USE_CDP_HIVE=true, Impala builds are failing during dataload when creating tables with PK/FK constraints. The error is: {code:java} ERROR: CREATE EXTERNAL TABLE IF NOT EXISTS functional_seq_record_snap.child_table ( seq int, id int, year string, a int, primary key(seq) DISABLE NOVALIDATE RELY, foreign key (id, year) references functional_seq_record_snap.parent_table(id, year) DISABLE NOVALIDATE RELY, foreign key(a) references functional_seq_record_snap.parent_table_2(a) DISABLE NOVALIDATE RELY) row format delimited fields terminated by ',' LOCATION '/test-warehouse/child_table' Traceback (most recent call last): File "Impala/bin/load-data.py", line 208, in exec_impala_query_from_file result = impala_client.execute(query) File "Impala/tests/beeswax/impala_beeswax.py", line 187, in execute handle = self.__execute_query(query_string.strip(), user=user) File "Impala/tests/beeswax/impala_beeswax.py", line 362, in __execute_query handle = self.execute_query_async(query_string, user=user) File "Impala/tests/beeswax/impala_beeswax.py", line 356, in execute_query_async handle = self.__do_rpc(lambda: self.imp_service.query(query,)) File "Impala/tests/beeswax/impala_beeswax.py", line 519, in __do_rpc raise ImpalaBeeswaxException(self.__build_error_message(b), b) ImpalaBeeswaxException: ImpalaBeeswaxException: INNER EXCEPTION: {code} The corresponding error in HMS is: {code:java} 2019-11-22T06:36:59,937 INFO [pool-10-thread-13] metastore.HiveMetaStore: 18: source:127.0.0.1 create_table_req: Table(tableName:child_table, dbName:functional_seq_record_gzip, owner:jenkins, createTime:0, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, type:string, comment:null), FieldSchema(name:a, type:int, comment:null)], location:hdfs://localhost:20500/test-warehouse/child_table, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=,, field.delim=,}), bucketCols:null, sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, OBJCAPABILITIES=EXTREAD,EXTWRITE}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, ownerType:USER, accessType:8) 2019-11-22T06:36:59,937 INFO [pool-10-thread-13] HiveMetaStore.audit: ugi=jenkins ip=127.0.0.1cmd=source:127.0.0.1 create_table_req: Table(tableName:child_table, dbName:functional_seq_record_gzip, owner:jenkins, createTime:0, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, type:string, comment:null), FieldSchema(name:a, type:int, comment:null)], location:hdfs://localhost:20500/test-warehouse/child_table, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=,, field.delim=,}), bucketCols:null, sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, OBJCAPABILITIES=EXTREAD,EXTWRITE}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, ownerType:USER, accessType:8) 2019-11-22T06:36:59,937 INFO [pool-10-thread-13] metastore.MetastoreDefaultTransformer: Starting translation for CreateTable for processor Impala3.4.0-SNAPSHOT@localhost with [EXTWRITE, EXTREAD, HIVEMANAGEDINSERTREAD, HIVEMANAGEDINSERTWRITE, HIVESQL, HIVEMQT, HIVEBUCKET2] on table child_table 2019-11-22T06:36:59,937 INFO [pool-10-thread-13] metastore.MetastoreDefaultTransformer: Table to be created is of type EXTERNAL_TABLE but not MANAGED_TABLE 2019-11-22T06:36:59,937 INFO [pool-10-thread-13] metastore.MetastoreDefaultTransformer: Transformer returning table:Table(tableName:child_table, dbName:functional_seq_record_gzip, owner:jenkins, createTime:0, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, type:string, comment:null), FieldSchema(name:a, type:int, comment:null)],
[jira] [Created] (IMPALA-9184) TestImpalaShellInteractive.test_ddl_queries_are_closed is flaky
Sahil Takiar created IMPALA-9184: Summary: TestImpalaShellInteractive.test_ddl_queries_are_closed is flaky Key: IMPALA-9184 URL: https://issues.apache.org/jira/browse/IMPALA-9184 Project: IMPALA Issue Type: Bug Reporter: Sahil Takiar The following test is flaky: shell.test_shell_interactive.TestImpalaShellInteractive.test_ddl_queries_are_closed[table_format_and_file_extension: ('textfile', '.txt') | protocol: beeswax] (from pytest) Error Message {code:java} AssertionError: drop query should be closed assert >(0) + where > = .wait_for_num_in_flight_queries {code} Stacktrace {code:java} Impala/tests/shell/test_shell_interactive.py:338: in test_ddl_queries_are_closed assert impalad.wait_for_num_in_flight_queries(0), MSG % 'drop' E AssertionError: drop query should be closed E assert >(0) E + where > = .wait_for_num_in_flight_queries {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9213) Client logs should indicate if a query has been retried
Sahil Takiar created IMPALA-9213: Summary: Client logs should indicate if a query has been retried Key: IMPALA-9213 URL: https://issues.apache.org/jira/browse/IMPALA-9213 Project: IMPALA Issue Type: Sub-task Reporter: Sahil Takiar The client logs should give some indication that a query has been retried and should print out information such as the new query id and the link to the retried query on the debug web UI. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-9233) Add impalad level metrics for query retries
Sahil Takiar created IMPALA-9233: Summary: Add impalad level metrics for query retries Key: IMPALA-9233 URL: https://issues.apache.org/jira/browse/IMPALA-9233 Project: IMPALA Issue Type: Sub-task Components: Backend Reporter: Sahil Takiar It would nice to have some impalad level metrics related to query retries. This would help answer questions like - how often are queries retried? how often are the retries actually successful? If queries are constantly being retried, then there is probably something wrong with the cluster. Some possible metrics to add: * Query retry rate (the rate at which queries are retried) ** This can be further divided by retry “type” - e.g. what caused the retry ** Potential categories would be: *** Queries retried due to failed RPCs *** Queries retried due to faulty disks *** Queries retried due to statestore detection of cluster membership changes * A metric that measures how often query retries are actually successful (e.g. if a query is retried, does the retry succeed, or does it just fail again) ** This can help users determine if query retries are actually helping, or just adding overhead (e.g. if retries always fail then something is probably wrong) -- This message was sent by Atlassian Jira (v8.3.4#803005)