[impala] 01/02: IMPALA-9711: incrementally update aggregate profile
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit e60292fb3bd71f25b90119d0d48292f4c49e158f Author: Tim Armstrong AuthorDate: Fri May 15 10:02:15 2020 -0700 IMPALA-9711: incrementally update aggregate profile In order to not cause additional work in the default mode, we still only compute the average once per instance, when it completes or when the query finishes. When --gen_experimental_profile=true, we update the aggregated profile for each status report, so that the live profile can be viewed as the query executes. The implications of this are as follows: * More work is done on the KRPC control service RPC thread (although this is largely moot after part 2 of IMPALA-9382 where we merge into the aggregated profile directly, so avoid the extra update). * For complex multi-stage queries, the profile merging work is done earlier as each stage completes, therefore the critical path of the query is shortened * Multiple RPC threads may be merging profiles concurrently * Multiple threads may be calling AggregatedRuntimeProfile::Update() on the same profile, whereas previously all merging was done by a single thread. I looked through the locking in that function to check correctness. Testing: Ran core tests. Ran a subset of the Python tests under TSAN, confirmed no races were introduced in this code. Change-Id: Ib03e79a40a33d8e74464640ae5f95a1467a6713a Reviewed-on: http://gerrit.cloudera.org:8080/15931 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- be/src/runtime/coordinator-backend-state.cc | 71 - be/src/runtime/coordinator-backend-state.h | 36 +-- be/src/runtime/coordinator.cc | 4 +- be/src/util/runtime-profile.cc | 1 + 4 files changed, 85 insertions(+), 27 deletions(-) diff --git a/be/src/runtime/coordinator-backend-state.cc b/be/src/runtime/coordinator-backend-state.cc index 440a5f0..602bfca 100644 --- a/be/src/runtime/coordinator-backend-state.cc +++ b/be/src/runtime/coordinator-backend-state.cc @@ -381,7 +381,8 @@ bool Coordinator::BackendState::ApplyExecStatusReport( const ReportExecStatusRequestPB& backend_exec_status, const TRuntimeProfileForest& thrift_profiles, ExecSummary* exec_summary, ProgressUpdater* scan_range_progress, DmlExecState* dml_exec_state, -vector* aux_error_info) { +vector* aux_error_info, +const vector& fragment_stats) { DCHECK(!IsEmptyBackend()); // Hold the exec_summary's lock to avoid exposing it half-way through // the update loop below. @@ -478,6 +479,10 @@ bool Coordinator::BackendState::ApplyExecStatusReport( backend_utilization_.exchange_bytes_sent = backend_exec_status.exchange_bytes_sent(); backend_utilization_.scan_bytes_sent = backend_exec_status.scan_bytes_sent(); + // Update state that depends on the instance profile updates we just received. + // Skip this in the edge case where the exec RPC didn't complete. + if (exec_done_) UpdateExecStatsLocked(lock, fragment_stats, /*finalize=*/false); + // status_ has incorporated the status from all fragment instances. If the overall // backend status is not OK, but no specific fragment instance reported an error, then // this is a general backend error. Incorporate the general error into status_. @@ -502,29 +507,23 @@ void Coordinator::BackendState::UpdateHostProfile( } void Coordinator::BackendState::UpdateExecStats( -const vector& fragment_stats) { - lock_guard l(lock_); +const vector& fragment_stats, bool finalize) { + unique_lock l(lock_); + UpdateExecStatsLocked(l, fragment_stats, finalize); +} + +void Coordinator::BackendState::UpdateExecStatsLocked(const unique_lock& lock, +const vector& fragment_stats, bool finalize) { + DCHECK(lock.owns_lock() && lock.mutex() == _); DCHECK(exec_done_) << "May only be called after WaitOnExecRpc() completes."; - for (const auto& entry: instance_stats_map_) { -const InstanceStats& instance_stats = *entry.second; -int fragment_idx = instance_stats.exec_params_.fragment_idx(); -DCHECK_LT(fragment_idx, fragment_stats.size()); -FragmentStats* f = fragment_stats[fragment_idx]; -int64_t completion_time = instance_stats.stopwatch_.ElapsedTime(); -RuntimeProfile::Counter* completion_timer = -PROFILE_CompletionTime.Instantiate(instance_stats.profile_); -completion_timer->Set(completion_time); -if (!FLAGS_gen_experimental_profile) f->completion_times_(completion_time); -if (completion_time > 0) { - RuntimeProfile::Counter* execution_rate_counter = - PROFILE_ExecutionRate.Instantiate(instance_stats.profil
[impala] 02/02: Pin the json-smart version to 2.3
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit d453d52aadcbd158147b906813b22eb2944ac90b Author: Joe McDonnell AuthorDate: Thu Oct 1 17:38:25 2020 -0700 Pin the json-smart version to 2.3 With some maven repositories, Impala builds have been picking up json-smart with version 2.3-SNAPSHOT. This is not intentional (and it doesn't reproduce with public repositories). To improve the consistency of the build, pin the json-smart version to 2.3 with appropriate exclusions to prevent alternate versions. This also fixes up bin/jenkins/get_maven_statistics.sh to handle cases where maven didn't download anything. Testing: - Ran core job Change-Id: Iff92a61c9c3164e7e0c63c7569178415dcba9fb4 Reviewed-on: http://gerrit.cloudera.org:8080/16536 Tested-by: Impala Public Jenkins Reviewed-by: Joe McDonnell --- bin/jenkins/get_maven_statistics.sh | 16 +++--- fe/pom.xml | 62 + shaded-deps/hive-exec/pom.xml | 13 3 files changed, 87 insertions(+), 4 deletions(-) diff --git a/bin/jenkins/get_maven_statistics.sh b/bin/jenkins/get_maven_statistics.sh index aff473f..b7a14a9 100755 --- a/bin/jenkins/get_maven_statistics.sh +++ b/bin/jenkins/get_maven_statistics.sh @@ -32,11 +32,19 @@ MVN_LOG=$1 # Dump how many artifacts were downloaded from each repo echo "Number of artifacts downloaded from each repo:" -cat "${MVN_LOG}" | grep "Downloaded from" | sed 's|.* Downloaded from ||' \ -| cut -d: -f1 | sort | uniq -c +if grep -q "Downloaded from" "${MVN_LOG}"; then + cat "${MVN_LOG}" | grep "Downloaded from" | sed 's|.* Downloaded from ||' \ + | cut -d: -f1 | sort | uniq -c +else + echo "No artifacts downloaded" +fi # Dump how many artifacts we tried to download from each repo echo echo "Number of download attempts (successful or unsuccessful) per repo:" -cat "${MVN_LOG}" | grep "Downloading from" | sed 's|.* Downloading from ||' \ -| cut -d: -f1 | sort | uniq -c +if grep -q "Downloading from" "${MVN_LOG}"; then + cat "${MVN_LOG}" | grep "Downloading from" | sed 's|.* Downloading from ||' \ + | cut -d: -f1 | sort | uniq -c +else + echo "No downloads attempted" +fi diff --git a/fe/pom.xml b/fe/pom.xml index 7729651..00b9c61 100644 --- a/fe/pom.xml +++ b/fe/pom.xml @@ -35,6 +35,13 @@ under the License. Apache Impala Query Engine Frontend + + + net.minidev + json-smart + 2.3 + + org.apache.impala query-event-hook-api @@ -58,6 +65,11 @@ under the License. hadoop-common ${hadoop.version} + + + net.minidev + json-smart + org.eclipse.jetty * @@ -82,6 +94,13 @@ under the License. org.apache.hadoop hadoop-auth ${hadoop.version} + + + + net.minidev + json-smart + + @@ -120,6 +139,13 @@ under the License. org.apache.hadoop hadoop-azure-datalake ${hadoop.version} + + + + net.minidev + json-smart + + @@ -160,6 +186,13 @@ under the License. org.apache.ranger ranger-plugins-common ${ranger.version} + + + + net.minidev + json-smart + + @@ -179,6 +212,11 @@ under the License. org.eclipse.jetty * + + + net.minidev + json-smart + @@ -243,12 +281,26 @@ under the License. org.apache.hbase hbase-client ${hbase.version} + + + + net.minidev + json-smart + + org.apache.hbase hbase-common ${hbase.version} + + + + net.minidev + json-smart + + @@ -895,6 +947,11 @@ under the License. io.netty * + + + net.minidev + json-smart + @@ -911,6 +968,11 @@ under the License. org.apache.logging.log4j log4j-1.2-api + + + net.minidev + json-smart + org.apache.hive hive-serde diff --git a/shaded-deps/hive-exec/pom.xml b/shaded-deps/hive-exec/pom.xml index 43be1a0..33cb897 100644 --- a/shaded-deps/hive-ex
[impala] branch master updated (d09294a -> d453d52)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from d09294a IMPALA-10202: Enable file handle cache for ABFS files new e60292f IMPALA-9711: incrementally update aggregate profile new d453d52 Pin the json-smart version to 2.3 The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: be/src/runtime/coordinator-backend-state.cc | 71 - be/src/runtime/coordinator-backend-state.h | 36 +-- be/src/runtime/coordinator.cc | 4 +- be/src/util/runtime-profile.cc | 1 + bin/jenkins/get_maven_statistics.sh | 16 +-- fe/pom.xml | 62 + shaded-deps/hive-exec/pom.xml | 13 ++ 7 files changed, 172 insertions(+), 31 deletions(-)
[impala] 01/02: IMPALA-8291: Show constraints in DESCRIBE FORMATTED
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 7b55168894d43b8696ac72f50515f6b842556caa Author: Shant Hovsepian AuthorDate: Tue Sep 8 15:27:53 2020 -0400 IMPALA-8291: Show constraints in DESCRIBE FORMATTED Support for displaying primary and foreign key constraints in describe formatted output. The output attempts to be as close to Hive's implementation as possible. Also includes constraint definitions for the TPC-DS test workload. Testing: * Fresh load of testdata * Metadata query tests comparing the output between Impala and Hive Change-Id: I676b69c465c46491f870d7fdc894e7474c030356 Reviewed-on: http://gerrit.cloudera.org:8080/16428 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- .../impala/compat/HiveMetadataFormatUtils.java | 128 ++- .../org/apache/impala/compat/MetastoreShim.java| 11 +- .../impala/service/DescribeResultFactory.java | 11 + testdata/datasets/tpcds/tpcds_schema_template.sql | 880 - tests/metadata/test_metadata_query_statements.py | 13 + 5 files changed, 649 insertions(+), 394 deletions(-) diff --git a/fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java b/fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java index 073031e..a2b1a5e 100644 --- a/fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java +++ b/fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java @@ -45,6 +45,8 @@ import org.apache.hadoop.hive.metastore.api.LongColumnStatsData; import org.apache.hadoop.hive.metastore.api.StorageDescriptor; import org.apache.hadoop.hive.metastore.api.StringColumnStatsData; import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.hive.ql.metadata.PrimaryKeyInfo; +import org.apache.hadoop.hive.ql.metadata.ForeignKeyInfo; import org.apache.hadoop.hive.serde2.io.DateWritable; /** @@ -98,7 +100,7 @@ public class HiveMetadataFormatUtils { private static void formatColumnsHeader(StringBuilder columnInformation, List colStats) { columnInformation.append("# "); // Easy for shell scripts to ignore -formatOutput(getColumnsHeader(colStats), columnInformation, false); +formatOutput(getColumnsHeader(colStats), columnInformation, false, true); columnInformation.append(LINE_DELIM); } @@ -112,26 +114,36 @@ public class HiveMetadataFormatUtils { * contains newlines? */ private static void formatOutput(String[] fields, StringBuilder tableInfo, - boolean isLastLinePadded) { -int[] paddings = new int[fields.length - 1]; -if (fields.length > 1) { - for (int i = 0; i < fields.length - 1; i++) { -if (fields[i] == null) { - tableInfo.append(FIELD_DELIM); - continue; + boolean isLastLinePadded, boolean isFormatted) { +if (!isFormatted) { + for (int i = 0; i < fields.length; i++) { +Object value = StringEscapeUtils.escapeJava(fields[i]); +if (value != null) { + tableInfo.append(value); } -tableInfo.append(String.format("%-" + ALIGNMENT + "s", fields[i])) -.append(FIELD_DELIM); -paddings[i] = ALIGNMENT > fields[i].length() ? ALIGNMENT : fields[i].length(); +tableInfo.append((i == fields.length - 1) ? LINE_DELIM : FIELD_DELIM); } -} -if (fields.length > 0) { - String value = fields[fields.length - 1]; - String unescapedValue = (isLastLinePadded && value != null) ? value - .replaceAll("n|r|rn", "\n") : value; - indentMultilineValue(unescapedValue, tableInfo, paddings, false); } else { - tableInfo.append(LINE_DELIM); + int[] paddings = new int[fields.length - 1]; + if (fields.length > 1) { +for (int i = 0; i < fields.length - 1; i++) { + if (fields[i] == null) { +tableInfo.append(FIELD_DELIM); +continue; + } + tableInfo.append(String.format("%-" + ALIGNMENT + "s", fields[i])) + .append(FIELD_DELIM); + paddings[i] = ALIGNMENT > fields[i].length() ? ALIGNMENT : fields[i].length(); +} + } + if (fields.length > 0) { +String value = fields[fields.length - 1]; +String unescapedValue = (isLastLinePadded && value != null) ? value +.replaceAll("n|r|rn", "\n") : value; +indentMultilineValue(unescapedValue, tableInfo, paddings, false); + } else { +tableInfo.append(LINE_DELIM); + } } } @@ -384,6 +396,75 @@ public class HiveMetadataFormatUtils { return null;
[impala] branch master updated (40777b7 -> 13f50ea)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from 40777b7 IMPALA-9636: Don't run retried query on the blacklisted nodes new 7b55168 IMPALA-8291: Show constraints in DESCRIBE FORMATTED new 13f50ea IMPALA-9229: impala-shell 'profile' to show original and retried queries The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: be/src/service/client-request-state.cc | 1 + be/src/service/client-request-state.h | 12 + be/src/service/impala-beeswax-server.cc| 4 +- be/src/service/impala-hs2-server.cc| 112 ++- be/src/service/impala-http-handler.cc | 11 +- be/src/service/impala-server.cc| 201 +++-- be/src/service/impala-server.h | 67 +- common/thrift/ImpalaService.thrift | 12 + .../impala/compat/HiveMetadataFormatUtils.java | 128 ++- .../org/apache/impala/compat/MetastoreShim.java| 11 +- .../impala/service/DescribeResultFactory.java | 11 + shell/impala_client.py | 18 +- shell/impala_shell.py | 60 +- testdata/datasets/tpcds/tpcds_schema_template.sql | 880 - tests/custom_cluster/test_shell_interactive.py | 81 +- tests/metadata/test_metadata_query_statements.py | 13 + tests/shell/test_shell_commandline.py | 26 +- tests/shell/util.py| 25 + 18 files changed, 1154 insertions(+), 519 deletions(-)
[impala] 02/02: IMPALA-9229: impala-shell 'profile' to show original and retried queries
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 13f50eaec59d2690dd54acda1bba83eb0aacb972 Author: Sahil Takiar AuthorDate: Tue Jul 14 09:07:12 2020 -0700 IMPALA-9229: impala-shell 'profile' to show original and retried queries Currently, the impala-shell 'profile' command only returns the profile for the most recent profile attempt. There is no way to get the original query profile (the profile of the first query attempt that failed) from the impala-shell. This patch modifies TGetRuntimeProfileReq and TGetRuntimeProfileResp to add support for returning both the original and retried profiles for a retried query. When a query is retried, TGetRuntimeProfileResp currently contains the profile for the most recent query attempt. TGetRuntimeProfileReq has a new field called 'include_query_attempts' and when it is set to true, the TGetRuntimeProfileResp will include all failed profiles in a new field called failed_profiles / failed_thrift_profiles. impala-shell has been modified so the 'profile' command has a new set of options. The syntax is now: PROFILE [ALL | LATEST | ORIGINAL] If 'ALL' is specified, both the latest and original profiles are printed. If 'LATEST' is specified, only the latest profile is printed. If 'ORIGINAL' is printed, only the original profile is printed. The default behavior is equivalent to specifying 'LATEST' (which is the current behavior before this patch as well). Support for this has only been added to HS2 given that Beeswax is being deprecated soon. The new 'profile' options have no affect when the Beeswax protocol is used. Most of the code change is in impala-hs2-server and impala-server; a lot of the GetRuntimeProfile code has been re-factored. Testing: * Added new impala-shell tests * Ran core tests Change-Id: I89cee02947b311e7bf9c7274f47dfc7214c1bb65 Reviewed-on: http://gerrit.cloudera.org:8080/16406 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/service/client-request-state.cc | 1 + be/src/service/client-request-state.h | 12 ++ be/src/service/impala-beeswax-server.cc| 4 +- be/src/service/impala-hs2-server.cc| 112 +++--- be/src/service/impala-http-handler.cc | 11 +- be/src/service/impala-server.cc| 201 +++-- be/src/service/impala-server.h | 67 - common/thrift/ImpalaService.thrift | 12 ++ shell/impala_client.py | 18 ++- shell/impala_shell.py | 60 ++-- tests/custom_cluster/test_shell_interactive.py | 81 +- tests/shell/test_shell_commandline.py | 26 +--- tests/shell/util.py| 25 +++ 13 files changed, 505 insertions(+), 125 deletions(-) diff --git a/be/src/service/client-request-state.cc b/be/src/service/client-request-state.cc index a55cb37..60acc3c 100644 --- a/be/src/service/client-request-state.cc +++ b/be/src/service/client-request-state.cc @@ -1511,6 +1511,7 @@ void ClientRequestState::MarkAsRetried(const TUniqueId& retried_id) { summary_profile_->AddInfoString("Retried Query Id", PrintId(retried_id)); UpdateExecState(ExecState::ERROR); block_until_retried_cv_.NotifyOne(); + retried_id_ = make_unique(retried_id); } const string& ClientRequestState::effective_user() const { diff --git a/be/src/service/client-request-state.h b/be/src/service/client-request-state.h index c5a4004..9cbbd2b 100644 --- a/be/src/service/client-request-state.h +++ b/be/src/service/client-request-state.h @@ -380,6 +380,13 @@ class ClientRequestState { return *original_id_; } + /// Can only be called if this query was retried. Returns the query id of the retried + /// query. + const TUniqueId& retried_id() const { +DCHECK(retried_id_ != nullptr); +return *retried_id_; + } + /// Returns the QueryDriver that owns this ClientRequestState. QueryDriver* parent_driver() const { return parent_driver_; } @@ -630,6 +637,11 @@ class ClientRequestState { /// be retried. std::unique_ptr original_id_ = nullptr; + /// Query id of the retried query. The retried query is the new query that is run + /// whenever the original query fails with a retryable error. See 'original_id_' for + /// an explanation of what the "original" query is. + std::unique_ptr retried_id_ = nullptr; + /// Condition variable used to signal any threads that are waiting until the query has /// been retried. ConditionVariable block_until_retried_cv_; diff --git a/be/src/service/impala-beeswax-server.cc b/be/src/service/impala-beeswax-server.cc index f
[impala] branch master updated (5e9f10d -> 2359a1b)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from 5e9f10d IMPALA-10064: Support constant propagation for eligible range predicates new 99e5f5a IMPALA-10133:Implement ds_hll_stringify function. new 2359a1b IMPALA-10119: Fix impala-shell history duplication test The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: be/src/exprs/datasketches-functions-ir.cc | 14 be/src/exprs/datasketches-functions.h | 6 common/function-registry/impala_functions.py | 2 ++ .../queries/QueryTest/datasketches-hll.test| 37 ++ tests/shell/test_shell_interactive.py | 11 --- 5 files changed, 66 insertions(+), 4 deletions(-)
[impala] 02/02: IMPALA-10119: Fix impala-shell history duplication test
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 2359a1be9dc491f6c35fe3415265d4a29d6bc939 Author: Tamas Mate AuthorDate: Tue Sep 1 09:50:44 2020 +0200 IMPALA-10119: Fix impala-shell history duplication test The flaky test was TestImpalaShellInteractive.test_history_does_not_duplicate_on_interrupt The test failed with timeout error when the interrupt signal arrived later after the next test query was started. The impala-shell output was ^C instead of the expected query result. This change adds an additional blocking expect call to wait for the interrupt signal to arrive before sending in the next query. Change-Id: I242eb47cc8093c4566de206f46b75b3feab1183c Reviewed-on: http://gerrit.cloudera.org:8080/16391 Tested-by: Impala Public Jenkins Reviewed-by: Tim Armstrong --- tests/shell/test_shell_interactive.py | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/tests/shell/test_shell_interactive.py b/tests/shell/test_shell_interactive.py index f9668d6..c6fe7e0 100755 --- a/tests/shell/test_shell_interactive.py +++ b/tests/shell/test_shell_interactive.py @@ -516,24 +516,27 @@ class TestImpalaShellInteractive(ImpalaTestSuite): # readline gets its input from tty, so using stdin does not work. shell_cmd = get_shell_cmd(vector) child_proc = spawn_shell(shell_cmd) -# set up history + +# initialize history child_proc.expect(PROMPT_REGEX) child_proc.sendline("select 1;") child_proc.expect("Fetched 1 row\(s\) in [0-9]+\.?[0-9]*s") child_proc.expect(PROMPT_REGEX) child_proc.sendline("quit;") child_proc.wait() + +# create a new shell and send SIGINT child_proc = spawn_shell(shell_cmd) child_proc.expect(PROMPT_REGEX) - -# send SIGINT then quit to save history child_proc.sendintr() +child_proc.expect("\^C") child_proc.sendline("select 2;") child_proc.expect("Fetched 1 row\(s\) in [0-9]+\.?[0-9]*s") +child_proc.expect(PROMPT_REGEX) child_proc.sendline("quit;") child_proc.wait() -# check history in a new instance +# check history in a new shell instance p = ImpalaShell(vector) p.send_cmd('history') result = p.get_result().stderr.splitlines()
[impala] 01/02: IMPALA-10133:Implement ds_hll_stringify function.
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 99e5f5a8859c58641973bc84058eeb15502da96c Author: Adam Tamas AuthorDate: Fri Aug 28 15:50:07 2020 +0200 IMPALA-10133:Implement ds_hll_stringify function. This function receives a string that is a serialized Apache DataSketches HLL sketch and returns its stringified format. A stringified format should look like and contains the following data: select ds_hll_stringify(ds_hll_sketch(float_col)) from functional_parquet.alltypestiny; ++ | ds_hll_stringify(ds_hll_sketch(float_col)) | ++ | ### HLL sketch summary:| | Log Config K : 12 | | Hll Target : HLL_4 | | Current Mode : LIST| | LB : 2 | | Estimate : 2 | | UB : 2.0001 | | OutOfOrder flag: false | | Coupon count : 2 | | ### End HLL sketch summary | || ++ Change-Id: I85dbf20b5114dd75c300eef0accabe90eac240a0 Reviewed-on: http://gerrit.cloudera.org:8080/16382 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/exprs/datasketches-functions-ir.cc | 14 be/src/exprs/datasketches-functions.h | 6 common/function-registry/impala_functions.py | 2 ++ .../queries/QueryTest/datasketches-hll.test| 37 ++ 4 files changed, 59 insertions(+) diff --git a/be/src/exprs/datasketches-functions-ir.cc b/be/src/exprs/datasketches-functions-ir.cc index 4edb83f..1cef6c9 100644 --- a/be/src/exprs/datasketches-functions-ir.cc +++ b/be/src/exprs/datasketches-functions-ir.cc @@ -38,6 +38,20 @@ BigIntVal DataSketchesFunctions::DsHllEstimate(FunctionContext* ctx, return sketch.get_estimate(); } +StringVal DataSketchesFunctions::DsHllStringify(FunctionContext* ctx, +const StringVal& serialized_sketch) { + if (serialized_sketch.is_null || serialized_sketch.len == 0) return StringVal::null(); + datasketches::hll_sketch sketch(DS_SKETCH_CONFIG, DS_HLL_TYPE); + if (!DeserializeDsSketch(serialized_sketch, )) { +LogSketchDeserializationError(ctx); +return StringVal::null(); + } + string str = sketch.to_string(true, false, false, false); + StringVal dst(ctx, str.size()); + memcpy(dst.ptr, str.c_str(), str.size()); + return dst; +} + FloatVal DataSketchesFunctions::DsKllQuantile(FunctionContext* ctx, const StringVal& serialized_sketch, const DoubleVal& rank) { if (serialized_sketch.is_null || serialized_sketch.len == 0) return FloatVal::null(); diff --git a/be/src/exprs/datasketches-functions.h b/be/src/exprs/datasketches-functions.h index c35c3f4..91d9313 100644 --- a/be/src/exprs/datasketches-functions.h +++ b/be/src/exprs/datasketches-functions.h @@ -35,6 +35,12 @@ public: static BigIntVal DsHllEstimate(FunctionContext* ctx, const StringVal& serialized_sketch); + /// 'serialized_sketch' is expected as a serialized Apache DataSketches HLL sketch. If + /// it is not, then the query fails. This function returns the stringified format of + /// an Apache DataSketches HLL sketch. + static StringVal DsHllStringify(FunctionContext* ctx, + const StringVal& serialized_sketch); + /// 'serialized_sketch' is expected as a serialized Apache DataSketches KLL sketch. If /// it is not, then the query fails. 'rank' is used to identify which item (estimate) /// to return from the sketched dataset. E.g. 0.1 means the item where 10% of the diff --git a/common/function-registry/impala_functions.py b/common/function-registry/impala_functions.py index 93a2926..6a644fe 100644 --- a/common/function-registry/impala_functions.py +++ b/common/function-registry/impala_functions.py @@ -933,6 +933,8 @@ visible_functions = [ # Functions to use Apache DataSketches functionality [['ds_hll_estimate'], 'BIGINT', ['STRING'], '_ZN6impala21DataSketchesFunctions13DsHllEstimateEPN10impala_udf15FunctionContextERKNS1_9StringValE'], + [['ds_hll_stringify'], 'STRING', ['STRING'], + '_ZN6impala21DataSketchesFunctions14DsHllStringifyEPN10impala_udf15FunctionContextERKNS1_9StringValE'], [['ds_kll_quantile'], 'FLOAT', ['STRING', 'DOUBLE'], '_ZN6impala21DataSketchesFunctions13DsKllQuantileEPN10impala_udf15FunctionContextERKNS1_9StringValERKNS1_9DoubleValE'], [['ds_kll_n'], 'BIGINT', ['STRING'], diff --git a/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test b/testdata/
[impala] branch master updated: IMPALA-10106: Upgrade DataSketches to version 2.1.0
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new f993654 IMPALA-10106: Upgrade DataSketches to version 2.1.0 f993654 is described below commit f9936549dcab58390c5662ebdedb9c60838185a4 Author: Adam Tamas AuthorDate: Tue Aug 25 11:46:07 2020 +0200 IMPALA-10106: Upgrade DataSketches to version 2.1.0 Upgrade the external DataSketches files for HLL/KLL to version 2.1.0 tests: -Ran the tests from tests/query_test/test_datasketches.py Change-Id: I4faa31c0b628a62c7e56a6c4b9549d0aaa8a02ff Reviewed-on: http://gerrit.cloudera.org:8080/16360 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/thirdparty/datasketches/README.md | 7 +- .../datasketches/kll_quantile_calculator.hpp | 37 +++-- .../datasketches/kll_quantile_calculator_impl.hpp | 159 + be/src/thirdparty/datasketches/kll_sketch_impl.hpp | 23 +-- 4 files changed, 106 insertions(+), 120 deletions(-) diff --git a/be/src/thirdparty/datasketches/README.md b/be/src/thirdparty/datasketches/README.md index d5c56ce..a838c0b 100644 --- a/be/src/thirdparty/datasketches/README.md +++ b/be/src/thirdparty/datasketches/README.md @@ -8,9 +8,8 @@ changed during this process as originally the following folders were affected: I copied the content of these folders into the same directory so that Impala can compile them without rewriting the include paths in the files themselves. -The git hash of the snapshot I used as a source for the files: -c67d92faad3827932ca3b5d864222e64977f2c20 +The git branch of the snapshot I used as a source for the files: 2.1.0-incubating +The hash: c1a6f8edb49699520f248d3d02019b87429b4241 Browse the source files here: -https://github.com/apache/incubator-datasketches-cpp - +https://github.com/apache/incubator-datasketches-cpp/tree/2.1.0-incubating-rc1 diff --git a/be/src/thirdparty/datasketches/kll_quantile_calculator.hpp b/be/src/thirdparty/datasketches/kll_quantile_calculator.hpp index f77071e..bc60f26 100644 --- a/be/src/thirdparty/datasketches/kll_quantile_calculator.hpp +++ b/be/src/thirdparty/datasketches/kll_quantile_calculator.hpp @@ -26,31 +26,38 @@ namespace datasketches { template class kll_quantile_calculator { - typedef typename std::allocator_traits::template rebind_alloc AllocU32; - typedef typename std::allocator_traits::template rebind_alloc AllocU64; public: // assumes that all levels are sorted including level 0 kll_quantile_calculator(const T* items, const uint32_t* levels, uint8_t num_levels, uint64_t n); -~kll_quantile_calculator(); T get_quantile(double fraction) const; private: +using AllocU32 = typename std::allocator_traits::template rebind_alloc; +using vector_u32 = std::vector; +using Entry = std::pair; +using AllocEntry = typename std::allocator_traits::template rebind_alloc; +using Container = std::vector; uint64_t n_; -T* items_; -uint64_t* weights_; -uint32_t* levels_; -uint8_t levels_size_; -uint8_t num_levels_; +vector_u32 levels_; +Container entries_; -void populate_from_sketch(const T* items, uint32_t num_items, const uint32_t* levels, uint8_t num_levels); +void populate_from_sketch(const T* items, const uint32_t* levels, uint8_t num_levels); T approximately_answer_positional_query(uint64_t pos) const; -static void convert_to_preceding_cummulative(uint64_t* weights, uint32_t weights_size); +void convert_to_preceding_cummulative(); +uint32_t chunk_containing_pos(uint64_t pos) const; +uint32_t search_for_chunk_containing_pos(uint64_t pos, uint32_t l, uint32_t r) const; +static void merge_sorted_blocks(Container& entries, const uint32_t* levels, uint8_t num_levels, uint32_t num_items); +static void merge_sorted_blocks_direct(Container& orig, Container& temp, const uint32_t* levels, uint8_t starting_level, uint8_t num_levels); +static void merge_sorted_blocks_reversed(Container& orig, Container& temp, const uint32_t* levels, uint8_t starting_level, uint8_t num_levels); static uint64_t pos_of_phi(double phi, uint64_t n); -static uint32_t chunk_containing_pos(uint64_t* weights, uint32_t weights_size, uint64_t pos); -static uint32_t search_for_chunk_containing_pos(const uint64_t* arr, uint64_t pos, uint32_t l, uint32_t r); -static void blocky_tandem_merge_sort(T* items, uint64_t* weights, uint32_t num_items, const uint32_t* levels, uint8_t num_levels); -static void blocky_tandem_merge_sort_recursion(T* items_src, uint64_t* weights_src, T* items_dst, uint64_t* weights_dst, const uint32_t* levels, uint8_t starting_level, uint8_t num_levels); -static void tandem_merge(const T* items_src, const uint6
[impala] 02/02: IMPALA-10121: Generate JUnitXML for TSAN messages
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 106dea63ba2f21ea43a580363445d4ad79a9c87c Author: Joe McDonnell AuthorDate: Tue Sep 1 09:40:16 2020 -0700 IMPALA-10121: Generate JUnitXML for TSAN messages This adds logic in bin/jenkins/finalize.sh to check the ERROR log for TSAN messages (i.e. WARNING: ThreadSanitizer: ...) and generate a JUnitXML with the message. This happens when TSAN aborts Impala. Testing: - Ran TSAN build (which is currently failing) Change-Id: I44ea33a78482499decae0ec4c7c44513094b2f44 Reviewed-on: http://gerrit.cloudera.org:8080/16397 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- bin/jenkins/finalize.sh | 25 +++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/bin/jenkins/finalize.sh b/bin/jenkins/finalize.sh index 2c216e7..c8ca198 100755 --- a/bin/jenkins/finalize.sh +++ b/bin/jenkins/finalize.sh @@ -72,14 +72,35 @@ function check_for_asan_error { fi } -# Check for AddressSanitizer messages. ASAN errors can show up in ERROR logs -# (particularly for impalad). Some backend tests generate ERROR logs. +function check_for_tsan_error { + ERROR_LOG=${1} + if grep -q "WARNING: ThreadSanitizer:" ${ERROR_LOG} ; then +# Extract out the TSAN message from the log file into a temp file. +# Starts with WARNING: ThreadSanitizer and then ends with a line with several '=' +# characters (currently 18, we match 10). +tmp_tsan_output=$(mktemp) +sed -n '/ThreadSanitizer:/,/==/p' ${ERROR_LOG} > "${tmp_tsan_output}" +# Make each TSAN issue use its own JUnitXML file by including the log filename +# in the step. +base=$(basename ${ERROR_LOG}) +"${IMPALA_HOME}"/bin/generate_junitxml.py --phase finalize \ + --step "tsan_error_${base}" \ + --error "Thread Sanitizer message detected in ${ERROR_LOG}" \ + --stderr "$(cat ${tmp_tsan_output})" +rm "${tmp_tsan_output}" + fi +} + +# Check for AddressSanitizer/ThreadSanitizer messages. ASAN/TSAN errors can show up +# in ERROR logs (particularly for impalad). Some backend tests generate ERROR logs. for error_log in $(find $LOGS_DIR -name "*ERROR*"); do check_for_asan_error ${error_log} + check_for_tsan_error ${error_log} done # Backend tests can also generate output in logs/be_tests/LastTest.log if [[ -f ${LOGS_DIR}/be_tests/LastTest.log ]]; then check_for_asan_error ${LOGS_DIR}/be_tests/LastTest.log + check_for_tsan_error ${LOGS_DIR}/be_tests/LastTest.log fi # Check for DCHECK messages. DCHECKs translate into CHECKs, which log at FATAL level
[impala] 01/02: IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 329bb41294a57bfd63dc0d90d57966e8562686b1 Author: Zoltan Borok-Nagy AuthorDate: Fri Aug 28 18:01:36 2020 +0200 IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files Currently Impala checks file metadata 'hive.acid.version' to decide the full ACID schema. There are cases when Hive forgets to set this value for full ACID files, e.g. query-based compactions. So it's more robust to check the schema elements instead of the metadata field. Also, sometimes Hive write the schema with different character cases, e.g. originalTransaction vs originaltransaction, so we should rather compare the column names in a case insensitive way. Testing: * added test for full ACID compaction * added test_full_acid_schema_without_file_metadata_tag to test full ACID file without metadata 'hive.acid.version' Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14 Reviewed-on: http://gerrit.cloudera.org:8080/16383 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/exec/hdfs-orc-scanner.cc| 8 ++--- be/src/exec/orc-metadata-utils.cc | 32 +- be/src/exec/orc-metadata-utils.h | 17 ++ testdata/data/README | 5 +++ .../data/full_acid_schema_but_no_acid_version.orc | Bin 0 -> 545 bytes .../queries/QueryTest/acid-compaction.test | 37 + tests/query_test/test_acid.py | 16 + 7 files changed, 88 insertions(+), 27 deletions(-) diff --git a/be/src/exec/hdfs-orc-scanner.cc b/be/src/exec/hdfs-orc-scanner.cc index c3e0baa..fe86013 100644 --- a/be/src/exec/hdfs-orc-scanner.cc +++ b/be/src/exec/hdfs-orc-scanner.cc @@ -190,8 +190,9 @@ Status HdfsOrcScanner::Open(ScannerContext* context) { RETURN_IF_ERROR(footer_status); bool is_table_full_acid = scan_node_->hdfs_table()->IsTableFullAcid(); - bool is_file_full_acid = reader_->hasMetadataValue(HIVE_ACID_VERSION_KEY) && - reader_->getMetadataValue(HIVE_ACID_VERSION_KEY) == "2"; + schema_resolver_.reset(new OrcSchemaResolver(*scan_node_->hdfs_table(), + _->getType(), filename(), is_table_full_acid)); + bool is_file_full_acid = schema_resolver_->HasFullAcidV2Schema(); acid_original_file_ = is_table_full_acid && !is_file_full_acid; if (is_table_full_acid) { acid_write_id_range_ = valid_write_ids_.GetWriteIdRange(filename()); @@ -218,9 +219,6 @@ Status HdfsOrcScanner::Open(ScannerContext* context) { filename())); } } - schema_resolver_.reset(new OrcSchemaResolver(*scan_node_->hdfs_table(), - _->getType(), filename(), is_table_full_acid, is_file_full_acid)); - RETURN_IF_ERROR(schema_resolver_->ValidateFullAcidFileSchema()); // Hive Streaming Ingestion allocates multiple write ids, hence create delta directories // like delta_5_10. Then it continuously appends new stripes (and footers) to the diff --git a/be/src/exec/orc-metadata-utils.cc b/be/src/exec/orc-metadata-utils.cc index aa81d7d..400bba0 100644 --- a/be/src/exec/orc-metadata-utils.cc +++ b/be/src/exec/orc-metadata-utils.cc @@ -17,9 +17,13 @@ #include "exec/orc-metadata-utils.h" +#include + #include "util/debug-util.h" #include "common/names.h" +using boost::algorithm::iequals; + namespace impala { Status OrcSchemaResolver::BuildSchemaPaths(int num_partition_keys, @@ -90,7 +94,6 @@ Status OrcSchemaResolver::ResolveColumn(const SchemaPath& col_path, *node = root_; *pos_field = false; *missing_field = false; - DCHECK_OK(ValidateFullAcidFileSchema()); // Should have already been validated. if (col_path.empty()) return Status::OK(); SchemaPath table_path, file_path; TranslateColPaths(col_path, _path, _path); @@ -318,28 +321,27 @@ bool OrcSchemaResolver::IsAcidColumn(const SchemaPath& col_path) const { col_path.front() >= num_part_cols && col_path.front() < num_part_cols + 5; } -Status OrcSchemaResolver::ValidateFullAcidFileSchema() const { - if (!is_file_full_acid_) return Status::OK(); - string error_msg = Substitute("File %0 should have full ACID schema.", filename_); - if (root_->getKind() != orc::TypeKind::STRUCT) return Status(error_msg); - if (root_->getSubtypeCount() != 6) return Status(error_msg); +void OrcSchemaResolver::DetermineFullAcidSchema() { + is_file_full_acid_ = false; + if (root_->getKind() != orc::TypeKind::STRUCT) return; + if (root_->getSubtypeCount() != 6) return; if (root_->getSubtype(0)->getKind() != orc::TypeKind::INT || root_->getSubtyp
[impala] branch master updated (69d0d0a -> 106dea6)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from 69d0d0a IMPALA-10087: IMPALA-6050 causes alluxio not to be supported new 329bb41 IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files new 106dea6 IMPALA-10121: Generate JUnitXML for TSAN messages The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: be/src/exec/hdfs-orc-scanner.cc| 8 ++--- be/src/exec/orc-metadata-utils.cc | 32 +- be/src/exec/orc-metadata-utils.h | 17 ++ bin/jenkins/finalize.sh| 25 -- testdata/data/README | 5 +++ .../data/full_acid_schema_but_no_acid_version.orc | Bin 0 -> 545 bytes .../queries/QueryTest/acid-compaction.test | 37 + tests/query_test/test_acid.py | 16 + 8 files changed, 111 insertions(+), 29 deletions(-) create mode 100644 testdata/data/full_acid_schema_but_no_acid_version.orc
[impala] branch master updated (f85dbff -> 69d0d0a)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from f85dbff IMPALA-10030: Remove unnecessary jar dependencies new f4273a4 IMPALA-7310: Partial fix for NDV cardinality with NULLs. new 69d0d0a IMPALA-10087: IMPALA-6050 causes alluxio not to be supported The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .../java/org/apache/impala/analysis/SlotRef.java | 24 +- .../org/apache/impala/common/FileSystemUtil.java | 4 +- .../impala/analysis/ExprCardinalityTest.java | 22 +- .../org/apache/impala/analysis/ExprNdvTest.java| 16 +- .../apache/impala/common/FileSystemUtilTest.java | 8 + .../org/apache/impala/planner/CardinalityTest.java | 41 +- .../queries/PlannerTest/tpcds/tpcds-q04.test | 840 +++-- .../queries/PlannerTest/tpcds/tpcds-q11.test | 636 8 files changed, 806 insertions(+), 785 deletions(-)
[impala] 02/02: IMPALA-10087: IMPALA-6050 causes alluxio not to be supported
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 69d0d0af471f7013627ead3dff86a402ebc263a6 Author: abeltian AuthorDate: Fri Aug 28 15:07:20 2020 +0800 IMPALA-10087: IMPALA-6050 causes alluxio not to be supported This change adds file type support for alluxio. Alluxio URLs have a different prefix such as:alluxio://zk@zk-1:2181,zk-2:2181,zk-3:2181/path/ Testing: Add unit test for alluxio file system type checks. Change-Id: Id92ec9cb0ee241a039fe4a96e1bc2ab3eaaf8f77 Reviewed-on: http://gerrit.cloudera.org:8080/16379 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- fe/src/main/java/org/apache/impala/common/FileSystemUtil.java | 4 +++- fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java | 8 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java b/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java index 38b1ddd..b4a41b2 100644 --- a/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java +++ b/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java @@ -422,7 +422,8 @@ public class FileSystemUtil { HDFS, LOCAL, S3, -OZONE; +OZONE, +ALLUXIO; private static final Map SCHEME_TO_FS_MAPPING = ImmutableMap.builder() @@ -433,6 +434,7 @@ public class FileSystemUtil { .put("hdfs", HDFS) .put("s3a", S3) .put("o3fs", OZONE) +.put("alluxio", ALLUXIO) .build(); /** diff --git a/fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java b/fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java index 030961a..da5e11e 100644 --- a/fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java +++ b/fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java @@ -21,6 +21,7 @@ import static org.apache.impala.common.FileSystemUtil.HIVE_TEMP_FILE_PREFIX; import static org.apache.impala.common.FileSystemUtil.isIgnoredDir; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertTrue; +import static org.junit.Assert.assertEquals; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.Path; @@ -84,6 +85,13 @@ public class FileSystemUtilTest { assertFalse(isIgnoredDir(new Path(TEST_TABLE_PATH + "/part=100/datafile"))); } + @Test + public void testAlluxioFsType() { +Path path = new Path("alluxio://zk@zk-1:2181,zk-2:2181,zk-3:2181/path/"); +assertEquals(FileSystemUtil.FsType.ALLUXIO, +FileSystemUtil.FsType.getFsType(path.toUri().getScheme())); + } + private boolean testIsInIgnoredDirectory(Path input) { return testIsInIgnoredDirectory(input, true); }
[impala] 01/02: Fix concurrency for docker-based tests on 140+GB memory machines
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 19f16a0f4889a59f7785bb88d059d2d8c335988d Author: Joe McDonnell AuthorDate: Sun Aug 9 19:38:42 2020 -0700 Fix concurrency for docker-based tests on 140+GB memory machines A prior change increased the suite concurrency for the docker-based tests on machines with 140+GB of memory. This new rung should also bump the parallel test concurrency (i.e. for parallel EE tests). This sets the parallel test concurrency to 12 for this rung (which is what we use for the 95GB-140GB rung). Testing: - Ran test-with-docker.py on a m5.12xlarge Change-Id: Ib7299abd585da9ba1a838640dadc0bef9c72a39b Reviewed-on: http://gerrit.cloudera.org:8080/16326 Reviewed-by: Laszlo Gaal Tested-by: Joe McDonnell --- docker/test-with-docker.py | 1 + 1 file changed, 1 insertion(+) diff --git a/docker/test-with-docker.py b/docker/test-with-docker.py index b348d3b..35f64aa 100755 --- a/docker/test-with-docker.py +++ b/docker/test-with-docker.py @@ -253,6 +253,7 @@ def _compute_defaults(): if total_memory_gb >= 140: suite_concurrency = 6 memlimit_gb = 11 +parallel_test_concurrency = min(cpus, 12) elif total_memory_gb >= 95: suite_concurrency = 4 memlimit_gb = 11
[impala] branch master updated (f95f794 -> ac63e19)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from f95f794 IMPALA-10017: Implement ds_kll_union() function new 19f16a0 Fix concurrency for docker-based tests on 140+GB memory machines new ac63e19 IMPALA-10043: Keep more logs when using EE_TEST_SHARDS The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: bin/run-all-tests.sh | 5 + docker/test-with-docker.py | 1 + 2 files changed, 6 insertions(+)
[impala] 02/02: IMPALA-10043: Keep more logs when using EE_TEST_SHARDS
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit ac63e19e0d3c797b08dcf80053fc8e3259d8472d Author: Joe McDonnell AuthorDate: Wed Aug 5 14:17:54 2020 -0700 IMPALA-10043: Keep more logs when using EE_TEST_SHARDS IMPALA-9887 introduced the EE_TEST_SHARDS setting, which splits the end-to-end test into shards and restarts Impala in between. In order to keep the logs from all the shards, the value for max_log_files needs to be increased so that none get aged out. This multiplies IMPALA_MAX_LOG_FILES by the number of shards using EE_TEST_SHARDS. Testing: - Ran a test with EE_TEST_SHARDS=6 and verified that the logs are preserved. Change-Id: Ie011b892cd2eb1a528012ec5600e72e44f281a88 Reviewed-on: http://gerrit.cloudera.org:8080/16297 Tested-by: Impala Public Jenkins Reviewed-by: Laszlo Gaal --- bin/run-all-tests.sh | 5 + 1 file changed, 5 insertions(+) diff --git a/bin/run-all-tests.sh b/bin/run-all-tests.sh index 5287861..74f65a9 100755 --- a/bin/run-all-tests.sh +++ b/bin/run-all-tests.sh @@ -254,6 +254,10 @@ do # Some test frameworks (e.g. the docker-based tests) use this. run_ee_tests else + # Increase the maximum number of log files so that the logs from the shards + # don't get aged out. Multiply the default number by the number of shards. + IMPALA_MAX_LOG_FILES_SAVE="${IMPALA_MAX_LOG_FILES:-10}" + export IMPALA_MAX_LOG_FILES="$((${EE_TEST_SHARDS} * ${IMPALA_MAX_LOG_FILES_SAVE}))" # When the EE tests are sharded, it runs 1/Nth of the tests at a time, restarting # Impala between the shards. There are two benefits: # 1. It isolates errors so that if Impala crashes, the next shards will still run @@ -268,6 +272,7 @@ do run_ee_tests "--shard_tests=$shard_idx/${EE_TEST_SHARDS}" start_impala_cluster done + export IMPALA_MAX_LOG_FILES="${IMPALA_MAX_LOG_FILES_SAVE}" fi fi
[impala] branch master updated: IMPALA-9645 Port LLVM codegen to adapt aarch64
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new fab251e IMPALA-9645 Port LLVM codegen to adapt aarch64 fab251e is described below commit fab251efe3de449d22439dd17798cd414168748c Author: zhaorenhai AuthorDate: Sun Apr 12 12:05:52 2020 + IMPALA-9645 Port LLVM codegen to adapt aarch64 On aarch64, the Lowered type of struct {bool, int128} is form { {i8}, {i128} }. No padding add. This is different with x86-64, which is { {i8}, {15*i8}, {i128} } with padding add automatically. And here also add some type conversion between x86 and aarch64 data types. And also add some aarch64 cpu's feature. Change-Id: I3f30ee84ea9bf5245da88154632bb69079103d11 Reviewed-on: http://gerrit.cloudera.org:8080/15718 Tested-by: Impala Public Jenkins Reviewed-by: Tim Armstrong --- be/src/codegen/codegen-anyval.cc | 121 +++ be/src/codegen/llvm-codegen.cc | 7 +++ be/src/exec/text-converter.cc| 19 ++ be/src/exprs/scalar-fn-call.cc | 39 + 4 files changed, 175 insertions(+), 11 deletions(-) diff --git a/be/src/codegen/codegen-anyval.cc b/be/src/codegen/codegen-anyval.cc index 66d79e7..1346f95 100644 --- a/be/src/codegen/codegen-anyval.cc +++ b/be/src/codegen/codegen-anyval.cc @@ -41,28 +41,56 @@ const char* CodegenAnyVal::LLVM_COLLECTIONVAL_NAME = "struct.impala_udf::Collect llvm::Type* CodegenAnyVal::GetLoweredType(LlvmCodeGen* cg, const ColumnType& type) { switch (type.type) { case TYPE_BOOLEAN: // i16 +#ifndef __aarch64__ return cg->i16_type(); +#else + return cg->i64_type(); +#endif case TYPE_TINYINT: // i16 +#ifndef __aarch64__ return cg->i16_type(); +#else + return cg->i64_type(); +#endif case TYPE_SMALLINT: // i32 +#ifndef __aarch64__ return cg->i32_type(); +#else + return cg->i64_type(); +#endif case TYPE_INT: // i64 return cg->i64_type(); case TYPE_BIGINT: // { i8, i64 } +#ifndef __aarch64__ return llvm::StructType::get(cg->i8_type(), cg->i64_type()); +#else + return llvm::ArrayType::get(cg->i64_type(), 2); +#endif case TYPE_FLOAT: // i64 return cg->i64_type(); case TYPE_DOUBLE: // { i8, double } +#ifndef __aarch64__ return llvm::StructType::get(cg->i8_type(), cg->double_type()); +#else + return llvm::ArrayType::get(cg->i64_type(), 2); +#endif case TYPE_STRING: // { i64, i8* } case TYPE_VARCHAR: // { i64, i8* } case TYPE_CHAR: // Uses StringVal, so same as STRING/VARCHAR. case TYPE_FIXED_UDA_INTERMEDIATE: // { i64, i8* } case TYPE_ARRAY: // CollectionVal has same memory layout as StringVal. case TYPE_MAP: // CollectionVal has same memory layout as StringVal. +#ifndef __aarch64__ return llvm::StructType::get(cg->i64_type(), cg->ptr_type()); +#else + return llvm::ArrayType::get(cg->i64_type(), 2); +#endif case TYPE_TIMESTAMP: // { i64, i64 } +#ifndef __aarch64__ return llvm::StructType::get(cg->i64_type(), cg->i64_type()); +#else + return llvm::ArrayType::get(cg->i64_type(), 2); +#endif case TYPE_DECIMAL: // %"struct.impala_udf::DecimalVal" (isn't lowered) // = { {i8}, [15 x i8], {i128} } return cg->GetNamedType(LLVM_DECIMALVAL_NAME); @@ -198,9 +226,14 @@ llvm::Value* CodegenAnyVal::GetIsNull(const char* name) const { case TYPE_BIGINT: case TYPE_DOUBLE: { // Lowered type is of form { i8, * }. Get the i8 value. - llvm::Value* is_null_i8 = builder_->CreateExtractValue(value_, 0); - DCHECK(is_null_i8->getType() == codegen_->i8_type()); - return builder_->CreateTrunc(is_null_i8, codegen_->bool_type(), name); + // On aarch64, Lowered type is of form { i64, * } + llvm::Value* is_null = builder_->CreateExtractValue(value_, 0); +#ifndef __aarch64__ + DCHECK(is_null->getType() == codegen_->i8_type()); +#else + DCHECK(is_null->getType() == codegen_->i64_type()); +#endif + return builder_->CreateTrunc(is_null, codegen_->bool_type(), name); } case TYPE_DECIMAL: { // Lowered type is of the form { {i8}, ... } @@ -240,8 +273,14 @@ void CodegenAnyVal::SetIsNull(llvm::Value* is_null) { case TYPE_BIGINT: case TYPE_DOUBLE: { // Lowered type is of form { i8, * }. Set the i8 value to 'is_null'. + // On aarch64, lowered type is of form { i64, * } +#ifndef __aarch64__ llvm::Value* is_null_ext = builder_->CreateZExt(is_null, codegen_->i8_type(), "is_null_ext"); +#else + llvm::Value* is_null_ext = + builder_->CreateZExt(is_null, codegen_->i64
[impala] branch master updated (dbbd403 -> bbec044)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from dbbd403 IMPALA-10005: Fix Snappy decompression for non-block filesystems new 86b70e9 IMPALA-9851: Truncate long error message. new 7a6469e IMPALA-10053: Remove uses of MonoTime::GetDeltaSince() new bbec044 IMPALA-10044: Fix cleanup for bootstrap_toolchain.py failure case The 3 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: be/src/runtime/bufferpool/buffer-pool-internal.h | 3 + be/src/runtime/bufferpool/buffer-pool-test.cc| 54 +++ be/src/runtime/bufferpool/buffer-pool.cc | 24 +++-- be/src/runtime/bufferpool/buffer-pool.h | 1 + be/src/runtime/krpc-data-stream-recvr.cc | 3 +- be/src/runtime/krpc-data-stream-sender.cc| 4 + be/src/service/data-stream-service.cc| 2 +- be/src/util/error-util-test.cc | 7 ++ be/src/util/error-util.cc| 112 --- be/src/util/error-util.h | 11 ++- be/src/util/internal-queue.h | 13 +++ bin/bootstrap_toolchain.py | 3 +- 12 files changed, 172 insertions(+), 65 deletions(-)
[impala] 01/03: IMPALA-9851: Truncate long error message.
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 86b70e9850cce0b45194a64cd89ae21df0e82029 Author: Riza Suminto AuthorDate: Wed Aug 5 17:03:08 2020 -0700 IMPALA-9851: Truncate long error message. Error message length was unbounded and can grow very large into couple of MB in size. This patch truncate error message to maximum 128kb in size. This patch also fix potentially long error message related to BufferPool::Client::DebugString(). Before this patch, DebugString() will print all pages in 'pinned_pages_', 'dirty_unpinned_pages_', and 'in_flight_write_pages_' PageList. With this patch, DebugString() only include maximum of 100 first pages in each PageList. Testing: - Add be test BufferPoolTest.ShortDebugString - Add test within ErrorMsg.GenericFormatting to test for truncation. - Run and pass core tests. Change-Id: Ic9fa4d024fb3dc9de03c7484f41b5e420a710e5a Reviewed-on: http://gerrit.cloudera.org:8080/16300 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/runtime/bufferpool/buffer-pool-internal.h | 3 + be/src/runtime/bufferpool/buffer-pool-test.cc| 54 +++ be/src/runtime/bufferpool/buffer-pool.cc | 24 +++-- be/src/runtime/bufferpool/buffer-pool.h | 1 + be/src/util/error-util-test.cc | 7 ++ be/src/util/error-util.cc| 112 --- be/src/util/error-util.h | 11 ++- be/src/util/internal-queue.h | 13 +++ 8 files changed, 163 insertions(+), 62 deletions(-) diff --git a/be/src/runtime/bufferpool/buffer-pool-internal.h b/be/src/runtime/bufferpool/buffer-pool-internal.h index c2caf7b..20d7767 100644 --- a/be/src/runtime/bufferpool/buffer-pool-internal.h +++ b/be/src/runtime/bufferpool/buffer-pool-internal.h @@ -182,6 +182,9 @@ class BufferPool::PageList { } void Iterate(boost::function fn) { list_.Iterate(fn); } + void IterateFirstN(boost::function fn, int n) { +list_.IterateFirstN(fn, n); + } bool Contains(Page* page) { return list_.Contains(page); } Page* tail() { return list_.tail(); } bool empty() const { return list_.empty(); } diff --git a/be/src/runtime/bufferpool/buffer-pool-test.cc b/be/src/runtime/bufferpool/buffer-pool-test.cc index 2c9add7..611963c 100644 --- a/be/src/runtime/bufferpool/buffer-pool-test.cc +++ b/be/src/runtime/bufferpool/buffer-pool-test.cc @@ -2353,6 +2353,60 @@ TEST_F(BufferPoolTest, BufferPoolGc) { buffer_pool->FreeBuffer(, ); buffer_pool->DeregisterClient(); } + +/// IMPALA-9851: Cap the number of pages that can be printed at +/// BufferPool::Client::DebugString(). +TEST_F(BufferPoolTest, ShortDebugString) { + // Allocate pages more than BufferPool::MAX_PAGE_ITER_DEBUG. + int num_pages = 105; + int64_t max_page_len = TEST_BUFFER_LEN; + int64_t total_mem = num_pages * max_page_len; + global_reservations_.InitRootTracker(NULL, total_mem); + BufferPool pool(test_env_->metrics(), TEST_BUFFER_LEN, total_mem, total_mem); + BufferPool::ClientHandle client; + ASSERT_OK(pool.RegisterClient("test client", NULL, _reservations_, NULL, + total_mem, NewProfile(), )); + ASSERT_TRUE(client.IncreaseReservation(total_mem)); + + vector handles(num_pages); + + // Create pages of various valid sizes. + for (int i = 0; i < num_pages; ++i) { +int64_t page_len = TEST_BUFFER_LEN; +int64_t used_before = client.GetUsedReservation(); +ASSERT_OK(pool.CreatePage(, page_len, [i])); +ASSERT_TRUE(handles[i].is_open()); +ASSERT_TRUE(handles[i].is_pinned()); +const BufferHandle* buffer; +ASSERT_OK(handles[i].GetBuffer()); +ASSERT_TRUE(buffer->data() != NULL); +ASSERT_EQ(handles[i].len(), page_len); +ASSERT_EQ(buffer->len(), page_len); +ASSERT_EQ(client.GetUsedReservation(), used_before + page_len); + } + + // Verify that only subset of pages are included in DebugString(). + string page_count_substr = Substitute( + "$0 out of $1 pinned pages:", BufferPool::MAX_PAGE_ITER_DEBUG, num_pages); + string debug_string = client.DebugString(); + ASSERT_NE(debug_string.find(page_count_substr), string::npos) + << page_count_substr << " not found at BufferPool::Client::DebugString(). " + << debug_string; + + // Close the handles and check memory consumption. + for (int i = 0; i < num_pages; ++i) { +int64_t used_before = client.GetUsedReservation(); +int page_len = handles[i].len(); +pool.DestroyPage(, [i]); +ASSERT_EQ(client.GetUsedReservation(), used_before - page_len); + } + + pool.DeregisterClient(); + + // All the reservations should be released at this point. + ASSERT_EQ(global_reservations_.GetReservation(), 0); + global_
[impala] 02/03: IMPALA-10053: Remove uses of MonoTime::GetDeltaSince()
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 7a6469e44486191cd344e9f7dcf681763d6091db Author: Thomas Tauber-Marshall AuthorDate: Wed Aug 5 16:57:56 2020 -0700 IMPALA-10053: Remove uses of MonoTime::GetDeltaSince() MonoTime is a utility Impala imports from Kudu. The behavior of MonoTime::GetDeltaSince() was accidentally flipped in https://gerrit.cloudera.org/#/c/14932/ so we're getting negative durations where we expect positive durations. The function is deprecated anyways, so this patch removes all uses of it and replaces them with the MonoTime '-' operator. Testing: - Manually ran with and without patch and inspected calculated values. - Added DCHECKs to prevent sucn an issue from occurring again. Change-Id: If8cd3eb51a4fd101bbe4b9c44ea9be6ea2ea0d06 Reviewed-on: http://gerrit.cloudera.org:8080/16296 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/runtime/krpc-data-stream-recvr.cc | 3 ++- be/src/runtime/krpc-data-stream-sender.cc | 4 be/src/service/data-stream-service.cc | 2 +- 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/be/src/runtime/krpc-data-stream-recvr.cc b/be/src/runtime/krpc-data-stream-recvr.cc index 43a13e4..97aa406 100644 --- a/be/src/runtime/krpc-data-stream-recvr.cc +++ b/be/src/runtime/krpc-data-stream-recvr.cc @@ -749,7 +749,8 @@ Status KrpcDataStreamRecvr::GetNext(RowBatch* output_batch, bool* eos) { void KrpcDataStreamRecvr::AddBatch(const TransmitDataRequestPB* request, TransmitDataResponsePB* response, RpcContext* rpc_context) { - MonoDelta duration(MonoTime::Now().GetDeltaSince(rpc_context->GetTimeReceived())); + MonoDelta duration(MonoTime::Now() - rpc_context->GetTimeReceived()); + DCHECK_GE(duration.ToNanoseconds(), 0); dispatch_timer_->UpdateCounter(duration.ToNanoseconds()); int use_sender_id = is_merging_ ? request->sender_id() : 0; // Add all batches to the same queue if is_merging_ is false. diff --git a/be/src/runtime/krpc-data-stream-sender.cc b/be/src/runtime/krpc-data-stream-sender.cc index b795310..9a0f28e 100644 --- a/be/src/runtime/krpc-data-stream-sender.cc +++ b/be/src/runtime/krpc-data-stream-sender.cc @@ -496,6 +496,8 @@ void KrpcDataStreamSender::Channel::TransmitDataCompleteCb() { const kudu::Status controller_status = rpc_controller_.status(); if (LIKELY(controller_status.ok())) { DCHECK(rpc_in_flight_batch_ != nullptr); +// 'receiver_latency_ns' is calculated with MonoTime, so it must be non-negative. +DCHECK_GE(resp_.receiver_latency_ns(), 0); int64_t row_batch_size = RowBatch::GetSerializedSize(*rpc_in_flight_batch_); int64_t network_time = total_time - resp_.receiver_latency_ns(); COUNTER_ADD(parent_->bytes_sent_counter_, row_batch_size); @@ -628,6 +630,8 @@ void KrpcDataStreamSender::Channel::EndDataStreamCompleteCb() { int64_t total_time_ns = MonotonicNanos() - rpc_start_time_ns_; const kudu::Status controller_status = rpc_controller_.status(); if (LIKELY(controller_status.ok())) { +// 'receiver_latency_ns' is calculated with MonoTime, so it must be non-negative. +DCHECK_GE(resp_.receiver_latency_ns(), 0); int64_t network_time_ns = total_time_ns - resp_.receiver_latency_ns(); parent_->network_time_stats_->UpdateCounter(network_time_ns); parent_->recvr_time_stats_->UpdateCounter(eos_resp_.receiver_latency_ns()); diff --git a/be/src/service/data-stream-service.cc b/be/src/service/data-stream-service.cc index 76ef7ba..ceea1fa 100644 --- a/be/src/service/data-stream-service.cc +++ b/be/src/service/data-stream-service.cc @@ -143,7 +143,7 @@ void DataStreamService::PublishFilter( template void DataStreamService::RespondRpc(const Status& status, ResponsePBType* response, kudu::rpc::RpcContext* ctx) { - MonoDelta duration(MonoTime::Now().GetDeltaSince(ctx->GetTimeReceived())); + MonoDelta duration(MonoTime::Now() - ctx->GetTimeReceived()); status.ToProto(response->mutable_status()); response->set_receiver_latency_ns(duration.ToNanoseconds()); ctx->RespondSuccess();
[impala] 03/03: IMPALA-10044: Fix cleanup for bootstrap_toolchain.py failure case
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit bbec0443fcdabf5de6f7ae0e47595414503f30f0 Author: Joe McDonnell AuthorDate: Wed Aug 5 14:02:30 2020 -0700 IMPALA-10044: Fix cleanup for bootstrap_toolchain.py failure case If DownloadUnpackTarball::download()'s wget_and_unpack_package call hits an exception, the exception handler cleans up any created directories. Currently, it erroneously cleans up the directory where the tarballs are downloaded even when it is not a temporary directory. This would delete the entire toolchain. This fixes the cleanup to only delete that directory if it is a temporary directory. Testing: - Simulated exception from wget_and_unpack_package and verified behavior. Change-Id: Ia57f56b6717635af94247fce50b955c07a57d113 Reviewed-on: http://gerrit.cloudera.org:8080/16294 Reviewed-by: Laszlo Gaal Tested-by: Impala Public Jenkins --- bin/bootstrap_toolchain.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/bin/bootstrap_toolchain.py b/bin/bootstrap_toolchain.py index 647fc00..5d59da1 100755 --- a/bin/bootstrap_toolchain.py +++ b/bin/bootstrap_toolchain.py @@ -182,7 +182,8 @@ class DownloadUnpackTarball(object): # Clean up any partially-unpacked result. if os.path.isdir(unpack_dir): shutil.rmtree(unpack_dir) - if os.path.isdir(download_dir): + # Only delete the download directory if it is a temporary directory + if download_dir != self.destination_basedir and os.path.isdir(download_dir): shutil.rmtree(download_dir) raise if self.makedir:
[impala] 02/02: IMPALA-10005: Fix Snappy decompression for non-block filesystems
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit dbbd40308a6d1cef77bfe45e016e775c918e0539 Author: Joe McDonnell AuthorDate: Thu Jul 23 20:44:30 2020 -0700 IMPALA-10005: Fix Snappy decompression for non-block filesystems Snappy-compressed text always uses THdfsCompression::SNAPPY_BLOCKED type compression in the backend. However, for non-block filesystems, the frontend is incorrectly passing THdfsCompression::SNAPPY instead. On debug builds, this leads to a DCHECK when trying to read Snappy-compressed text. On release builds, it fails to decompress the data. This fixes the frontend to always pass THdfsCompression::SNAPPY_BLOCKED for Snappy-compressed text. This reworks query_test/test_compressed_formats.py to provide better coverage: - Changed the RC and Seq test cases to verify that the file extension doesn't matter. Added Avro to this case as well. - Fixed the text case to use appropriate extensions (fixing IMPALA-9004) - Changed the utility function so it doesn't use Hive. This allows it to be enabled on non-HDFS filesystems like S3. - Changed the test to use unique_database and allow parallel execution. - Changed the test to run in the core job, so it now has coverage on the usual S3 test configuration. It is reasonably quick (1-2 minutes) and runs in parallel. Testing: - Exhaustive job - Core s3 job - Changed the frontend to force it to use the code for non-block filesystems (i.e. the TFileSplitGeneratorSpec code) and verified that it is now able to read Snappy-compressed text. Change-Id: I0879f2fc0bf75bb5c15cecb845ece46a901601ac Reviewed-on: http://gerrit.cloudera.org:8080/16278 Tested-by: Impala Public Jenkins Reviewed-by: Sahil Takiar --- .../org/apache/impala/catalog/HdfsCompression.java | 20 +- tests/query_test/test_compressed_formats.py| 202 + 2 files changed, 135 insertions(+), 87 deletions(-) diff --git a/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java b/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java index df76463..153106d 100644 --- a/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java +++ b/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java @@ -24,13 +24,15 @@ import com.google.common.base.Preconditions; import com.google.common.collect.ImmutableMap; /** - * Support for recognizing compression suffixes on data files. + * Support for recognizing compression suffixes on data files. This is currently + * limited to text files. Other file formats embed metadata about the compression + * type and do not use the file suffixes. * Compression of a file is recognized in mapreduce by looking for suffixes of * supported codecs. - * For now Impala supports GZIP, SNAPPY, BZIP2 and some additional formats if plugins - * are available. Even if a plugin is available, we need to add the file suffixes here so - * that we can resolve the compression type from the file name. LZO can use the specific - * HIVE input class. + * For now Impala supports GZIP, SNAPPY_BLOCKED, BZIP2 and some additional formats if + * plugins are available. Even if a plugin is available, we need to add the file suffixes + * here so that we can resolve the compression type from the file name. LZO can use the + * specific HIVE input class. * Some compression types here are detected even though they are not supported. This * allows for better error messages (e.g. LZ4, LZO). */ @@ -39,7 +41,7 @@ public enum HdfsCompression { DEFLATE, GZIP, BZIP2, - SNAPPY, + SNAPPY_BLOCKED, LZO, LZO_INDEX, //Lzo index file. LZ4, @@ -51,7 +53,7 @@ public enum HdfsCompression { put("deflate", DEFLATE). put("gz", GZIP). put("bz2", BZIP2). - put("snappy", SNAPPY). + put("snappy", SNAPPY_BLOCKED). put("lzo", LZO). put("index", LZO_INDEX). put("lz4", LZ4). @@ -76,7 +78,7 @@ public enum HdfsCompression { case DEFLATE: return THdfsCompression.DEFLATE; case GZIP: return THdfsCompression.GZIP; case BZIP2: return THdfsCompression.BZIP2; -case SNAPPY: return THdfsCompression.SNAPPY_BLOCKED; +case SNAPPY_BLOCKED: return THdfsCompression.SNAPPY_BLOCKED; case LZO: return THdfsCompression.LZO; case LZ4: return THdfsCompression.LZ4; case ZSTD: return THdfsCompression.ZSTD; @@ -90,7 +92,7 @@ public enum HdfsCompression { case DEFLATE: return FbCompression.DEFLATE; case GZIP: return FbCompression.GZIP; case BZIP2: return FbCompression.BZIP2; - case SNAPPY: return FbCompression.SNAPPY; + case SNAPPY_BLOCKED
[impala] branch master updated (c413f9b -> dbbd403)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from c413f9b IMPALA-10047: Revert core piece of IMPALA-6984 new 87aeb2a IMPALA-9963: Implement ds_kll_n() function new dbbd403 IMPALA-10005: Fix Snappy decompression for non-block filesystems The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: be/src/exprs/datasketches-common.h | 2 +- be/src/exprs/datasketches-functions-ir.cc | 11 ++ be/src/exprs/datasketches-functions.h | 5 + common/function-registry/impala_functions.py | 2 + .../org/apache/impala/catalog/HdfsCompression.java | 20 +- .../queries/QueryTest/datasketches-kll.test| 37 tests/query_test/test_compressed_formats.py| 202 + 7 files changed, 191 insertions(+), 88 deletions(-)
[impala] 01/02: IMPALA-9963: Implement ds_kll_n() function
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 87aeb2ad78e2106f1d8df84d4d84975c7cde5b5a Author: Gabor Kaszab AuthorDate: Thu Jul 30 09:41:00 2020 +0200 IMPALA-9963: Implement ds_kll_n() function This function receives a serialized Apache DataSketches KLL sketch and returns how many input values were fed into this sketch. Change-Id: I166e87a468e68e888ac15fca7429ac2552dbb781 Reviewed-on: http://gerrit.cloudera.org:8080/16259 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/exprs/datasketches-common.h | 2 +- be/src/exprs/datasketches-functions-ir.cc | 11 +++ be/src/exprs/datasketches-functions.h | 5 +++ common/function-registry/impala_functions.py | 2 ++ .../queries/QueryTest/datasketches-kll.test| 37 ++ 5 files changed, 56 insertions(+), 1 deletion(-) diff --git a/be/src/exprs/datasketches-common.h b/be/src/exprs/datasketches-common.h index 7560692..37a6458 100644 --- a/be/src/exprs/datasketches-common.h +++ b/be/src/exprs/datasketches-common.h @@ -37,7 +37,7 @@ const int DS_SKETCH_CONFIG = 12; /// Logs a common error message saying that sketch deserialization failed. void LogSketchDeserializationError(FunctionContext* ctx); -/// Receives a serialized DataSketches sketch (either Hll or KLL) in +/// Receives a serialized DataSketches sketch (either Hll or KLL) in /// 'serialized_sketch', deserializes it and puts the deserialized sketch into 'sketch'. /// The outgoing 'sketch' will hold the same configs as 'serialized_sketch' regardless of /// what was provided when it was constructed before this function call. Returns false if diff --git a/be/src/exprs/datasketches-functions-ir.cc b/be/src/exprs/datasketches-functions-ir.cc index d2898bc..b76cbe9 100644 --- a/be/src/exprs/datasketches-functions-ir.cc +++ b/be/src/exprs/datasketches-functions-ir.cc @@ -59,5 +59,16 @@ FloatVal DataSketchesFunctions::DsKllQuantile(FunctionContext* ctx, } } +BigIntVal DataSketchesFunctions::DsKllN(FunctionContext* ctx, +const StringVal& serialized_sketch) { + if (serialized_sketch.is_null || serialized_sketch.len == 0) return BigIntVal::null(); + datasketches::kll_sketch sketch; + if (!DeserializeDsSketch(serialized_sketch, )) { +LogSketchDeserializationError(ctx); +return BigIntVal::null(); + } + return sketch.get_n(); +} + } diff --git a/be/src/exprs/datasketches-functions.h b/be/src/exprs/datasketches-functions.h index 143fd69..bd6b76c 100644 --- a/be/src/exprs/datasketches-functions.h +++ b/be/src/exprs/datasketches-functions.h @@ -42,6 +42,11 @@ public: /// of [0,1]. Otherwise this function returns error. static FloatVal DsKllQuantile(FunctionContext* ctx, const StringVal& serialized_sketch, const DoubleVal& rank); + + /// 'serialized_sketch' is expected as a serialized Apache DataSketches KLL sketch. If + /// it is not, then the query fails. + /// Returns the number of input values fed to 'serialized_sketch'. + static BigIntVal DsKllN(FunctionContext* ctx, const StringVal& serialized_sketch); }; } diff --git a/common/function-registry/impala_functions.py b/common/function-registry/impala_functions.py index 8398785..fbed357 100644 --- a/common/function-registry/impala_functions.py +++ b/common/function-registry/impala_functions.py @@ -935,6 +935,8 @@ visible_functions = [ '_ZN6impala21DataSketchesFunctions13DsHllEstimateEPN10impala_udf15FunctionContextERKNS1_9StringValE'], [['ds_kll_quantile'], 'FLOAT', ['STRING', 'DOUBLE'], '_ZN6impala21DataSketchesFunctions13DsKllQuantileEPN10impala_udf15FunctionContextERKNS1_9StringValERKNS1_9DoubleValE'], + [['ds_kll_n'], 'BIGINT', ['STRING'], + '_ZN6impala21DataSketchesFunctions6DsKllNEPN10impala_udf15FunctionContextERKNS1_9StringValE'], ] invisible_functions = [ diff --git a/testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test b/testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test index b7b734b..ee240bf 100644 --- a/testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test +++ b/testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test @@ -144,3 +144,40 @@ FLOAT,FLOAT,FLOAT,FLOAT,FLOAT,FLOAT RESULTS 100.169482422,25000.099609375,50.9152587891,NULL,50.5,NULL + QUERY +# Check that ds_kll_n() returns null for an empty sketch. +select ds_kll_n(ds_kll_sketch(cast(f2 as float))) from functional_parquet.emptytable; + RESULTS +NULL + TYPES +BIGINT + + QUERY +# Check that ds_kll_n() returns null for a null input. +select ds_kll_n(c) from functional_parquet.nulltable; + RESULTS +NULL + TYPES +BIGINT + + QUERY +# Check that ds_kll_n() returns e
[impala] branch master updated: IMPALA-10047: Revert core piece of IMPALA-6984
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new c413f9b IMPALA-10047: Revert core piece of IMPALA-6984 c413f9b is described below commit c413f9b558d51de877f497590baf14139ad5cf99 Author: Joe McDonnell AuthorDate: Tue Aug 4 17:29:19 2020 -0700 IMPALA-10047: Revert core piece of IMPALA-6984 Performance testing on TPC-DS found a peformance regression on short queries due to delayed exec status reports. Further testing traced this back to IMPALA-6984's behavior of cancelling backends on EOS. The coordinator log show that CancelBackends() call intermittently taking 10 seconds due to timing out in the RPC layer. As a temporary workaround, this reverts the core part of IMPALA-6984 that added that CancelBackends() call for EOS. It leaves the rest of IMPALA-6984 intact, as other code has built on top of it. Testing: - Core job - Performance tests Change-Id: Ibf00a56e91f0376eaaa552e3bb4763501bfb49e8 (cherry picked from commit b91f3c0e064d592f3cdf2a2e089ca6546133ba55) Reviewed-on: http://gerrit.cloudera.org:8080/16288 Reviewed-by: Joe McDonnell Tested-by: Impala Public Jenkins --- be/src/runtime/coordinator.cc | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/be/src/runtime/coordinator.cc b/be/src/runtime/coordinator.cc index b57d66f..0ceae83 100644 --- a/be/src/runtime/coordinator.cc +++ b/be/src/runtime/coordinator.cc @@ -714,9 +714,7 @@ void Coordinator::HandleExecStateTransition( // execution and release resources. ReleaseExecResources(); if (new_state == ExecState::RETURNED_RESULTS) { -// Cancel all backends, but wait for the final status reports to be received so that -// we have a complete profile for this successful query. -CancelBackends(/*fire_and_forget=*/ false); +// TODO: IMPALA-6984: cancel all backends in this case too. WaitForBackends(); } else { CancelBackends(/*fire_and_forget=*/ true);
[impala] 02/02: IMPALA-9923: Load ORC serially to hack around flakiness
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit dc2fdabbd1f2c930348671e17f885c5c54b628e4 Author: Joe McDonnell AuthorDate: Tue Aug 4 22:08:22 2020 -0700 IMPALA-9923: Load ORC serially to hack around flakiness ORC dataload has been intermittently failing with "Fail to get checksum, since file .../_orc_acid_version is under construction." This is due to some Hive/HDFS interaction that seems to get worse with parallelism. This has been hitting a lot of developer tests. As a temporary workaround, this changes dataload to load ORC serially. This is slightly slower, but it should be more reliable. Testing: - Ran precommit tests, manually verified dataload logs Change-Id: I15eff1ec6cab32c1216ed7400e4c4b57bb81e4cd Reviewed-on: http://gerrit.cloudera.org:8080/16292 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- bin/load-data.py | 11 +++ 1 file changed, 11 insertions(+) diff --git a/bin/load-data.py b/bin/load-data.py index b461d7a..a7eb883 100755 --- a/bin/load-data.py +++ b/bin/load-data.py @@ -415,6 +415,7 @@ def main(): impala_create_files = [] hive_load_text_files = [] +hive_load_orc_files = [] hive_load_nontext_files = [] hbase_create_files = [] hbase_postload_files = [] @@ -426,6 +427,8 @@ def main(): elif hive_load_match in filename: if 'text-none-none' in filename: hive_load_text_files.append(filename) +elif 'orc-def-block' in filename: + hive_load_orc_files.append(filename) else: hive_load_nontext_files.append(filename) elif hbase_create_match in filename: @@ -448,6 +451,7 @@ def main(): log_file_list("Impala Create Files:", impala_create_files) log_file_list("Hive Load Text Files:", hive_load_text_files) +log_file_list("Hive Load Orc Files:", hive_load_orc_files) log_file_list("Hive Load Non-Text Files:", hive_load_nontext_files) log_file_list("HBase Create Files:", hbase_create_files) log_file_list("HBase Post-Load Files:", hbase_postload_files) @@ -472,6 +476,13 @@ def main(): # need to be loaded first assert(len(hive_load_text_files) <= 1) hive_exec_query_files_parallel(thread_pool, hive_load_text_files) +# IMPALA-9923: Run ORC serially separately from other non-text formats. This hacks +# around flakiness seen when loading this in parallel. This should be removed as +# soon as possible. +assert(len(hive_load_orc_files) <= 1) +hive_exec_query_files_parallel(thread_pool, hive_load_orc_files) + +# Load all non-text formats (goes parallel) hive_exec_query_files_parallel(thread_pool, hive_load_nontext_files) assert(len(hbase_postload_files) <= 1)
[impala] 01/02: IMPALA-10037: Remove flaky test_mt_dop_scan_node
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit f38ca7df8cf027fcaab4713a6b186b584cef Author: Bikramjeet Vig AuthorDate: Tue Aug 4 17:14:37 2020 -0700 IMPALA-10037: Remove flaky test_mt_dop_scan_node This test has inherent flakiness due to it relying on instances fetching scan ranges from a shared queue. Therefore, this patch removes the test since it was just a sanity check but its flakiness outweighed its usefulness. Change-Id: I1625872189ea7ac2d4e4d035956f784b6e18eb08 Reviewed-on: http://gerrit.cloudera.org:8080/16286 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- tests/query_test/test_mt_dop.py | 43 + 1 file changed, 1 insertion(+), 42 deletions(-) diff --git a/tests/query_test/test_mt_dop.py b/tests/query_test/test_mt_dop.py index 8af3fa8..4f5b50d 100644 --- a/tests/query_test/test_mt_dop.py +++ b/tests/query_test/test_mt_dop.py @@ -37,6 +37,7 @@ WAIT_TIME_MS = build_flavor_timeout(6, slow_build_timeout=10) # the value 0 to cover the non-MT path as well. MT_DOP_VALUES = [0, 1, 2, 8] + class TestMtDop(ImpalaTestSuite): @classmethod def add_test_dimensions(cls): @@ -97,48 +98,6 @@ class TestMtDop(ImpalaTestSuite): assert expected_results in results.data -class TestMtDopScanNode(ImpalaTestSuite): - @classmethod - def get_workload(self): -return 'functional-query' - - @classmethod - def add_test_dimensions(cls): -super(TestMtDopScanNode, cls).add_test_dimensions() -cls.ImpalaTestMatrix.add_constraint( - lambda v: v.get_value('table_format').file_format == 'text' and v.get_value( -'table_format').compression_codec == 'none') - - def test_mt_dop_scan_node(self, vector, unique_database): -"""Regression test to make sure scan ranges are shared among all scan node instances -when using mt_dop. This runs a selective hash join that will dynamically prune -partitions leaving less than 5% of the data. Before IMPALA-9655 this would almost -always result in a failure where at least one instance would have all its statically -assigned scan ranges pruned.""" -fq_table_name = "%s.store_sales_subset" % unique_database -self.execute_query("create table %s as select distinct(ss_sold_date_sk) as " - "sold_date from tpcds.store_sales limit 50" % fq_table_name) -vector.get_value('exec_option')['mt_dop'] = 8 -vector.get_value('exec_option')['runtime_filter_wait_time_ms'] = 10 - -# Since this depends on instances fetching scan ranges from a shared queue, running -# it multiple times ensures any flakiness is removed. On a release build it has a -# 0.05% failure rate. -NUM_TRIES = 100 -failed_count = 0 -for i in xrange(NUM_TRIES): - try: -result = self.execute_query( - "select count(ss_sold_date_sk) from tpcds.store_sales, %s where " - "ss_sold_date_sk = sold_date" % fq_table_name, - vector.get_value('exec_option')) -assert "- BytesRead: 0" not in result.runtime_profile, result.runtime_profile -break - except Exception: -failed_count += 1 -if i == NUM_TRIES - 1: raise -LOG.info("Num of times failed before success {0}".format(failed_count)) - class TestMtDopParquet(ImpalaTestSuite): @classmethod def get_workload(cls):
[impala] branch master updated (cc1eddb -> dc2fdab)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from cc1eddb Add logging when query unregisters new f38ca7d IMPALA-10037: Remove flaky test_mt_dop_scan_node new dc2fdab IMPALA-9923: Load ORC serially to hack around flakiness The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: bin/load-data.py| 11 +++ tests/query_test/test_mt_dop.py | 43 + 2 files changed, 12 insertions(+), 42 deletions(-)
[impala] branch master updated: Add logging when query unregisters
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new cc1eddb Add logging when query unregisters cc1eddb is described below commit cc1eddbe193daf228dee1d53bb1e4ccd064d90a5 Author: Bikramjeet Vig AuthorDate: Tue Aug 4 17:10:02 2020 -0700 Add logging when query unregisters This adds a log line which is printed when a query is successfully unregistered by the async unregister thread pool. Added only for additional observability. Change-Id: I09be63afbee6b338a952a9b12321e028be9d7cb0 Reviewed-on: http://gerrit.cloudera.org:8080/16285 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/service/impala-server.cc | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/be/src/service/impala-server.cc b/be/src/service/impala-server.cc index f1017ea..d4f05c8 100644 --- a/be/src/service/impala-server.cc +++ b/be/src/service/impala-server.cc @@ -1212,7 +1212,12 @@ void ImpalaServer::FinishUnregisterQuery(const QueryHandle& query_handle) { Status status = query_handle.query_driver()->Unregister(_driver_map_); string err_msg = "QueryDriver can only be deleted once: " + status.GetDetail(); DCHECK(status.ok()) << err_msg; - if (UNLIKELY(!status.ok())) LOG(ERROR) << status.GetDetail(); + if (UNLIKELY(!status.ok())) { +LOG(ERROR) << status.GetDetail(); + } else { +VLOG_QUERY << "Query successfully unregistered: query_id=" + << PrintId(query_handle->query_id()); + } } void ImpalaServer::UnregisterQueryDiscardResult(
[impala] branch master updated: IMPALA-9633: Implement ds_hll_union()
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new 9c542ef IMPALA-9633: Implement ds_hll_union() 9c542ef is described below commit 9c542ef5891f984300f9e5f45406caf145039e75 Author: Gabor Kaszab AuthorDate: Fri Jun 5 10:53:11 2020 +0200 IMPALA-9633: Implement ds_hll_union() This function receives a set of sketches produced by ds_hll_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_hll_estimate(ds_hll_union(sketch_col)) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Note, currently there is a known limitation of unioning string types where some input sketches come from Impala and some from Hive. In this case if there is an overlap in the input data used by Impala and by Hive this overlapping data is still counted twice due to some string representation difference between Impala and Hive. For more details see: https://issues.apache.org/jira/browse/IMPALA-9939 Testing: - Apart from the automated tests I added to this patch I also tested ds_hll_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_hll_union() on those sketches. Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09 Reviewed-on: http://gerrit.cloudera.org:8080/16095 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/exprs/CMakeLists.txt| 1 + be/src/exprs/aggregate-functions-ir.cc | 100 + be/src/exprs/aggregate-functions.h | 8 +- ...ches-functions-ir.cc => datasketches-common.cc} | 28 +++--- be/src/exprs/datasketches-common.h | 49 ++ be/src/exprs/datasketches-functions-ir.cc | 17 ++-- .../java/org/apache/impala/catalog/BuiltinsDb.java | 14 +++ testdata/data/README | 4 + testdata/data/hll_sketches_from_impala.parquet | Bin 0 -> 3501 bytes .../queries/QueryTest/datasketches-hll.test| 60 + tests/query_test/test_datasketches.py | 1 + 11 files changed, 241 insertions(+), 41 deletions(-) diff --git a/be/src/exprs/CMakeLists.txt b/be/src/exprs/CMakeLists.txt index e0ed683..7af6145 100644 --- a/be/src/exprs/CMakeLists.txt +++ b/be/src/exprs/CMakeLists.txt @@ -36,6 +36,7 @@ add_library(Exprs compound-predicates-ir.cc conditional-functions.cc conditional-functions-ir.cc + datasketches-common.cc datasketches-functions-ir.cc date-functions-ir.cc decimal-functions-ir.cc diff --git a/be/src/exprs/aggregate-functions-ir.cc b/be/src/exprs/aggregate-functions-ir.cc index 06395f40..5b87d0b 100644 --- a/be/src/exprs/aggregate-functions-ir.cc +++ b/be/src/exprs/aggregate-functions-ir.cc @@ -29,6 +29,7 @@ #include "codegen/impala-ir.h" #include "common/logging.h" #include "exprs/anyval-util.h" +#include "exprs/datasketches-common.h" #include "exprs/hll-bias.h" #include "gutil/strings/substitute.h" #include "runtime/date-value.h" @@ -1611,23 +1612,18 @@ BigIntVal AggregateFunctions::HllFinalize(FunctionContext* ctx, const StringVal& return estimate; } -/// Config for DataSketches HLL algorithm to set the size of each entry within the -/// sketch. +/// Auxiliary function that receives an input type that has a serialize_compact() +/// function (e.g. hll_sketch or hll_union) and returns the serialized version of it +/// wrapped into a StringVal. /// Introducing this variable in the .cc to avoid including the whole DataSketches HLL /// functionality into the header. -const datasketches::target_hll_type DS_HLL_TYPE = datasketches::target_hll_type::HLL_4; - -/// Auxiliary function that receives a hll_sketch and returns the serialized version of -/// it wrapped into a StringVal. -/// Introducing this function in the .cc to avoid including the whole DataSketches HLL -/// functionality into the header. -StringVal SerializeDsHllSketch(FunctionContext* ctx, -const datasketches::hll_sketch& sketch) { - std::stringstream serialized_sketch; - sketch.serialize_compact(serialized_sketch); - std::string serialized_sketch_str = serialized_sketch.str(); - StringVal dst(ctx, serialized_sketch_str.size()); - memcpy(dst.ptr, serialized_sketch_str.c_str(), serialized_
[impala] branch master updated: IMPALA-9887: Add support for sharding end-to-end tests
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new 605e301 IMPALA-9887: Add support for sharding end-to-end tests 605e301 is described below commit 605e301739b8ef7619482db9b13444e84145b219 Author: Joe McDonnell AuthorDate: Wed Jun 24 12:27:04 2020 -0700 IMPALA-9887: Add support for sharding end-to-end tests ASAN maintains stacks for each allocation and free of memory. Impala sometimes allocates/frees memory from codegen'd code, so this means that the number of distinct stacks is unbounded. ASAN is storing these stacks in a hash table with a fixed number of buckets (one million). As the stacks accumulate, allocations and frees get slower and slower, because the lookup in this hashtable gets slower. This causes test execution time to degrade over time. Since backend tests and custom cluster tests don't have long running daemons, only the end to end tests are affected. This adds support for breaking end-to-end test execution into shards, restarting Impala between each shard. This uses the preexisting shard_tests pytest functionality introduced for the docker-based tests in IMPALA-6070. The number of shards is configurable via the EE_TEST_SHARDS environment variable. By default, EE_TEST_SHARDS=1 and no sharding is used. Without sharding, an ASAN core job takes about 16-17 hours. With 6 shards, it takes about 9 hours. It is recommended to always use sharding with ASAN. Testing: - Ran core job - Ran ASAN with EE_TEST_SHARDS=6 Change-Id: I0bdbd79940df2bc7b951efdf0f044e6b40a3fda9 Reviewed-on: http://gerrit.cloudera.org:8080/16155 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- bin/run-all-tests.sh | 43 +-- tests/run-tests.py | 31 +++ 2 files changed, 64 insertions(+), 10 deletions(-) diff --git a/bin/run-all-tests.sh b/bin/run-all-tests.sh index 3a1f8b8..5287861 100755 --- a/bin/run-all-tests.sh +++ b/bin/run-all-tests.sh @@ -46,6 +46,7 @@ fi # Run End-to-end Tests : ${EE_TEST:=true} : ${EE_TEST_FILES:=} +: ${EE_TEST_SHARDS:=1} # Run JDBC Test : ${JDBC_TEST:=true} # Run Cluster Tests @@ -158,6 +159,8 @@ LOG_DIR="${IMPALA_EE_TEST_LOGS_DIR}" # Enable core dumps ulimit -c unlimited || true +TEST_RET_CODE=0 + # Helper function to start Impala cluster. start_impala_cluster() { # TODO: IMPALA-9812: remove --unlock_mt_dop when it is no longer needed. @@ -167,6 +170,21 @@ start_impala_cluster() { ${TEST_START_CLUSTER_ARGS} --impalad_args=--unlock_mt_dop=true } +run_ee_tests() { + if [[ $# -gt 0 ]]; then +EXTRA_ARGS=${1} + else +EXTRA_ARGS="" + fi + # Run end-to-end tests. + # KERBEROS TODO - this will need to deal with ${KERB_ARGS} + if ! "${IMPALA_HOME}/tests/run-tests.py" ${COMMON_PYTEST_ARGS} \ + ${RUN_TESTS_ARGS} ${EXTRA_ARGS} ${EE_TEST_FILES}; then +#${KERB_ARGS}; +TEST_RET_CODE=1 + fi +} + for i in $(seq 1 $NUM_TEST_ITERATIONS) do TEST_RET_CODE=0 @@ -231,12 +249,25 @@ do fi if [[ "$EE_TEST" == true ]]; then -# Run end-to-end tests. -# KERBEROS TODO - this will need to deal with ${KERB_ARGS} -if ! "${IMPALA_HOME}/tests/run-tests.py" ${COMMON_PYTEST_ARGS} \ -${RUN_TESTS_ARGS} ${EE_TEST_FILES}; then - #${KERB_ARGS}; - TEST_RET_CODE=1 +if [[ ${EE_TEST_SHARDS} -lt 2 ]]; then + # For runs without sharding, avoid adding the "--shard_tests" parameter. + # Some test frameworks (e.g. the docker-based tests) use this. + run_ee_tests +else + # When the EE tests are sharded, it runs 1/Nth of the tests at a time, restarting + # Impala between the shards. There are two benefits: + # 1. It isolates errors so that if Impala crashes, the next shards will still run + #with a fresh Impala. + # 2. For ASAN runs, resources accumulate over test execution, so tests get slower + #over time (see IMPALA-9887). Running shards with regular restarts + #substantially speeds up execution time. + # + # Shards are 1 indexed (i.e. 1/N through N/N). This shards both serial and + # parallel tests. + for (( shard_idx=1 ; shard_idx <= ${EE_TEST_SHARDS} ; shard_idx++ )); do +run_ee_tests "--shard_tests=$shard_idx/${EE_TEST_SHARDS}" +start_impala_cluster + done fi fi diff --git a/tests/run-tests.py b/tests/run-tests.py index 55b002a..8f1e8d3 100755 --- a/tests/run-tests.py +++ b/tests/run-tests.py @@ -282,22 +282,44 @@ if __name__ == "__main__": run(sys.argv[1:]) else: print_metrics('connections') + +# If using sha
[impala] 01/02: IMPALA-9531: Dropped support for dateless timestamps
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 1bafb7bd29f4ecf1706d35e274c2b701a32281ac Author: Adam Tamas AuthorDate: Tue Apr 7 10:07:47 2020 +0200 IMPALA-9531: Dropped support for dateless timestamps Removed the support for dateless timestamps. During dateless timestamp casts if the format doesn't contain date part we get an error during tokenization of the format. If the input str doesn't contain a date part then we get null result. Examples: select cast('01:02:59' as timestamp); This will come back as NULL value. select to_timestamp('01:01:01', 'HH:mm:ss'); select cast('01:02:59' as timestamp format 'HH12:MI:SS'); select cast('12 AM' as timestamp FORMAT 'AM.HH12'); These will come back with a parsing errors. Casting from a table will generate similar results. Testing: Modified the previous tests related to dateless timestamps. Added test to read fromtables which are still containing dateless timestamps and covered timestamp to string path when no date tokens are requested in the output string. Change-Id: I48c49bf027cc4b917849b3d58518facba372b322 Reviewed-on: http://gerrit.cloudera.org:8080/15866 Tested-by: Impala Public Jenkins Reviewed-by: Gabor Kaszab --- be/src/benchmarks/convert-timestamp-benchmark.cc | 2 +- be/src/benchmarks/parse-timestamp-benchmark.cc | 4 +- be/src/exec/text-converter.inline.h| 2 +- be/src/exprs/cast-functions-ir.cc | 6 - be/src/exprs/expr-test.cc | 35 +- be/src/exprs/scalar-expr-evaluator.cc | 2 +- be/src/exprs/timestamp-functions-ir.cc | 12 +- be/src/exprs/timestamp-functions.cc| 18 ++- be/src/exprs/timestamp-functions.h | 6 +- be/src/runtime/date-parse-util.cc | 2 +- be/src/runtime/date-test.cc| 6 +- .../runtime/datetime-iso-sql-format-tokenizer.cc | 3 + be/src/runtime/datetime-parser-common.cc | 3 + be/src/runtime/datetime-parser-common.h| 1 + .../runtime/datetime-simple-date-format-parser.cc | 136 - .../runtime/datetime-simple-date-format-parser.h | 10 +- be/src/runtime/timestamp-parse-util.cc | 6 +- be/src/runtime/timestamp-test.cc | 67 +- be/src/runtime/timestamp-value.h | 3 +- bin/rat_exclude_files.txt | 1 + common/function-registry/impala_functions.py | 10 +- .../apache/impala/analysis/AnalyzeKuduDDLTest.java | 3 +- testdata/data/README | 12 ++ testdata/data/dateless_timestamps.parq | Bin 0 -> 435 bytes testdata/data/dateless_timestamps.txt | 7 ++ testdata/data/lazy_timestamp.csv | 7 -- .../functional-query/queries/QueryTest/date.test | 7 -- .../QueryTest/dateless_timestamp_parquet.test | 25 .../queries/QueryTest/dateless_timestamp_text.test | 29 + .../functional-query/queries/QueryTest/exprs.test | 20 ++- .../queries/QueryTest/select-lazy-timestamp.test | 7 -- tests/data_errors/test_data_errors.py | 4 +- tests/query_test/test_cast_with_format.py | 23 +++- tests/query_test/test_scanners.py | 24 34 files changed, 275 insertions(+), 228 deletions(-) diff --git a/be/src/benchmarks/convert-timestamp-benchmark.cc b/be/src/benchmarks/convert-timestamp-benchmark.cc index a1d9331..c263fcf 100644 --- a/be/src/benchmarks/convert-timestamp-benchmark.cc +++ b/be/src/benchmarks/convert-timestamp-benchmark.cc @@ -166,7 +166,7 @@ fast path speedup: 10.2951 vector AddTestDataDateTimes(int n, const string& startstr) { DateTimeFormatContext dt_ctx; dt_ctx.Reset("-MMM-dd HH:mm:ss"); - SimpleDateFormatTokenizer::Tokenize(_ctx); + SimpleDateFormatTokenizer::Tokenize(_ctx, PARSE); random_device rd; mt19937 gen(rd()); diff --git a/be/src/benchmarks/parse-timestamp-benchmark.cc b/be/src/benchmarks/parse-timestamp-benchmark.cc index c7ca51a..8d42311 100644 --- a/be/src/benchmarks/parse-timestamp-benchmark.cc +++ b/be/src/benchmarks/parse-timestamp-benchmark.cc @@ -258,9 +258,9 @@ int main(int argc, char **argv) { timestamp_suite.AddBenchmark("Impala", TestImpalaSimpleDateFormat, ); dt_ctx_simple_date_format.Reset("-MM-dd HH:mm:ss", 19); - SimpleDateFormatTokenizer::Tokenize(_ctx_simple_date_format); + SimpleDateFormatTokenizer::Tokenize(_ctx_simple_date_format, PARSE); dt_ctx_tz_simple_date_format.Reset("-MM-dd HH:mm:ss+hh:mm", 25); - SimpleDateFormatTokenizer::Tokenize(_ctx_tz_simple_date_format); + SimpleDateFormatTokeni
[impala] 02/02: IMPALA-7923: DecimalValue should be marked as packed
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 45c105d71d47c4c57e042b9bf8a0d8d8044083bc Author: Daniel Becker AuthorDate: Thu Jul 2 00:06:09 2020 +0200 IMPALA-7923: DecimalValue should be marked as packed IMPALA-7473 and IMPALA-9111 were symptoms of a more general problem that DecimalValue is not guaranteed to be aligned by the Impala runtime but the compiler assumes it is and under some circumstances, it will emit code for aligned loads to value_ when value_ is an int128. This commit marks DecimalValue as packed so that the compiler does not assume any alignment. TODO: Maybe benchmark if this introduces performance regressions, but it shouldn't. Change-Id: I55f936a4f4f4b5faf129a9265222e64fc486b8ed Reviewed-on: http://gerrit.cloudera.org:8080/16134 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/exec/hdfs-avro-scanner-test.cc | 17 +++-- be/src/exec/orc-column-readers.h | 2 +- be/src/runtime/decimal-value.h| 23 +-- be/src/runtime/decimal-value.inline.h | 5 + be/src/util/decimal-util.h| 2 +- be/src/util/dict-test.cc | 7 --- 6 files changed, 35 insertions(+), 21 deletions(-) diff --git a/be/src/exec/hdfs-avro-scanner-test.cc b/be/src/exec/hdfs-avro-scanner-test.cc index 621247b..e7086c8 100644 --- a/be/src/exec/hdfs-avro-scanner-test.cc +++ b/be/src/exec/hdfs-avro-scanner-test.cc @@ -461,7 +461,8 @@ TEST_F(HdfsAvroScannerTest, DecimalTest) { // Unscaled value can be stored in 4 bytes data[0] = 8; // decodes to 4 #if __BYTE_ORDER == __LITTLE_ENDIAN - BitUtil::ByteSwap([1], (), 4); + const Decimal4Value::StorageType d4v_value = d4v.value(); + BitUtil::ByteSwap([1], _value, 4); #else memcpy([1], (), 4); #endif @@ -482,7 +483,8 @@ TEST_F(HdfsAvroScannerTest, DecimalTest) { d8v = Decimal8Value(123456789012345678); data[0] = 16; // decodes to 8 #if __BYTE_ORDER == __LITTLE_ENDIAN - BitUtil::ByteSwap([1], (), 8); + const Decimal8Value::StorageType d8v_value = d8v.value(); + BitUtil::ByteSwap([1], _value, 8); #else memcpy([1], (), 8); #endif @@ -495,7 +497,8 @@ TEST_F(HdfsAvroScannerTest, DecimalTest) { Decimal16Value d16v(1234567890); data[0] = 10; // decodes to 5 #if __BYTE_ORDER == __LITTLE_ENDIAN - BitUtil::ByteSwap([1], (), 5); + const Decimal16Value::StorageType d16v_value = d16v.value(); + BitUtil::ByteSwap([1], _value, 5); #else memcpy([1], (), 5); #endif @@ -506,12 +509,14 @@ TEST_F(HdfsAvroScannerTest, DecimalTest) { TestReadAvroDecimal(data, 4, d16v, -1, TErrorCode::AVRO_TRUNCATED_BLOCK); /// Produce a very large decimal value. - memset((), 0xFF, sizeof(d16v.value())); + Decimal16Value::StorageType d16v_value2; + memset(_value2, 0xFF, sizeof(d16v_value2)); + d16v.set_value(d16v_value2); data[0] = 32; // decodes to 16 #if __BYTE_ORDER == __LITTLE_ENDIAN - BitUtil::ByteSwap([1], (), 16); + BitUtil::ByteSwap([1], _value2, 16); #else - memcpy([1], (), 16); + memcpy([1], _value2, 16); #endif TestReadAvroDecimal(data, 17, d16v, 17); TestReadAvroDecimal(data, 20, d16v, 17); diff --git a/be/src/exec/orc-column-readers.h b/be/src/exec/orc-column-readers.h index e45b0a2..e50c216 100644 --- a/be/src/exec/orc-column-readers.h +++ b/be/src/exec/orc-column-readers.h @@ -420,7 +420,7 @@ class OrcDecimalColumnReader return Status::OK(); } int64_t val = batch_->values.data()[row_idx]; -reinterpret_cast(OrcColumnReader::GetSlot(tuple))->value() = val; + reinterpret_cast(OrcColumnReader::GetSlot(tuple))->set_value(val); return Status::OK(); } diff --git a/be/src/runtime/decimal-value.h b/be/src/runtime/decimal-value.h index e329476..c2744e0 100644 --- a/be/src/runtime/decimal-value.h +++ b/be/src/runtime/decimal-value.h @@ -40,8 +40,13 @@ namespace impala { /// Overflow is handled by an output return parameter. Functions should set this /// to true if overflow occured and leave it *unchanged* otherwise (e.g. |= rather than =). /// This allows the caller to not have to check overflow after every call. +/// +/// Values of this class may be unaligned so we mark it as "packed" so that the compiler +/// does not assume proper alignment. If the compiler assumes that the value is aligned it +/// may generate aligned load instructions (for example 'vmovdqa') which fail in case the +/// value is actually misaligned. template -class DecimalValue { +class __attribute__ ((packed)) DecimalValue { public: typedef T StorageType; @@ -49,8 +54,7 @@ class DecimalValue { DecimalValue(const T& s) : value_(s) { } DecimalValue& operator=(const T& s) { -// 'value_' may be unaligned. Use memcpy to avoid an unaligned store. -memcpy(_, , sizeof(T)
[impala] branch master updated (3b820d7 -> 45c105d)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from 3b820d7 IMPALA-9921: Change error messages in checking needsQuotes to TRACE level logs new 1bafb7b IMPALA-9531: Dropped support for dateless timestamps new 45c105d IMPALA-7923: DecimalValue should be marked as packed The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: be/src/benchmarks/convert-timestamp-benchmark.cc | 2 +- be/src/benchmarks/parse-timestamp-benchmark.cc | 4 +- be/src/exec/hdfs-avro-scanner-test.cc | 17 ++- be/src/exec/orc-column-readers.h | 2 +- be/src/exec/text-converter.inline.h| 2 +- be/src/exprs/cast-functions-ir.cc | 6 - be/src/exprs/expr-test.cc | 35 +- be/src/exprs/scalar-expr-evaluator.cc | 2 +- be/src/exprs/timestamp-functions-ir.cc | 12 +- be/src/exprs/timestamp-functions.cc| 18 ++- be/src/exprs/timestamp-functions.h | 6 +- be/src/runtime/date-parse-util.cc | 2 +- be/src/runtime/date-test.cc| 6 +- .../runtime/datetime-iso-sql-format-tokenizer.cc | 3 + be/src/runtime/datetime-parser-common.cc | 3 + be/src/runtime/datetime-parser-common.h| 1 + .../runtime/datetime-simple-date-format-parser.cc | 136 - .../runtime/datetime-simple-date-format-parser.h | 10 +- be/src/runtime/decimal-value.h | 23 +++- be/src/runtime/decimal-value.inline.h | 5 +- be/src/runtime/timestamp-parse-util.cc | 6 +- be/src/runtime/timestamp-test.cc | 67 +- be/src/runtime/timestamp-value.h | 3 +- be/src/util/decimal-util.h | 2 +- be/src/util/dict-test.cc | 7 +- bin/rat_exclude_files.txt | 1 + common/function-registry/impala_functions.py | 10 +- .../apache/impala/analysis/AnalyzeKuduDDLTest.java | 3 +- testdata/data/README | 12 ++ testdata/data/dateless_timestamps.parq | Bin 0 -> 435 bytes testdata/data/dateless_timestamps.txt | 7 ++ testdata/data/lazy_timestamp.csv | 7 -- .../functional-query/queries/QueryTest/date.test | 7 -- .../QueryTest/dateless_timestamp_parquet.test | 25 .../queries/QueryTest/dateless_timestamp_text.test | 29 + .../functional-query/queries/QueryTest/exprs.test | 20 ++- .../queries/QueryTest/select-lazy-timestamp.test | 7 -- tests/data_errors/test_data_errors.py | 4 +- tests/query_test/test_cast_with_format.py | 23 +++- tests/query_test/test_scanners.py | 24 40 files changed, 310 insertions(+), 249 deletions(-) create mode 100644 testdata/data/dateless_timestamps.parq create mode 100644 testdata/data/dateless_timestamps.txt create mode 100644 testdata/workloads/functional-query/queries/QueryTest/dateless_timestamp_parquet.test create mode 100644 testdata/workloads/functional-query/queries/QueryTest/dateless_timestamp_text.test
[impala] branch master updated: IMPALA-9515: Full ACID Milestone 3: Read support for "original files"
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new 930264a IMPALA-9515: Full ACID Milestone 3: Read support for "original files" 930264a is described below commit 930264afbdc6d309a30e2c7e1eef9fd7129ef29b Author: Zoltan Borok-Nagy AuthorDate: Tue May 19 11:47:08 2020 +0200 IMPALA-9515: Full ACID Milestone 3: Read support for "original files" "Original files" are files that don't have full ACID schema. We can see such files if we upgrade a non-ACID table to full ACID. Also, the LOAD DATA statement can load non-ACID files into full ACID tables. So such files don't store special ACID columns, that means we need to auto-generate their values. These are (operation, originalTransaction, bucket, rowid, and currentTransaction). With the exception of 'rowid', all of them can be calculated based on the file path, so I add their values to the scanner's template tuple. 'rowid' is the ordinal number of the row inside a bucket inside a directory. For now Impala only allows one file per bucket per directory. Therefore we can generate row ids for each file independently. Multiple files in a single bucket in a directory can only be present if the table was non-transactional earlier and we upgraded it to full ACID table. After the first compaction we should only see one original file per bucket per directory. In HdfsOrcScanner we calculate the first row id for our split then the OrcStructReader fills the rowid slot with the proper values. Testing: * added e2e tests to check if the generated values are correct * added e2e test to reject tables that have multiple files per bucket * added unit tests to the new auxiliary functions Change-Id: I176497ef9873ed7589bd3dee07d048a42dfad953 Reviewed-on: http://gerrit.cloudera.org:8080/16001 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/exec/acid-metadata-utils-test.cc| 29 +++ be/src/exec/acid-metadata-utils.cc | 95 ++-- be/src/exec/acid-metadata-utils.h | 7 +- be/src/exec/hdfs-orc-scanner.cc| 75 ++- be/src/exec/hdfs-orc-scanner.h | 12 + be/src/exec/orc-column-readers.cc | 37 ++- be/src/exec/orc-column-readers.h | 9 + be/src/exec/orc-metadata-utils.cc | 161 ++ be/src/exec/orc-metadata-utils.h | 41 testdata/data/README | 5 + testdata/data/alltypes_non_acid.orc| Bin 0 -> 34176 bytes .../functional/functional_schema_template.sql | 25 +++ .../datasets/functional/schema_constraints.csv | 1 + .../queries/QueryTest/acid-negative.test | 20 ++ .../queries/QueryTest/full-acid-original-file.test | 247 + tests/query_test/test_acid.py | 23 ++ 16 files changed, 711 insertions(+), 76 deletions(-) diff --git a/be/src/exec/acid-metadata-utils-test.cc b/be/src/exec/acid-metadata-utils-test.cc index 7db3e57..e2c4266 100644 --- a/be/src/exec/acid-metadata-utils-test.cc +++ b/be/src/exec/acid-metadata-utils-test.cc @@ -207,3 +207,32 @@ TEST(ValidWriteIdListTest, IsCompacted) { EXPECT_FALSE(ValidWriteIdList::IsCompacted("/foo/000")); EXPECT_FALSE(ValidWriteIdList::IsCompacted("/foo/p=1/000")); } + +TEST(ValidWriteIdListTest, GetWriteIdRange) { + EXPECT_EQ((make_pair(0, 0)), + ValidWriteIdList::GetWriteIdRange("/foo/0_0")); + EXPECT_EQ((make_pair(5, 5)), + ValidWriteIdList::GetWriteIdRange("/foo/base_5/000")); + EXPECT_EQ((make_pair(5, 5)), + ValidWriteIdList::GetWriteIdRange("/foo/base_5_v123/000")); + EXPECT_EQ((make_pair(5 ,10)), + ValidWriteIdList::GetWriteIdRange("/foo/delta_5_00010/000")); + EXPECT_EQ((make_pair(5 ,10)), + ValidWriteIdList::GetWriteIdRange("/foo/delta_5_00010_0006/000")); + EXPECT_EQ((make_pair(5 ,10)), + ValidWriteIdList::GetWriteIdRange("/foo/delta_5_00010_v123/000")); +} + +TEST(ValidWriteIdListTest, GetBucketProperty) { + EXPECT_EQ(536870912, ValidWriteIdList::GetBucketProperty("/foo/000_0")); + EXPECT_EQ(536936448, ValidWriteIdList::GetBucketProperty("/foo/001_1")); + EXPECT_EQ(537001984, ValidWriteIdList::GetBucketProperty("/foo/bucket_2")); + EXPECT_EQ(537067520, ValidWriteIdList::GetBucketProperty( + "/foo/base_0001_v1/bucket_03_0")); + EXPECT_EQ(537133056, ValidWriteIdList::GetBucketProperty( + "/foo/delta_1_5/bucket_000
[impala] branch master updated: IMPALA-9878: Fix use-after-free in TmpFileMgrTest's TestAllocation
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new 7b1cfac IMPALA-9878: Fix use-after-free in TmpFileMgrTest's TestAllocation 7b1cfac is described below commit 7b1cfacbc6c4c709947cb91517baa9ec364afee1 Author: Joe McDonnell AuthorDate: Mon Jun 22 09:36:33 2020 -0700 IMPALA-9878: Fix use-after-free in TmpFileMgrTest's TestAllocation ASAN found a use-after-free for the in this code: file_group.Close(); <--- free underlying storage for 'file' EXPECT_FALSE(boost::filesystem::exists(file->path())); <-- use 'file' This switches it to a copy of file->path(). Testing: - Ran tmp-file-mgr-test under ASAN Change-Id: Idd5cbae70c287c78db8d1c560d8c777d6bed5b56 Reviewed-on: http://gerrit.cloudera.org:8080/16099 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- be/src/runtime/tmp-file-mgr-test.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/be/src/runtime/tmp-file-mgr-test.cc b/be/src/runtime/tmp-file-mgr-test.cc index 57a409b..fcf581c 100644 --- a/be/src/runtime/tmp-file-mgr-test.cc +++ b/be/src/runtime/tmp-file-mgr-test.cc @@ -278,7 +278,7 @@ TEST_F(TmpFileMgrTest, TestFileAllocation) { // tmp file is only allocated on writes. EXPECT_OK(FileSystemUtil::CreateFile(file->path())); file_group.Close(); - EXPECT_FALSE(boost::filesystem::exists(file->path())); + EXPECT_FALSE(boost::filesystem::exists(file_path)); CheckMetrics(_file_mgr); }
[impala] branch master updated: IMPALA-3695: Remove KUDU_IS_SUPPORTED
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new 6ec6aaa IMPALA-3695: Remove KUDU_IS_SUPPORTED 6ec6aaa is described below commit 6ec6aaae8edc552feb3416bebba0ed355c36e46e Author: Tim Armstrong AuthorDate: Mon Jun 15 21:33:34 2020 -0700 IMPALA-3695: Remove KUDU_IS_SUPPORTED Testing: Ran exhaustive tests. Change-Id: I059d7a42798c38b570f25283663c284f2fcee517 Reviewed-on: http://gerrit.cloudera.org:8080/16085 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- CMakeLists.txt | 2 - bin/bootstrap_toolchain.py | 137 + bin/impala-config.sh | 13 -- common/thrift/generate_error_codes.py | 2 +- docker/entrypoint.sh | 22 ++-- .../java/org/apache/impala/common/RuntimeEnv.java | 3 - .../org/apache/impala/analysis/AnalyzeDDLTest.java | 6 +- .../apache/impala/analysis/AnalyzeKuduDDLTest.java | 3 - .../impala/analysis/AnalyzeModifyStmtsTest.java| 7 -- .../apache/impala/analysis/AnalyzeStmtsTest.java | 14 +-- .../impala/analysis/AnalyzeUpsertStmtTest.java | 1 - .../apache/impala/analysis/AuditingKuduTest.java | 1 - .../apache/impala/analysis/ExprRewriterTest.java | 62 +- .../org/apache/impala/analysis/ParserTest.java | 1 - .../java/org/apache/impala/analysis/ToSqlTest.java | 2 - .../org/apache/impala/planner/PlannerTest.java | 6 - .../org/apache/impala/planner/PlannerTestBase.java | 4 +- .../java/org/apache/impala/testutil/TestUtils.java | 4 - infra/python/bootstrap_virtualenv.py | 6 +- testdata/bin/compute-table-stats.sh| 5 +- testdata/bin/create-load-data.sh | 2 +- testdata/cluster/admin | 4 +- tests/common/kudu_test_suite.py| 3 - tests/common/skip.py | 4 - tests/common/test_dimensions.py| 8 +- tests/comparison/leopard/impala_docker_env.py | 43 --- tests/metadata/test_ddl.py | 1 - tests/metadata/test_show_create_table.py | 2 - tests/query_test/test_resource_limits.py | 3 +- tests/shell/test_shell_commandline.py | 1 - 30 files changed, 80 insertions(+), 292 deletions(-) diff --git a/CMakeLists.txt b/CMakeLists.txt index bc8c983..0f273bd 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -402,8 +402,6 @@ else() set(kuduClient_DIR "$ENV{IMPALA_KUDU_HOME}/release/share/kuduClient/cmake") endif() endif() -# When KUDU_IS_SUPPORTED is false, the Kudu client is expected to be a non-functional -# stub. It's still needed to link though. find_package(kuduClient REQUIRED NO_DEFAULT_PATH) include_directories(SYSTEM ${KUDU_CLIENT_INCLUDE_DIR}) diff --git a/bin/bootstrap_toolchain.py b/bin/bootstrap_toolchain.py index bb71577..53d0f8b 100755 --- a/bin/bootstrap_toolchain.py +++ b/bin/bootstrap_toolchain.py @@ -41,9 +41,6 @@ # DOWNLOAD_CDH_COMPONENTS - When set to true, this script will also download and extract # the CDP Hadoop components (i.e. Hadoop, Hive, HBase, Ranger, etc) into # CDP_COMPONENTS_HOME as appropriate. -# KUDU_IS_SUPPORTED - If KUDU_IS_SUPPORTED is false, Kudu is disabled and we download -# the toolchain Kudu and use the symbols to compile a non-functional stub library so -# that Impala has something to link against. # IMPALA__VERSION - The version expected for . This is typically # configured in bin/impala-config.sh and must exist for every package. This is used # to construct an appropriate URL and expected archive name. @@ -405,115 +402,6 @@ def check_custom_toolchain(toolchain_packages_home, packages): raise Exception("Toolchain bootstrap failed: required packages were missing") -def build_kudu_stub(kudu_dir, gcc_dir): - """When Kudu isn't supported, the CentOS 7 Kudu package is downloaded from the - toolchain. This replaces the client lib with a stubbed client. The - 'kudu_dir' specifies the location of the unpacked CentOS 7 Kudu package. - The 'gcc_dir' specifies the location of the unpacked GCC/G++.""" - - print "Building kudu stub" - # Find the client lib files in the Kudu dir. There may be several files with - # various extensions. Also there will be a debug version. - client_lib_paths = [] - for path, _, files in os.walk(kudu_dir): -for file in files: - if not file.startswith("libkudu_client.so"): -continue - file_path = os.path.join(path, file) - if os.path.islink(file_path): -continue - cli
[impala] branch master updated: IMPALA-9862: Don't exclude Solr dependencies in frontend build
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new fd810e6 IMPALA-9862: Don't exclude Solr dependencies in frontend build fd810e6 is described below commit fd810e6f0926e0651c0ae30dd7ae654962701b95 Author: Joe McDonnell AuthorDate: Tue Jun 16 19:39:19 2020 -0700 IMPALA-9862: Don't exclude Solr dependencies in frontend build Ranger can be configured in a variety of ways and some have a runtime dependency on Solr. If Solr is excluded, then Impala can fail to startup due to ClassNotFoundException for org.apache.solr.SolrException. This removes the exclusion for Solr from fe/pom.xml. Testing: - Tests on a cluster that previously failed with ClassNotFoundException now pass. Change-Id: Ifb74c20a56e5795cba2efbe887d32392af4017f3 Reviewed-on: http://gerrit.cloudera.org:8080/16089 Reviewed-by: Joe McDonnell Tested-by: Impala Public Jenkins --- fe/pom.xml | 6 -- 1 file changed, 6 deletions(-) diff --git a/fe/pom.xml b/fe/pom.xml index ec40300..787cafa 100644 --- a/fe/pom.xml +++ b/fe/pom.xml @@ -212,10 +212,6 @@ under the License. org.apache.kafka kafka_2.11 - - org.apache.solr - * - @@ -692,8 +688,6 @@ under the License. io.netty:* org.rocksdb:* - -org.apache.solr:* com.sun.jersey:jersey-server com.sun.jersey:jersey-server
[impala] branch master updated (aa6d788 -> d38e4d1)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from aa6d788 IMPALA-9842: Fix hang when cancelling BufferedPlanRootSink new bf94582 IMPALA-9831: Fix off by one error in condition for ValidateColumnOffsets() new a89489c IMPALA-9604: Add TPCH-nested tests for column masking new d38e4d1 IMPALA-9435: Usability enhancements for data cache access trace The 3 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: be/src/exec/parquet/parquet-metadata-utils.cc | 2 +- be/src/runtime/io/CMakeLists.txt | 7 + be/src/runtime/io/data-cache-test.cc | 156 +++-- be/src/runtime/io/data-cache-trace-replayer.cc | 203 +++ be/src/runtime/io/data-cache-trace-test.cc | 338 ++ be/src/runtime/io/data-cache-trace.cc | 374 be/src/runtime/io/data-cache-trace.h | 247 ++ be/src/runtime/io/data-cache.cc| 378 + be/src/runtime/io/data-cache.h | 50 ++- bin/start-impala-cluster.py| 26 ++ .../queries/masked-tpch_nested-q10.test| 58 ...nested-q15.test => masked-tpch_nested-q15.test} | 2 +- .../queries/masked-tpch_nested-q18.test| 81 + .../tpch_nested/queries/masked-tpch_nested-q2.test | 147 .../queries/masked-tpch_nested-q20.test| 42 +++ .../queries/masked-tpch_nested-q21.test| 47 +++ .../tpch_nested/queries/masked-tpch_nested-q9.test | 37 ++ tests/authorization/test_ranger.py | 55 +++ 18 files changed, 1976 insertions(+), 274 deletions(-) create mode 100644 be/src/runtime/io/data-cache-trace-replayer.cc create mode 100644 be/src/runtime/io/data-cache-trace-test.cc create mode 100644 be/src/runtime/io/data-cache-trace.cc create mode 100644 be/src/runtime/io/data-cache-trace.h create mode 100644 testdata/workloads/tpch_nested/queries/masked-tpch_nested-q10.test copy testdata/workloads/tpch_nested/queries/{tpch_nested-q15.test => masked-tpch_nested-q15.test} (91%) create mode 100644 testdata/workloads/tpch_nested/queries/masked-tpch_nested-q18.test create mode 100644 testdata/workloads/tpch_nested/queries/masked-tpch_nested-q2.test create mode 100644 testdata/workloads/tpch_nested/queries/masked-tpch_nested-q20.test create mode 100644 testdata/workloads/tpch_nested/queries/masked-tpch_nested-q21.test create mode 100644 testdata/workloads/tpch_nested/queries/masked-tpch_nested-q9.test
[impala] 02/03: IMPALA-9604: Add TPCH-nested tests for column masking
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit a89489cbc8c8a4b4a9222e0de318ae0d0d8ad26e Author: stiga-huang AuthorDate: Mon Apr 6 11:26:27 2020 +0800 IMPALA-9604: Add TPCH-nested tests for column masking Add tests for TPCH-nested queries with column masking policies on the PII columns (phone, name, address). Some queries have the same results as without the column masking policies so we reuse their test files. Change-Id: I4a6c9fc480923369952e8e215f4a90b2f6448028 Reviewed-on: http://gerrit.cloudera.org:8080/15655 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- .../queries/masked-tpch_nested-q10.test| 58 .../queries/masked-tpch_nested-q15.test| 38 ++ .../queries/masked-tpch_nested-q18.test| 81 .../tpch_nested/queries/masked-tpch_nested-q2.test | 147 + .../queries/masked-tpch_nested-q20.test| 42 ++ .../queries/masked-tpch_nested-q21.test| 47 +++ .../tpch_nested/queries/masked-tpch_nested-q9.test | 37 ++ tests/authorization/test_ranger.py | 55 8 files changed, 505 insertions(+) diff --git a/testdata/workloads/tpch_nested/queries/masked-tpch_nested-q10.test b/testdata/workloads/tpch_nested/queries/masked-tpch_nested-q10.test new file mode 100644 index 000..2da39ee --- /dev/null +++ b/testdata/workloads/tpch_nested/queries/masked-tpch_nested-q10.test @@ -0,0 +1,58 @@ + + QUERY: TPCH-Q10 +# Q10 - Returned Item Reporting Query +# Converted select from multiple tables to joins +select + c_custkey, + c_name, + sum(l_extendedprice * (1 - l_discount)) as revenue, + c_acctbal, + n_name, + c_address, + c_phone, + c_comment +from + customer c, + c.c_orders o, + o.o_lineitems l, + region.r_nations n +where + o_orderdate >= '1993-10-01' + and o_orderdate < '1994-01-01' + and l_returnflag = 'R' + and c_nationkey = n_nationkey +group by + c_custkey, + c_name, + c_acctbal, + c_phone, + n_name, + c_address, + c_comment +order by + revenue desc +limit 20 + RESULTS +57040,'Xxxx#n',734235.2455,632.87,'JAPAN','Xxxnxx','22-8xx-xxx-','sits. slyly regular requests sleep alongside of the regular inst' +143347,'Xxxx#n',721002.6948,2557.47,'EGYPT','nxXxXXx,Xxn','14-7xx-xxx-','ggle carefully enticing requests. final deposits use bold, bold pinto beans. ironic, idle re' +60838,'Xxxx#n',679127.3077,2454.77,'BRAZIL','nnXxXnxXxXXxXxxxXxnXXxXX','12-9xx-xxx-',' need to boost against the slyly regular account' +101998,'Xxxx#n',637029.5667,3790.89,'UNITED KINGDOM','nnxnXXXxXxxXXXxXx','33-5xx-xxx-','ress foxes wake slyly after the bold excuses. ironic platelets are furiously carefully bold theodolites' +125341,'Xxxx#n',633508.0860,4983.51,'GERMANY','XnnXXXnxxxXnXXxxXXxxxXxX','17-5xx-xxx-','arefully even depths. blithely even excuses sleep furiously. foxes use except the dependencies. ca' +25501,'Xxxx#n',620269.7849,7725.04,'ETHIOPIA',' XnnnXXxxXX,XxnXnxxXnXX','15-8xx-xxx-','he pending instructions wake carefully at the pinto beans. regular, final instructions along the slyly fina' +115831,'Xxxx#n',596423.8672,5098.10,'FRANCE','xXxXxXXxx xx xxnxXnxXnxXnnxXnxxxXxXx','16-7xx-xxx-','l somas sleep. furiously final deposits wake blithely regular pinto b' +84223,'Xxxx#n',594998.0239,528.65,'UNITED KINGDOM','xxnXxXxx xxXnnX nxXxxxnXXX','33-4xx-xxx-',' slyly final deposits haggle regular, pending dependencies. pending escapades wake ' +54289,'Xxxx#n',585603.3918,5583.02,'IRAN','xXXxxXxXnXxxnXXX ,X','20-8xx-xxx-','ely special foxes are quickly finally ironic p' +39922,'Xxxx#n',584878.1134,7321.11,'GERMANY','XxxnxnnxnXXXnxXnxnnnxXxnX','17-1xx-xxx-','y final requests. furiously final foxes cajole blithely special platelets. f' +6226,'Xxxx#n',576783.7606,2230.09,'UNITED KINGDOM','nxXxn,XXXxxxXXnxxx,xxXnx,','33-6xx-xxx-','ending platelets along the express deposits cajole carefully final ' +922,'Xxxx#n',576767.5333,3869.25,'GERMANY','XxnXXxxxnXxXxxnxXXnXxXxXxxnxXxx','17-9xx-xxx-','luffily fluffy deposits. packages c' +147946,'Xxxx#n',576455.1320,2030.13,'ALGERIA','xXXxXXxnXxxxnxXxXxxX','10-8xx-xxx-','ithely ironic deposits haggle blithely ironic requests. quickly regu' +115640,'Xxxx#n',569341.1933,6436.10,'ARGENTINA','XxnxX nXxXxxxXnX','11-4xx-xxx-','ost slyly along the patterns; pinto be' +73606,'Xxxx#n',568656.8578,1785.67,'JAPAN','xxXnXxxnxXxXxXXnxx','22-4xx-xxx-','he furiously regular ideas. slowly' +110246,'Xxxx#nnn
[impala] 01/03: IMPALA-9831: Fix off by one error in condition for ValidateColumnOffsets()
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit bf945824e62a48822414c9375d6641dc499846a9 Author: Joe McDonnell AuthorDate: Tue Jun 16 08:18:36 2020 -0700 IMPALA-9831: Fix off by one error in condition for ValidateColumnOffsets() ParquetMetadataUtils::ValidateColumnOffsets() returns an error if the end of the column is beyond the end of the file (i.e. offset > end_of_file). Instead, because there is a footer, the end of column must not be the end of the file either, so it should use offset >= end_of_file. Otherwise, a subsequent DCHECK in ParquetPageReader using the stricter condition will fire. Testing: - Core job Change-Id: I16bd6dfbb8eeacc1cb854ed4a3c2ed9f1c3aa11f Reviewed-on: http://gerrit.cloudera.org:8080/16086 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/exec/parquet/parquet-metadata-utils.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/be/src/exec/parquet/parquet-metadata-utils.cc b/be/src/exec/parquet/parquet-metadata-utils.cc index aece0e1..7ec32b6 100644 --- a/be/src/exec/parquet/parquet-metadata-utils.cc +++ b/be/src/exec/parquet/parquet-metadata-utils.cc @@ -245,7 +245,7 @@ Status ParquetMetadataUtils::ValidateColumnOffsets(const string& filename, } int64_t col_len = col_chunk.meta_data.total_compressed_size; int64_t col_end = col_start + col_len; -if (col_end <= 0 || col_end > file_length) { +if (col_end <= 0 || col_end >= file_length) { return Status(Substitute("Parquet file '$0': metadata is corrupt. Column $1 has " "invalid column offsets (offset=$2, size=$3, file_size=$4).", filename, i, col_start, col_len, file_length));
[impala] branch master updated: IMPALA-9842: Fix hang when cancelling BufferedPlanRootSink
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new aa6d788 IMPALA-9842: Fix hang when cancelling BufferedPlanRootSink aa6d788 is described below commit aa6d7887eec1efd33c73f77e5346a499569e5b6b Author: Joe McDonnell AuthorDate: Tue Jun 16 11:57:05 2020 -0700 IMPALA-9842: Fix hang when cancelling BufferedPlanRootSink In BufferedPlanRootSink::FlushFinal(), if Cancel() runs before FlushFinal() waits on the consumer_eos_ condition variable, the thread in FlushFinal() will wait forever. This is because it is not checking for cancellation or synchronizing with the Cancel() thread. Specifically: Thread A: Calls BufferedPlanRootSink::Cancel(), signalling any thread currently waiting on the consumer_eos_ condition variable. Thread B: Enters FlushFinal(). Never tests RuntimeState::is_cancelled() and calls Wait() on the consumer_eos_ condition variable. This waits forever. This changes BufferedPlanRootSink::Cancel() to get the lock_ before signalling the consumer_eos_ condition variable. It also changes FlushFinal() to call Wait() in a loop. It breaks out of the loop if it is cancelled or the batch_queue_ is empty. There are two cases: 1. FlushFinal() gets the lock_ first and only releases it when waiting on the consumer_eos_ condition variable. It will get signalled by Cancel(). 2. Cancel() gets the lock_ first and FlushFinal() will not wait, because is_cancelled() is true. Testing: - Run core tests Change-Id: Id6f3fbc05420ca95313fa79ea106547feb92b16b Reviewed-on: http://gerrit.cloudera.org:8080/16088 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- be/src/exec/buffered-plan-root-sink.cc | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/be/src/exec/buffered-plan-root-sink.cc b/be/src/exec/buffered-plan-root-sink.cc index 277eac6..6dd69dd 100644 --- a/be/src/exec/buffered-plan-root-sink.cc +++ b/be/src/exec/buffered-plan-root-sink.cc @@ -105,8 +105,9 @@ Status BufferedPlanRootSink::FlushFinal(RuntimeState* state) { // If no batches are ever added, wake up the consumer thread so it can check the // SenderState and return appropriately. rows_available_.NotifyAll(); - // Wait until the consumer has read all rows from the batch_queue_. - { + // Wait until the consumer has read all rows from the batch_queue_ or this has + // been cancelled. + while (!IsCancelledOrClosed(state) && !IsQueueEmpty(state)) { SCOPED_TIMER(profile()->inactive_timer()); consumer_eos_.Wait(l); } @@ -136,6 +137,14 @@ void BufferedPlanRootSink::Close(RuntimeState* state) { void BufferedPlanRootSink::Cancel(RuntimeState* state) { DCHECK(state->is_cancelled()); + // Get the lock_ to synchronize with FlushFinal(). Either FlushFinal() will be waiting + // on the consumer_eos_ condition variable and get signalled below, or it will see + // that is_cancelled() is true after it gets the lock. Drop the the lock before + // signalling the CV so that a blocked thread can immediately acquire the mutex when + // it wakes up. + { +unique_lock l(lock_); + } // Wake up all sleeping threads so they can check the cancellation state. // While it should be safe to call NotifyOne() here, prefer to use NotifyAll() to // ensure that all sleeping threads are awoken. The calls to NotifyAll() are not on the
[impala] 02/02: IMPALA-9858: Fix wrong partition metrics in LocalCatalog profile
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit ee70df2e9006d3592175af5f1fa7ec128f5f1b8d Author: stiga-huang AuthorDate: Mon Jun 15 15:07:19 2020 +0800 IMPALA-9858: Fix wrong partition metrics in LocalCatalog profile The hits and requests metrics of partitions are overcounted due to using an updated map. This patch fixes it and adds test coverage on partition metrics. Tests - Run CatalogdMetaProviderTest Change-Id: I10cabce2908f1d252b90390978e679d31003e89d Reviewed-on: http://gerrit.cloudera.org:8080/16080 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- .../impala/catalog/local/CatalogdMetaProvider.java | 4 +- .../catalog/local/CatalogdMetaProviderTest.java| 61 +++--- 2 files changed, 43 insertions(+), 22 deletions(-) diff --git a/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java b/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java index 03aba1e..02195ad 100644 --- a/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java +++ b/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java @@ -895,8 +895,8 @@ public class CatalogdMetaProvider implements MetaProvider { storePartitionsInCache(refImpl, hostIndex, fromCatalogd); } sw.stop(); -addStatsToProfile(PARTITIONS_STATS_CATEGORY, refToMeta.size(), numMisses, sw); -LOG.trace("Request for partitions of {}: hit {}/{}", table, refToMeta.size(), +addStatsToProfile(PARTITIONS_STATS_CATEGORY, numHits, numMisses, sw); +LOG.trace("Request for partitions of {}: hit {}/{}", table, numHits, partitionRefs.size()); // Convert the returned map to be by-name instead of by-ref. diff --git a/fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java b/fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java index c4dcf2d..b378ce0 100644 --- a/fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java +++ b/fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java @@ -20,7 +20,6 @@ package org.apache.impala.catalog.local; import static org.junit.Assert.*; import java.util.ArrayList; -import java.util.Collections; import java.util.List; import java.util.Map; import java.util.concurrent.ExecutorService; @@ -44,6 +43,7 @@ import org.apache.impala.thrift.TBackendGflags; import org.apache.impala.thrift.TBriefTableMeta; import org.apache.impala.thrift.TCatalogObject; import org.apache.impala.thrift.TCatalogObjectType; +import org.apache.impala.thrift.TCounter; import org.apache.impala.thrift.TDatabase; import org.apache.impala.thrift.TNetworkAddress; import org.apache.impala.thrift.TRuntimeProfileNode; @@ -58,11 +58,13 @@ import com.google.common.base.Stopwatch; import com.google.common.cache.CacheStats; import com.google.common.collect.ImmutableCollection; import com.google.common.collect.ImmutableList; +import com.google.common.collect.Maps; public class CatalogdMetaProviderTest { private final static Logger LOG = LoggerFactory.getLogger( CatalogdMetaProviderTest.class); + private final static ListMap HOST_INDEX = new ListMap<>(); private final CatalogdMetaProvider provider_; private final TableMetaRef tableRef_; @@ -113,33 +115,36 @@ public class CatalogdMetaProviderTest { public void testCachePartitionsByRef() throws Exception { List allRefs = provider_.loadPartitionList(tableRef_); List partialRefs = allRefs.subList(3, 8); -ListMap hostIndex = new ListMap<>(); CacheStats stats = diffStats(); // Should get no hits on the initial load of partitions. -Map partMap = provider_.loadPartitionsByRefs( -tableRef_, /* partitionColumnNames unused by this impl */null, hostIndex, -partialRefs); +Map partMap = loadPartitions(tableRef_, partialRefs); assertEquals(partialRefs.size(), partMap.size()); stats = diffStats(); assertEquals(0, stats.hitCount()); // Load the same partitions again and we should get a hit for each partition. -Map partMapHit = provider_.loadPartitionsByRefs( -tableRef_, /* partitionColumnNames unused by this impl */null, hostIndex, -partialRefs); +Map partMapHit = loadPartitions(tableRef_, partialRefs); stats = diffStats(); assertEquals(stats.hitCount(), partMapHit.size()); // Load all of the partitions: we should get some hits and some misses. -Map allParts = provider_.loadPartitionsByRefs( -tableRef_, /* partitionColumnNames unused by this impl */null, hostIndex, -allRefs); +Map allParts = loadPartitions(tableRef_, allRefs); assertEquals(allRefs.size(), allParts.size()); stats
[impala] branch master updated (13fbe51 -> ee70df2)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from 13fbe51 IMPALA-9838: Switch to GCC 7.5.0 new 419aa2e IMPALA-9778: Refactor partition modifications in DDL/DMLs new ee70df2 IMPALA-9858: Fix wrong partition metrics in LocalCatalog profile The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .../impala/catalog/CatalogServiceCatalog.java | 6 +- .../org/apache/impala/catalog/FeCatalogUtils.java | 28 +- .../org/apache/impala/catalog/HdfsPartition.java | 615 + .../java/org/apache/impala/catalog/HdfsTable.java | 384 - .../impala/catalog/ParallelFileMetadataLoader.java | 23 +- .../apache/impala/catalog/PartitionStatsUtil.java | 2 +- .../main/java/org/apache/impala/catalog/Table.java | 15 + .../impala/catalog/local/CatalogdMetaProvider.java | 4 +- .../apache/impala/service/CatalogOpExecutor.java | 202 +++ .../org/apache/impala/util/HdfsCachingUtil.java| 21 +- .../catalog/CatalogObjectToFromThriftTest.java | 10 +- .../org/apache/impala/catalog/CatalogTest.java | 7 +- .../catalog/local/CatalogdMetaProviderTest.java| 61 +- 13 files changed, 853 insertions(+), 525 deletions(-)
[impala] 01/02: IMPALA-9778: Refactor partition modifications in DDL/DMLs
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 419aa2e30db326f02e9b4ec563ef7864e82df86e Author: stiga-huang AuthorDate: Mon May 25 18:01:38 2020 +0800 IMPALA-9778: Refactor partition modifications in DDL/DMLs After this patch, in DDL/DMLs that update metadata of partitions, instead of updating partitions in place, we always create new ones and use them to replace the existing instances. This is guarded by making HdfsPartition immutable. There are several benefits for this: - HdfsPartition can be shared across table versions. In full catalog update mode, catalog update can ignore unchanged partitions (IMPALA-3234) and send the update in partition granularity. - Aborted DDL/DMLs won't leave partition metadata in a bad shape (e.g. IMPALA-8406), which usually requires invalidation to recover. - Fetch-on-demand coordinators can cache partition meta using the partition id as the key. When table version updates, only metadata of changed partitions need to be reloaded (IMPALA-7533). - In the work of decoupling partitions from tables (IMPALA-3127), we don't need to assign a catalog version to partitions since the partition ids already identify the partitions. However, HdfsPartition is not strictly immutable. Although all its fields are final, some fields are still referencing mutable objects. We need more refactoring to achieve this. This patch focuses on refactoring the DDL/DML code paths. Changes: - Make all fields of HdfsPartition final. Move HdfsPartition constructor logics and all its update methods into HdfsPartition.Builder. - Refactor in-place updates on HdfsPartition to be creating a new one and dropping the old one. HdfsPartition.Builder represents the in-progress modifications. Once all modifications are done, call its build() method to create the new HdfsPartition instance. The old HdfsPartition instance is only replaced at the end of the modifications. - Move the "dirty" marker of HdfsPartition into a map of HdfsTable. It maps from the old partition id to the in-progress partition builder. For "dirty" partitions, we’ll reload its HMS meta and file meta. Tests: - No new tests are added since the existing tests already provide sufficient coverage - Run CORE tests Change-Id: Ib52e5810d01d5e0c910daacb9c98977426d3914c Reviewed-on: http://gerrit.cloudera.org:8080/15985 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- .../impala/catalog/CatalogServiceCatalog.java | 6 +- .../org/apache/impala/catalog/FeCatalogUtils.java | 28 +- .../org/apache/impala/catalog/HdfsPartition.java | 615 + .../java/org/apache/impala/catalog/HdfsTable.java | 384 - .../impala/catalog/ParallelFileMetadataLoader.java | 23 +- .../apache/impala/catalog/PartitionStatsUtil.java | 2 +- .../main/java/org/apache/impala/catalog/Table.java | 15 + .../apache/impala/service/CatalogOpExecutor.java | 202 +++ .../org/apache/impala/util/HdfsCachingUtil.java| 21 +- .../catalog/CatalogObjectToFromThriftTest.java | 10 +- .../org/apache/impala/catalog/CatalogTest.java | 7 +- 11 files changed, 810 insertions(+), 503 deletions(-) diff --git a/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java b/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java index 994abff..fafad2c 100644 --- a/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java +++ b/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java @@ -35,6 +35,7 @@ import java.util.concurrent.Semaphore; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicLong; import java.util.concurrent.locks.ReentrantReadWriteLock; +import java.util.stream.Collectors; import org.apache.commons.collections.MapUtils; import org.apache.hadoop.fs.RemoteIterator; @@ -3125,8 +3126,11 @@ public class CatalogServiceCatalog extends Catalog { "Unable to fetch valid transaction ids while loading file metadata for table " + table.getFullName(), ex); } + List partBuilders = partToPartialInfoMap.keySet().stream() + .map(HdfsPartition.Builder::new) + .collect(Collectors.toList()); Map> fdsByPart = new ParallelFileMetadataLoader( - table, partToPartialInfoMap.keySet(), reqWriteIdList, validTxnList, logPrefix) + table, partBuilders, reqWriteIdList, validTxnList, logPrefix) .loadAndGet(); for (HdfsPartition partition : fdsByPart.keySet()) { TPartialPartitionInfo partitionInfo = partToPartialInfoMap.get(par
[impala] 01/03: IMPALA-9849: Set halt_on_error=1 for TSAN builds
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 38b96174621c7d2f58b580d1e2bba4b95b261d1c Author: Sahil Takiar AuthorDate: Mon Jun 8 11:35:05 2020 -0700 IMPALA-9849: Set halt_on_error=1 for TSAN builds Set halt_on_error to true by default for TSAN builds (we already do this for ASAN builds). This ensures that Impala crashes whenever a TSAN error is detected. IMPALA-9568 accidentally broke this. Testing: * Ran dataload + be tests in a TSAN build Change-Id: I268c338d9194a66b37c3ccd97027e3543d27bea7 Reviewed-on: http://gerrit.cloudera.org:8080/16069 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/common/init.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/be/src/common/init.cc b/be/src/common/init.cc index db47282..4524f44 100644 --- a/be/src/common/init.cc +++ b/be/src/common/init.cc @@ -426,7 +426,7 @@ extern "C" const char* __tsan_default_options() { #else "1 " #endif - "history_size=7 allocator_may_return_null=1 " + "halt_on_error=1 history_size=7 allocator_may_return_null=1 " "suppressions=" THREAD_SANITIZER_SUPPRESSIONS; } #endif
[impala] 02/03: IMPALA-9709: Remove Impala-lzo from the development environment
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit f15a311065f2d30b727d53d96fae87f07132e4d9 Author: Joe McDonnell AuthorDate: Sun Apr 26 18:38:26 2020 -0700 IMPALA-9709: Remove Impala-lzo from the development environment This removes Impala-lzo from the Impala development environment. Impala-lzo is not built as part of the Impala build. The LZO plugin is no longer loaded. LZO tables are not loaded during dataload, and LZO is no longer tested. This removes some obsolete scan APIs that were only used by Impala-lzo. With this commit, Impala-lzo would require code changes to build against Impala. The plugin infrastructure is not removed, and this leaves some LZO support code in place. If someone were to decide to revive Impala-lzo, they would still be able to load it as a plugin and get the same functionality as before. This plugin support may be removed later. Testing: - Dryrun of GVO - Modified TestPartitionMetadataUncompressedTextOnly's test_unsupported_text_compression() to add LZO case Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e Reviewed-on: http://gerrit.cloudera.org:8080/15814 Reviewed-by: Bikramjeet Vig Tested-by: Joe McDonnell --- CMakeLists.txt | 11 --- be/src/exec/hdfs-plugin-text-scanner.cc| 6 ++-- be/src/exec/hdfs-scan-node-base.cc | 10 +- be/src/exec/hdfs-scan-node-base.h | 12 ++-- be/src/util/codec.cc | 2 +- bin/bootstrap_system.sh| 23 ++ bin/clean.sh | 7 - bin/impala-config.sh | 18 +++ bin/set-ld-library-path.sh | 3 -- bin/start-impala-cluster.py| 7 - buildall.sh| 10 -- docker/entrypoint.sh | 8 - docker/impala_base/Dockerfile | 4 +-- docker/test-with-docker.py | 13 +--- .../org/apache/impala/analysis/ToSqlUtils.java | 3 +- .../org/apache/impala/catalog/HdfsCompression.java | 4 ++- .../org/apache/impala/catalog/HdfsFileFormat.java | 6 ++-- .../org/apache/impala/planner/HdfsScanNode.java| 1 - .../org/apache/impala/planner/HdfsTableSink.java | 6 ++-- .../apache/impala/analysis/AnalyzeStmtsTest.java | 10 +++--- .../org/apache/impala/analysis/AnalyzerTest.java | 10 +++--- testdata/bad_text_lzo/bad_text.lzo | Bin 736999 -> 0 bytes testdata/bad_text_lzo/bad_text.lzo.index | Bin 5192 -> 0 bytes testdata/bin/create-load-data.sh | 22 - testdata/bin/generate-schema-statements.py | 31 +++ testdata/bin/generate-test-vectors.py | 1 - testdata/bin/load_nested.py| 5 ++- testdata/bin/lzo_indexer.sh| 20 .../common/etc/hadoop/conf/core-site.xml.py| 3 -- .../common/etc/hadoop/conf/yarn-site.xml.py| 4 +-- .../functional/functional_schema_template.sql | 11 --- .../datasets/functional/schema_constraints.csv | 4 --- .../joins-hdfs-num-rows-est-enabled.test | 8 ++--- .../queries/PlannerTest/joins.test | 8 ++--- .../functional-query_dimensions.csv| 2 +- .../functional-query_exhaustive.csv| 1 - .../DataErrorsTest/hdfs-scan-node-errors.test | 18 --- .../queries/QueryTest/disable-lzo-plugin.test | 7 - .../queries/QueryTest/show-create-table.test | 12 .../unsupported-compression-partitions.test| 9 +- .../perf-regression/perf-regression_dimensions.csv | 2 +- .../perf-regression/perf-regression_exhaustive.csv | 1 - .../perf-regression/perf-regression_pairwise.csv | 1 - .../targeted-perf/targeted-perf_dimensions.csv | 2 +- .../targeted-perf/targeted-perf_exhaustive.csv | 1 - .../targeted-perf/targeted-perf_pairwise.csv | 1 - .../targeted-stress/targeted-stress_dimensions.csv | 2 +- .../targeted-stress/targeted-stress_exhaustive.csv | 1 - .../targeted-stress/targeted-stress_pairwise.csv | 1 - .../tpcds-unmodified_dimensions.csv| 2 +- .../tpcds-unmodified_exhaustive.csv| 1 - .../tpcds-unmodified/tpcds-unmodified_pairwise.csv | 1 - testdata/workloads/tpcds/tpcds_dimensions.csv | 2 +- testdata/workloads/tpcds/tpcds_exhaustive.csv | 1 - testdata/workloads/tpcds/tpcds_pairwise.csv| 1 - testdata/workloads/tpch/tpch_dimensions.csv
[impala] 03/03: IMPALA-9838: Switch to GCC 7.5.0
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 13fbe510c0d70a8cbe82f0ca83f59b3faf5353c8 Author: Joe McDonnell AuthorDate: Thu May 28 22:12:21 2020 -0700 IMPALA-9838: Switch to GCC 7.5.0 This upgrades GCC and libstdc++ to version 7.5.0. There have been ABI changes since 4.9.2, so this means that the native-toolchain produced with the new compiler is not interoperable with one produced by the old compiler. To allow that transition, IMPALA_TOOLCHAIN_PACKAGES_HOME is now a subdirectory of IMPALA_TOOLCHAIN (toolchain-packages-gcc${IMPALA_GCC_VERSION}) to distinguish it from the old packages. Some Python packages in the impala-python virtualenv are compiled using the toolchain GCC and now use the new ABI. This leads to two changes: 1. When constructing the LD_LIBRARY_PATH for impala-python, we include the GCC libstdc++ libraries. Otherwise, certain Python packages that use C++ fail on older OSes like Centos 7. This fixes IMPALA-9804. 2. Since developers work on various branches, this changes the virtualenv's directory location to a directory with the GCC version in the name. This allows the virtualenv built with GCC 7 to coexist with the current virtualenv built with GCC 4.9.2. The location for the old virtualenv is ${IMPALA_HOME}/infra/python/env. The new location is ${IMPALA_HOME}/infra/python/env-gcc${IMPALA_GCC_VERSION}. This required updating several impala-python scripts. There are various odds-and-ends related to the transition: 1. Due to the small string optimization, the size of std::string changed, which means that various data structures also changed in size. This required updating some static asserts. 2. There is a bug in clang-tidy that reports a use-after-free for some code using std::shared_ptr. Clang is not modeling the shared_ptr correctly, so it is a false-positive. As a workaround, this disables the clang-analyzer-cplusplus.NewDelete diagnostic. 3. Various small compilation fixes (includes, etc). Performance testing: - Ran single-node performance tests on TPC-H for the following configurations: - TPC-H Parquet scale 30 with normal configurations - TPC-H Parquet scale 30 with codegen disabled - TPC-H Kudu scale 10 None found any significant regressions. Full results are posted on the JIRA. - Ran single-node performance tests on targeted-perf scale 10. No significant regressions. - The size of binaries (impalad, etc) is slightly smaller with the new GCC: GCC 4.9.2 release impalad binary: 545664 GCC 7.5.0 release impalad binary: 539900 - Compilation in DEBUG mode is roughly 15-25% faster Functional testing: - Ran core jobs, exhaustive release jobs, UBSAN Change-Id: Ia0beb2b618ba669c9699f8dbc0c52d1203d004e4 Reviewed-on: http://gerrit.cloudera.org:8080/16045 Reviewed-by: Joe McDonnell Tested-by: Impala Public Jenkins --- .clang-tidy | 1 + be/src/runtime/sorter-internal.h | 2 ++ be/src/runtime/sorter.cc | 4 be/src/runtime/thread-resource-mgr.cc | 1 + be/src/util/container-util.h | 4 ++-- bin/impala-config.sh | 14 +++--- bin/impala-flake8 | 2 +- bin/impala-gcovr | 2 +- bin/impala-ipython| 2 +- bin/impala-pip| 2 +- bin/impala-py.test| 2 +- bin/impala-python | 2 +- bin/impala-python-common.sh | 1 + bin/impala-shell.sh | 5 +++-- bin/set-pythonpath.sh | 4 ++-- infra/python/bootstrap_virtualenv.py | 13 ++--- tests/comparison/ORACLE.txt | 2 +- 17 files changed, 36 insertions(+), 27 deletions(-) diff --git a/.clang-tidy b/.clang-tidy index faf4b7b..cc70284 100644 --- a/.clang-tidy +++ b/.clang-tidy @@ -24,6 +24,7 @@ Checks: "-*,clang*,\ -clang-analyzer-core.uninitialized.ArraySubscript,\ -clang-analyzer-core.uninitialized.Assign,\ -clang-analyzer-core.uninitialized.Branch,\ +-clang-analyzer-cplusplus.NewDelete,\ -clang-analyzer-cplusplus.NewDeleteLeaks,\ -clang-analyzer-deadcode.DeadStores,\ -clang-analyzer-optin.performance.Padding,\ diff --git a/be/src/runtime/sorter-internal.h b/be/src/runtime/sorter-internal.h index ea8275a..492fc95 100644 --- a/be/src/runtime/sorter-internal.h +++ b/be/src/runtime/sorter-internal.h @@ -21,6 +21,8 @@ #include "sorter.h" +#include + namespace impala { /// Wrapper around BufferPool::PageHandle that tracks additional info about the page. diff --git a/be/src/runtime/sorter.cc b/be/src/runtime/sorter.cc index 339e0b9..f30ecc4 1006
[impala] branch master updated (f8c28f8 -> 13fbe51)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from f8c28f8 IMPALA-9843: Add support for metastore db schema upgrade new 38b9617 IMPALA-9849: Set halt_on_error=1 for TSAN builds new f15a311 IMPALA-9709: Remove Impala-lzo from the development environment new 13fbe51 IMPALA-9838: Switch to GCC 7.5.0 The 3 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .clang-tidy| 1 + CMakeLists.txt | 11 --- be/src/common/init.cc | 2 +- be/src/exec/hdfs-plugin-text-scanner.cc| 6 ++-- be/src/exec/hdfs-scan-node-base.cc | 10 +- be/src/exec/hdfs-scan-node-base.h | 12 ++-- be/src/runtime/sorter-internal.h | 2 ++ be/src/runtime/sorter.cc | 4 --- be/src/runtime/thread-resource-mgr.cc | 1 + be/src/util/codec.cc | 2 +- be/src/util/container-util.h | 4 +-- bin/bootstrap_system.sh| 23 ++ bin/clean.sh | 7 - bin/impala-config.sh | 32 +++ bin/impala-flake8 | 2 +- bin/impala-gcovr | 2 +- bin/impala-ipython | 2 +- bin/impala-pip | 2 +- bin/impala-py.test | 2 +- bin/impala-python | 2 +- bin/impala-python-common.sh| 1 + bin/impala-shell.sh| 5 +-- bin/set-ld-library-path.sh | 3 -- bin/set-pythonpath.sh | 4 +-- bin/start-impala-cluster.py| 7 - buildall.sh| 10 -- docker/entrypoint.sh | 8 - docker/impala_base/Dockerfile | 4 +-- docker/test-with-docker.py | 13 +--- .../org/apache/impala/analysis/ToSqlUtils.java | 3 +- .../org/apache/impala/catalog/HdfsCompression.java | 4 ++- .../org/apache/impala/catalog/HdfsFileFormat.java | 6 ++-- .../org/apache/impala/planner/HdfsScanNode.java| 1 - .../org/apache/impala/planner/HdfsTableSink.java | 6 ++-- .../apache/impala/analysis/AnalyzeStmtsTest.java | 10 +++--- .../org/apache/impala/analysis/AnalyzerTest.java | 10 +++--- infra/python/bootstrap_virtualenv.py | 13 ++-- testdata/bad_text_lzo/bad_text.lzo | Bin 736999 -> 0 bytes testdata/bad_text_lzo/bad_text.lzo.index | Bin 5192 -> 0 bytes testdata/bin/create-load-data.sh | 22 - testdata/bin/generate-schema-statements.py | 31 +++ testdata/bin/generate-test-vectors.py | 1 - testdata/bin/load_nested.py| 5 ++- testdata/bin/lzo_indexer.sh| 20 .../common/etc/hadoop/conf/core-site.xml.py| 3 -- .../common/etc/hadoop/conf/yarn-site.xml.py| 4 +-- .../functional/functional_schema_template.sql | 11 --- .../datasets/functional/schema_constraints.csv | 4 --- .../joins-hdfs-num-rows-est-enabled.test | 8 ++--- .../queries/PlannerTest/joins.test | 8 ++--- .../functional-query_dimensions.csv| 2 +- .../functional-query_exhaustive.csv| 1 - .../DataErrorsTest/hdfs-scan-node-errors.test | 18 --- .../queries/QueryTest/disable-lzo-plugin.test | 7 - .../queries/QueryTest/show-create-table.test | 12 .../unsupported-compression-partitions.test| 9 +- .../perf-regression/perf-regression_dimensions.csv | 2 +- .../perf-regression/perf-regression_exhaustive.csv | 1 - .../perf-regression/perf-regression_pairwise.csv | 1 - .../targeted-perf/targeted-perf_dimensions.csv | 2 +- .../targeted-perf/targeted-perf_exhaustive.csv | 1 - .../targeted-perf/targeted-perf_pairwise.csv | 1 - .../targeted-stress/targeted-stress_dimensions.csv | 2 +- .../targeted-stress/targeted-stress_exhaustive.csv | 1 - .../targeted-stress/targeted-stress_pairwise.csv | 1 - .../tpcds-unmodified_dimensions.csv| 2 +- .../tpcds-unmodified_exhaustive.csv|
[impala] 01/03: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 0cb44242d20532945e5fb09f5bbef6c65415a753 Author: Vihang Karajgaonkar AuthorDate: Fri May 22 14:56:43 2020 -0700 IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API This change enhances the Catalog-v2 API getPartialCatalogObject to support ValidWriteIdList as an optional field in the TableInfoSelector. When such a field is provided by the clients, catalog compares the provided ValidWriteIdList with the cached ValidWriteIdList of the table. The catalog reloads the table if it determines that the cached table is stale with respect to the ValidWriteIdList provided. In case the table is already at or above the requested ValidWriteIdList catalog uses the cached table metadata information to filter out filedescriptors pertaining to the provided ValidWriteIdList. Note that in case compactions it is possible that the requested ValidWriteIdList cannot be satisfied using the cached file-metadata for some partitions. For such partitions, catalog re-fetches the file-metadata from the FileSystem. In order to implement the fall-back to getting the file-metadata from filesystem, the patch refactor some of file-metadata loading logic into ParallelFileMetadataLoader which also helps simplify some methods in HdfsTable.java. Additionally, it modifies the WriteIdBasedPredicate to optionally do a strict check which throws an exception on some scenarios. This is helpful to provide a snapshot view of the table metadata during query compilation with respect to other changes happening to the table concurrently. Note that this change does not implement the coordinator side changes needed for catalog clients to use such a field. That would be taken up in a separate change to keep this patch smaller. Testing: 1. Ran existing filemetadata loader tests. 2. Added a new test which exercises the various cases for ValidWriteIdList comparison. 3. Ran core tests along with the dependent MetastoreClientPool patch (IMPALA-9824). Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a Reviewed-on: http://gerrit.cloudera.org:8080/16008 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- common/thrift/CatalogService.thrift| 7 + .../impala/catalog/CatalogServiceCatalog.java | 110 +++- .../apache/impala/catalog/FileMetadataLoader.java | 15 +- .../java/org/apache/impala/catalog/HdfsTable.java | 122 +++-- .../impala/catalog/ParallelFileMetadataLoader.java | 101 +++- .../main/java/org/apache/impala/catalog/Table.java | 2 +- .../org/apache/impala/catalog/TableLoadingMgr.java | 2 +- .../impala/catalog/local/DirectMetaProvider.java | 7 +- .../apache/impala/catalog/local/LocalFsTable.java | 2 +- .../apache/impala/catalog/local/MetaProvider.java | 3 +- .../apache/impala/service/CatalogOpExecutor.java | 8 +- .../java/org/apache/impala/util/AcidUtils.java | 224 +++- .../catalog/CatalogObjectToFromThriftTest.java | 14 +- .../org/apache/impala/catalog/CatalogTest.java | 107 ++-- .../catalog/CatalogdTableInvalidatorTest.java | 2 +- .../impala/catalog/FileMetadataLoaderTest.java | 20 +- .../catalog/PartialCatalogInfoWriteIdTest.java | 587 + .../events/MetastoreEventsProcessorTest.java | 5 +- .../apache/impala/testutil/ImpalaJdbcClient.java | 6 + .../apache/impala/testutil/ImpaladTestCatalog.java | 2 +- .../java/org/apache/impala/util/AcidUtilsTest.java | 3 +- shaded-deps/pom.xml| 1 + 22 files changed, 1168 insertions(+), 182 deletions(-) diff --git a/common/thrift/CatalogService.thrift b/common/thrift/CatalogService.thrift index 0ab972d..8c42471 100644 --- a/common/thrift/CatalogService.thrift +++ b/common/thrift/CatalogService.thrift @@ -329,6 +329,10 @@ struct TTableInfoSelector { // The response should contain table constraints like primary keys // and foreign keys 8: bool want_table_constraints + + // If this is for a ACID table and this is set, this table info returned + // will be consistent the provided valid_write_ids + 9: optional CatalogObjects.TValidWriteIdList valid_write_ids } // Returned information about a particular partition. @@ -488,6 +492,9 @@ struct TGetCatalogObjectResponse { struct TGetPartitionStatsRequest { 1: required CatalogServiceVersion protocol_version = CatalogServiceVersion.V1 2: required CatalogObjects.TTableName table_name + // if the table is transactional then this field represents the client's view + // of the table snapshot view in terms of ValidWriteIdList. + 3: optional CatalogObjects.TValidWriteIdList valid_write_ids } // Response for requesting
[impala] 02/03: IMPALA-9847: reduce web UI serialized JSON size
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 6ca6e403580dc592c026b4f684d31f8a4dcfae11 Author: Tim Armstrong AuthorDate: Wed Jun 10 16:52:08 2020 -0700 IMPALA-9847: reduce web UI serialized JSON size Switch to using the plain writer in some places, and tweak PrettyWriter to produce denser output for the debug UI JSON (so that it's still human readable but denser). Testing: Manually tested. The profile for the below query went from 338kB to 134kB. select min(l_orderkey) from tpch_parquet.lineitem; Change-Id: I66af9d00f0f0fc70e324033b6464b75a6adadd6f Reviewed-on: http://gerrit.cloudera.org:8080/16068 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/service/impala-hs2-server.cc | 3 ++- be/src/service/impala-http-handler.cc | 6 -- be/src/util/webserver.cc | 4 3 files changed, 10 insertions(+), 3 deletions(-) diff --git a/be/src/service/impala-hs2-server.cc b/be/src/service/impala-hs2-server.cc index 3cca3e5..757c4e9 100644 --- a/be/src/service/impala-hs2-server.cc +++ b/be/src/service/impala-hs2-server.cc @@ -1042,8 +1042,9 @@ void ImpalaServer::GetRuntimeProfile( if (request.format == TRuntimeProfileFormat::THRIFT) { return_val.__set_thrift_profile(thrift_profile); } else if (request.format == TRuntimeProfileFormat::JSON) { +// Serialize to JSON without extra whitespace/formatting. rapidjson::StringBuffer sb; -rapidjson::PrettyWriter writer(sb); +rapidjson::Writer writer(sb); json_profile.Accept(writer); ss << sb.GetString(); return_val.__set_profile(ss.str()); diff --git a/be/src/service/impala-http-handler.cc b/be/src/service/impala-http-handler.cc index b2ece97..197100d 100644 --- a/be/src/service/impala-http-handler.cc +++ b/be/src/service/impala-http-handler.cc @@ -,9 +,11 @@ void ImpalaHttpHandler::AdmissionStateHandler( string staleness_detail = ac->GetStalenessDetail("", _since_last_statestore_update); // In order to embed a plain json inside the webpage generated by mustache, we need - // to stringify it and write it out as a json element. + // to stringify it and write it out as a json element. We do not need to pretty-print + // it, so use the basic writer. rapidjson::StringBuffer strbuf; - PrettyWriter writer(strbuf); + Writer writer(strbuf); + resource_pools.Accept(writer); Value raw_json(strbuf.GetString(), document->GetAllocator()); document->AddMember("resource_pools_plain_json", raw_json, document->GetAllocator()); diff --git a/be/src/util/webserver.cc b/be/src/util/webserver.cc index cbf2874..fa6a317 100644 --- a/be/src/util/webserver.cc +++ b/be/src/util/webserver.cc @@ -805,7 +805,11 @@ void Webserver::RenderUrlWithTemplate(const struct sq_connection* connection, // Callbacks may optionally be rendered as a text-only, pretty-printed Json document // (mostly for debugging or integration with third-party tools). StringBuffer strbuf; +// Write the JSON out with human-readable formatting. The settings are tweaked to +// reduce extraneous whitespace characters, compared to the default formatting. PrettyWriter writer(strbuf); +writer.SetIndent('\t', 1); +writer.SetFormatOptions(kFormatSingleLineArray); document.Accept(writer); (*output) << strbuf.GetString(); *content_type = JSON;
[impala] 03/03: IMPALA-9843: Add support for metastore db schema upgrade
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit f8c28f8adfd781727c311b15546a532ce65881e0 Author: Vihang Karajgaonkar AuthorDate: Tue Jun 9 12:44:21 2020 -0700 IMPALA-9843: Add support for metastore db schema upgrade This change adds support to upgrade the HMS database schema using the hive schema tool. It adds a new option to the buildall.sh script which can be provided to upgrade the HMS db schema. Alternatively, users can directly upgrade the schema using the create-test-configuration.sh script. The logs for the schema upgrade are available in logs/cluster/schematool.log. Following invocations will upgrade the HMS database schema. 1. buildall.sh -upgrade_metastore_db 2. bin/create-test-configuration.sh -upgrade_metastore_db This upgrade option is idempotent. It is a no-op if the metastore schema is already at its latest version. In case of any errors, the only fallback currently is to format the metastore schema and load the test data again. Testing: Upgraded the HMS schema on my local dev environment and made sure that the HMS service starts without any errors. Change-Id: I85af8d57e110ff284832056a1661f94b85ed3b09 Reviewed-on: http://gerrit.cloudera.org:8080/16054 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- bin/create-test-configuration.sh | 13 + buildall.sh | 20 +--- 2 files changed, 30 insertions(+), 3 deletions(-) diff --git a/bin/create-test-configuration.sh b/bin/create-test-configuration.sh index 8ab2e48..83d500b 100755 --- a/bin/create-test-configuration.sh +++ b/bin/create-test-configuration.sh @@ -66,6 +66,7 @@ function generate_config { CREATE_METASTORE=0 CREATE_RANGER_POLICY_DB=0 +UPGRADE_METASTORE_DB=0 # parse command line options for ARG in $* @@ -77,9 +78,13 @@ do -create_ranger_policy_db) CREATE_RANGER_POLICY_DB=1 ;; +-upgrade_metastore_db) + UPGRADE_METASTORE_DB=1 + ;; -help|*) echo "[-create_metastore] : If true, creates a new metastore." echo "[-create_ranger_policy_db] : If true, creates a new Ranger policy db." + echo "[-upgrade_metastore_db] : If true, upgrades the schema of HMS db." exit 1 ;; esac @@ -163,12 +168,20 @@ if [ $CREATE_METASTORE -eq 1 ]; then # version and invokes the appropriate scripts CLASSPATH={$CLASSPATH}:${CONFIG_DIR} ${HIVE_HOME}/bin/schematool -initSchema -dbType \ postgres 1>${IMPALA_CLUSTER_LOGS_DIR}/schematool.log 2>&1 + # TODO: We probably don't need to do this anymore # Increase the size limit of PARAM_VALUE from SERDE_PARAMS table to be able to create # HBase tables with large number of columns. echo "alter table \"SERDE_PARAMS\" alter column \"PARAM_VALUE\" type character varying" \ | psql -q -U hiveuser -d ${METASTORE_DB} fi +if [ $UPGRADE_METASTORE_DB -eq 1 ]; then + echo "Upgrading the schema of metastore db ${METASTORE_DB}. Check \ +${IMPALA_CLUSTER_LOGS_DIR}/schematool.log for details." + CLASSPATH={$CLASSPATH}:${CONFIG_DIR} ${HIVE_HOME}/bin/schematool -upgradeSchema \ +-dbType postgres 1>${IMPALA_CLUSTER_LOGS_DIR}/schematool.log 2>&1 +fi + if [ $CREATE_RANGER_POLICY_DB -eq 1 ]; then echo "Creating Ranger Policy Server DB" dropdb -U hiveuser "${RANGER_POLICY_DB}" 2> /dev/null || true diff --git a/buildall.sh b/buildall.sh index 158de01..dbe4030 100755 --- a/buildall.sh +++ b/buildall.sh @@ -58,6 +58,7 @@ TESTDATA_ACTION=0 TESTS_ACTION=1 FORMAT_CLUSTER=0 FORMAT_METASTORE=0 +UPGRADE_METASTORE_SCHEMA=0 FORMAT_RANGER_POLICY_DB=0 NEED_MINICLUSTER=0 START_IMPALA_CLUSTER=0 @@ -114,6 +115,9 @@ do -format_metastore) FORMAT_METASTORE=1 ;; +-upgrade_metastore_db) + UPGRADE_METASTORE_SCHEMA=1 + ;; -format_ranger_policy_db) FORMAT_RANGER_POLICY_DB=1 ;; @@ -201,6 +205,8 @@ do "[Default: False]" echo "[-format_cluster] : Format the minicluster [Default: False]" echo "[-format_metastore] : Format the metastore db [Default: False]" + echo "[-upgrade_metastore_db] : Upgrades the schema of metastore db"\ + "[Default: False]" echo "[-format_ranger_policy_db] : Format the Ranger policy db [Default: False]" echo "[-release_and_debug] : Build both release and debug binaries. Overrides "\ "other build types [Default: false]" @@ -269,7 +275,10 @@ Examples of common tasks: ./buildall.sh -testdata # Build, format mini-cluster and metastore, load all test data, run tests - ./buildall.sh -testdata
[impala] branch master updated (67b4764 -> f8c28f8)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from 67b4764 IMPALA-9752: aggregate profile stats on executor new 0cb4424 IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API new 6ca6e40 IMPALA-9847: reduce web UI serialized JSON size new f8c28f8 IMPALA-9843: Add support for metastore db schema upgrade The 3 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: be/src/service/impala-hs2-server.cc| 3 +- be/src/service/impala-http-handler.cc | 6 +- be/src/util/webserver.cc | 4 + bin/create-test-configuration.sh | 13 + buildall.sh| 20 +- common/thrift/CatalogService.thrift| 7 + .../impala/catalog/CatalogServiceCatalog.java | 110 +++- .../apache/impala/catalog/FileMetadataLoader.java | 15 +- .../java/org/apache/impala/catalog/HdfsTable.java | 122 +++-- .../impala/catalog/ParallelFileMetadataLoader.java | 101 +++- .../main/java/org/apache/impala/catalog/Table.java | 2 +- .../org/apache/impala/catalog/TableLoadingMgr.java | 2 +- .../impala/catalog/local/DirectMetaProvider.java | 7 +- .../apache/impala/catalog/local/LocalFsTable.java | 2 +- .../apache/impala/catalog/local/MetaProvider.java | 3 +- .../apache/impala/service/CatalogOpExecutor.java | 8 +- .../java/org/apache/impala/util/AcidUtils.java | 224 +++- .../catalog/CatalogObjectToFromThriftTest.java | 14 +- .../org/apache/impala/catalog/CatalogTest.java | 107 ++-- .../catalog/CatalogdTableInvalidatorTest.java | 2 +- .../impala/catalog/FileMetadataLoaderTest.java | 20 +- .../catalog/PartialCatalogInfoWriteIdTest.java | 587 + .../events/MetastoreEventsProcessorTest.java | 5 +- .../apache/impala/testutil/ImpalaJdbcClient.java | 6 + .../apache/impala/testutil/ImpaladTestCatalog.java | 2 +- .../java/org/apache/impala/util/AcidUtilsTest.java | 3 +- shaded-deps/pom.xml| 1 + 27 files changed, 1208 insertions(+), 188 deletions(-) create mode 100644 fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
[impala] branch master updated: IMPALA-9107 (part 2): Add script to use the m2 archive tarball
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new fb28285 IMPALA-9107 (part 2): Add script to use the m2 archive tarball fb28285 is described below commit fb282852ef52d72079a86c55a90982ffac567cc7 Author: Joe McDonnell AuthorDate: Thu Apr 2 17:28:45 2020 -0700 IMPALA-9107 (part 2): Add script to use the m2 archive tarball This adds a script to find an appropriate m2 archive tarball, download it, and use it to prepopulate the ~/.m2 directory. The script uses the JSON interface for Jenkins to search through the all-build-options-ub1604 builds on jenkins.impala.io to find one that: 1. Is building the "master" branch 2. Has the m2_archive.tar.gz Then, it downloads the m2 archive and uses it to populate ~/.m2. It does not overwrite or remove any files already in ~/.m2. The build scripts that call populate_m2_directory.py do not rely on the script succeeding. They will continue even if the script fails. This also modifies the build-all-flag-combinations.sh script to only build the m2 archive if the GENERATE_M2_ARCHIVE environment variable is true. GENERATE_M2_ARCHIVE=true will clear out the ~/.m2 directory to build an accurate m2 archive. Precommit jobs will use GENERATE_M2_ARCHIVE=false, which will allow them to use the m2 archive to speed up the build. Testing: - Ran gerrify-verify-dryrun - Tested locally Change-Id: I5065658d8c0514550927161855b0943fa7b3a402 Reviewed-on: http://gerrit.cloudera.org:8080/15735 Reviewed-by: Joe McDonnell Tested-by: Impala Public Jenkins --- bin/bootstrap_build.sh | 5 + bin/bootstrap_system.sh| 5 + bin/jenkins/build-all-flag-combinations.sh | 17 ++- bin/jenkins/populate_m2_directory.py | 172 + 4 files changed, 195 insertions(+), 4 deletions(-) diff --git a/bin/bootstrap_build.sh b/bin/bootstrap_build.sh index 1168bb0..a450ef7 100755 --- a/bin/bootstrap_build.sh +++ b/bin/bootstrap_build.sh @@ -54,4 +54,9 @@ if [ ! -d /usr/local/apache-maven-3.5.4 ]; then sudo ln -s /usr/local/apache-maven-3.5.4/bin/mvn /usr/local/bin fi +# Try to prepopulate the m2 directory to save time +if ! bin/jenkins/populate_m2_directory.py ; then + echo "Failed to prepopulate the m2 directory. Continuing..." +fi + ./buildall.sh -notests -so diff --git a/bin/bootstrap_system.sh b/bin/bootstrap_system.sh index a52083d..18cce2b 100755 --- a/bin/bootstrap_system.sh +++ b/bin/bootstrap_system.sh @@ -471,3 +471,8 @@ fi cd "$HADOOP_LZO_HOME" time -p ant package cd "$IMPALA_HOME" + +# Try to prepopulate the m2 directory to save time +if ! bin/jenkins/populate_m2_directory.py ; then + echo "Failed to prepopulate the m2 directory. Continuing..." +fi diff --git a/bin/jenkins/build-all-flag-combinations.sh b/bin/jenkins/build-all-flag-combinations.sh index a6a0d2c..9209e48 100755 --- a/bin/jenkins/build-all-flag-combinations.sh +++ b/bin/jenkins/build-all-flag-combinations.sh @@ -32,6 +32,8 @@ export IMPALA_MAVEN_OPTIONS="-U" . bin/impala-config.sh +: ${GENERATE_M2_ARCHIVE:=false} + # These are configurations for buildall. CONFIGS=( # Test gcc builds with and without -so: @@ -46,6 +48,13 @@ CONFIGS=( FAILED="" +if [[ "$GENERATE_M2_ARCHIVE" == true ]]; then + # The m2 archive relies on parsing the maven log to get a list of jars downloaded + # from particular repositories. To accurately produce the archive every time, we + # need to clear out the ~/.m2 directory before producing the archive. + rm -rf ~/.m2 +fi + TMP_DIR=$(mktemp -d) function onexit { echo "$0: Cleaning up temporary directory" @@ -53,8 +62,6 @@ function onexit { } trap onexit EXIT -mkdir -p ${TMP_DIR} - for CONFIG in "${CONFIGS[@]}"; do DESCRIPTION="Options $CONFIG" @@ -91,7 +98,9 @@ then exit 1 fi -# Make a tarball of the .m2 directory -bin/jenkins/archive_m2_directory.sh logs/mvn/mvn_accumulated.log logs/m2_archive.tar.gz +if [[ "$GENERATE_M2_ARCHIVE" == true ]]; then + # Make a tarball of the .m2 directory + bin/jenkins/archive_m2_directory.sh logs/mvn/mvn_accumulated.log logs/m2_archive.tar.gz +fi # Note: The exit callback handles cleanup of the temp directory. diff --git a/bin/jenkins/populate_m2_directory.py b/bin/jenkins/populate_m2_directory.py new file mode 100755 index 000..1570189 --- /dev/null +++ b/bin/jenkins/populate_m2_directory.py @@ -0,0 +1,172 @@ +#!/usr/bin/python +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for
[impala] branch master updated: IMPALA-8860: Improve /log_level usability on WebUI
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new ad8f468 IMPALA-8860: Improve /log_level usability on WebUI ad8f468 is described below commit ad8f468871d3a893bf4c7b702025ec70765ce8e1 Author: Zoltan Garaguly AuthorDate: Tue May 12 15:20:08 2020 +0200 IMPALA-8860: Improve /log_level usability on WebUI Add glog level fetching logic and fetch glog level on every request which allows: - showing glog level on page load/reload - showing current glog level in "Log level" combo box - showing current glog level in text format Add log4j log levels fetching logic and fetch all java class log levels on every request: - log4j levels for all java classes previously set are shown on page as a list, fetching of individual class log levels not needed anymore Page layout standardization: - glog/log4j part has similar layout - using terms of frontend/backend logs Change-Id: I2fbf2ef21f4af297913a4e9b16a391768624da33 Reviewed-on: http://gerrit.cloudera.org:8080/15903 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/util/logging-support.cc | 80 +++--- common/thrift/Logging.thrift | 18 ++--- .../java/org/apache/impala/util/GlogAppender.java | 36 ++ tests/webserver/test_web_pages.py | 41 --- www/log_level.tmpl | 53 +++--- 5 files changed, 113 insertions(+), 115 deletions(-) diff --git a/be/src/util/logging-support.cc b/be/src/util/logging-support.cc index 9d0232d..d05d8dc 100644 --- a/be/src/util/logging-support.cc +++ b/be/src/util/logging-support.cc @@ -86,7 +86,7 @@ int FLAGS_v_original_value; static jclass log4j_logger_class_; // Jni method descriptors corresponding to getLogLevel() and setLogLevel() operations. -static jmethodID get_log_level_method; // GlogAppender.getLogLevel() +static jmethodID get_log_levels_method; // GlogAppender.getLogLevels() static jmethodID set_log_level_method; // GlogAppender.setLogLevel() static jmethodID reset_log_levels_method; // GlogAppender.resetLogLevels() @@ -98,27 +98,18 @@ void AddDocumentMember(const string& message, const char* member, document->AddMember(key, output, document->GetAllocator()); } -template -Webserver::UrlCallback MakeCallback(const F& fnc, bool display_log4j_handlers) { - return [fnc, display_log4j_handlers](const auto& req, auto* doc) { -// Display log4j log level handlers only when display_log4j_handlers is true. -if (display_log4j_handlers) AddDocumentMember("true", "include_log4j_handlers", doc); -(*fnc)(req, doc); - }; -} - void InitDynamicLoggingSupport() { JNIEnv* env = JniUtil::GetJNIEnv(); ABORT_IF_ERROR(JniUtil::GetGlobalClassRef(env, "org/apache/impala/util/GlogAppender", _logger_class_)); - JniMethodDescriptor get_log_level_method_desc = - {"getLogLevel", "([B)Ljava/lang/String;", _log_level_method}; + JniMethodDescriptor get_log_levels_method_desc = + {"getLogLevels", "()[B", _log_levels_method}; JniMethodDescriptor set_log_level_method_desc = {"setLogLevel", "([B)Ljava/lang/String;", _log_level_method}; JniMethodDescriptor reset_log_level_method_desc = {"resetLogLevels", "()V", _log_levels_method}; ABORT_IF_ERROR(JniUtil::LoadStaticJniMethod( - env, log4j_logger_class_, _log_level_method_desc)); + env, log4j_logger_class_, _log_levels_method_desc)); ABORT_IF_ERROR(JniUtil::LoadStaticJniMethod( env, log4j_logger_class_, _log_level_method_desc)); ABORT_IF_ERROR(JniUtil::LoadStaticJniMethod( @@ -132,38 +123,28 @@ void InitDynamicLoggingSupport() { [](const char* flagname, int value) { return value >= 0 && value <= 3; }); } -// Helper method to get the log level of given Java class. It is a JNI wrapper around -// GlogAppender.getLogLevel(). -Status GetJavaLogLevel(const TGetJavaLogLevelParams& params, string* result) { - return JniCall::static_method(log4j_logger_class_, get_log_level_method) - .with_thrift_arg(params).Call(result); -} - Status ResetJavaLogLevels() { return JniCall::static_method(log4j_logger_class_, reset_log_levels_method).Call(); } -// Callback handler for /get_java_loglevel. -void GetJavaLogLevelCallback(const Webserver::WebRequest& req, Document* document) { - const auto& args = req.parsed_args; - Webserver::ArgumentMap::const_iterator log_getclass = args.find("class"); - if (log_getclass == args.end() || log_getclass->second.empty()) { -
[impala] branch master updated: IMPALA-9318: Add admission control setting to cap MT_DOP
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new 9125de7 IMPALA-9318: Add admission control setting to cap MT_DOP 9125de7 is described below commit 9125de7ae3d2ba0eca59097fd9732a6fbb609107 Author: Joe McDonnell AuthorDate: Sat May 16 20:33:49 2020 -0700 IMPALA-9318: Add admission control setting to cap MT_DOP This introduces the max-mt-dop setting for admission control. If a statement runs with an MT_DOP setting that exceeds the max-mt-dop, then the MT_DOP setting is downgraded to the max-mt-dop value. If max-mt-dop is set to a negative value, no limit is applied. max-mt-dop is set via the llama-site.xml and can be set at the daemon level or at the resource pool level. When there is no max-mt-dop setting, it defaults to -1, so no limit is applied. The max-mt-dop is evaluated once prior to query planning. The MT_DOP settings for queries past planning are not reevaluated if the policy changes. If a statement is downgraded, it's runtime profile contains a message explaining the downgrade: MT_DOP limited by admission control: Requested MT_DOP=9 reduced to MT_DOP=4. Testing: - Added custom cluster test with various max-mt-dop settings - Ran core tests Change-Id: I3affb127a5dca517591323f2b1c880aa4b38badd Reviewed-on: http://gerrit.cloudera.org:8080/16020 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/service/client-request-state.cc | 7 be/src/service/impala-server.cc| 16 be/src/service/impala-server.h | 5 +++ common/thrift/ImpalaInternalService.thrift | 9 + .../org/apache/impala/util/RequestPoolService.java | 5 +++ fe/src/test/resources/fair-scheduler-maxmtdop.xml | 21 ++ fe/src/test/resources/llama-site-maxmtdop.xml | 30 ++ .../queries/QueryTest/max-mt-dop.test | 47 ++ tests/custom_cluster/test_mt_dop.py| 31 +- 9 files changed, 170 insertions(+), 1 deletion(-) diff --git a/be/src/service/client-request-state.cc b/be/src/service/client-request-state.cc index 5a54dbc..c919123 100644 --- a/be/src/service/client-request-state.cc +++ b/be/src/service/client-request-state.cc @@ -196,6 +196,13 @@ Status ClientRequestState::Exec() { DebugQueryOptions(query_ctx_.client_request.query_options)); summary_profile_->AddInfoString("Query Options (set by configuration and planner)", DebugQueryOptions(exec_request_->query_options)); + if (query_ctx_.__isset.overridden_mt_dop_value) { +DCHECK(query_ctx_.client_request.query_options.__isset.mt_dop); +summary_profile_->AddInfoString("MT_DOP limited by admission control", +Substitute("Requested MT_DOP=$0 reduced to MT_DOP=$1", +query_ctx_.overridden_mt_dop_value, +query_ctx_.client_request.query_options.mt_dop)); + } switch (exec_request_->stmt_type) { case TStmtType::QUERY: diff --git a/be/src/service/impala-server.cc b/be/src/service/impala-server.cc index 3251456..6d0a4dd 100644 --- a/be/src/service/impala-server.cc +++ b/be/src/service/impala-server.cc @@ -910,6 +910,9 @@ void ImpalaServer::AddPoolConfiguration(TQueryCtx* ctx, << " overlay_mask=" << overlay_mask.to_string(); OverlayQueryOptions(pool_options, overlay_mask, >client_request.query_options); + // Enforce the max mt_dop after the defaults and overlays have already been done. + EnforceMaxMtDop(ctx, config.max_mt_dop); + status = ValidateQueryOptions(_options); if (!status.ok()) { VLOG_QUERY << "Ignoring errors while validating default query options for pool=" @@ -917,6 +920,19 @@ void ImpalaServer::AddPoolConfiguration(TQueryCtx* ctx, } } +void ImpalaServer::EnforceMaxMtDop(TQueryCtx* query_ctx, int64_t max_mt_dop) { + TQueryOptions& query_options = query_ctx->client_request.query_options; + // The mt_dop is overridden if all three conditions are met: + // 1. There is a nonnegative max mt_dop setting + // 2. The mt_dop query option is set + // 3. The specified mt_dop is larger than the max mt_dop setting + if (max_mt_dop >= 0 && query_options.__isset.mt_dop && + max_mt_dop < query_options.mt_dop) { +query_ctx->__set_overridden_mt_dop_value(query_options.mt_dop); +query_options.__set_mt_dop(max_mt_dop); + } +} + Status ImpalaServer::Execute(TQueryCtx* query_ctx, shared_ptr session_state, QueryHandle* query_handle) { PrepareQueryContext(query_ctx); diff --git a/be/src/service/impala-server.h b/be/src/service/impala-server.h index cfe3fc8..91
[impala] branch master updated: IMPALA-9673: Add external warehouse dir variable in E2E test
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new d45e3a5 IMPALA-9673: Add external warehouse dir variable in E2E test d45e3a5 is described below commit d45e3a50b003259e4ef102b47781a028eb19 Author: xiaomeng AuthorDate: Fri May 22 11:13:55 2020 -0700 IMPALA-9673: Add external warehouse dir variable in E2E test Updated CDP build to 7.2.1.0-57 to include new Hive features such as HIVE-22995. In minicluster, we have default values of hive.create.as.acid and hive.create.as.insert.only which are false. So by default hive creates external type table located in external warehouse directory. Due to HIVE-22995, desc db returns external warehouse directory. With above reasons, we need use external warehouse dir in some tests. Also add a new test for "CREATE DATABASE ... LOCATION". Tested: Re-run failed test in minicluster. Run exhaustive tests. Change-Id: I57926babf4caebfd365e6be65a399f12ea68687f Reviewed-on: http://gerrit.cloudera.org:8080/15990 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- bin/impala-config.sh | 16 +- .../functional/functional_schema_template.sql | 23 -- .../queries/QueryTest/create-database.test | 36 -- .../queries/QueryTest/describe-db.test | 6 ++-- .../queries/QueryTest/describe-hive-db.test| 6 ++-- tests/common/environ.py| 1 + tests/common/impala_test_suite.py | 12 ++-- tests/query_test/test_compressed_formats.py| 12 +--- 8 files changed, 78 insertions(+), 34 deletions(-) diff --git a/bin/impala-config.sh b/bin/impala-config.sh index 481ea4e..c6387b7 100755 --- a/bin/impala-config.sh +++ b/bin/impala-config.sh @@ -172,16 +172,16 @@ export CDH_BUILD_NUMBER=1814051 export CDH_MAVEN_REPOSITORY=\ "https://${IMPALA_TOOLCHAIN_HOST}/build/cdh_components/${CDH_BUILD_NUMBER}/maven; -export CDP_BUILD_NUMBER=2523282 +export CDP_BUILD_NUMBER=3192304 export CDP_MAVEN_REPOSITORY=\ "https://${IMPALA_TOOLCHAIN_HOST}/build/cdp_components/${CDP_BUILD_NUMBER}/maven; -export CDP_HADOOP_VERSION=3.1.1.7.1.1.0-380 -export CDP_HBASE_VERSION=2.2.3.7.1.1.0-380 -export CDP_HIVE_VERSION=3.1.3000.7.1.1.0-380 -export CDP_KNOX_VERSION=1.3.0.7.1.1.0-380 -export CDP_OZONE_VERSION=0.4.0.7.1.1.0-380 -export CDP_RANGER_VERSION=2.0.0.7.1.1.0-380 -export CDP_TEZ_VERSION=0.9.1.7.1.1.0-380 +export CDP_HADOOP_VERSION=3.1.1.7.2.1.0-57 +export CDP_HBASE_VERSION=2.2.3.7.2.1.0-57 +export CDP_HIVE_VERSION=3.1.3000.7.2.1.0-57 +export CDP_KNOX_VERSION=1.3.0.7.2.1.0-57 +export CDP_OZONE_VERSION=0.6.0.7.2.1.0-57 +export CDP_RANGER_VERSION=2.0.0.7.2.1.0-57 +export CDP_TEZ_VERSION=0.9.1.7.2.1.0-57 export IMPALA_PARQUET_VERSION=1.10.99-cdh6.x-SNAPSHOT export IMPALA_AVRO_JAVA_VERSION=1.8.2-cdh6.x-SNAPSHOT diff --git a/testdata/datasets/functional/functional_schema_template.sql b/testdata/datasets/functional/functional_schema_template.sql index dc53371..c323666 100644 --- a/testdata/datasets/functional/functional_schema_template.sql +++ b/testdata/datasets/functional/functional_schema_template.sql @@ -2273,16 +2273,6 @@ TBLPROPERTIES('transactional'='true'); DATASET functional BASE_TABLE_NAME -materialized_view - HIVE_MAJOR_VERSION -3 - CREATE_HIVE -CREATE MATERIALIZED VIEW IF NOT EXISTS {db_name}{db_suffix}.{table_name} - AS SELECT * FROM {db_name}{db_suffix}.insert_only_transactional_table; - - DATASET -functional - BASE_TABLE_NAME insert_only_transactional_bucketed_table HIVE_MAJOR_VERSION 3 @@ -2323,6 +2313,19 @@ SELECT * from functional.{table_name}; DATASET functional BASE_TABLE_NAME +materialized_view + HIVE_MAJOR_VERSION +3 + CREATE_HIVE +-- The create materialized view command is moved down so that the database's +-- managed directory has been created. Otherwise the command would fail. This +-- is a bug in Hive. +CREATE MATERIALIZED VIEW IF NOT EXISTS {db_name}{db_suffix}.{table_name} + AS SELECT * FROM {db_name}{db_suffix}.insert_only_transactional_table; += + DATASET +functional + BASE_TABLE_NAME uncomp_src_alltypes CREATE_HIVE CREATE TABLE {db_name}{db_suffix}.{table_name} LIKE functional.alltypes STORED AS ORC; diff --git a/testdata/workloads/functional-query/queries/QueryTest/create-database.test b/testdata/workloads/functional-query/queries/QueryTest/create-database.test index 1b698b5..5cdaed3 100644 --- a/testdata/workloads/functional-query/queries/QueryTest/create-database.test +++ b/testdata/workloads/functional-query/queries/QueryTest/create-database.test @@ -16,7 +16,7 @@ STRING, STRING # for a newly created datab
[impala] branch master updated (6a1c448 -> 03f2b55)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from 6a1c448 IMPALA-9782: fix Kudu DML with mt_dop new c62a680 IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu new 03f2b55 Filter out "Checksum validation failed" messages during the maven build The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: be/CMakeLists.txt | 3 +- be/src/benchmarks/bloom-filter-benchmark.cc| 26 +- be/src/codegen/gen_ir_descriptions.py | 7 +- be/src/exec/filter-context.cc | 40 +- be/src/exec/kudu-scanner.cc| 134 --- be/src/runtime/raw-value-ir.cc | 77 +++- be/src/runtime/raw-value.h | 17 + be/src/runtime/raw-value.inline.h | 125 +++ be/src/runtime/runtime-filter-bank.cc | 6 +- be/src/runtime/runtime-filter-ir.cc| 4 +- be/src/runtime/runtime-filter.h| 1 + be/src/service/query-options-test.cc | 4 + be/src/service/query-options.cc| 8 + be/src/service/query-options.h | 6 +- be/src/util/bloom-filter-ir.cc | 13 +- be/src/util/bloom-filter-test.cc | 65 ++-- be/src/util/bloom-filter.cc| 248 - be/src/util/bloom-filter.h | 201 -- be/src/util/debug-util.cc | 1 + be/src/util/debug-util.h | 1 + bin/impala-config.sh | 6 +- bin/mvn-quiet.sh | 8 +- common/thrift/ImpalaInternalService.thrift | 4 + common/thrift/ImpalaService.thrift | 8 + common/thrift/PlanNodes.thrift | 7 + .../impala/planner/RuntimeFilterGenerator.java | 63 +++- .../org/apache/impala/planner/PlannerTest.java | 24 +- .../PlannerTest/bloom-filter-assignment.test | 408 + .../queries/PlannerTest/kudu-update.test | 20 +- .../queries/PlannerTest/kudu.test | 4 +- .../PlannerTest/runtime-filter-query-options.test | 117 ++ .../queries/PlannerTest/tpch-kudu.test | 381 ++- ...n_max_filters.test => all_runtime_filters.test} | 188 ++ .../QueryTest/diff_runtime_filter_types.test | 151 .../queries/QueryTest/runtime_filters.test | 5 + tests/query_test/test_runtime_filters.py | 33 +- tests/query_test/test_spilling.py | 6 +- 37 files changed, 1696 insertions(+), 724 deletions(-) create mode 100644 testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test copy testdata/workloads/functional-query/queries/QueryTest/{min_max_filters.test => all_runtime_filters.test} (67%) create mode 100644 testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test
[impala] 02/02: Filter out "Checksum validation failed" messages during the maven build
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 03f2b559c31af7fc11165cf3b00876900e234663 Author: Joe McDonnell AuthorDate: Fri Apr 17 19:20:53 2020 -0700 Filter out "Checksum validation failed" messages during the maven build Some Impala dependencies come from repositories that don't have checksums available. During the build, this produces a large number of messages like: [WARNING] Checksum validation failed, no checksums available from the repository for ... or: [WARNING] Checksum validation failed, could not read expected checksum ... These messages are not very useful, and they make it harder to search the console output for failed tests. This filters them out of the maven output. Differet versions of maven structure the messsages differently, so this filters all the "Checksum validation failed" messages that happen at WARNING level. Testing: - Ran core tests, verified the messages are gone Change-Id: I19afbd157533e52ef3157730c7ec5159241749bc Reviewed-on: http://gerrit.cloudera.org:8080/15775 Tested-by: Impala Public Jenkins Reviewed-by: Anurag Mantripragada --- bin/mvn-quiet.sh | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/bin/mvn-quiet.sh b/bin/mvn-quiet.sh index f782ff4..c7c557e 100755 --- a/bin/mvn-quiet.sh +++ b/bin/mvn-quiet.sh @@ -34,10 +34,16 @@ EOF LOGGING_OPTIONS="-Dorg.slf4j.simpleLogger.showDateTime \ -Dorg.slf4j.simpleLogger.dateTimeFormat=HH:mm:ss" +# Filter out "Checksum validation failed" messages, as they are mostly harmless and +# make it harder to search for failed tests in the console output. Limit the filtering +# to WARNING messages. +CHECKSUM_VALIDATION_FAILED_REGEX="[WARNING].*Checksum validation failed" + # Always use maven's batch mode (-B), as it produces output that is easier to parse. if ! mvn -B $IMPALA_MAVEN_OPTIONS $LOGGING_OPTIONS "$@" | \ tee -a "$LOG_FILE" | \ - grep -E -e WARNING -e ERROR -e SUCCESS -e FAILURE -e Test -e "Found Banned"; then + grep -E -e WARNING -e ERROR -e SUCCESS -e FAILURE -e Test -e "Found Banned" | \ + grep -v -i "${CHECKSUM_VALIDATION_FAILED_REGEX}"; then echo "mvn $IMPALA_MAVEN_OPTIONS $@ exited with code $?" exit 1 fi
[impala] 02/02: IMPALA-9820: Pull Datasketches-5 HLL MurmurHash fix
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 3713d5db8dcac540ce0b5cb45974054ca87792db Author: Gabor Kaszab AuthorDate: Tue Jun 2 22:08:38 2020 +0200 IMPALA-9820: Pull Datasketches-5 HLL MurmurHash fix There is a bug in DataSketches HLL MurmurHash where long strings are over-read resulting a cardinality estimate that is more than 15% off from the correct cardinality number. A recent upstream fix in Apache DataSketches addresses this issue and this patch pulls it to Impala. https://issues.apache.org/jira/browse/DATASKETCHES-5 Testing: - I used ds_hll_sketch() and ds_hll_estimate() functions from IMPALA-9632 to trigger DataSketches HLL functionality. - Ran DataSketches HLL on lineitem.l_comment in TPCH25_parquet to reproduce the issue. The symptom was that the actual result was around 15% off from the correct cardinality result (~69M vs 79M). - After applying this fix re-running the query gives much closer results, usually under 3% error range. Change-Id: I84d73fce1e7a197c1f8fb49404b58ed9bb0b843d Reviewed-on: http://gerrit.cloudera.org:8080/16026 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/thirdparty/datasketches/MurmurHash3.h | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/be/src/thirdparty/datasketches/MurmurHash3.h b/be/src/thirdparty/datasketches/MurmurHash3.h index 45a64c6..f68e989 100644 --- a/be/src/thirdparty/datasketches/MurmurHash3.h +++ b/be/src/thirdparty/datasketches/MurmurHash3.h @@ -104,14 +104,12 @@ FORCE_INLINE void MurmurHash3_x64_128(const void* key, int lenBytes, uint64_t se out.h2 = seed; // Number of full 128-bit blocks of 16 bytes. - // Possible exclusion fo a remainder of up to 15 bytes. + // Possible exclusion of a remainder of up to 15 bytes. const int nblocks = lenBytes >> 4; // bytes / 16 - // Process the 128-bit blocks (the body) into teh hash + // Process the 128-bit blocks (the body) into the hash const uint64_t* blocks = (const uint64_t*)(data); for (int i = 0; i < nblocks; ++i) { // 16 bytes per block -//uint64_t k1 = getblock64(blocks, 0); -//uint64_t k2 = getblock64(blocks, 1); uint64_t k1 = getblock64(blocks,i*2+0); uint64_t k2 = getblock64(blocks,i*2+1); @@ -124,12 +122,9 @@ FORCE_INLINE void MurmurHash3_x64_128(const void* key, int lenBytes, uint64_t se out.h2 = ROTL64(out.h2,31); out.h2 += out.h1; out.h2 = out.h2*5+0x38495ab5; - -blocks += 2; } // tail - //const uint8_t * tail = (const uint8_t*)blocks; const uint8_t * tail = (const uint8_t*)(data + (nblocks << 4)); uint64_t k1 = 0; @@ -175,4 +170,4 @@ FORCE_INLINE void MurmurHash3_x64_128(const void* key, int lenBytes, uint64_t se //- -#endif // _MURMURHASH3_H_ \ No newline at end of file +#endif // _MURMURHASH3_H_
[impala] branch master updated (37b5599 -> 3713d5d)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from 37b5599 IMPALA-9809: Multi-aggregation query on particular dataset crashes impalad new 3c71586 IMPALA-9723: Raise error when when Hive Streaming side-file is found new 3713d5d IMPALA-9820: Pull Datasketches-5 HLL MurmurHash fix The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: be/src/thirdparty/datasketches/MurmurHash3.h | 11 +++-- .../java/org/apache/impala/util/AcidUtils.java | 9 +++- .../java/org/apache/impala/util/AcidUtilsTest.java | 27 ++ 3 files changed, 38 insertions(+), 9 deletions(-)
[impala] branch master updated: IMPALA-9702: Cleanup unique_database directories
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new bfdc5bf IMPALA-9702: Cleanup unique_database directories bfdc5bf is described below commit bfdc5bf6af2703127d4ff5611ed049f11b2cb004 Author: Joe McDonnell AuthorDate: Sun May 31 14:56:01 2020 -0700 IMPALA-9702: Cleanup unique_database directories If there are external tables in a database, drop database cascade won't remove the external table locations. If those locations are inside the database, then the database directory does not get removed. Some tests that use unique_database fail when running for the second time (or with a data snapshot) due to the preexisting files. This adds code to remove the database directory for unique_database. It also adds some debugging statements that list the files at the beginning of bin/run-all-tests.sh and again at the end. Testing: - Ran a core job and verified that the unique database directories are being removed - Ran TestMixedPartitions::test_incompatible_avro_partition_in_non_avro_table() multiple times and it passes when it previously failed. Change-Id: I0530c028e5e7c241dfc054f04c78e2a045c2d035 Reviewed-on: http://gerrit.cloudera.org:8080/16015 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- bin/run-all-tests.sh | 9 + tests/conftest.py| 23 --- 2 files changed, 25 insertions(+), 7 deletions(-) diff --git a/bin/run-all-tests.sh b/bin/run-all-tests.sh index 7e61068..77ce7c4 100755 --- a/bin/run-all-tests.sh +++ b/bin/run-all-tests.sh @@ -171,6 +171,9 @@ for i in $(seq 1 $NUM_TEST_ITERATIONS) do TEST_RET_CODE=0 + # Store a list of the files at the beginning of each iteration. + hdfs dfs -ls -R /test-warehouse > ${IMPALA_LOGS_DIR}/file-list-begin-${i}.log 2>&1 + start_impala_cluster if [[ "$BE_TEST" == true ]]; then @@ -276,6 +279,12 @@ do # succeed. # ${IMPALA_HOME}/tests/run-process-failure-tests.sh + # Store a list of the files at the end of each iteration. This can be compared + # to the file-list-begin*.log from the beginning of the iteration to see if files + # are not being cleaned up. This is most useful on the first iteration, when + # the list of files is from dataload. + hdfs dfs -ls -R /test-warehouse > ${IMPALA_LOGS_DIR}/file-list-end-${i}.log 2>&1 + # Finally, kill the spawned timeout process and its child sleep process. # There may not be a sleep process, so ignore failure. pkill -P $TIMEOUT_PID || true diff --git a/tests/conftest.py b/tests/conftest.py index b544c5e..f8c98f6 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -34,7 +34,7 @@ from tests.common.environ import build_flavor_timeout from common.test_result_verifier import QueryTestResult from tests.common.patterns import is_valid_impala_identifier from tests.comparison.db_connection import ImpalaConnection -from tests.util.filesystem_utils import FILESYSTEM, ISILON_WEBHDFS_PORT +from tests.util.filesystem_utils import FILESYSTEM, ISILON_WEBHDFS_PORT, WAREHOUSE LOG = logging.getLogger('test_configuration') LOG_FORMAT = "-- %(asctime)s %(levelname)-8s %(threadName)s: %(message)s" @@ -343,22 +343,31 @@ def unique_database(request, testid_checksum): ' test function name or any prefixes for long length or invalid ' 'characters.'.format(db_name)) + def cleanup_database(db_name, must_exist): +request.instance.execute_query_expect_success(request.instance.client, +'DROP DATABASE {0} `{1}` CASCADE'.format( +"" if must_exist else "IF EXISTS", db_name), +{'sync_ddl': sync_ddl}) +# The database directory may not be removed if there are external tables in the +# database when it is dropped. The external locations are not removed by cascade. +# These preexisting files/directories can cause errors when tests run repeatedly or +# use a data snapshot (see IMPALA-9702), so this forces cleanup of the database +# directory. +db_location = "{0}/{1}.db".format(WAREHOUSE, db_name).lstrip('/') +request.instance.filesystem_client.delete_file_dir(db_location, recursive=True) + def cleanup(): # Make sure we don't try to drop the current session database request.instance.execute_query_expect_success(request.instance.client, "use default") for db_name in db_names: - request.instance.execute_query_expect_success( - request.instance.client, 'DROP DATABASE `{0}` CASCADE'.format(db_name), - {'sync_ddl': sync_ddl}) + cleanup_database(db_name, True) LOG.info('Dropped datab
[impala] branch master updated: Reapply IMPALA-9781: Fix GCC 7 unaligned 128-bit loads / stores
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new b188340 Reapply IMPALA-9781: Fix GCC 7 unaligned 128-bit loads / stores b188340 is described below commit b1883405cd59988b92df279965ff2f733c0e Author: Joe McDonnell AuthorDate: Wed May 27 16:21:36 2020 -0700 Reapply IMPALA-9781: Fix GCC 7 unaligned 128-bit loads / stores When running a release binary built with GCC 7.5.0, it crashes with an unaligned memory error in multiple pieces of code. In these locations, we are doing stores to 128-bit values, but we cannot guarantee alignment. GCC 7 must be optimizing the code to use instructions that require a higher level of alignment than we can provide. This switches the code locations to use memcpy to avoid the unaligned stores (with local variables as necessary). Testing: - Ran exhaustive tests with a release binary built by GCC 7.5.0 - Ran UBSAN core tests - Add unaligned test case in decimal-test Change-Id: I7edd8beeb15e4fbb69126a9f97a1476a4b8f12a9 Reviewed-on: http://gerrit.cloudera.org:8080/16009 Tested-by: Impala Public Jenkins Reviewed-by: Tim Armstrong --- be/src/exprs/slot-ref.cc | 5 - be/src/runtime/decimal-test.cc | 4 be/src/runtime/decimal-value.h | 3 ++- be/src/util/dict-encoding.h| 5 +++-- 4 files changed, 13 insertions(+), 4 deletions(-) diff --git a/be/src/exprs/slot-ref.cc b/be/src/exprs/slot-ref.cc index 634a989..661c7ef 100644 --- a/be/src/exprs/slot-ref.cc +++ b/be/src/exprs/slot-ref.cc @@ -422,7 +422,10 @@ DecimalVal SlotRef::GetDecimalValInterpreted( case 8: return DecimalVal(*reinterpret_cast(t->GetSlot(slot_offset_))); case 16: - return DecimalVal(*reinterpret_cast(t->GetSlot(slot_offset_))); + // Avoid an unaligned load by using memcpy + __int128_t val; + memcpy(, t->GetSlot(slot_offset_), sizeof(val)); + return DecimalVal(val); default: DCHECK(false); return DecimalVal::null(); diff --git a/be/src/runtime/decimal-test.cc b/be/src/runtime/decimal-test.cc index 9a2f7a8..c5aed53 100644 --- a/be/src/runtime/decimal-test.cc +++ b/be/src/runtime/decimal-test.cc @@ -726,6 +726,10 @@ TEST(DecimalTest, UnalignedValues) { stringstream ss; RawValue::PrintValue(unaligned, ColumnType::CreateDecimalType(28, 2), 0, ); EXPECT_EQ("123.45", ss.str()); + // Regression test for IMPALA-9781: Verify that operator=() works + *unaligned = 0; + __int128_t val = unaligned->value(); + EXPECT_EQ(val, 0); free(unaligned_mem); } diff --git a/be/src/runtime/decimal-value.h b/be/src/runtime/decimal-value.h index 761d474..e329476 100644 --- a/be/src/runtime/decimal-value.h +++ b/be/src/runtime/decimal-value.h @@ -49,7 +49,8 @@ class DecimalValue { DecimalValue(const T& s) : value_(s) { } DecimalValue& operator=(const T& s) { -value_ = s; +// 'value_' may be unaligned. Use memcpy to avoid an unaligned store. +memcpy(_, , sizeof(T)); return *this; } diff --git a/be/src/util/dict-encoding.h b/be/src/util/dict-encoding.h index f440332..e6e01bc 100644 --- a/be/src/util/dict-encoding.h +++ b/be/src/util/dict-encoding.h @@ -346,10 +346,11 @@ class DictDecoder : public DictDecoderBase { virtual int num_entries() const { return dict_.size(); } virtual void GetValue(int index, void* buffer) { -T* val_ptr = reinterpret_cast(buffer); DCHECK_GE(index, 0); DCHECK_LT(index, dict_.size()); -*val_ptr = dict_[index]; +// Avoid an unaligned store by using memcpy +T val = dict_[index]; +memcpy(buffer, reinterpret_cast(), sizeof(T)); } /// Returns the next value. Returns false if the data is invalid.
[impala] branch master updated: IMPALA-9749: ASAN builds should not run FE tests.
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new 30f68db IMPALA-9749: ASAN builds should not run FE tests. 30f68db is described below commit 30f68dbe111c6c23394e805dfdf8ae63cedce57c Author: Anurag Mantripragada AuthorDate: Wed May 6 18:41:10 2020 -0700 IMPALA-9749: ASAN builds should not run FE tests. https://gerrit.cloudera.org/#/c/15778/ inadvertently changed the behaviour of ASAN builds to to run FE tests. After this change, FE custom cluster tests run immediately after other FE tests when FE_TEST is true. Testing: Ran private parametrized job with ASAN. Change-Id: I26c469a20032bdc1f4f0bb3938d9f1c50163c99a Reviewed-on: http://gerrit.cloudera.org:8080/15921 Tested-by: Impala Public Jenkins Reviewed-by: Thomas Tauber-Marshall --- bin/run-all-tests.sh | 41 + 1 file changed, 25 insertions(+), 16 deletions(-) diff --git a/bin/run-all-tests.sh b/bin/run-all-tests.sh index 46e35ae..7e61068 100755 --- a/bin/run-all-tests.sh +++ b/bin/run-all-tests.sh @@ -158,14 +158,20 @@ LOG_DIR="${IMPALA_EE_TEST_LOGS_DIR}" # Enable core dumps ulimit -c unlimited || true +# Helper function to start Impala cluster. +start_impala_cluster() { + # TODO-MT: remove --unlock_mt_dop when it is no longer needed. + run-step "Starting Impala cluster" start-impala-cluster.log \ + "${IMPALA_HOME}/bin/start-impala-cluster.py" \ + --log_dir="${IMPALA_EE_TEST_LOGS_DIR}" \ + ${TEST_START_CLUSTER_ARGS} --impalad_args=--unlock_mt_dop=true +} + for i in $(seq 1 $NUM_TEST_ITERATIONS) do TEST_RET_CODE=0 - # TODO-MT: remove --unlock_mt_dop when it is no longer needed. - run-step "Starting Impala cluster" start-impala-cluster.log \ - "${IMPALA_HOME}/bin/start-impala-cluster.py" --log_dir="${IMPALA_EE_TEST_LOGS_DIR}" \ - ${TEST_START_CLUSTER_ARGS} --impalad_args=--unlock_mt_dop=true + start_impala_cluster if [[ "$BE_TEST" == true ]]; then if [[ "$TARGET_FILESYSTEM" == "local" ]]; then @@ -200,12 +206,25 @@ do if [[ "$CODE_COVERAGE" == true ]]; then MVN_ARGS+="-DcodeCoverage" fi -# Don't run the FE custom cluster/service tests here since they restart Impala. We'll -# run them with the other custom cluster/service tests below. +# Run the FE tests first. We run the FE custom cluster tests below since they +# restart Impala. +MVN_ARGS_TEMP=$MVN_ARGS MVN_ARGS+=" -Dtest=!org.apache.impala.custom*.*Test" if ! "${IMPALA_HOME}/bin/mvn-quiet.sh" -fae test ${MVN_ARGS}; then TEST_RET_CODE=1 fi + +# Run the FE custom cluster tests only if not running against S3 +if [[ "${TARGET_FILESYSTEM}" != "s3" ]]; then + MVN_ARGS=$MVN_ARGS_TEMP + MVN_ARGS+=" -Dtest=org.apache.impala.custom*.*Test" + if ! "${IMPALA_HOME}/bin/mvn-quiet.sh" -fae test ${MVN_ARGS}; then +TEST_RET_CODE=1 + fi + # Restart the minicluster after running the FE custom cluster tests. + # TODO-MT: remove --unlock_mt_dop when it is no longer needed. + start_impala_cluster +fi popd fi @@ -250,16 +269,6 @@ do TEST_RET_CODE=1 fi export IMPALA_MAX_LOG_FILES="${IMPALA_MAX_LOG_FILES_SAVE}" - -# Run the FE custom cluster tests only if not running against S3. -if [[ "${TARGET_FILESYSTEM}" != "s3" ]]; then - pushd "${IMPALA_FE_DIR}" - MVN_ARGS=" -Dtest=org.apache.impala.custom*.*Test " - if ! "${IMPALA_HOME}/bin/mvn-quiet.sh" -fae test ${MVN_ARGS}; then -TEST_RET_CODE=1 - fi - popd -fi fi # Run the process failure tests.
[impala] branch master updated: IMPALA-9760: Add IMPALA_TOOLCHAIN_PACKAGES_HOME to prepare for GCC7
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new 56ee90c IMPALA-9760: Add IMPALA_TOOLCHAIN_PACKAGES_HOME to prepare for GCC7 56ee90c is described below commit 56ee90c598dcc637f10647ffc3e03cc0a70b92ce Author: Joe McDonnell AuthorDate: Wed May 27 13:32:43 2020 -0700 IMPALA-9760: Add IMPALA_TOOLCHAIN_PACKAGES_HOME to prepare for GCC7 The locations for native-toolchain packages in IMPALA_TOOLCHAIN currently do not include the compiler version. This means that the toolchain can't distinguish between native-toolchain packages built with gcc 4.9.2 versus gcc 7.5.0. The collisions can cause issues when switching back and forth between branches. This introduces the IMPALA_TOOLCHAIN_PACKAGES_HOME environment variable, which is a location inside IMPALA_TOOLCHAIN that would hold native-toolchain packages. Currently, it is set to the same as IMPALA_TOOLCHAIN, so there is no difference in behavior. This lays the groundwork to add the compiler version to this path when switching to GCC7. Testing: - The only impediment to building with IMPALA_TOOLCHAIN_PACKAGES_HOME=$IMPALA_TOOLCHAIN/test is Impala-lzo. With a custom Impala-lzo, compilation succeeds. Either Impala-lzo will be fixed or it will be removed. - Core tests Change-Id: I1ff641e503b2161baf415355452f86b6c8bfb15b Reviewed-on: http://gerrit.cloudera.org:8080/15991 Reviewed-by: Joe McDonnell Tested-by: Impala Public Jenkins --- CMakeLists.txt | 17 +++--- be/CMakeLists.txt | 2 +- be/src/service/CMakeLists.txt | 4 +-- bin/bootstrap_toolchain.py | 58 +++--- bin/distcc/distcc.sh | 5 +++ bin/dump_breakpad_symbols.py | 12 +++ bin/impala-config.sh | 21 bin/impala-shell.sh| 2 +- bin/jenkins/finalize.sh| 2 +- bin/run-backend-tests.sh | 2 +- bin/run-binary.sh | 2 +- bin/run-jvm-binary.sh | 2 +- bin/run_clang_tidy.sh | 4 +-- bin/set-ld-library-path.sh | 3 +- bin/set-pythonpath.sh | 2 +- cmake_modules/clang_toolchain.cmake| 10 +++--- cmake_modules/toolchain.cmake | 5 +-- docker/setup_build_context.py | 7 ++-- fe/pom.xml | 2 +- infra/python/bootstrap_virtualenv.py | 13 shell/make_shell_tarball.sh| 2 +- shell/packaging/make_python_package.sh | 2 +- testdata/datasets/tpcds/preload| 2 +- testdata/datasets/tpch/preload | 2 +- 24 files changed, 105 insertions(+), 78 deletions(-) diff --git a/CMakeLists.txt b/CMakeLists.txt index 484f741..5719249 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -83,12 +83,13 @@ function(set_dep_root NAME) string(TOLOWER ${NAME} NAME_LOWER) string(REPLACE "_" "-" NAME_LOWER ${NAME_LOWER}) set(VAL_NAME "IMPALA_${NAME}_VERSION") - set(${NAME}_ROOT $ENV{IMPALA_TOOLCHAIN}/${NAME_LOWER}-$ENV{${VAL_NAME}} PARENT_SCOPE) + set(${NAME}_ROOT $ENV{IMPALA_TOOLCHAIN_PACKAGES_HOME}/${NAME_LOWER}-$ENV{${VAL_NAME}} + PARENT_SCOPE) endfunction() # Define root path for all dependencies, this is in the form of # set_dep_root(PACKAGE) -> -# PACKAGE_ROOT set to $ENV{IMPALA_TOOLCHAIN}/PACKAGE-$ENV{IMPALA_PACKAGE_VERSION} +# PACKAGE_ROOT set to $ENV{IMPALA_TOOLCHAIN_PACKAGES_HOME}/PACKAGE-$ENV{IMPALA_PACKAGE_VERSION} set_dep_root(AVRO) set_dep_root(ORC) set_dep_root(BOOST) @@ -104,7 +105,8 @@ set_dep_root(GTEST) set_dep_root(LIBEV) set_dep_root(LIBUNWIND) set_dep_root(LLVM) -set(LLVM_DEBUG_ROOT $ENV{IMPALA_TOOLCHAIN}/llvm-$ENV{IMPALA_LLVM_DEBUG_VERSION}) +set(LLVM_DEBUG_ROOT +$ENV{IMPALA_TOOLCHAIN_PACKAGES_HOME}/llvm-$ENV{IMPALA_LLVM_DEBUG_VERSION}) set_dep_root(LZ4) set_dep_root(ZSTD) set_dep_root(OPENLDAP) @@ -113,7 +115,8 @@ set_dep_root(RE2) set_dep_root(RAPIDJSON) set_dep_root(SNAPPY) set_dep_root(THRIFT) -set(THRIFT11_ROOT $ENV{IMPALA_TOOLCHAIN}/thrift-$ENV{IMPALA_THRIFT11_VERSION}) +set(THRIFT11_ROOT +$ENV{IMPALA_TOOLCHAIN_PACKAGES_HOME}/thrift-$ENV{IMPALA_THRIFT11_VERSION}) set_dep_root(ZLIB) set_dep_root(CCTZ) @@ -435,10 +438,14 @@ add_custom_target(cscope ALL DEPENDS gen-deps COMMAND "${CMAKE_SOURCE_DIR}/bin/gen-cscope.sh" ) +# This call is passing IMPALA_TOOLCHAIN_PACKAGES_HOME into Impala-lzo's build.sh, +# but this is known not to work with the current version of Impala-lzo when +# IMPALA_TOOLCHAIN_PACKAGES_HOME is a subdirectory of IMPALA_TOOLCHAIN. Either +# Impala-lzo will need to be fixed or it will need
[impala] branch master updated: IMPALA-9762: Fix GCC7 shift-count-overflow in tuple-row-compare.cc
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new f474b03 IMPALA-9762: Fix GCC7 shift-count-overflow in tuple-row-compare.cc f474b03 is described below commit f474b03dce35956d0762159eed16516b793903eb Author: Joe McDonnell AuthorDate: Tue May 19 18:36:48 2020 -0700 IMPALA-9762: Fix GCC7 shift-count-overflow in tuple-row-compare.cc This fixes a GCC 7 compilation error for this code in TupleRowZOrderComparator's GetSharedIntRepresentation() and GetSharedFloatRepresentation(): return (static_cast(val) << std::max((sizeof(U) - sizeof(T)) * 8, (uint64_t)0)) ^ mask; In this case, the std::max is running with uint64_t arguments. For template instatiations with sizeof(T) > sizeof(U), this results in integer overflow and a very large positive integer causing the shift-count-overflow. These instantiations are not used by Impala, but the compiler still needs to generate them. This changes the logic to use signed integers for the std::max, avoiding the shift-count-overflow. Testing: - Build on GCC 4.9.2 and GCC 7 - Core tests Change-Id: I518e8bed1bb8d49d9cb76a33b07b665e15dfef87 Reviewed-on: http://gerrit.cloudera.org:8080/15962 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/util/tuple-row-compare.cc | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/be/src/util/tuple-row-compare.cc b/be/src/util/tuple-row-compare.cc index d960bbe..099fa46 100644 --- a/be/src/util/tuple-row-compare.cc +++ b/be/src/util/tuple-row-compare.cc @@ -439,8 +439,9 @@ U TupleRowZOrderComparator::GetSharedRepresentation(void* val, ColumnType type) template U inline TupleRowZOrderComparator::GetSharedIntRepresentation(const T val, U mask) const { - return (static_cast(val) << - std::max((sizeof(U) - sizeof(T)) * 8, (uint64_t)0)) ^ mask; + uint64_t shift_size = static_cast( + std::max(static_cast((sizeof(U) - sizeof(T)) * 8), (int64_t) 0)); + return (static_cast(val) << shift_size) ^ mask; } template @@ -449,13 +450,14 @@ U inline TupleRowZOrderComparator::GetSharedFloatRepresentation(void* val, U mas T floating_value = *reinterpret_cast(val); memcpy(, _value, sizeof(T)); if (UNLIKELY(std::isnan(floating_value))) return 0; + uint64_t shift_size = static_cast( + std::max(static_cast((sizeof(U) - sizeof(T)) * 8), (int64_t) 0)); if (floating_value < 0.0) { // Flipping all bits for negative values. -return static_cast(~tmp) << std::max((sizeof(U) - sizeof(T)) * 8, (uint64_t)0); +return static_cast(~tmp) << shift_size; } else { // Flipping only first bit. -return (static_cast(tmp) << std::max((sizeof(U) - sizeof(T)) * 8, (uint64_t)0)) ^ -mask; +return (static_cast(tmp) << shift_size) ^ mask; } }
[impala] branch master updated: Revert "IMPALA-9781: Fix GCC 7 unaligned 128-bit loads / stores"
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new 7a2e80c Revert "IMPALA-9781: Fix GCC 7 unaligned 128-bit loads / stores" 7a2e80c is described below commit 7a2e80cf602b8c13d935cfc06a2a55a3c48f8d0b Author: Joe McDonnell AuthorDate: Fri May 29 12:32:01 2020 -0700 Revert "IMPALA-9781: Fix GCC 7 unaligned 128-bit loads / stores" The change in decimal-util.h introduced undefined behavior. See IMPALA-9800. This reverts commit 227da84c3757eb857008e7b82aad622ed959eb84. Change-Id: Id2b2e43c478a220ff545fdbca712e47905c8d22b Reviewed-on: http://gerrit.cloudera.org:8080/16006 Reviewed-by: Joe McDonnell Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- be/src/exprs/slot-ref.cc| 5 + be/src/util/decimal-util.h | 3 +-- be/src/util/dict-encoding.h | 5 ++--- 3 files changed, 4 insertions(+), 9 deletions(-) diff --git a/be/src/exprs/slot-ref.cc b/be/src/exprs/slot-ref.cc index 661c7ef..634a989 100644 --- a/be/src/exprs/slot-ref.cc +++ b/be/src/exprs/slot-ref.cc @@ -422,10 +422,7 @@ DecimalVal SlotRef::GetDecimalValInterpreted( case 8: return DecimalVal(*reinterpret_cast(t->GetSlot(slot_offset_))); case 16: - // Avoid an unaligned load by using memcpy - __int128_t val; - memcpy(, t->GetSlot(slot_offset_), sizeof(val)); - return DecimalVal(val); + return DecimalVal(*reinterpret_cast(t->GetSlot(slot_offset_))); default: DCHECK(false); return DecimalVal::null(); diff --git a/be/src/util/decimal-util.h b/be/src/util/decimal-util.h index 4ddfe23..f505ecc 100644 --- a/be/src/util/decimal-util.h +++ b/be/src/util/decimal-util.h @@ -128,8 +128,7 @@ class DecimalUtil { const uint8_t* buffer, int fixed_len_size, T* v) { DCHECK_GT(fixed_len_size, 0); DCHECK_LE(fixed_len_size, sizeof(T)); -// Avoid an unaligned store by using memset -memset(v, 0, sizeof(T)); +*v = 0; // We need to sign extend val. For example, if the original value was // -1, the original bytes were -1,-1,-1,-1. If we only wrote out 1 byte, after // the encode step above, val would contain (-1, 0, 0, 0). We need to sign diff --git a/be/src/util/dict-encoding.h b/be/src/util/dict-encoding.h index e6e01bc..f440332 100644 --- a/be/src/util/dict-encoding.h +++ b/be/src/util/dict-encoding.h @@ -346,11 +346,10 @@ class DictDecoder : public DictDecoderBase { virtual int num_entries() const { return dict_.size(); } virtual void GetValue(int index, void* buffer) { +T* val_ptr = reinterpret_cast(buffer); DCHECK_GE(index, 0); DCHECK_LT(index, dict_.size()); -// Avoid an unaligned store by using memcpy -T val = dict_[index]; -memcpy(buffer, reinterpret_cast(), sizeof(T)); +*val_ptr = dict_[index]; } /// Returns the next value. Returns false if the data is invalid.
[impala] 03/04: IMPALA-9415: Switch result set size calculations from capacity() to size()
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 4cc1b4ad04cd5770a41961269d69a65cdfac1dcf Author: Joe McDonnell AuthorDate: Wed May 27 15:05:04 2020 -0700 IMPALA-9415: Switch result set size calculations from capacity() to size() The behavior of string's capacity() is implementation specific. In GCC 7.5.0, the implementation has different behavior compared to GCC 4.9.2. This is causing a DCHECK to fire in ClientRequestState::FetchRowsInternal(): // Confirm that this was not an underestimate of the memory required. DCHECK_GE(before + delta_bytes, after) What happens on GCC 7.5.0 is that the capacity of the string before the copy is 29, but after the copy to the result set, the capacity is 30. The size remains unchanged. This switches the code to use size(), which is guaranteed to be consistent across copies. This loses some accuracy, because there is some string object overhead and excess capacity that no longer counts. However, this is not code that requires perfect accuracy. Testing: - Ran core tests with GCC 4.9.2 and GCC 7.5.0 Change-Id: I3f9ab260927e14d8951b7c7661f2b5b18a1da39a Reviewed-on: http://gerrit.cloudera.org:8080/15992 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/service/query-result-set.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/be/src/service/query-result-set.cc b/be/src/service/query-result-set.cc index 5445ec7..f2d5b8e 100644 --- a/be/src/service/query-result-set.cc +++ b/be/src/service/query-result-set.cc @@ -226,7 +226,7 @@ int64_t AsciiQueryResultSet::ByteSize(int start_idx, int num_rows) { int64_t bytes = 0; const int end = min(static_cast(num_rows), result_set_->size() - start_idx); for (int i = start_idx; i < start_idx + end; ++i) { -bytes += sizeof(result_set_[i]) + result_set_[i].capacity(); +bytes += sizeof(result_set_[i]) + result_set_[i].size(); } return bytes; } @@ -237,7 +237,7 @@ namespace { // Utility functions for computing the size of HS2 Thrift structs in bytes. inline int64_t ByteSize(const ThriftTColumnValue& val) { - return sizeof(val) + val.stringVal.value.capacity(); + return sizeof(val) + val.stringVal.value.size(); } int64_t ByteSize(const TRow& row) {
[impala] 01/04: IMPALA-9692 (part 2): Refactor parts of TExecPlanFragmentInfo to protobuf
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit b67c0906f596ca336d0ea0e8cbc618a20ac0e563 Author: Thomas Tauber-Marshall AuthorDate: Tue Mar 24 14:04:53 2020 -0700 IMPALA-9692 (part 2): Refactor parts of TExecPlanFragmentInfo to protobuf The new admission control service will be written in protobuf, so there are various admission control related structures currently stored in Thrift that it would be convenient to convert to protobuf, to minimize the amount of converting back and forth that needs to be done. This patch converts some portions of TExecPlanFragmentInfo to protobuf. TExecPlanFragmentInfo is sent as a sidecar with the Exec() rpc, so the refactored parts are now just directly included in the ExecQueryFInstancesRequestPB. The portions that are converted are those that are part of the QuerySchedule, in particular the TPlanFragmentDestination, TScanRangeParams, and TJoinBuildInput. This patch is just a refactor and doesn't contain any functional changes. One notable related change is that DataSink::CreateSink() has two parameters removed - TPlanFragmentCtx (which no longer exists) and TPlanFragmentInstanceCtx. These variables and the new PB eqivalents are available via the RuntimeState that was already being passed in as another parameter and don't need to be individually passed in. Testing: - Passed a full run of existing tests. - Ran the single node perf test and didn't detect any regressions. Change-Id: I3a8e46767b257bbf677171ac2f4efb1b623ba41b Reviewed-on: http://gerrit.cloudera.org:8080/15844 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/benchmarks/expr-benchmark.cc| 7 +- be/src/benchmarks/hash-benchmark.cc| 5 +- be/src/codegen/llvm-codegen-test.cc| 6 +- be/src/exec/blocking-join-node.cc | 16 +++-- be/src/exec/data-sink.h| 3 +- be/src/exec/hbase-scan-node.cc | 16 ++--- be/src/exec/hbase-table-sink.cc| 5 +- be/src/exec/hbase-table-sink.h | 4 +- be/src/exec/hdfs-scan-node-base.cc | 66 +- be/src/exec/hdfs-scan-node-base.h | 2 +- be/src/exec/hdfs-table-sink.cc | 5 +- be/src/exec/hdfs-table-sink.h | 4 +- be/src/exec/kudu-scan-node-base.cc | 9 +-- be/src/exec/kudu-table-sink.cc | 5 +- be/src/exec/kudu-table-sink.h | 4 +- be/src/exec/nested-loop-join-builder.cc| 6 +- be/src/exec/nested-loop-join-builder.h | 4 +- be/src/exec/partitioned-hash-join-builder.cc | 5 +- be/src/exec/partitioned-hash-join-builder.h| 4 +- be/src/exec/plan-root-sink.cc | 7 +- be/src/exec/plan-root-sink.h | 4 +- be/src/exec/scan-node.h| 7 +- be/src/exprs/expr-codegen-test.cc | 6 +- be/src/rpc/CMakeLists.txt | 2 + be/src/runtime/coordinator-backend-state.cc| 35 ++ be/src/runtime/data-stream-test.cc | 40 +-- be/src/runtime/fragment-instance-state.cc | 34 + be/src/runtime/fragment-instance-state.h | 13 ++-- be/src/runtime/fragment-state.cc | 52 -- be/src/runtime/fragment-state.h| 30 +--- be/src/runtime/krpc-data-stream-sender.cc | 60 be/src/runtime/krpc-data-stream-sender.h | 8 +-- be/src/runtime/query-state.cc | 37 ++ be/src/runtime/row-batch.cc| 10 +-- be/src/runtime/runtime-state.cc| 24 --- be/src/runtime/runtime-state.h | 18 +++-- be/src/runtime/test-env.cc | 16 +++-- be/src/scheduling/query-schedule.h | 7 +- be/src/scheduling/scheduler-test-util.cc | 20 +++--- be/src/scheduling/scheduler-test-util.h| 3 +- be/src/scheduling/scheduler-test.cc| 34 - be/src/scheduling/scheduler.cc | 80 +++--- be/src/scheduling/scheduler.h | 6 +- be/src/service/fe-support.cc | 5 +- be/src/util/CMakeLists.txt | 1 + be/src/util/compression-util.cc| 64 + .../src/util/compression-util.h| 28 +++- be/src/util/container-util.h | 8 +++ be/src/util/uid-util.h | 6 ++ common/protobuf
[impala] branch master updated (a148517 -> 227da84)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from a148517 IMPALA-9787: fix spinning thread with memory-based table invalidation new b67c090 IMPALA-9692 (part 2): Refactor parts of TExecPlanFragmentInfo to protobuf new f39ddb1 IMPALA-9761: Fix GCC7 ambiguous else warning for gtest macros new 4cc1b4a IMPALA-9415: Switch result set size calculations from capacity() to size() new 227da84 IMPALA-9781: Fix GCC 7 unaligned 128-bit loads / stores The 4 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: be/src/benchmarks/expr-benchmark.cc | 7 +- be/src/benchmarks/hash-benchmark.cc | 5 +- be/src/codegen/llvm-codegen-test.cc | 6 +- be/src/exec/blocking-join-node.cc | 16 +++-- be/src/exec/data-sink.h | 3 +- be/src/exec/hbase-scan-node.cc| 16 ++--- be/src/exec/hbase-table-sink.cc | 5 +- be/src/exec/hbase-table-sink.h| 4 +- be/src/exec/hdfs-scan-node-base.cc| 66 ++- be/src/exec/hdfs-scan-node-base.h | 2 +- be/src/exec/hdfs-table-sink.cc| 5 +- be/src/exec/hdfs-table-sink.h | 4 +- be/src/exec/kudu-scan-node-base.cc| 9 +-- be/src/exec/kudu-table-sink.cc| 5 +- be/src/exec/kudu-table-sink.h | 4 +- be/src/exec/nested-loop-join-builder.cc | 6 +- be/src/exec/nested-loop-join-builder.h| 4 +- be/src/exec/parquet/parquet-common-test.cc| 8 ++- be/src/exec/partitioned-hash-join-builder.cc | 5 +- be/src/exec/partitioned-hash-join-builder.h | 4 +- be/src/exec/plan-root-sink.cc | 7 +- be/src/exec/plan-root-sink.h | 4 +- be/src/exec/scan-node.h | 7 +- be/src/exprs/expr-codegen-test.cc | 6 +- be/src/exprs/slot-ref.cc | 5 +- be/src/rpc/CMakeLists.txt | 2 + be/src/runtime/buffered-tuple-stream-test.cc | 4 +- be/src/runtime/bufferpool/buffer-pool-test.cc | 8 ++- be/src/runtime/coordinator-backend-state.cc | 35 ++ be/src/runtime/data-stream-test.cc| 40 ++-- be/src/runtime/fragment-instance-state.cc | 34 +- be/src/runtime/fragment-instance-state.h | 13 ++-- be/src/runtime/fragment-state.cc | 52 --- be/src/runtime/fragment-state.h | 30 ++--- be/src/runtime/io/disk-io-mgr-test.cc | 4 +- be/src/runtime/krpc-data-stream-sender.cc | 60 - be/src/runtime/krpc-data-stream-sender.h | 8 +-- be/src/runtime/query-state.cc | 37 +++ be/src/runtime/row-batch.cc | 10 +-- be/src/runtime/runtime-state.cc | 24 --- be/src/runtime/runtime-state.h| 18 +++-- be/src/runtime/test-env.cc| 16 +++-- be/src/runtime/timestamp-test.cc | 12 ++-- be/src/scheduling/query-schedule.h| 7 +- be/src/scheduling/scheduler-test-util.cc | 20 +++--- be/src/scheduling/scheduler-test-util.h | 3 +- be/src/scheduling/scheduler-test.cc | 34 +- be/src/scheduling/scheduler.cc| 80 --- be/src/scheduling/scheduler.h | 6 +- be/src/service/fe-support.cc | 5 +- be/src/service/query-result-set.cc| 4 +- be/src/testutil/cpu-util.h| 4 +- be/src/util/CMakeLists.txt| 1 + be/src/util/compression-util.cc | 64 ++ be/src/util/{flat_buffer.h => compression-util.h} | 14 ++-- be/src/util/container-util.h | 8 +++ be/src/util/decimal-util.h| 3 +- be/src/util/dict-encoding.h | 5 +- be/src/util/uid-util.h| 6 ++ common/protobuf/CMakeLists.txt| 2 +- common/protobuf/common.proto | 21 -- common/protobuf/control_service.proto | 73 + common/protobuf/planner.proto | 76 + common/protobuf/row_batch.proto | 2 +- common/thrift/ImpalaInternalService.thrift| 58 ++-- common/thrift/PlanNodes.thrift| 8 ++-
[impala] 02/04: IMPALA-9761: Fix GCC7 ambiguous else warning for gtest macros
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit f39ddb1443eaa588ec85254d0a4aefe95f105b7a Author: Joe McDonnell AuthorDate: Tue May 19 19:39:43 2020 -0700 IMPALA-9761: Fix GCC7 ambiguous else warning for gtest macros On GCC7, a dangling-else warning is firing for code like: if (cond1) ASSERT_TRUE(cond2) This is true for several ASSERT_* and EXPECT_* gtest macros. gtest had some code to avoid warnings for code of this form, but that code is no longer effective. gtest now disables the dangling-else warning. Since this is just a matter of adding braces, this adds braces for all those locations. For consistency, this may include locations that were not failing. I found locations by doing: git grep EXPECT_ | grep if git grep ASSERT_ | grep if and manually looking through the output. Testing: - Builds successfully Change-Id: Ieb664afe83736a71508302575e8e66a1b506c985 Reviewed-on: http://gerrit.cloudera.org:8080/15964 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/exec/parquet/parquet-common-test.cc| 8 ++-- be/src/runtime/buffered-tuple-stream-test.cc | 4 +++- be/src/runtime/bufferpool/buffer-pool-test.cc | 8 ++-- be/src/runtime/io/disk-io-mgr-test.cc | 4 +++- be/src/runtime/timestamp-test.cc | 12 be/src/testutil/cpu-util.h| 4 +++- 6 files changed, 29 insertions(+), 11 deletions(-) diff --git a/be/src/exec/parquet/parquet-common-test.cc b/be/src/exec/parquet/parquet-common-test.cc index b67603a..0dc519f 100644 --- a/be/src/exec/parquet/parquet-common-test.cc +++ b/be/src/exec/parquet/parquet-common-test.cc @@ -30,7 +30,9 @@ void ValidateRanges(RangeVec skip_ranges, int num_rows, const RangeVec& expected RangeVec result; bool success = ComputeCandidateRanges(num_rows, _ranges, ); EXPECT_EQ(should_succeed, success); - if (success) EXPECT_EQ(expected, result); + if (success) { +EXPECT_EQ(expected, result); + } } void ValidateRangesError(RangeVec skip_ranges, int num_rows, const RangeVec& expected) { @@ -76,7 +78,9 @@ void ValidatePages(const vector& first_row_indexes, const RangeVec& ran bool success = ComputeCandidatePages(page_locations, ranges, num_rows, _pages); EXPECT_EQ(should_succeed, success); - if (success) EXPECT_EQ(expected_page_indexes, candidate_pages); + if (success) { +EXPECT_EQ(expected_page_indexes, candidate_pages); + } } void ValidatePagesError(const vector& first_row_indexes, const RangeVec& ranges, diff --git a/be/src/runtime/buffered-tuple-stream-test.cc b/be/src/runtime/buffered-tuple-stream-test.cc index 37d0323..1c42f07 100644 --- a/be/src/runtime/buffered-tuple-stream-test.cc +++ b/be/src/runtime/buffered-tuple-stream-test.cc @@ -854,7 +854,9 @@ void SimpleTupleStreamTest::TestAttachMemory(bool pin_stream, bool attach_on_rea } else { EXPECT_EQ(0, num_buffers_attached) << "No buffers attached during iteration."; } - if (attach_on_read || !pin_stream) EXPECT_EQ(4, num_flushes); + if (attach_on_read || !pin_stream) { +EXPECT_EQ(4, num_flushes); + } out_batch->Reset(); stream.Close(out_batch, RowBatch::FlushMode::FLUSH_RESOURCES); if (attach_on_read) { diff --git a/be/src/runtime/bufferpool/buffer-pool-test.cc b/be/src/runtime/bufferpool/buffer-pool-test.cc index ea788c4..2c9add7 100644 --- a/be/src/runtime/bufferpool/buffer-pool-test.cc +++ b/be/src/runtime/bufferpool/buffer-pool-test.cc @@ -584,7 +584,9 @@ void BufferPoolTest::TestBufferAllocation(bool reserved) { BufferPool::ClientHandle client; ASSERT_OK(pool.RegisterClient("test client", NULL, _reservations_, NULL, TOTAL_MEM, NewProfile(), )); - if (reserved) ASSERT_TRUE(client.IncreaseReservationToFit(TOTAL_MEM)); + if (reserved) { +ASSERT_TRUE(client.IncreaseReservationToFit(TOTAL_MEM)); + } vector handles(NUM_BUFFERS); @@ -2095,7 +2097,9 @@ void BufferPoolTest::TestRandomInternalImpl(BufferPool* pool, TmpFileGroup* file int rand_pick = uniform_int_distribution(0, pages.size() - 1)(*rng); PageHandle* page = [rand_pick].first; if (!client.IncreaseReservationToFit(page->len())) continue; - if (!page->is_pinned() || multiple_pins) ASSERT_OK(pool->Pin(, page)); + if (!page->is_pinned() || multiple_pins) { +ASSERT_OK(pool->Pin(, page)); + } // Block on the pin and verify data for sync pins. if (p < 0.35) VerifyData(*page, pages[rand_pick].second); } else if (p < 0.70) { diff --git a/be/src/runtime/io/disk-io-mgr-test.cc b/be/src/runtime/io/disk-io-mgr-test.cc index 2cf4642..f549e19 100644 --- a/be/src/runtime/io/disk-io-mgr-test.cc +++ b/be/s
[impala] 04/04: IMPALA-9781: Fix GCC 7 unaligned 128-bit loads / stores
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 227da84c3757eb857008e7b82aad622ed959eb84 Author: Joe McDonnell AuthorDate: Wed May 27 16:21:36 2020 -0700 IMPALA-9781: Fix GCC 7 unaligned 128-bit loads / stores When running a release binary built with GCC 7.5.0, it crashes with an unaligned memory error in multiple pieces of code. In these locations, we are doing stores to 128-bit values, but we cannot guarantee alignment. GCC 7 must be optimizing the code to use instructions that require a higher level of alignment than we can provide. This switches the code locations to use memset / memcpy with local variables to avoid the unaligned stores. Testing: - Ran exhaustive tests with a release binary built by GCC 7.5.0 - Ran exhaustive tests Change-Id: I67320790789d5b57aeaf2dff0eae7352a1cbf81e Reviewed-on: http://gerrit.cloudera.org:8080/15993 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/exprs/slot-ref.cc| 5 - be/src/util/decimal-util.h | 3 ++- be/src/util/dict-encoding.h | 5 +++-- 3 files changed, 9 insertions(+), 4 deletions(-) diff --git a/be/src/exprs/slot-ref.cc b/be/src/exprs/slot-ref.cc index 634a989..661c7ef 100644 --- a/be/src/exprs/slot-ref.cc +++ b/be/src/exprs/slot-ref.cc @@ -422,7 +422,10 @@ DecimalVal SlotRef::GetDecimalValInterpreted( case 8: return DecimalVal(*reinterpret_cast(t->GetSlot(slot_offset_))); case 16: - return DecimalVal(*reinterpret_cast(t->GetSlot(slot_offset_))); + // Avoid an unaligned load by using memcpy + __int128_t val; + memcpy(, t->GetSlot(slot_offset_), sizeof(val)); + return DecimalVal(val); default: DCHECK(false); return DecimalVal::null(); diff --git a/be/src/util/decimal-util.h b/be/src/util/decimal-util.h index f505ecc..4ddfe23 100644 --- a/be/src/util/decimal-util.h +++ b/be/src/util/decimal-util.h @@ -128,7 +128,8 @@ class DecimalUtil { const uint8_t* buffer, int fixed_len_size, T* v) { DCHECK_GT(fixed_len_size, 0); DCHECK_LE(fixed_len_size, sizeof(T)); -*v = 0; +// Avoid an unaligned store by using memset +memset(v, 0, sizeof(T)); // We need to sign extend val. For example, if the original value was // -1, the original bytes were -1,-1,-1,-1. If we only wrote out 1 byte, after // the encode step above, val would contain (-1, 0, 0, 0). We need to sign diff --git a/be/src/util/dict-encoding.h b/be/src/util/dict-encoding.h index f440332..e6e01bc 100644 --- a/be/src/util/dict-encoding.h +++ b/be/src/util/dict-encoding.h @@ -346,10 +346,11 @@ class DictDecoder : public DictDecoderBase { virtual int num_entries() const { return dict_.size(); } virtual void GetValue(int index, void* buffer) { -T* val_ptr = reinterpret_cast(buffer); DCHECK_GE(index, 0); DCHECK_LT(index, dict_.size()); -*val_ptr = dict_[index]; +// Avoid an unaligned store by using memcpy +T val = dict_[index]; +memcpy(buffer, reinterpret_cast(), sizeof(T)); } /// Returns the next value. Returns false if the data is invalid.
[impala] 01/02: IMPALA-9775: Fix test failure in TestAcid.test_acid_heartbeats
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit febce6519f5791e5578608eac4d68f5f9ccb0457 Author: wzhou-code AuthorDate: Sun May 24 20:59:20 2020 -0700 IMPALA-9775: Fix test failure in TestAcid.test_acid_heartbeats The failure was caused by IMPALA-9764, which change the sleep interval between heartbeats. To fix it, add an upper limit of the sleep interval as 100 seconds, and increase the execution time for the query in test case TestAcid::test_acid_heartbeats. Skip the test for table formats with compression to reduce the total execution time. Testing: - Ran following command to verify that the bug was fixed: ./bin/impala-py.test tests/query_test/test_acid.py\ ::TestAcid::test_acid_heartbeats \ --workload_exploration_strategy=functional-query:exhaustive - Passed all exhaustive tests. Change-Id: I7922797d7e3ce94a2c8948211245f4e77fdb08b7 Reviewed-on: http://gerrit.cloudera.org:8080/15984 Reviewed-by: Zoltan Borok-Nagy Tested-by: Impala Public Jenkins --- .../main/java/org/apache/impala/common/TransactionKeepalive.java | 8 ++-- tests/query_test/test_acid.py | 8 +--- 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/fe/src/main/java/org/apache/impala/common/TransactionKeepalive.java b/fe/src/main/java/org/apache/impala/common/TransactionKeepalive.java index 50e35a2..52dff28 100644 --- a/fe/src/main/java/org/apache/impala/common/TransactionKeepalive.java +++ b/fe/src/main/java/org/apache/impala/common/TransactionKeepalive.java @@ -48,6 +48,10 @@ import com.sun.tools.javac.code.Attribute.Array; public class TransactionKeepalive { public static final Logger LOG = Logger.getLogger(TransactionKeepalive.class); + // (IMPALA-9775) The sleep interval is deduced from Hive configuration parameter + // hive.txn.timeout. To be safe, set an upper limit for sleep interval as 100 + // seconds for carrying through the test case TestAcid.test_acid_heartbeats. + private static final long MAX_SLEEP_INTERVAL_MILLISECONDS = 10; private static final long MILLION = 100L; private final long sleepIntervalMs_; @@ -209,8 +213,8 @@ public class TransactionKeepalive { */ public TransactionKeepalive(MetaStoreClientPool metaStoreClientPool) { HiveConf hiveConf = new HiveConf(TransactionKeepalive.class); -sleepIntervalMs_ = hiveConf.getTimeVar( -HiveConf.ConfVars.HIVE_TXN_TIMEOUT, TimeUnit.MILLISECONDS) / 3; +sleepIntervalMs_ = Math.min(MAX_SLEEP_INTERVAL_MILLISECONDS, hiveConf.getTimeVar( +HiveConf.ConfVars.HIVE_TXN_TIMEOUT, TimeUnit.MILLISECONDS) / 3); Preconditions.checkState(sleepIntervalMs_ > 0); Preconditions.checkNotNull(metaStoreClientPool); metaStoreClientPool_ = metaStoreClientPool; diff --git a/tests/query_test/test_acid.py b/tests/query_test/test_acid.py index ed564a4..f9e3f02 100644 --- a/tests/query_test/test_acid.py +++ b/tests/query_test/test_acid.py @@ -174,13 +174,15 @@ class TestAcid(ImpalaTestSuite): @SkipIfADLS.hive @SkipIfIsilon.hive @SkipIfLocal.hive - @pytest.mark.execute_serially def test_acid_heartbeats(self, vector, unique_database): """Tests heartbeating of transactions. Creates a long-running query via some jitting and in the meanwhile it periodically checks whether there is a transaction that has sent a heartbeat since its start. """ if self.exploration_strategy() != 'exhaustive': pytest.skip() +table_format = vector.get_value('table_format') +if table_format.compression_codec != 'none': pytest.skip() + last_open_txn_start_time = self._latest_open_transaction() dummy_tbl = "{}.{}".format(unique_database, "dummy") self.execute_query("create table {} (i int) tblproperties" @@ -188,8 +190,8 @@ class TestAcid(ImpalaTestSuite): "'transactional_properties'='insert_only')".format(dummy_tbl)) try: handle = self.execute_query_async( - "insert into {} values (sleep(20))".format(dummy_tbl)) - MAX_ATTEMPTS = 10 + "insert into {} values (sleep(32))".format(dummy_tbl)) + MAX_ATTEMPTS = 16 attempt = 0 success = False while attempt < MAX_ATTEMPTS:
[impala] 02/02: Run test_row_validation only on HDFS
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit b8496d56e49814c36edadbff17b80b10bf40a3a2 Author: Zoltan Borok-Nagy AuthorDate: Wed May 27 11:10:59 2020 +0200 Run test_row_validation only on HDFS Added pytest.skip() when the test is being run on a filesystem other than HDFS. The test only makes sense on filesystems that support APPEND because it simulates Hive Streaming V2. And currently Hive Streaming only works on HDFS. Change-Id: Id2a647ba5c75a600f177f82290241a93afc71ea7 Reviewed-on: http://gerrit.cloudera.org:8080/15988 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- tests/query_test/test_acid_row_validation.py | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/tests/query_test/test_acid_row_validation.py b/tests/query_test/test_acid_row_validation.py index c6438c0..041e754 100644 --- a/tests/query_test/test_acid_row_validation.py +++ b/tests/query_test/test_acid_row_validation.py @@ -18,11 +18,12 @@ # Functional tests for ACID integration with Hive. import os +import pytest from tests.common.impala_test_suite import ImpalaTestSuite from tests.common.skip import SkipIfLocal from tests.util.acid_txn import AcidTxn - +from tests.util.filesystem_utils import IS_HDFS # Tests that Impala validates rows against a validWriteIdList correctly. class TestAcidRowValidation(ImpalaTestSuite): @@ -63,6 +64,10 @@ class TestAcidRowValidation(ImpalaTestSuite): """Tests reading from a file written by Hive Streaming Ingestion. In the first no rows are valid. Then we commit the first transaction and read the table. Then we commit the last transaction and read the table.""" +# This test only makes sense on a filesystem that supports the file append operation +# (e.g. S3 doesn't) because it simulates Hive Streaming V2. So let's run it only on +# HDFS. +if not IS_HDFS: pytest.skip() tbl_name = "streaming" self._create_test_table(vector, unique_database, tbl_name) self.run_test_case('QueryTest/acid-row-validation-0', vector, use_db=unique_database)
[impala] branch master updated (5c69e7b -> b8496d5)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from 5c69e7b IMPALA-9597: Eliminate redundant Ranger audits for column masking new febce65 IMPALA-9775: Fix test failure in TestAcid.test_acid_heartbeats new b8496d5 Run test_row_validation only on HDFS The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .../main/java/org/apache/impala/common/TransactionKeepalive.java | 8 ++-- tests/query_test/test_acid.py | 8 +--- tests/query_test/test_acid_row_validation.py | 7 ++- 3 files changed, 17 insertions(+), 6 deletions(-)
[impala] branch master updated: IMPALA-9597: Eliminate redundant Ranger audits for column masking
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new 5c69e7b IMPALA-9597: Eliminate redundant Ranger audits for column masking 5c69e7b is described below commit 5c69e7ba583297dc886652ac5952816882b928af Author: Fang-Yu Rao AuthorDate: Mon Apr 27 08:41:53 2020 -0700 IMPALA-9597: Eliminate redundant Ranger audits for column masking After IMPALA-9350, Impala is able to produce the corresponding Ranger audits when a query involves policies of column masking. However, redundant audit events could be produced due to the fact that the analysis of the TableRef containing a column involved in a column masking policy could be performed more than once for a query that has to be analyzed more than once. For example, a query consisting of a WithClause or a query that requires a rewrite operation followed by a re-analysis phase would result in RangerImpalaPlugin#evalDataMaskPolicies() being invoked multiple times, each producing an audit log entry for the same column. Moreover, for a query involving column masking policies, the corresponding audit log entries will still be generated even though there is an AuthorizationException thrown in the authorization phase. This patch fixes those two issues described above by adding some post-processing steps after the analysis of a query to deduplicate the List of AuthzAuditEvent's for column masking policies. Specifically, we stash the deduplicated audit events after the analysis of the query and will add back those deduplicated events only if the authorization of the query is successful. On the other hand, this patch also resolves an inconsistency when an "Unmasked" policy is involved in a query that retains the original column value. Specifically, when an "Unmasked" policy is the only column masking policy involved in this query, RangerAuthorizationChecker#createColumnMask() will not be called to produce the corresponding AuthzAuditEvent, whereas createColumnMask() will be invoked to produce the respective AuthzAuditEvent if there are policies of other types. Since an "Unmasked" policy essentially does not change the original column value, we filter out the respective events with mask type equal to "MASK_NONE" which corresponds to an "Unmasked" policy. Testing: - Added three test cases in RangerAuditLogTest#testAuditsForColumnMasking() to make sure the issues above are resolved. - Verified that this patch passes the FE tests in the DEBUG build. Change-Id: I42d60130fba93d63fbc36949f2bf746b7ae2497d Reviewed-on: http://gerrit.cloudera.org:8080/15854 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- .../apache/impala/analysis/AnalysisContext.java| 29 ++-- .../impala/authorization/AuthorizationChecker.java | 10 +- .../authorization/BaseAuthorizationChecker.java| 2 +- .../authorization/NoopAuthorizationFactory.java| 4 + .../ranger/RangerAuthorizationChecker.java | 48 +-- .../ranger/RangerAuthorizationContext.java | 42 ++ .../authorization/ranger/RangerImpalaPlugin.java | 22 +++ .../authorization/ranger/RangerAuditLogTest.java | 153 ++--- .../org/apache/impala/common/FrontendTestBase.java | 4 + 9 files changed, 259 insertions(+), 55 deletions(-) diff --git a/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java b/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java index feddb4c..df68afb 100644 --- a/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java +++ b/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java @@ -416,27 +416,18 @@ public class AnalysisContext { // Analyze statement and record exception. AnalysisException analysisException = null; -TClientRequest clientRequest; -AuthorizationContext authzCtx = null; - +TClientRequest clientRequest = queryCtx_.getClient_request(); +AuthorizationContext authzCtx = authzChecker.createAuthorizationContext(true, +clientRequest.isSetRedacted_stmt() ? +clientRequest.getRedacted_stmt() : clientRequest.getStmt(), +queryCtx_.getSession(), Optional.of(timeline_)); +Preconditions.checkState(authzCtx != null); try { - clientRequest = queryCtx_.getClient_request(); - authzCtx = authzChecker.createAuthorizationContext(true, - clientRequest.isSetRedacted_stmt() ? - clientRequest.getRedacted_stmt() : clientRequest.getStmt(), - queryCtx_.getSession(), Optional.of(timeline_)); - // TODO (IMPALA-9597): Generating column masking audit events in the
[impala] 01/02: IMPALA-9755: Flaky test: test_global_exchange_counters
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 95ee26354dc0ce61e5844430d1eaf553fd13d154 Author: Sahil Takiar AuthorDate: Tue May 19 14:58:23 2020 -0700 IMPALA-9755: Flaky test: test_global_exchange_counters De-flake TestObservability.test_global_exchange_counters in test_observability.py. IMPALA-6984 added a feature to send a Cancel RPC to running fragments when the coordinator fragment fetches all rows defined by a limit. This causes fragments to terminate early (which is a good thing). However, test_global_exchange_counters expects each fragment to produce some rows, which is why it recently became flaky. This patch modifies test_global_exchange_counters so that it allows for some fragments to produce 0 rows. Testing: * Ran test_observability.py locally * Looped 8 concurrent streams of test_global_exchange_counters for an hour, no failures (previously I was able to reproduce the test issue within 5 minutes) Change-Id: Icb3a1b5ccb5695eb71343e96cc830f12d5c72f1e Reviewed-on: http://gerrit.cloudera.org:8080/15960 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- tests/query_test/test_observability.py | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/tests/query_test/test_observability.py b/tests/query_test/test_observability.py index 101f086..8f27c72 100644 --- a/tests/query_test/test_observability.py +++ b/tests/query_test/test_observability.py @@ -404,7 +404,6 @@ class TestObservability(ImpalaTestSuite): def __verify_profile_event_sequence(self, event_regexes, runtime_profile): """Check that 'event_regexes' appear in a consecutive series of lines in 'runtime_profile'""" -lines = runtime_profile.splitlines() event_regex_index = 0 # Check that the strings appear in the above order with no gaps in the profile. @@ -501,9 +500,13 @@ class TestObservability(ImpalaTestSuite): if key in line: # Match byte count within parentheses m = re.search("\(([0-9]+)\)", line) - assert m, "Cannot match pattern for key %s in line '%s'" % (key, line) - # Only keep first (query-level) counter - if counters[key] == 0: + + # If a match was not found, then the value of the key should be 0 + if not m: +assert key + ": 0" in line, "Invalid format for key %s" % key +assert counters[key] != 0, "Query level counter for key %s cannot be 0" % key + elif counters[key] == 0: +# Only keep first (query-level) counter counters[key] = int(m.group(1)) # All counters have values
[impala] branch master updated (a11106e -> 3e76da9)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from a11106e IMPALA-9764: TransactionKeepalive should set sleep interval based on Hive Configuration new 95ee263 IMPALA-9755: Flaky test: test_global_exchange_counters new 3e76da9 IMPALA-9708: Remove Sentry support The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: README-build.md| 7 +- be/src/catalog/catalog-server.cc | 11 - be/src/catalog/catalog-service-client-wrapper.h| 8 - be/src/catalog/catalog.cc | 11 - be/src/catalog/catalog.h | 7 - be/src/common/global-flags.cc | 3 + be/src/exec/catalog-op-executor.cc | 15 - be/src/exec/catalog-op-executor.h | 6 - be/src/service/fe-support.cc | 29 - be/src/service/frontend.cc | 8 +- be/src/transport/TSasl.cpp | 2 +- be/src/util/backend-gflag-util.cc | 8 - bin/bootstrap_toolchain.py | 49 +- bin/create-test-configuration.sh | 19 - bin/impala-config.sh | 52 +- buildall.sh| 18 +- common/thrift/BackendGflags.thrift | 6 +- common/thrift/CatalogService.thrift| 24 - fe/pom.xml | 173 +--- .../impala/analysis/AlterDbSetOwnerStmt.java | 6 +- .../analysis/AlterTableOrViewSetOwnerStmt.java | 6 +- .../apache/impala/analysis/AuthorizationStmt.java | 3 +- .../apache/impala/analysis/CreateTableStmt.java| 2 +- .../authorization/AuthorizationProvider.java | 2 - .../authorization/PrivilegeRequestBuilder.java | 2 +- .../impala/authorization/sentry/ImpalaAction.java | 87 --- .../authorization/sentry/ImpalaActionFactory.java | 57 -- .../authorization/sentry/ImpalaPrivilegeModel.java | 44 -- .../authorization/sentry/SentryAuthProvider.java | 70 -- .../authorization/sentry/SentryAuthorizable.java | 59 -- .../sentry/SentryAuthorizableColumn.java | 83 -- .../authorization/sentry/SentryAuthorizableDb.java | 54 -- .../sentry/SentryAuthorizableFactory.java | 86 -- .../authorization/sentry/SentryAuthorizableFn.java | 61 -- .../sentry/SentryAuthorizableServer.java | 53 -- .../sentry/SentryAuthorizableTable.java| 71 -- .../sentry/SentryAuthorizableUri.java | 53 -- .../sentry/SentryAuthorizationChecker.java | 155 .../sentry/SentryAuthorizationConfig.java | 147 .../sentry/SentryAuthorizationFactory.java | 109 --- .../sentry/SentryAuthorizationPolicy.java | 167 .../sentry/SentryCatalogdAuthorizationManager.java | 541 - .../impala/authorization/sentry/SentryConfig.java | 74 -- .../sentry/SentryImpaladAuthorizationManager.java | 306 .../sentry/SentryPolicyReaderException.java| 35 - .../authorization/sentry/SentryPolicyService.java | 544 - .../impala/authorization/sentry/SentryProxy.java | 651 .../sentry/SentryUnavailableException.java | 35 - .../impala/authorization/sentry/SentryUtil.java| 48 -- .../org/apache/impala/service/BackendConfig.java | 3 - .../java/org/apache/impala/service/FeSupport.java | 12 - .../java/org/apache/impala/service/Frontend.java | 3 +- .../java/org/apache/impala/service/JniCatalog.java | 21 - .../org/apache/impala/util/AuthorizationUtil.java | 5 +- .../impala/analysis/AnalyzeAuthStmtsTest.java | 17 +- .../authorization/AuthorizationStmtTest.java | 85 +- .../impala/authorization/AuthorizationTest.java| 673 .../authorization/AuthorizationTestBase.java | 89 +-- .../sentry/ImpalaActionFactoryTest.java| 135 .../authorization/sentry/SentryProxyTest.java | 614 --- .../org/apache/impala/catalog/CatalogTest.java | 2 +- .../impala/testutil/SentryServicePinger.java | 99 --- .../impala/testutil/TestSentryGroupMapper.java | 80 -- .../apache/impala/util/AuthorizationUtilTest.java | 25 +- fe/src/test/resources/hive-site.xml.py | 10 +- fe/src/test/resources/sentry-site.xml.py | 65 -- impala-parent/pom.xml | 4 +- infra/deploy/deploy.py | 1 - testdata/bin/run-all.sh| 14 +- testdata/bin/run-hive-server.sh
[impala] 02/04: IMPALA-9585: [DOCS] update mt_dop docs
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 2dd20dc1e409fb6c460f9dbb1b995044ba8b858e Author: Tim Armstrong AuthorDate: Thu May 7 16:12:23 2020 -0700 IMPALA-9585: [DOCS] update mt_dop docs Updated to reflect changes in IMPALA-9099 and IMPALA-9736. Change-Id: Ifc7511fede5f9b36ae8250d3acf8d0061b48106f Reviewed-on: http://gerrit.cloudera.org:8080/15883 Reviewed-by: Tamas Mate Tested-by: Impala Public Jenkins Reviewed-by: Bikramjeet Vig --- docs/topics/impala_mt_dop.xml | 39 +++ 1 file changed, 27 insertions(+), 12 deletions(-) diff --git a/docs/topics/impala_mt_dop.xml b/docs/topics/impala_mt_dop.xml index 04fb1c0..5ca2be6 100644 --- a/docs/topics/impala_mt_dop.xml +++ b/docs/topics/impala_mt_dop.xml @@ -51,8 +51,7 @@ under the License. -Currently, the operations affected by the MT_DOP -query option are: +Currently, MT_DOP support varies by statement type: @@ -64,11 +63,28 @@ under the License. -Queries with execution plans containing only scan and aggregation operators, -or local joins that do not need data exchanges (such as for nested types). -Other queries produce an error if MT_DOP is set to a non-zero -value. Therefore, this query option is typically only set for the duration of -specific long-running, CPU-intensive queries. +SELECT statements. MT_DOP is 0 by default +for SELECT statements but can be set to a value greater +than 0 to control intra-node parallelism. This may be useful to tune +query performance and in particular to reduce execution time of +long-running, CPU-intensive queries. + + + + +DML statements. MT_DOP values greater +than zero are not currently supported for DML statements. DML statements +will produce an error if MT_DOP is set to a non-zero value. + + + + +In and earlier, not all SELECT +statements support setting MT_DOP. Specifically, only +scan and aggregation operators, and +local joins that do not need data exchanges (such as for nested types) are +supported. Other SELECT statements produce an error if +MT_DOP is set to a non-zero value. @@ -149,7 +165,7 @@ compute stats billion_rows_parquet; The following example shows the effects of setting MT_DOP - for a query involving only scan and aggregation operations for a Parquet table: + for a query on a Parquet table:
[impala] 04/04: IMPALA-9714: Fix edge cases in SimpleLogger and add test
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 0815a184fdfeb3293849a8441ba003d63a588dab Author: Joe McDonnell AuthorDate: Mon May 4 14:03:30 2020 -0700 IMPALA-9714: Fix edge cases in SimpleLogger and add test SimpleLogger is used for several existing log types and a change to use it for the data cache access trace is underway. Since this is commonly used, it is useful to nail down specific semantics and test them. This fixes the following edge cases: 1. LoggingSupport::DeleteOldLogs() currently maintains a map from mtime to the filename in order to decide which files need to be deleted. This stops working when there are fast updates to the log, because mtime has seconds resolution and DeleteOldLogs() is only able to recognize a single file per mtime with the current map. This changes the map to a set of pairs of mtime + filename. The behavior is identical except that if there are multiple files with the same mtime, they each get their own entry in the set. This allows DeleteOldLogs() to more accurately maintain the maximum log files. 2. SimpleLogger::Init() now enforces the limit on the maximum number of log files. This provides a clear semantic when dealing with preexisting files from a previous incarnation of the same logger. 3. SimpleLogger will now create any intermediate directories when creating the logging directory (i.e. existingdir/a/b/c works). 4. This changes the enforcement moves enforcement max_audit_event_log_files to use the limits provided by SimpleLogger rather than a background thread calling DeleteOldLogs() periodically. This also introduces SimpleLogger::GetLogFiles(), which is a static function to get the log files given a directory and prefix. This is necessary for testing, but it also will be useful for code that wants to process logs from SimpleLogger. Testing: - Added a new simple-logger-test that codifies the expected behavior - Ran core tests Change-Id: Idd092a65b31d34f40a660cab7b5e0695a3627c78 Reviewed-on: http://gerrit.cloudera.org:8080/15861 Reviewed-by: Thomas Tauber-Marshall Tested-by: Impala Public Jenkins --- be/src/common/init.cc | 6 - be/src/common/logging.cc| 16 +- be/src/common/logging.h | 5 - be/src/service/impala-server.cc | 8 +- be/src/service/impala-server.h | 3 - be/src/util/CMakeLists.txt | 2 + be/src/util/filesystem-util-test.cc | 37 + be/src/util/filesystem-util.cc | 9 +- be/src/util/filesystem-util.h | 7 +- be/src/util/logging-support.cc | 15 +- be/src/util/simple-logger-test.cc | 290 be/src/util/simple-logger.cc| 56 +-- be/src/util/simple-logger.h | 14 +- 13 files changed, 419 insertions(+), 49 deletions(-) diff --git a/be/src/common/init.cc b/be/src/common/init.cc index 278a563..db47282 100644 --- a/be/src/common/init.cc +++ b/be/src/common/init.cc @@ -74,10 +74,6 @@ DECLARE_string(redaction_rules_file); DECLARE_string(reserved_words_version); DECLARE_bool(symbolize_stacktrace); -DEFINE_int32(max_audit_event_log_files, 0, "Maximum number of audit event log files " -"to retain. The most recent audit event log files are retained. If set to 0, " -"all audit event log files are retained."); - DEFINE_int32(memory_maintenance_sleep_time_ms, 1, "Sleep time in milliseconds " "between memory maintenance iterations"); @@ -146,8 +142,6 @@ extern "C" { void __gcov_flush(); } if (impala::TestInfo::is_test()) continue; // Check for log rotation in every interval of the maintenance thread impala::CheckAndRotateLogFiles(FLAGS_max_log_files); -// Check for audit event log rotation in every interval of the maintenance thread -impala::CheckAndRotateAuditEventLogFiles(FLAGS_max_audit_event_log_files); // Check for minidump rotation in every interval of the maintenance thread. This is // necessary since an arbitrary number of minidumps can be written by sending SIGUSR1 // to the process. diff --git a/be/src/common/logging.cc b/be/src/common/logging.cc index 7e9e4f7..297bedb 100644 --- a/be/src/common/logging.cc +++ b/be/src/common/logging.cc @@ -32,7 +32,7 @@ #include #include "common/logging.h" -#include "service/impala-server.h" +#include "util/container-util.h" #include "util/debug-util.h" #include "util/error-util.h" #include "util/logging-support.h" @@ -45,7 +45,6 @@ DECLARE_string(redaction_rules_file); DECLARE_string(log_filename); DECLARE_bool(redirect_stdout_stderr); -D
[impala] branch master updated (7295edc -> 0815a18)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from 7295edc IMPALA-9680: Fixed compressed inserts failing new 7d260b6 IMPALA-9727: Fix HBaseScanNode explain formatting new 2dd20dc IMPALA-9585: [DOCS] update mt_dop docs new a93f2c2 IMPALA-8205: Support number of true and false statistics for boolean column new 0815a18 IMPALA-9714: Fix edge cases in SimpleLogger and add test The 4 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: be/src/common/init.cc |6 - be/src/common/logging.cc | 16 +- be/src/common/logging.h|5 - be/src/exec/catalog-op-executor.cc |7 +- be/src/exec/incr-stats-util-test.cc| 20 +- be/src/exec/incr-stats-util.cc | 46 +- be/src/exec/incr-stats-util.h | 18 +- be/src/service/impala-server.cc|8 +- be/src/service/impala-server.h |3 - be/src/util/CMakeLists.txt |2 + be/src/util/filesystem-util-test.cc| 37 + be/src/util/filesystem-util.cc |9 +- be/src/util/filesystem-util.h |7 +- be/src/util/logging-support.cc | 15 +- be/src/util/simple-logger-test.cc | 290 +++ be/src/util/simple-logger.cc | 56 +- be/src/util/simple-logger.h| 14 +- common/thrift/CatalogObjects.thrift|8 + docs/topics/impala_mt_dop.xml | 39 +- .../impala/analysis/AlterTableSetColumnStats.java |8 +- .../apache/impala/analysis/ComputeStatsStmt.java | 11 + .../org/apache/impala/catalog/ColumnStats.java | 54 +- .../org/apache/impala/planner/HBaseScanNode.java |6 +- .../java/org/apache/impala/service/Frontend.java | 13 +- .../org/apache/impala/analysis/ParserTest.java |3 + .../org/apache/impala/catalog/CatalogTest.java |2 + .../queries/PlannerTest/hbase.test | 24 +- .../queries/QueryTest/acid-compute-stats.test | 22 +- .../QueryTest/alter-table-set-column-stats.test| 120 +- .../queries/QueryTest/alter-table.test | 12 +- .../QueryTest/compute-stats-avro-catalog-v2.test | 264 +-- .../queries/QueryTest/compute-stats-avro.test | 262 +-- .../queries/QueryTest/compute-stats-date.test | 32 +- .../queries/QueryTest/compute-stats-decimal.test | 24 +- .../QueryTest/compute-stats-incremental.test | 172 +- .../queries/QueryTest/compute-stats.test | 2446 ++-- .../QueryTest/hbase-compute-stats-incremental.test | 40 +- .../queries/QueryTest/hbase-compute-stats.test | 104 +- .../queries/QueryTest/hbase-show-stats.test| 30 +- .../queries/QueryTest/show-stats.test | 64 +- .../queries/QueryTest/truncate-table.test | 76 +- 41 files changed, 2449 insertions(+), 1946 deletions(-) create mode 100644 be/src/util/simple-logger-test.cc
[impala] 01/04: IMPALA-9727: Fix HBaseScanNode explain formatting
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 7d260b602895280fab1a1a543a3e9700493febbd Author: Shant Hovsepian AuthorDate: Fri Apr 10 16:01:51 2020 -0400 IMPALA-9727: Fix HBaseScanNode explain formatting In the case with more than one hbase predicate the indentation level wasn't correctly formatted in the explain string. Instead of: | | 13:SCAN HBASE [default.dimension d] | | hbase filters: | | d:foo EQUAL '1' | | d:bar EQUAL '2' | | d:baz EQUAL '3' | | predicate: This was produced: | | 13:SCAN HBASE [default.dimension d] | | hbase filters: d:foo EQUAL '1' d:bar EQUAL '2' d:baz EQUAL '3' | | predicate: Change-Id: I30fad791408a1f7e35e9b3f2e6cb4958952dd567 Reviewed-on: http://gerrit.cloudera.org:8080/15749 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- .../org/apache/impala/planner/HBaseScanNode.java | 6 +++--- .../queries/PlannerTest/hbase.test | 24 +- 2 files changed, 17 insertions(+), 13 deletions(-) diff --git a/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java b/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java index bcfbc00..999c5bf 100644 --- a/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java +++ b/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java @@ -585,9 +585,9 @@ public class HBaseScanNode extends ScanNode { } else { for (int i = 0; i < filters_.size(); ++i) { THBaseFilter filter = filters_.get(i); -output.append("\n " + filter.family + ":" + filter.qualifier + " " + -CompareFilter.CompareOp.values()[filter.op_ordinal].toString() + " " + -"'" + filter.filter_constant + "'"); +output.append("\n" + detailPrefix + filter.family + ":" + filter.qualifier ++ " " + CompareFilter.CompareOp.values()[filter.op_ordinal].toString() ++ " " + "'" + filter.filter_constant + "'"); } } output.append('\n'); diff --git a/testdata/workloads/functional-planner/queries/PlannerTest/hbase.test b/testdata/workloads/functional-planner/queries/PlannerTest/hbase.test index 886fb05..5a26b0f 100644 --- a/testdata/workloads/functional-planner/queries/PlannerTest/hbase.test +++ b/testdata/workloads/functional-planner/queries/PlannerTest/hbase.test @@ -690,6 +690,7 @@ from functional_hbase.alltypessmall a, functional_hbase.alltypessmall c where + b.string_col > '1' and b.string_col < '3000' and b.bool_col = false and c.month = 4 and a.int_col = b.int_col and @@ -698,20 +699,23 @@ where PLAN-ROOT SINK | 04:HASH JOIN [INNER JOIN] -| hash predicates: a.int_col = b.int_col -| row-size=29B cardinality=300 +| hash predicates: b.int_col = c.int_col +| row-size=42B cardinality=130 | -|--00:SCAN HBASE [functional_hbase.alltypessmall b] -| predicates: b.bool_col = FALSE -| row-size=9B cardinality=25 +|--02:SCAN HBASE [functional_hbase.alltypessmall c] +| predicates: c.`month` = 4 +| row-size=12B cardinality=13 | 03:HASH JOIN [INNER JOIN] -| hash predicates: a.int_col = c.int_col -| row-size=20B cardinality=120 +| hash predicates: a.int_col = b.int_col +| row-size=30B cardinality=40 | -|--02:SCAN HBASE [functional_hbase.alltypessmall c] -| predicates: c.`month` = 4 -| row-size=12B cardinality=12 +|--00:SCAN HBASE [functional_hbase.alltypessmall b] +| hbase filters: +| d:string_col GREATER '1' +| d:string_col LESS '3000' +| predicates: b.string_col > '1', b.bool_col = FALSE, b.string_col < '3000' +| row-size=22B cardinality=4 | 01:SCAN HBASE [functional_hbase.alltypessmall a] row-size=8B cardinality=50
[impala] 01/03: IMPALA-9570: [DOCS] add memory management
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 4846751c84b70673ea25cd88e8c9d2085f7ae55e Author: Shajini Thayasingh AuthorDate: Wed Apr 29 15:02:53 2020 -0700 IMPALA-9570: [DOCS] add memory management add memory management and fix broken links. Incorporated review changes. Change-Id: I6e8b6d0c3fe2e1746831665b3d3ae98a0beaa1e7 Reviewed-on: http://gerrit.cloudera.org:8080/15836 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- docs/impala_keydefs.ditamap | 4 ++-- docs/topics/impala_udf.xml | 8 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/docs/impala_keydefs.ditamap b/docs/impala_keydefs.ditamap index d133c0d..594fa4d 100644 --- a/docs/impala_keydefs.ditamap +++ b/docs/impala_keydefs.ditamap @@ -69,13 +69,13 @@ under the License. impala-udf-samples - + https://github.com/cloudera/impala-udf-samples/blob/master/uda-sample.cc; scope="external" format="html" keys="uda-sample.cc"> uda-sample.cc udf-sample.h - + https://github.com/cloudera/impala-udf-samples/blob/master/uda-sample.h; scope="external" format="html" keys="uda-sample.h"> uda-sample.h https://github.com/apache/impala/blob/master/be/src/testutil/test-udas.cc; scope="external" format="html" keys="test-udas.cc"> diff --git a/docs/topics/impala_udf.xml b/docs/topics/impala_udf.xml index 8d6c382..60735a4 100644 --- a/docs/topics/impala_udf.xml +++ b/docs/topics/impala_udf.xml @@ -920,6 +920,14 @@ within UDAs, you can return without specifying a value. + Intermediate values returned by the init, update and merge functions that referred to allocations + must be allocated using FunctionContext::Allocate() and freed using FunctionContext::Free(). + Both serialize and finalize functions are responsible for cleaning up the intermediate value and freeing such allocations. + StringVals returned to Impala directly by Serialize(), Finalize() or GetValue() functions should be backed by + temporary results memory allocated using the StringVal(FunctionContext*, int) constructor, + StringVal::CopyFrom(FunctionContext*, const uint8_t*, size_t), or StringVal::Resize(). + + In the SQL syntax, you create a UDAF by using the statement CREATE AGGREGATE FUNCTION. You specify the entry points of the underlying C++ functions using the clauses INIT_FN, UPDATE_FN, MERGE_FN, SERIALIZE_FN, and
[impala] branch master updated (f4f7fb5 -> dcf4979)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from f4f7fb5 IMPALA-9729: consistent GetExecSummary() behaviour new 4846751 IMPALA-9570: [DOCS] add memory management new e8d1794 IMPALA-9716: Add jitter to the exponential backoff in status reporting new dcf4979 IMPALA-9736: fix mt_dop not supported error The 3 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: be/src/runtime/query-state.cc | 10 +- docs/impala_keydefs.ditamap| 4 ++-- docs/topics/impala_udf.xml | 8 fe/src/main/java/org/apache/impala/planner/Planner.java| 3 +-- .../queries/PlannerTest/mt-dop-validation.test | 4 ++-- 5 files changed, 22 insertions(+), 7 deletions(-)
[impala] branch master updated: Revert "IMPALA-9718: Delete pkg_resources from IMPALA_HOME/shell/"
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new 0a0001e Revert "IMPALA-9718: Delete pkg_resources from IMPALA_HOME/shell/" 0a0001e is described below commit 0a0001e1a85462c81c9c4617a2e864c98913f229 Author: Joe McDonnell AuthorDate: Thu May 7 10:03:45 2020 -0700 Revert "IMPALA-9718: Delete pkg_resources from IMPALA_HOME/shell/" The fix for IMPALA-9718 introduced test failures on Centos 7. See IMPALA-9735. This reverts commit 75d98b4b081df95b58d7388da39bb1ec7c2f4f67. Change-Id: Id09c55435f432a8626a45079f58860d6e27ac55e Reviewed-on: http://gerrit.cloudera.org:8080/15881 Reviewed-by: Tim Armstrong Tested-by: Joe McDonnell --- LICENSE.txt |1 + shell/make_shell_tarball.sh |1 + shell/pkg_resources.py | 2700 +++ 3 files changed, 2702 insertions(+) diff --git a/LICENSE.txt b/LICENSE.txt index b4cedd9..c76c157 100644 --- a/LICENSE.txt +++ b/LICENSE.txt @@ -396,6 +396,7 @@ www/DataTables* and www/datatables*: MIT license +shell/pkg_resources.py: Python Software License V2 Parts of be/src/runtime/string-search.h: Python Software License V2 Parts of shell/impala_shell.py: Python Software License V2 shell/ext-py/bitarray*: Python Software License V2 diff --git a/shell/make_shell_tarball.sh b/shell/make_shell_tarball.sh index 6626e15..d5c0c2c 100755 --- a/shell/make_shell_tarball.sh +++ b/shell/make_shell_tarball.sh @@ -128,6 +128,7 @@ cp ${SHELL_HOME}/TSSLSocketWithWildcardSAN.py ${TARBALL_ROOT}/lib cp ${SHELL_HOME}/ImpalaHttpClient.py ${TARBALL_ROOT}/lib cp ${SHELL_HOME}/shell_exceptions.py ${TARBALL_ROOT}/lib cp ${SHELL_HOME}/shell_output.py ${TARBALL_ROOT}/lib +cp ${SHELL_HOME}/pkg_resources.py ${TARBALL_ROOT}/lib cp ${SHELL_HOME}/impala-shell ${TARBALL_ROOT} cp ${SHELL_HOME}/impala_shell.py ${TARBALL_ROOT} cp ${SHELL_HOME}/compatibility.py ${TARBALL_ROOT} diff --git a/shell/pkg_resources.py b/shell/pkg_resources.py new file mode 100644 index 000..70ecc44 --- /dev/null +++ b/shell/pkg_resources.py @@ -0,0 +1,2700 @@ +from __future__ import print_function, unicode_literals + +""" + This file is redistributed under the Python Software Foundation License: + http://docs.python.org/2/license.html +""" + +"""Package resource API + + +A resource is a logical file contained within a package, or a logical +subdirectory thereof. The package resource API expects resource names +to have their path parts separated with ``/``, *not* whatever the local +path separator is. Do not use os.path operations to manipulate resource +names being passed into the API. + +The package resource API is designed to work with normal filesystem packages, +.egg files, and unpacked .egg files. It can also work in a limited way with +.zip files and with custom PEP 302 loaders that support the ``get_data()`` +method. +""" + +import sys, os, zipimport, time, re, imp, types +from urlparse import urlparse, urlunparse + +try: +frozenset +except NameError: +from sets import ImmutableSet as frozenset + +# capture these to bypass sandboxing +from os import utime +try: +from os import mkdir, rename, unlink +WRITE_SUPPORT = True +except ImportError: +# no write support, probably under GAE +WRITE_SUPPORT = False + +from os import open as os_open +from os.path import isdir, split + +# This marker is used to simplify the process that checks is the +# setuptools package was installed by the Setuptools project +# or by the Distribute project, in case Setuptools creates +# a distribution with the same version. +# +# The bootstrapping script for instance, will check if this +# attribute is present to decide wether to reinstall the package +_distribute = True + +def _bypass_ensure_directory(name, mode=0777): +# Sandbox-bypassing version of ensure_directory() +if not WRITE_SUPPORT: +raise IOError('"os.mkdir" not supported on this platform.') +dirname, filename = split(name) +if dirname and filename and not isdir(dirname): +_bypass_ensure_directory(dirname) +mkdir(dirname, mode) + + + + + + + + +def get_supported_platform(): +"""Return this platform's maximum compatible version. + +distutils.util.get_platform() normally reports the minimum version +of Mac OS X that would be required to *use* extensions produced by +distutils. But what we want when checking compatibility is to know the +version of Mac OS X that we are *running*. To allow usage of packages that +explicitly require a newer version of Mac OS X, we must also know
[impala] branch master updated: IMPALA-9731: Remove USE_CDP_HIVE=false and Hive 2 support
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new f241fd0 IMPALA-9731: Remove USE_CDP_HIVE=false and Hive 2 support f241fd0 is described below commit f241fd08ac97a9c20a3c97a86f45b9ba5e7ec2fb Author: Joe McDonnell AuthorDate: Wed May 6 18:25:13 2020 -0700 IMPALA-9731: Remove USE_CDP_HIVE=false and Hive 2 support Impala 4 moved to using CDP versions for components, which involves adopting Hive 3. This removes the old code supporting CDH components and Hive 2. Specifically, it does the following: 1. Remove USE_CDP_HIVE and default to the values from USE_CDP_HIVE=true. USE_CDP_HIVE now has no effect on the Impala environment. This also means that bin/jenkins/build-all-flag-combinations.sh no longer include USE_CDP_HIVE=false as a configuration. 2. Remove USE_CDH_KUDU and default to getting Impala from the native toolchain. 3. Ban IMPALA_HIVE_MAJOR_VERSION<3 and remove related code, including the IMPALA_HIVE_MAJOR_VERSION=2 maven profile in fe/pom.xml. There is a fair amount of code that still references the Hive major version. Upstream Hive is now working on Hive 4, so there is a high likelihood that we'll need some code to deal with that transition. This leaves some code (such as maven profiles) and test logic in place. Change-Id: Id85e849beaf4e19dda4092874185462abd2ec608 Reviewed-on: http://gerrit.cloudera.org:8080/15869 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- README-build.md| 10 +- bin/bootstrap_toolchain.py | 100 +--- bin/impala-config.sh | 135 ++--- bin/jenkins/build-all-flag-combinations.sh | 11 +- fe/pom.xml | 239 - .../hadoop/hive/common/ValidWriteIdList.java | 74 --- .../org/apache/impala/compat/MetastoreShim.java| 556 - testdata/bin/create-load-data.sh | 5 - testdata/bin/run-hive-server.sh| 30 +- testdata/cluster/admin | 15 +- .../common/etc/hadoop/conf/core-site.xml.py| 14 +- 11 files changed, 81 insertions(+), 1108 deletions(-) diff --git a/README-build.md b/README-build.md index 1297b86..c604716 100644 --- a/README-build.md +++ b/README-build.md @@ -29,7 +29,7 @@ can do so through the environment variables and scripts listed below. | SKIP_TOOLCHAIN_BOOTSTRAP | "false" | Skips downloading the toolchain any python dependencies if "true" | | CDH_BUILD_NUMBER | | Identifier to indicate the CDH build number | CDH_COMPONENTS_HOME | "${IMPALA_HOME}/toolchain/cdh_components-${CDH_BUILD_NUMBER}" | Location of the CDH components within the toolchain. | -| CDH_MAJOR_VERSION | "5" | Identifier used to uniqueify paths for potentially incompatible component builds. | +| CDH_MAJOR_VERSION | "7" | Identifier used to uniqueify paths for potentially incompatible component builds. | | IMPALA_CONFIG_SOURCED | "1" | Set by ${IMPALA_HOME}/bin/impala-config.sh (internal use) | | JAVA_HOME | "/usr/lib/jvm/${JAVA_VERSION}" | Used to locate Java | | JAVA_VERSION | "java-7-oracle-amd64" | Can override to set a local Java version. | @@ -59,11 +59,11 @@ can do so through the environment variables and scripts listed below. ## Dependencies | Environment variable | Default value | Description | |--|---|-| -| HADOOP_HOME | "${CDH_COMPONENTS_HOME}/hadoop-${IMPALA_HADOOP_VERSION}/" | Used to locate Hadoop | +| HADOOP_HOME | "${CDP_COMPONENTS_HOME}/hadoop-${IMPALA_HADOOP_VERSION}/" | Used to locate Hadoop | | HADOOP_INCLUDE_DIR | "${HADOOP_HOME}/include" | For 'hdfs.h' | | HADOOP_LIB_DIR | "${HADOOP_HOME}/lib" | For 'libhdfs.a' or 'libhdfs.so' | -| HIVE_HOME| "${CDH_COMPONENTS_HOME}/{hive-${IMPALA_HIVE_VERSION}/" | | -| HBASE_HOME | "${CDH_COMPONENTS_HOME}/hbase-${IMPALA_HBASE_VERSION}/" | | -| SENTRY_HOME | "${CDH_COMPONENTS_HOME}/sentry-${IMPALA_SENTRY_VERSION}/" | Used to setup test data | +| HIVE_HOME| "${CDP_COMPONENTS_HOME}/{hive-${IMPALA_HIVE_VERSION}/" | | +| HBASE_HOME | "${CDP_COMPONENTS_HOME}/hbase-${IMPALA_HBASE_VERSION}/" | | +| SENTRY_HOME | "${CDP_COMPONENTS_HOME}/sentry-${IMPALA_SENTRY_VERSION}/" | Used to setup test data | | THRIFT_HOME | "${IMPALA_TOOLCHAIN}/thrift-${IMPALA_THRIFT_VERSION}" | | diff --git a/bin/bootstrap_too
[impala] branch master updated: IMPALA-9639: [DOCS] Document Impala support for Kudu DATE type
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git The following commit(s) were added to refs/heads/master by this push: new 39c5c4d IMPALA-9639: [DOCS] Document Impala support for Kudu DATE type 39c5c4d is described below commit 39c5c4d01db7ad60d21d8df6b681738a3f3b09b1 Author: Kris Hahn AuthorDate: Thu Apr 9 16:51:15 2020 -0700 IMPALA-9639: [DOCS] Document Impala support for Kudu DATE type Documented read/write support for DATE type in 3.4. Made review changes. Change-Id: I865599587817358b0c94debfcb0e9644fab4ae00 Reviewed-on: http://gerrit.cloudera.org:8080/15702 Tested-by: Impala Public Jenkins Reviewed-by: Tamas Mate Reviewed-by: Thomas Tauber-Marshall --- docs/topics/impala_date.xml | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/topics/impala_date.xml b/docs/topics/impala_date.xml index 556936c..8fa4561 100644 --- a/docs/topics/impala_date.xml +++ b/docs/topics/impala_date.xml @@ -42,8 +42,8 @@ under the License. Use the DATE data type to store date values. The -DATE type is supported for HBase, Text, Avro, and - Parquet. +DATE type is supported for Avro, HBase, Kudu, Parquet, + and Text. Range: @@ -199,6 +199,8 @@ under the License. The DATE type is available in Impala 3.3 and higher. + +In Impala 3.4, you can read and write DATE values to Kudu tables.
[impala] 01/02: IMPALA-9539: Enable CNF rewrites by default
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit d0325b2ac176d536fc9c8959b1f2d01335bc32a2 Author: Aman Sinha AuthorDate: Fri Apr 24 17:32:38 2020 -0700 IMPALA-9539: Enable CNF rewrites by default This patch enables the conjunctive normal form rewrites by default by setting enable_cnf_rewrites to true. Since the CNF rule does an explicit analyze of the predicate if it was not previously analyzed, in case no rewrite was done we were previously returning the analyzed predicate. This causes some side effects hence I have fixed it by returning the original un-analyzed predicate when no rewrite is done. Other functional and performance testing with this flag set to true did not uncover major regressions and showed significant performance gains for queries with disjunctions in the tpch and tpcds suites. Testing: - Updated the PlannerTest tests with plan changes in various test suites. Removed previously added tpch tests which were explicitly setting this flag to true. - I had previously added a test in convert-to-cnf.test with enable_cnf_rewrites=false, so I did not add any new tests with this flag disabled. Change-Id: I4dde86e092c61d71ddf9081f768072ced470b589 Reviewed-on: http://gerrit.cloudera.org:8080/15807 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- common/thrift/ImpalaInternalService.thrift | 2 +- .../apache/impala/rewrite/ConvertToCNFRule.java| 6 +- .../org/apache/impala/planner/PlannerTest.java | 6 +- .../queries/PlannerTest/constant-folding.test | 23 +- .../queries/PlannerTest/tpcds-all.test | 4 +- .../queries/PlannerTest/tpch-all.test | 376 ++--- .../queries/PlannerTest/tpch-kudu.test | 70 ++-- .../queries/PlannerTest/tpch-nested.test | 220 ++-- .../queries/PlannerTest/tpch-views.test| 222 ++-- 9 files changed, 386 insertions(+), 543 deletions(-) diff --git a/common/thrift/ImpalaInternalService.thrift b/common/thrift/ImpalaInternalService.thrift index 1ca85b4..d447d69 100644 --- a/common/thrift/ImpalaInternalService.thrift +++ b/common/thrift/ImpalaInternalService.thrift @@ -412,7 +412,7 @@ struct TQueryOptions { 99: optional i64 preagg_bytes_limit = -1; // See comment in ImpalaService.thrift - 100: optional bool enable_cnf_rewrites = false; + 100: optional bool enable_cnf_rewrites = true; // See comment in ImpalaService.thrift 101: optional i32 max_cnf_exprs = 0; diff --git a/fe/src/main/java/org/apache/impala/rewrite/ConvertToCNFRule.java b/fe/src/main/java/org/apache/impala/rewrite/ConvertToCNFRule.java index 9b95f1a..0925fd5 100644 --- a/fe/src/main/java/org/apache/impala/rewrite/ConvertToCNFRule.java +++ b/fe/src/main/java/org/apache/impala/rewrite/ConvertToCNFRule.java @@ -110,11 +110,15 @@ public class ConvertToCNFRule implements ExprRewriteRule { // we can skip the rewrite since the disjunct can be pushed down as-is List tids = new ArrayList<>(); if (!cpred.isAnalyzed()) { + // clone before analyzing to avoid side effects of analysis + cpred = (CompoundPredicate) (cpred.clone()); cpred.analyzeNoThrow(analyzer); } cpred.getIds(tids, null); if (tids.size() <= 1) { - return cpred; + // if no transform is done, return the original predicate, + // not the one that that may have been analyzed above + return pred; } } if (cpred.getOp() == CompoundPredicate.Operator.OR) { diff --git a/fe/src/test/java/org/apache/impala/planner/PlannerTest.java b/fe/src/test/java/org/apache/impala/planner/PlannerTest.java index c452332..d0cbb3f 100644 --- a/fe/src/test/java/org/apache/impala/planner/PlannerTest.java +++ b/fe/src/test/java/org/apache/impala/planner/PlannerTest.java @@ -129,7 +129,7 @@ public class PlannerTest extends PlannerTestBase { } @Test - public void testConstantPropagataion() { + public void testConstantPropagation() { runPlannerTestFile("constant-propagation"); } @@ -1016,8 +1016,6 @@ public class PlannerTest extends PlannerTestBase { */ @Test public void testConvertToCNF() { -TQueryOptions options = new TQueryOptions(); -options.setEnable_cnf_rewrites(true); -runPlannerTestFile("convert-to-cnf", "tpch_parquet", options); +runPlannerTestFile("convert-to-cnf", "tpch_parquet"); } } diff --git a/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test b/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test index 0c9..6eeb8c0 100644 --- a/testdat
[impala] branch master updated (1a36a03 -> 53ff6f9)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from 1a36a03 IMPALA-9398: Fix shell history duplication when cmdloop breaks new d0325b2 IMPALA-9539: Enable CNF rewrites by default new 53ff6f9 IMPALA-9649: Exclude shiro* and add to banned dependency maven plugin The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: common/thrift/ImpalaInternalService.thrift | 2 +- fe/pom.xml | 50 +- .../apache/impala/rewrite/ConvertToCNFRule.java| 6 +- .../impala/authorization/AuthorizationTest.java| 111 .../org/apache/impala/planner/PlannerTest.java | 6 +- .../TestSentryResourceAuthorizationProvider.java | 36 -- .../queries/PlannerTest/constant-folding.test | 23 +- .../queries/PlannerTest/tpcds-all.test | 4 +- .../queries/PlannerTest/tpch-all.test | 376 -- .../queries/PlannerTest/tpch-kudu.test | 70 +-- .../queries/PlannerTest/tpch-nested.test | 220 .../queries/PlannerTest/tpch-views.test| 222 tests/authorization/test_authorization.py | 29 -- tests/authorization/test_grant_revoke.py | 47 -- tests/authorization/test_owner_privileges.py | 571 - tests/authorization/test_sentry.py | 53 -- tests/authorization/test_show_grant.py | 150 -- 17 files changed, 429 insertions(+), 1547 deletions(-) delete mode 100644 fe/src/test/java/org/apache/impala/testutil/TestSentryResourceAuthorizationProvider.java delete mode 100644 tests/authorization/test_owner_privileges.py delete mode 100644 tests/authorization/test_show_grant.py
[impala] 02/02: IMPALA-9649: Exclude shiro* and add to banned dependency maven plugin
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 53ff6f9bf5ca907f15ec1187eb5d4007d46eb61e Author: David Knupp AuthorDate: Thu Apr 23 17:06:42 2020 -0700 IMPALA-9649: Exclude shiro* and add to banned dependency maven plugin The earlier attempt to exclude the shiro-core and shiro-crypto-cipher jars from fe/pom.xml failed to find all instances, and security scans picked them up again. This patch also excludes the jar from the following: - sentry-core-common - sentry-provider-cache - sentry-provider-db - sentry-provider-file Furthermore, to avoid compilation errors related to the absense of shiro, it was necessary to remove the TestSentryResourceAuthorizationProvider class, and any tests that referenced it. Since Sentry is not being used any longer, this shouldn't be an issue. Tested by running build, which didn't fail from banned dependency plugin, as well running the standard set of tests on jenkins.impala.io. Change-Id: I9f9994bf81c1d2e025a03925e8eccb147c34d66e Reviewed-on: http://gerrit.cloudera.org:8080/15796 Reviewed-by: Joe McDonnell Tested-by: Impala Public Jenkins --- fe/pom.xml | 50 +- .../impala/authorization/AuthorizationTest.java| 111 .../TestSentryResourceAuthorizationProvider.java | 36 -- tests/authorization/test_authorization.py | 29 -- tests/authorization/test_grant_revoke.py | 47 -- tests/authorization/test_owner_privileges.py | 571 - tests/authorization/test_sentry.py | 53 -- tests/authorization/test_show_grant.py | 150 -- 8 files changed, 43 insertions(+), 1004 deletions(-) diff --git a/fe/pom.xml b/fe/pom.xml index 61e26a2..94c0cf8 100644 --- a/fe/pom.xml +++ b/fe/pom.xml @@ -192,10 +192,17 @@ under the License. + org.apache.impala + yarn-extras + ${yarn-extras.version} + + + org.apache.sentry sentry-core-common ${sentry.version} + org.apache.shiro shiro-crypto-cipher @@ -208,12 +215,6 @@ under the License. - org.apache.impala - yarn-extras - ${yarn-extras.version} - - - org.apache.sentry sentry-core-model-db ${sentry.version} @@ -237,6 +238,16 @@ under the License. sentry-provider-db ${sentry.version} + + + org.apache.shiro + shiro-crypto-cipher + + + org.apache.shiro + shiro-core + + net.minidev @@ -269,6 +280,17 @@ under the License. org.apache.sentry sentry-provider-file ${sentry.version} + + + + org.apache.shiro + shiro-crypto-cipher + + + org.apache.shiro + shiro-core + + @@ -276,6 +298,16 @@ under the License. sentry-provider-cache ${sentry.version} + + + org.apache.shiro + shiro-crypto-cipher + + + org.apache.shiro + shiro-core + + net.minidev @@ -349,7 +381,7 @@ under the License. org.apache.hadoop hadoop-common - + org.apache.hive * @@ -758,6 +790,9 @@ under the License. org.fusesource.leveldbjni:* org.apache.httpcomponents:fluent-hc + +org.apache.shiro:shiro-core:* +org.apache.shiro:shiro-crypto-cipher:* org.apache.hadoop:* @@ -1380,6 +1415,7 @@ under the License. javax.el 3.0.1-b08 + diff --git a/fe/src/test/java/org/apache/impala/authorization/AuthorizationTest.java b/fe/src/test/java/org/apache/impala/authorization/AuthorizationTest.java index 620246f..fda304d 100644 --- a/fe/src/test/java/org/apache/impala/authorization/AuthorizationTest.java +++ b/fe/src/test/java/org/apache/impala/authorization/AuthorizationTest.java @@ -46,7 +46,6 @@ import org.apache.impala.common.ImpalaException; import org.apache.impala.common.InternalException; import org.apache.impala.common.RuntimeEnv; import org.apache.impala.testutil.TestSentryGroupMapper; -import org.apache.impala.testutil.TestSentryResourceAuthorizationProvider; import org.apache.impala.service.Frontend; import org.apache.impala.testutil.ImpaladTestCatalog; import org.apache.impala.thrift.TMetadataOpRequest; @@ -519,79 +518,6 @@ public class AuthorizationTest extends FrontendTestBase { } @Test - public void
[impala] 02/02: IMPALA-9701: fix data race in BTS
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit f4258b5f971f90390b93aa7a2e76dd0b8a1d8825 Author: Tim Armstrong AuthorDate: Mon Apr 27 16:52:40 2020 -0700 IMPALA-9701: fix data race in BTS A benign data race in BufferedTupleStream was flagged by TSAN. Testing: Reran the unit test under TSAN, it succeeded. Change-Id: Ie2c4464adbc51bb8b0214ba0adbfa71217b87c86 Reviewed-on: http://gerrit.cloudera.org:8080/15826 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/runtime/buffered-tuple-stream.cc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/be/src/runtime/buffered-tuple-stream.cc b/be/src/runtime/buffered-tuple-stream.cc index a35a89e..1ca8a92 100644 --- a/be/src/runtime/buffered-tuple-stream.cc +++ b/be/src/runtime/buffered-tuple-stream.cc @@ -1079,7 +1079,9 @@ void BufferedTupleStream::ReadIterator::Init(bool attach_on_read) { valid_ = true; rows_returned_ = 0; DCHECK(!attach_on_read_) << "attach_on_read can only be set once"; - attach_on_read_ = attach_on_read; + // Only set 'attach_on_read' if needed. Otherwise, if this is the builtin + // iterator, a benign data race may be flagged by TSAN (see IMPALA-9701). + if (attach_on_read) attach_on_read_ = attach_on_read; } void BufferedTupleStream::ReadIterator::SetReadPage(list::iterator read_page) {
[impala] branch master updated (afe765e -> f4258b5)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from afe765e Don't filter maven messages about banned dependencies new 75a6d7b IMPALA-9097: Don't require minicluster for backend tests new f4258b5 IMPALA-9701: fix data race in BTS The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: be/src/runtime/buffered-tuple-stream.cc| 4 +++- be/src/service/frontend.cc | 8 ++-- .../java/org/apache/impala/service/Frontend.java | 23 ++ .../org/apache/impala/service/JniFrontend.java | 5 +++-- 4 files changed, 27 insertions(+), 13 deletions(-)
[impala] 01/02: IMPALA-9097: Don't require minicluster for backend tests
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 75a6d7b2bba66825efb3a37f14c9447e64ea584f Author: Joe McDonnell AuthorDate: Fri Nov 22 17:58:16 2019 -0800 IMPALA-9097: Don't require minicluster for backend tests Currently, many backend tests require a running minicluster, because they initialize a Frontend object that requires a connection to the Hive Metastore. If the minicluster is not running or if cluster configurations are missing (i.e. bin/create-test-configurations.sh needs to run), the backend tests will fail. The docker based tests always hit this, because they run the backend tests without a minicluster. The HMS dependency comes from the Frontend's MetaStoreClientPool, which is unnecesary for backend tests. This modifies the code so that it does not initialize this for backend tests, and thus backend tests pass without a running minicluster. Testing: - Ran backend tests without a running minicluster Change-Id: I8f1b1385853fb23df28d24d38761237e6e5c97a7 Reviewed-on: http://gerrit.cloudera.org:8080/15641 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- be/src/service/frontend.cc | 8 ++-- .../java/org/apache/impala/service/Frontend.java | 23 ++ .../org/apache/impala/service/JniFrontend.java | 5 +++-- 3 files changed, 24 insertions(+), 12 deletions(-) diff --git a/be/src/service/frontend.cc b/be/src/service/frontend.cc index baf1089..346ff0c 100644 --- a/be/src/service/frontend.cc +++ b/be/src/service/frontend.cc @@ -25,6 +25,7 @@ #include "rpc/jni-thrift-util.h" #include "util/backend-gflag-util.h" #include "util/jni-util.h" +#include "util/test-info.h" #include "util/time.h" #include "common/names.h" @@ -81,7 +82,7 @@ DEFINE_string(kudu_master_hosts, "", "Specifies the default Kudu master(s). The Frontend::Frontend() { JniMethodDescriptor methods[] = { -{"", "([B)V", _ctor_}, +{"", "([BZ)V", _ctor_}, {"createExecRequest", "([B)[B", _exec_request_id_}, {"getExplainPlan", "([B)Ljava/lang/String;", _explain_plan_id_}, {"getHadoopConfig", "([B)[B", _hadoop_config_id_}, @@ -130,7 +131,10 @@ Frontend::Frontend() { jbyteArray cfg_bytes; ABORT_IF_ERROR(GetThriftBackendGflags(jni_env, _bytes)); - jobject fe = jni_env->NewObject(fe_class, fe_ctor_, cfg_bytes); + // Pass in whether this is a backend test, so that the Frontend can avoid certain + // unnecessary initialization that introduces dependencies on a running minicluster. + jboolean is_be_test = TestInfo::is_be_test(); + jobject fe = jni_env->NewObject(fe_class, fe_ctor_, cfg_bytes, is_be_test); ABORT_IF_EXC(jni_env); ABORT_IF_ERROR(JniUtil::LocalToGlobalRef(jni_env, fe, _)); } diff --git a/fe/src/main/java/org/apache/impala/service/Frontend.java b/fe/src/main/java/org/apache/impala/service/Frontend.java index 1715233..d4ed406 100644 --- a/fe/src/main/java/org/apache/impala/service/Frontend.java +++ b/fe/src/main/java/org/apache/impala/service/Frontend.java @@ -281,8 +281,9 @@ public class Frontend { private static ExecutorService checkAuthorizationPool_; - public Frontend(AuthorizationFactory authzFactory) throws ImpalaException { -this(authzFactory, FeCatalogManager.createFromBackendConfig()); + public Frontend(AuthorizationFactory authzFactory, boolean isBackendTest) + throws ImpalaException { +this(authzFactory, FeCatalogManager.createFromBackendConfig(), isBackendTest); } /** @@ -292,11 +293,12 @@ public class Frontend { @VisibleForTesting public Frontend(AuthorizationFactory authzFactory, FeCatalog testCatalog) throws ImpalaException { -this(authzFactory, FeCatalogManager.createForTests(testCatalog)); +// This signature is only used for frontend tests, so pass false for isBackendTest +this(authzFactory, FeCatalogManager.createForTests(testCatalog), false); } - private Frontend(AuthorizationFactory authzFactory, FeCatalogManager catalogManager) - throws ImpalaException { + private Frontend(AuthorizationFactory authzFactory, FeCatalogManager catalogManager, + boolean isBackendTest) throws ImpalaException { catalogManager_ = catalogManager; authzFactory_ = authzFactory; @@ -323,10 +325,15 @@ public class Frontend { impaladTableUsageTracker_ = ImpaladTableUsageTracker.createFromConfig( BackendConfig.INSTANCE); queryHookManager_ = QueryEventHookManager.createFromConfig(BackendConfig.INSTANCE); -metaStoreClientPool_ = new MetaStoreClientPool(1, 0); -if (MetastoreShim.getMajor
[impala] 02/02: Don't filter maven messages about banned dependencies
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit afe765e3bdf8facb1940f4c7620eb7f9084bcb1f Author: Joe McDonnell AuthorDate: Mon Apr 27 12:01:41 2020 -0700 Don't filter maven messages about banned dependencies The frontend build uses the maven-enforcer-plugin to ban some dependencies or require specific versions of dependencies. The messages look like: Found Banned Dependency: foo.bar.baz:1.2.3 These are currently filtered by bin/mvn-quiet.sh. This adds an exception for "Found Banned" so they are not filtered. Testing: - Ran on a branch with a known banned dependency and verified the output Change-Id: I24abe59ad6bffb28ac63d014aa0ec7388ef5478f Reviewed-on: http://gerrit.cloudera.org:8080/15820 Tested-by: Impala Public Jenkins Reviewed-by: David Knupp --- bin/mvn-quiet.sh | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/bin/mvn-quiet.sh b/bin/mvn-quiet.sh index cc673da..f782ff4 100755 --- a/bin/mvn-quiet.sh +++ b/bin/mvn-quiet.sh @@ -36,7 +36,8 @@ LOGGING_OPTIONS="-Dorg.slf4j.simpleLogger.showDateTime \ # Always use maven's batch mode (-B), as it produces output that is easier to parse. if ! mvn -B $IMPALA_MAVEN_OPTIONS $LOGGING_OPTIONS "$@" | \ - tee -a "$LOG_FILE" | grep -E -e WARNING -e ERROR -e SUCCESS -e FAILURE -e Test; then + tee -a "$LOG_FILE" | \ + grep -E -e WARNING -e ERROR -e SUCCESS -e FAILURE -e Test -e "Found Banned"; then echo "mvn $IMPALA_MAVEN_OPTIONS $@ exited with code $?" exit 1 fi
[impala] 01/02: IMPALA-9613: [DOCS] Document the data_cache_eviction_policy
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 961519747bf75bc2ba8519d00757b8e176d14538 Author: Kris Hahn AuthorDate: Wed Apr 8 20:39:00 2020 -0700 IMPALA-9613: [DOCS] Document the data_cache_eviction_policy Describe start up flag to set LRU or LIRS policy. Tweak LIRS description. Change-Id: Ic46ae00549157535c12f761aff7747fc90249d98 Reviewed-on: http://gerrit.cloudera.org:8080/15694 Tested-by: Impala Public Jenkins Reviewed-by: Joe McDonnell --- docs/topics/impala_data_cache.xml | 11 +++ 1 file changed, 11 insertions(+) diff --git a/docs/topics/impala_data_cache.xml b/docs/topics/impala_data_cache.xml index fed4181..210b181 100644 --- a/docs/topics/impala_data_cache.xml +++ b/docs/topics/impala_data_cache.xml @@ -85,6 +85,17 @@ under the License. --data_cache=/data/0,/data/1:500GB + In Impala 3.4 and higher, you can configure one of the following cache eviction policies for + the data cache: +LRU (Least Recently Used--the default) +LIRS (Inter-referenece Recency Set) + LIRS is a scan-resistent, low performance-overhead policy. You configure a cache + eviction policy using the --data_cache_eviction_policy Impala Daemon start-up + flag: + +--data_cache_eviction_policy=policy + +
[impala] branch master updated (c4ac9d2 -> afe765e)
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/impala.git. from c4ac9d2 Revert "IMPALA-9648: Exclude netty and netty-all from hadoop-hdfs mvn download" new 9615197 IMPALA-9613: [DOCS] Document the data_cache_eviction_policy new afe765e Don't filter maven messages about banned dependencies The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: bin/mvn-quiet.sh | 3 ++- docs/topics/impala_data_cache.xml | 11 +++ 2 files changed, 13 insertions(+), 1 deletion(-)
[impala] 02/03: Revert "IMPALA-9648: Don't ban netty 3* from fe/pom.xml"
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit 326ef554caef75c50a1394240766b501e5a699c3 Author: David Knupp AuthorDate: Mon Apr 27 15:38:49 2020 -0700 Revert "IMPALA-9648: Don't ban netty 3* from fe/pom.xml" This patch was leading to CI builds failing in some environments. This reverts commit f129a179a2c1b304e4d15fe4950449c5786abda1. Change-Id: I4f38cab4deb0d9457d50d1e1a899af4cb90d3c24 Reviewed-on: http://gerrit.cloudera.org:8080/15824 Reviewed-by: David Knupp Tested-by: David Knupp --- fe/pom.xml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fe/pom.xml b/fe/pom.xml index 59d6eeb..70d17d6 100644 --- a/fe/pom.xml +++ b/fe/pom.xml @@ -758,6 +758,8 @@ under the License. org.fusesource.leveldbjni:* org.apache.httpcomponents:fluent-hc + +io.netty:netty:[3.10.6,) io.netty:netty-all:[4.1.46,)
[impala] 01/03: IMPALA-9640: [DOCS] Document Impala support for Kudu VARCHAR type
This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git commit b9e84738b68710e00afb7efb3cf0c25c0582f7f8 Author: Kris Hahn AuthorDate: Thu Apr 9 17:37:42 2020 -0700 IMPALA-9640: [DOCS] Document Impala support for Kudu VARCHAR type Removed VARCHAR from unsupported types in "Kudu considerations". Change-Id: I61ad6982c35a009b15a2a082692f118a0fbcee65 Reviewed-on: http://gerrit.cloudera.org:8080/15703 Tested-by: Impala Public Jenkins Reviewed-by: Tamas Mate Reviewed-by: Joe McDonnell --- docs/shared/impala_common.xml | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/docs/shared/impala_common.xml b/docs/shared/impala_common.xml index cdf25d8..6b0e812 100644 --- a/docs/shared/impala_common.xml +++ b/docs/shared/impala_common.xml @@ -4509,10 +4509,9 @@ sudo pip-python install ssl Currently, the INSERT OVERWRITE syntax cannot be used with Kudu tables. - -Currently, the data types CHAR, VARCHAR, -ARRAY, MAP, and STRUCT cannot be used -with Kudu tables. + Currently, the data types +CHAR, ARRAY, MAP, and + STRUCT cannot be used with Kudu tables.