from:"joemcdonnell"

[impala] 01/02: IMPALA-9711: incrementally update aggregate profile

2020-10-03 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit e60292fb3bd71f25b90119d0d48292f4c49e158f
Author: Tim Armstrong 
AuthorDate: Fri May 15 10:02:15 2020 -0700

IMPALA-9711: incrementally update aggregate profile

In order to not cause additional work in the default mode,
we still only compute the average once per instance,
when it completes or when the query finishes.

When --gen_experimental_profile=true, we update the aggregated
profile for each status report, so that the live profile
can be viewed as the query executes.

The implications of this are as follows:
* More work is done on the KRPC control service RPC thread
  (although this is largely moot after part 2 of IMPALA-9382
   where we merge into the aggregated profile directly,
   so avoid the extra update).
* For complex multi-stage queries, the profile merging
  work is done earlier as each stage completes, therefore
  the critical path of the query is shortened
* Multiple RPC threads may be merging profiles concurrently
* Multiple threads may be calling AggregatedRuntimeProfile::Update()
  on the same profile, whereas previously all merging was done by
  a single thread. I looked through the locking in that function to
  check correctness.

Testing:
Ran core tests.

Ran a subset of the Python tests under TSAN, confirmed no races
were introduced in this code.

Change-Id: Ib03e79a40a33d8e74464640ae5f95a1467a6713a
Reviewed-on: http://gerrit.cloudera.org:8080/15931
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 
---
 be/src/runtime/coordinator-backend-state.cc | 71 -
 be/src/runtime/coordinator-backend-state.h  | 36 +--
 be/src/runtime/coordinator.cc   |  4 +-
 be/src/util/runtime-profile.cc  |  1 +
 4 files changed, 85 insertions(+), 27 deletions(-)

diff --git a/be/src/runtime/coordinator-backend-state.cc 
b/be/src/runtime/coordinator-backend-state.cc
index 440a5f0..602bfca 100644
--- a/be/src/runtime/coordinator-backend-state.cc
+++ b/be/src/runtime/coordinator-backend-state.cc
@@ -381,7 +381,8 @@ bool Coordinator::BackendState::ApplyExecStatusReport(
 const ReportExecStatusRequestPB& backend_exec_status,
 const TRuntimeProfileForest& thrift_profiles, ExecSummary* exec_summary,
 ProgressUpdater* scan_range_progress, DmlExecState* dml_exec_state,
-vector* aux_error_info) {
+vector* aux_error_info,
+const vector& fragment_stats) {
   DCHECK(!IsEmptyBackend());
   // Hold the exec_summary's lock to avoid exposing it half-way through
   // the update loop below.
@@ -478,6 +479,10 @@ bool Coordinator::BackendState::ApplyExecStatusReport(
   backend_utilization_.exchange_bytes_sent = 
backend_exec_status.exchange_bytes_sent();
   backend_utilization_.scan_bytes_sent = backend_exec_status.scan_bytes_sent();
 
+  // Update state that depends on the instance profile updates we just 
received.
+  // Skip this in the edge case where the exec RPC didn't complete.
+  if (exec_done_) UpdateExecStatsLocked(lock, fragment_stats, 
/*finalize=*/false);
+
   // status_ has incorporated the status from all fragment instances. If the 
overall
   // backend status is not OK, but no specific fragment instance reported an 
error, then
   // this is a general backend error. Incorporate the general error into 
status_.
@@ -502,29 +507,23 @@ void Coordinator::BackendState::UpdateHostProfile(
 }
 
 void Coordinator::BackendState::UpdateExecStats(
-const vector& fragment_stats) {
-  lock_guard l(lock_);
+const vector& fragment_stats, bool finalize) {
+  unique_lock l(lock_);
+  UpdateExecStatsLocked(l, fragment_stats, finalize);
+}
+
+void Coordinator::BackendState::UpdateExecStatsLocked(const 
unique_lock& lock,
+const vector& fragment_stats, bool finalize) {
+  DCHECK(lock.owns_lock() && lock.mutex() == _);
   DCHECK(exec_done_) << "May only be called after WaitOnExecRpc() completes.";
-  for (const auto& entry: instance_stats_map_) {
-const InstanceStats& instance_stats = *entry.second;
-int fragment_idx = instance_stats.exec_params_.fragment_idx();
-DCHECK_LT(fragment_idx, fragment_stats.size());
-FragmentStats* f = fragment_stats[fragment_idx];
-int64_t completion_time = instance_stats.stopwatch_.ElapsedTime();
-RuntimeProfile::Counter* completion_timer =
-PROFILE_CompletionTime.Instantiate(instance_stats.profile_);
-completion_timer->Set(completion_time);
-if (!FLAGS_gen_experimental_profile) f->completion_times_(completion_time);
-if (completion_time > 0) {
-  RuntimeProfile::Counter* execution_rate_counter =
-  PROFILE_ExecutionRate.Instantiate(instance_stats.profil

[impala] 02/02: Pin the json-smart version to 2.3

2020-10-03 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit d453d52aadcbd158147b906813b22eb2944ac90b
Author: Joe McDonnell 
AuthorDate: Thu Oct 1 17:38:25 2020 -0700

Pin the json-smart version to 2.3

With some maven repositories, Impala builds have been
picking up json-smart with version 2.3-SNAPSHOT. This
is not intentional (and it doesn't reproduce with public
repositories). To improve the consistency of the build,
pin the json-smart version to 2.3 with appropriate
exclusions to prevent alternate versions.

This also fixes up bin/jenkins/get_maven_statistics.sh
to handle cases where maven didn't download anything.

Testing:
 - Ran core job

Change-Id: Iff92a61c9c3164e7e0c63c7569178415dcba9fb4
Reviewed-on: http://gerrit.cloudera.org:8080/16536
Tested-by: Impala Public Jenkins 
Reviewed-by: Joe McDonnell 
---
 bin/jenkins/get_maven_statistics.sh | 16 +++---
 fe/pom.xml  | 62 +
 shaded-deps/hive-exec/pom.xml   | 13 
 3 files changed, 87 insertions(+), 4 deletions(-)

diff --git a/bin/jenkins/get_maven_statistics.sh 
b/bin/jenkins/get_maven_statistics.sh
index aff473f..b7a14a9 100755
--- a/bin/jenkins/get_maven_statistics.sh
+++ b/bin/jenkins/get_maven_statistics.sh
@@ -32,11 +32,19 @@ MVN_LOG=$1
 
 # Dump how many artifacts were downloaded from each repo
 echo "Number of artifacts downloaded from each repo:"
-cat "${MVN_LOG}" | grep "Downloaded from" | sed 's|.* Downloaded from ||' \
-| cut -d: -f1 | sort | uniq -c
+if grep -q "Downloaded from" "${MVN_LOG}"; then
+  cat "${MVN_LOG}" | grep "Downloaded from" | sed 's|.* Downloaded from ||' \
+  | cut -d: -f1 | sort | uniq -c
+else
+  echo "No artifacts downloaded"
+fi
 
 # Dump how many artifacts we tried to download from each repo
 echo
 echo "Number of download attempts (successful or unsuccessful) per repo:"
-cat "${MVN_LOG}" | grep "Downloading from" | sed 's|.* Downloading from ||' \
-| cut -d: -f1 | sort | uniq -c
+if grep -q "Downloading from" "${MVN_LOG}"; then
+  cat "${MVN_LOG}" | grep "Downloading from" | sed 's|.* Downloading from ||' \
+  | cut -d: -f1 | sort | uniq -c
+else
+  echo "No downloads attempted"
+fi
diff --git a/fe/pom.xml b/fe/pom.xml
index 7729651..00b9c61 100644
--- a/fe/pom.xml
+++ b/fe/pom.xml
@@ -35,6 +35,13 @@ under the License.
   Apache Impala Query Engine Frontend
 
   
+
+
+  net.minidev
+  json-smart
+  2.3
+
+
 
   org.apache.impala
   query-event-hook-api
@@ -58,6 +65,11 @@ under the License.
   hadoop-common
   ${hadoop.version}
   
+
+
+  net.minidev
+  json-smart
+
 
   org.eclipse.jetty
   *
@@ -82,6 +94,13 @@ under the License.
   org.apache.hadoop
   hadoop-auth
   ${hadoop.version}
+  
+
+
+  net.minidev
+  json-smart
+
+  
 
 
 
@@ -120,6 +139,13 @@ under the License.
   org.apache.hadoop
   hadoop-azure-datalake
   ${hadoop.version}
+  
+
+
+  net.minidev
+  json-smart
+
+  
 
 
 
@@ -160,6 +186,13 @@ under the License.
   org.apache.ranger
   ranger-plugins-common
   ${ranger.version}
+  
+
+
+  net.minidev
+  json-smart
+
+  
 
 
 
@@ -179,6 +212,11 @@ under the License.
   org.eclipse.jetty
   *
 
+
+
+  net.minidev
+  json-smart
+
   
 
 
@@ -243,12 +281,26 @@ under the License.
   org.apache.hbase
   hbase-client
   ${hbase.version}
+  
+
+
+  net.minidev
+  json-smart
+
+  
 
 
 
   org.apache.hbase
   hbase-common
   ${hbase.version}
+  
+
+
+  net.minidev
+  json-smart
+
+  
 
 
 
@@ -895,6 +947,11 @@ under the License.
   io.netty
   *
 
+
+
+  net.minidev
+  json-smart
+
   
 
 
@@ -911,6 +968,11 @@ under the License.
   org.apache.logging.log4j
   log4j-1.2-api
 
+
+
+  net.minidev
+  json-smart
+
 
   org.apache.hive
   hive-serde
diff --git a/shaded-deps/hive-exec/pom.xml b/shaded-deps/hive-exec/pom.xml
index 43be1a0..33cb897 100644
--- a/shaded-deps/hive-ex

[impala] branch master updated (d09294a -> d453d52)

2020-10-03 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from d09294a  IMPALA-10202: Enable file handle cache for ABFS files
 new e60292f  IMPALA-9711: incrementally update aggregate profile
 new d453d52  Pin the json-smart version to 2.3

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/runtime/coordinator-backend-state.cc | 71 -
 be/src/runtime/coordinator-backend-state.h  | 36 +--
 be/src/runtime/coordinator.cc   |  4 +-
 be/src/util/runtime-profile.cc  |  1 +
 bin/jenkins/get_maven_statistics.sh | 16 +--
 fe/pom.xml  | 62 +
 shaded-deps/hive-exec/pom.xml   | 13 ++
 7 files changed, 172 insertions(+), 31 deletions(-)

[impala] 01/02: IMPALA-8291: Show constraints in DESCRIBE FORMATTED

2020-09-17 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 7b55168894d43b8696ac72f50515f6b842556caa
Author: Shant Hovsepian 
AuthorDate: Tue Sep 8 15:27:53 2020 -0400

IMPALA-8291: Show constraints in DESCRIBE FORMATTED

Support for displaying primary and foreign key constraints in describe
formatted output. The output attempts to be as close to Hive's
implementation as possible.

Also includes constraint definitions for the TPC-DS test workload.

Testing:
  * Fresh load of testdata
  * Metadata query tests comparing the output between Impala and Hive

Change-Id: I676b69c465c46491f870d7fdc894e7474c030356
Reviewed-on: http://gerrit.cloudera.org:8080/16428
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 .../impala/compat/HiveMetadataFormatUtils.java | 128 ++-
 .../org/apache/impala/compat/MetastoreShim.java|  11 +-
 .../impala/service/DescribeResultFactory.java  |  11 +
 testdata/datasets/tpcds/tpcds_schema_template.sql  | 880 -
 tests/metadata/test_metadata_query_statements.py   |  13 +
 5 files changed, 649 insertions(+), 394 deletions(-)

diff --git 
a/fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java
 
b/fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java
index 073031e..a2b1a5e 100644
--- 
a/fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java
+++ 
b/fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java
@@ -45,6 +45,8 @@ import 
org.apache.hadoop.hive.metastore.api.LongColumnStatsData;
 import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
 import org.apache.hadoop.hive.metastore.api.StringColumnStatsData;
 import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.ql.metadata.PrimaryKeyInfo;
+import org.apache.hadoop.hive.ql.metadata.ForeignKeyInfo;
 import org.apache.hadoop.hive.serde2.io.DateWritable;
 
 /**
@@ -98,7 +100,7 @@ public class HiveMetadataFormatUtils {
   private static void formatColumnsHeader(StringBuilder columnInformation,
   List colStats) {
 columnInformation.append("# "); // Easy for shell scripts to ignore
-formatOutput(getColumnsHeader(colStats), columnInformation, false);
+formatOutput(getColumnsHeader(colStats), columnInformation, false, true);
 columnInformation.append(LINE_DELIM);
   }
 
@@ -112,26 +114,36 @@ public class HiveMetadataFormatUtils {
* contains newlines?
*/
   private static void formatOutput(String[] fields, StringBuilder tableInfo,
-  boolean isLastLinePadded) {
-int[] paddings = new int[fields.length - 1];
-if (fields.length > 1) {
-  for (int i = 0; i < fields.length - 1; i++) {
-if (fields[i] == null) {
-  tableInfo.append(FIELD_DELIM);
-  continue;
+  boolean isLastLinePadded, boolean isFormatted) {
+if (!isFormatted) {
+  for (int i = 0; i < fields.length; i++) {
+Object value = StringEscapeUtils.escapeJava(fields[i]);
+if (value != null) {
+  tableInfo.append(value);
 }
-tableInfo.append(String.format("%-" + ALIGNMENT + "s", fields[i]))
-.append(FIELD_DELIM);
-paddings[i] = ALIGNMENT > fields[i].length() ? ALIGNMENT : 
fields[i].length();
+tableInfo.append((i == fields.length - 1) ? LINE_DELIM : FIELD_DELIM);
   }
-}
-if (fields.length > 0) {
-  String value = fields[fields.length - 1];
-  String unescapedValue = (isLastLinePadded && value != null) ? value
-  .replaceAll("n|r|rn", "\n") : value;
-  indentMultilineValue(unescapedValue, tableInfo, paddings, false);
 } else {
-  tableInfo.append(LINE_DELIM);
+  int[] paddings = new int[fields.length - 1];
+  if (fields.length > 1) {
+for (int i = 0; i < fields.length - 1; i++) {
+  if (fields[i] == null) {
+tableInfo.append(FIELD_DELIM);
+continue;
+  }
+  tableInfo.append(String.format("%-" + ALIGNMENT + "s", fields[i]))
+  .append(FIELD_DELIM);
+  paddings[i] = ALIGNMENT > fields[i].length() ? ALIGNMENT : 
fields[i].length();
+}
+  }
+  if (fields.length > 0) {
+String value = fields[fields.length - 1];
+String unescapedValue = (isLastLinePadded && value != null) ? value
+.replaceAll("n|r|rn", "\n") : value;
+indentMultilineValue(unescapedValue, tableInfo, paddings, false);
+  } else {
+tableInfo.append(LINE_DELIM);
+  }
 }
   }
 
@@ -384,6 +396,75 @@ public class HiveMetadataFormatUtils {
 return null;

[impala] branch master updated (40777b7 -> 13f50ea)

2020-09-17 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from 40777b7  IMPALA-9636: Don't run retried query on the blacklisted nodes
 new 7b55168  IMPALA-8291: Show constraints in DESCRIBE FORMATTED
 new 13f50ea  IMPALA-9229: impala-shell 'profile' to show original and 
retried queries

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/service/client-request-state.cc |   1 +
 be/src/service/client-request-state.h  |  12 +
 be/src/service/impala-beeswax-server.cc|   4 +-
 be/src/service/impala-hs2-server.cc| 112 ++-
 be/src/service/impala-http-handler.cc  |  11 +-
 be/src/service/impala-server.cc| 201 +++--
 be/src/service/impala-server.h |  67 +-
 common/thrift/ImpalaService.thrift |  12 +
 .../impala/compat/HiveMetadataFormatUtils.java | 128 ++-
 .../org/apache/impala/compat/MetastoreShim.java|  11 +-
 .../impala/service/DescribeResultFactory.java  |  11 +
 shell/impala_client.py |  18 +-
 shell/impala_shell.py  |  60 +-
 testdata/datasets/tpcds/tpcds_schema_template.sql  | 880 -
 tests/custom_cluster/test_shell_interactive.py |  81 +-
 tests/metadata/test_metadata_query_statements.py   |  13 +
 tests/shell/test_shell_commandline.py  |  26 +-
 tests/shell/util.py|  25 +
 18 files changed, 1154 insertions(+), 519 deletions(-)

[impala] 02/02: IMPALA-9229: impala-shell 'profile' to show original and retried queries

2020-09-17 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 13f50eaec59d2690dd54acda1bba83eb0aacb972
Author: Sahil Takiar 
AuthorDate: Tue Jul 14 09:07:12 2020 -0700

IMPALA-9229: impala-shell 'profile' to show original and retried queries

Currently, the impala-shell 'profile' command only returns the profile
for the most recent profile attempt. There is no way to get the original
query profile (the profile of the first query attempt that failed) from
the impala-shell.

This patch modifies TGetRuntimeProfileReq and TGetRuntimeProfileResp to
add support for returning both the original and retried profiles for a
retried query. When a query is retried, TGetRuntimeProfileResp currently
contains the profile for the most recent query attempt.
TGetRuntimeProfileReq has a new field called 'include_query_attempts'
and when it is set to true, the TGetRuntimeProfileResp will include all
failed profiles in a new field called failed_profiles /
failed_thrift_profiles.

impala-shell has been modified so the 'profile' command has a new set of
options. The syntax is now:

PROFILE [ALL | LATEST | ORIGINAL]

If 'ALL' is specified, both the latest and original profiles are
printed. If 'LATEST' is specified, only the latest profile is printed.
If 'ORIGINAL' is printed, only the original profile is printed. The
default behavior is equivalent to specifying 'LATEST' (which is the
current behavior before this patch as well).

Support for this has only been added to HS2 given that Beeswax is being
deprecated soon. The new 'profile' options have no affect when the
Beeswax protocol is used.

Most of the code change is in impala-hs2-server and impala-server; a lot
of the GetRuntimeProfile code has been re-factored.

Testing:
* Added new impala-shell tests
* Ran core tests

Change-Id: I89cee02947b311e7bf9c7274f47dfc7214c1bb65
Reviewed-on: http://gerrit.cloudera.org:8080/16406
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/service/client-request-state.cc |   1 +
 be/src/service/client-request-state.h  |  12 ++
 be/src/service/impala-beeswax-server.cc|   4 +-
 be/src/service/impala-hs2-server.cc| 112 +++---
 be/src/service/impala-http-handler.cc  |  11 +-
 be/src/service/impala-server.cc| 201 +++--
 be/src/service/impala-server.h |  67 -
 common/thrift/ImpalaService.thrift |  12 ++
 shell/impala_client.py |  18 ++-
 shell/impala_shell.py  |  60 ++--
 tests/custom_cluster/test_shell_interactive.py |  81 +-
 tests/shell/test_shell_commandline.py  |  26 +---
 tests/shell/util.py|  25 +++
 13 files changed, 505 insertions(+), 125 deletions(-)

diff --git a/be/src/service/client-request-state.cc 
b/be/src/service/client-request-state.cc
index a55cb37..60acc3c 100644
--- a/be/src/service/client-request-state.cc
+++ b/be/src/service/client-request-state.cc
@@ -1511,6 +1511,7 @@ void ClientRequestState::MarkAsRetried(const TUniqueId& 
retried_id) {
   summary_profile_->AddInfoString("Retried Query Id", PrintId(retried_id));
   UpdateExecState(ExecState::ERROR);
   block_until_retried_cv_.NotifyOne();
+  retried_id_ = make_unique(retried_id);
 }
 
 const string& ClientRequestState::effective_user() const {
diff --git a/be/src/service/client-request-state.h 
b/be/src/service/client-request-state.h
index c5a4004..9cbbd2b 100644
--- a/be/src/service/client-request-state.h
+++ b/be/src/service/client-request-state.h
@@ -380,6 +380,13 @@ class ClientRequestState {
 return *original_id_;
   }
 
+  /// Can only be called if this query was retried. Returns the query id of 
the retried
+  /// query.
+  const TUniqueId& retried_id() const {
+DCHECK(retried_id_ != nullptr);
+return *retried_id_;
+  }
+
   /// Returns the QueryDriver that owns this ClientRequestState.
   QueryDriver* parent_driver() const { return parent_driver_; }
 
@@ -630,6 +637,11 @@ class ClientRequestState {
   /// be retried.
   std::unique_ptr original_id_ = nullptr;
 
+  /// Query id of the retried query. The retried query is the new query that 
is run
+  /// whenever the original query fails with a retryable error. See 
'original_id_' for
+  /// an explanation of what the "original" query is.
+  std::unique_ptr retried_id_ = nullptr;
+
   /// Condition variable used to signal any threads that are waiting until the 
query has
   /// been retried.
   ConditionVariable block_until_retried_cv_;
diff --git a/be/src/service/impala-beeswax-server.cc 
b/be/src/service/impala-beeswax-server.cc
index f

[impala] branch master updated (5e9f10d -> 2359a1b)

2020-09-03 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from 5e9f10d  IMPALA-10064: Support constant propagation for eligible range 
predicates
 new 99e5f5a  IMPALA-10133:Implement ds_hll_stringify function.
 new 2359a1b  IMPALA-10119: Fix impala-shell history duplication test

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/exprs/datasketches-functions-ir.cc  | 14 
 be/src/exprs/datasketches-functions.h  |  6 
 common/function-registry/impala_functions.py   |  2 ++
 .../queries/QueryTest/datasketches-hll.test| 37 ++
 tests/shell/test_shell_interactive.py  | 11 ---
 5 files changed, 66 insertions(+), 4 deletions(-)

[impala] 02/02: IMPALA-10119: Fix impala-shell history duplication test

2020-09-03 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 2359a1be9dc491f6c35fe3415265d4a29d6bc939
Author: Tamas Mate 
AuthorDate: Tue Sep 1 09:50:44 2020 +0200

IMPALA-10119: Fix impala-shell history duplication test

The flaky test was
TestImpalaShellInteractive.test_history_does_not_duplicate_on_interrupt

The test failed with timeout error when the interrupt signal arrived
later after the next test query was started. The impala-shell output was
^C instead of the expected query result.

This change adds an additional blocking expect call to wait for the
interrupt signal to arrive before sending in the next query.

Change-Id: I242eb47cc8093c4566de206f46b75b3feab1183c
Reviewed-on: http://gerrit.cloudera.org:8080/16391
Tested-by: Impala Public Jenkins 
Reviewed-by: Tim Armstrong 
---
 tests/shell/test_shell_interactive.py | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/tests/shell/test_shell_interactive.py 
b/tests/shell/test_shell_interactive.py
index f9668d6..c6fe7e0 100755
--- a/tests/shell/test_shell_interactive.py
+++ b/tests/shell/test_shell_interactive.py
@@ -516,24 +516,27 @@ class TestImpalaShellInteractive(ImpalaTestSuite):
 # readline gets its input from tty, so using stdin does not work.
 shell_cmd = get_shell_cmd(vector)
 child_proc = spawn_shell(shell_cmd)
-# set up history
+
+# initialize history
 child_proc.expect(PROMPT_REGEX)
 child_proc.sendline("select 1;")
 child_proc.expect("Fetched 1 row\(s\) in [0-9]+\.?[0-9]*s")
 child_proc.expect(PROMPT_REGEX)
 child_proc.sendline("quit;")
 child_proc.wait()
+
+# create a new shell and send SIGINT
 child_proc = spawn_shell(shell_cmd)
 child_proc.expect(PROMPT_REGEX)
-
-# send SIGINT then quit to save history
 child_proc.sendintr()
+child_proc.expect("\^C")
 child_proc.sendline("select 2;")
 child_proc.expect("Fetched 1 row\(s\) in [0-9]+\.?[0-9]*s")
+child_proc.expect(PROMPT_REGEX)
 child_proc.sendline("quit;")
 child_proc.wait()
 
-# check history in a new instance
+# check history in a new shell instance
 p = ImpalaShell(vector)
 p.send_cmd('history')
 result = p.get_result().stderr.splitlines()

[impala] 01/02: IMPALA-10133:Implement ds_hll_stringify function.

2020-09-03 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 99e5f5a8859c58641973bc84058eeb15502da96c
Author: Adam Tamas 
AuthorDate: Fri Aug 28 15:50:07 2020 +0200

IMPALA-10133:Implement ds_hll_stringify function.

This function receives a string that is a serialized Apache DataSketches
HLL sketch and returns its stringified format.

A stringified format should look like and contains the following data:

select ds_hll_stringify(ds_hll_sketch(float_col)) from
functional_parquet.alltypestiny;
++
| ds_hll_stringify(ds_hll_sketch(float_col)) |
++
| ### HLL sketch summary:|
|   Log Config K   : 12  |
|   Hll Target : HLL_4   |
|   Current Mode   : LIST|
|   LB : 2   |
|   Estimate   : 2   |
|   UB : 2.0001  |
|   OutOfOrder flag: false   |
|   Coupon count   : 2   |
| ### End HLL sketch summary |
||
++

Change-Id: I85dbf20b5114dd75c300eef0accabe90eac240a0
Reviewed-on: http://gerrit.cloudera.org:8080/16382
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/exprs/datasketches-functions-ir.cc  | 14 
 be/src/exprs/datasketches-functions.h  |  6 
 common/function-registry/impala_functions.py   |  2 ++
 .../queries/QueryTest/datasketches-hll.test| 37 ++
 4 files changed, 59 insertions(+)

diff --git a/be/src/exprs/datasketches-functions-ir.cc 
b/be/src/exprs/datasketches-functions-ir.cc
index 4edb83f..1cef6c9 100644
--- a/be/src/exprs/datasketches-functions-ir.cc
+++ b/be/src/exprs/datasketches-functions-ir.cc
@@ -38,6 +38,20 @@ BigIntVal 
DataSketchesFunctions::DsHllEstimate(FunctionContext* ctx,
   return sketch.get_estimate();
 }
 
+StringVal DataSketchesFunctions::DsHllStringify(FunctionContext* ctx,
+const StringVal& serialized_sketch) {
+  if (serialized_sketch.is_null || serialized_sketch.len == 0) return 
StringVal::null();
+  datasketches::hll_sketch sketch(DS_SKETCH_CONFIG, DS_HLL_TYPE);
+  if (!DeserializeDsSketch(serialized_sketch, )) {
+LogSketchDeserializationError(ctx);
+return StringVal::null();
+  }
+  string str = sketch.to_string(true, false, false, false);
+  StringVal dst(ctx, str.size());
+  memcpy(dst.ptr, str.c_str(), str.size());
+  return dst;
+}
+
 FloatVal DataSketchesFunctions::DsKllQuantile(FunctionContext* ctx,
 const StringVal& serialized_sketch, const DoubleVal& rank) {
   if (serialized_sketch.is_null || serialized_sketch.len == 0) return 
FloatVal::null();
diff --git a/be/src/exprs/datasketches-functions.h 
b/be/src/exprs/datasketches-functions.h
index c35c3f4..91d9313 100644
--- a/be/src/exprs/datasketches-functions.h
+++ b/be/src/exprs/datasketches-functions.h
@@ -35,6 +35,12 @@ public:
   static BigIntVal DsHllEstimate(FunctionContext* ctx,
   const StringVal& serialized_sketch);
 
+  /// 'serialized_sketch' is expected as a serialized Apache DataSketches HLL 
sketch. If
+  /// it is not, then the query fails. This function returns the stringified 
format of
+  /// an Apache DataSketches HLL sketch.
+  static StringVal DsHllStringify(FunctionContext* ctx,
+  const StringVal& serialized_sketch);
+
   /// 'serialized_sketch' is expected as a serialized Apache DataSketches KLL 
sketch. If
   /// it is not, then the query fails. 'rank' is used to identify which item 
(estimate)
   /// to return from the sketched dataset. E.g. 0.1 means the item where 10% 
of the
diff --git a/common/function-registry/impala_functions.py 
b/common/function-registry/impala_functions.py
index 93a2926..6a644fe 100644
--- a/common/function-registry/impala_functions.py
+++ b/common/function-registry/impala_functions.py
@@ -933,6 +933,8 @@ visible_functions = [
   # Functions to use Apache DataSketches functionality
   [['ds_hll_estimate'], 'BIGINT', ['STRING'],
   
'_ZN6impala21DataSketchesFunctions13DsHllEstimateEPN10impala_udf15FunctionContextERKNS1_9StringValE'],
+  [['ds_hll_stringify'], 'STRING', ['STRING'],
+  
'_ZN6impala21DataSketchesFunctions14DsHllStringifyEPN10impala_udf15FunctionContextERKNS1_9StringValE'],
   [['ds_kll_quantile'], 'FLOAT', ['STRING', 'DOUBLE'],
   
'_ZN6impala21DataSketchesFunctions13DsKllQuantileEPN10impala_udf15FunctionContextERKNS1_9StringValERKNS1_9DoubleValE'],
   [['ds_kll_n'], 'BIGINT', ['STRING'],
diff --git 
a/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test 
b/testdata/

[impala] branch master updated: IMPALA-10106: Upgrade DataSketches to version 2.1.0

2020-09-02 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new f993654  IMPALA-10106: Upgrade DataSketches to version 2.1.0
f993654 is described below

commit f9936549dcab58390c5662ebdedb9c60838185a4
Author: Adam Tamas 
AuthorDate: Tue Aug 25 11:46:07 2020 +0200

IMPALA-10106: Upgrade DataSketches to version 2.1.0

Upgrade the external DataSketches files for HLL/KLL to version 2.1.0

tests:
-Ran the tests from tests/query_test/test_datasketches.py

Change-Id: I4faa31c0b628a62c7e56a6c4b9549d0aaa8a02ff
Reviewed-on: http://gerrit.cloudera.org:8080/16360
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/thirdparty/datasketches/README.md   |   7 +-
 .../datasketches/kll_quantile_calculator.hpp   |  37 +++--
 .../datasketches/kll_quantile_calculator_impl.hpp  | 159 +
 be/src/thirdparty/datasketches/kll_sketch_impl.hpp |  23 +--
 4 files changed, 106 insertions(+), 120 deletions(-)

diff --git a/be/src/thirdparty/datasketches/README.md 
b/be/src/thirdparty/datasketches/README.md
index d5c56ce..a838c0b 100644
--- a/be/src/thirdparty/datasketches/README.md
+++ b/be/src/thirdparty/datasketches/README.md
@@ -8,9 +8,8 @@ changed during this process as originally the following folders 
were affected:
 I copied the content of these folders into the same directory so that Impala
 can compile them without rewriting the include paths in the files themselves.
 
-The git hash of the snapshot I used as a source for the files:
-c67d92faad3827932ca3b5d864222e64977f2c20
+The git branch of the snapshot I used as a source for the files: 
2.1.0-incubating
+The hash: c1a6f8edb49699520f248d3d02019b87429b4241
 
 Browse the source files here:
-https://github.com/apache/incubator-datasketches-cpp
-
+https://github.com/apache/incubator-datasketches-cpp/tree/2.1.0-incubating-rc1
diff --git a/be/src/thirdparty/datasketches/kll_quantile_calculator.hpp 
b/be/src/thirdparty/datasketches/kll_quantile_calculator.hpp
index f77071e..bc60f26 100644
--- a/be/src/thirdparty/datasketches/kll_quantile_calculator.hpp
+++ b/be/src/thirdparty/datasketches/kll_quantile_calculator.hpp
@@ -26,31 +26,38 @@ namespace datasketches {
 
 template 
 class kll_quantile_calculator {
-  typedef typename std::allocator_traits::template rebind_alloc 
AllocU32;
-  typedef typename std::allocator_traits::template rebind_alloc 
AllocU64;
   public:
 // assumes that all levels are sorted including level 0
 kll_quantile_calculator(const T* items, const uint32_t* levels, uint8_t 
num_levels, uint64_t n);
-~kll_quantile_calculator();
 T get_quantile(double fraction) const;
 
   private:
+using AllocU32 = typename std::allocator_traits::template 
rebind_alloc;
+using vector_u32 = std::vector;
+using Entry = std::pair;
+using AllocEntry = typename std::allocator_traits::template 
rebind_alloc;
+using Container = std::vector;
 uint64_t n_;
-T* items_;
-uint64_t* weights_;
-uint32_t* levels_;
-uint8_t levels_size_;
-uint8_t num_levels_;
+vector_u32 levels_;
+Container entries_;
 
-void populate_from_sketch(const T* items, uint32_t num_items, const 
uint32_t* levels, uint8_t num_levels);
+void populate_from_sketch(const T* items, const uint32_t* levels, uint8_t 
num_levels);
 T approximately_answer_positional_query(uint64_t pos) const;
-static void convert_to_preceding_cummulative(uint64_t* weights, uint32_t 
weights_size);
+void convert_to_preceding_cummulative();
+uint32_t chunk_containing_pos(uint64_t pos) const;
+uint32_t search_for_chunk_containing_pos(uint64_t pos, uint32_t l, 
uint32_t r) const;
+static void merge_sorted_blocks(Container& entries, const uint32_t* 
levels, uint8_t num_levels, uint32_t num_items);
+static void merge_sorted_blocks_direct(Container& orig, Container& temp, 
const uint32_t* levels, uint8_t starting_level, uint8_t num_levels);
+static void merge_sorted_blocks_reversed(Container& orig, Container& temp, 
const uint32_t* levels, uint8_t starting_level, uint8_t num_levels);
 static uint64_t pos_of_phi(double phi, uint64_t n);
-static uint32_t chunk_containing_pos(uint64_t* weights, uint32_t 
weights_size, uint64_t pos);
-static uint32_t search_for_chunk_containing_pos(const uint64_t* arr, 
uint64_t pos, uint32_t l, uint32_t r);
-static void blocky_tandem_merge_sort(T* items, uint64_t* weights, uint32_t 
num_items, const uint32_t* levels, uint8_t num_levels);
-static void blocky_tandem_merge_sort_recursion(T* items_src, uint64_t* 
weights_src, T* items_dst, uint64_t* weights_dst, const uint32_t* levels, 
uint8_t starting_level, uint8_t num_levels);
-static void tandem_merge(const T* items_src, const uint6

[impala] 02/02: IMPALA-10121: Generate JUnitXML for TSAN messages

2020-09-01 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 106dea63ba2f21ea43a580363445d4ad79a9c87c
Author: Joe McDonnell 
AuthorDate: Tue Sep 1 09:40:16 2020 -0700

IMPALA-10121: Generate JUnitXML for TSAN messages

This adds logic in bin/jenkins/finalize.sh to check the ERROR
log for TSAN messages (i.e. WARNING: ThreadSanitizer: ...)
and generate a JUnitXML with the message. This happens when
TSAN aborts Impala.

Testing:
 - Ran TSAN build (which is currently failing)

Change-Id: I44ea33a78482499decae0ec4c7c44513094b2f44
Reviewed-on: http://gerrit.cloudera.org:8080/16397
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 
---
 bin/jenkins/finalize.sh | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/bin/jenkins/finalize.sh b/bin/jenkins/finalize.sh
index 2c216e7..c8ca198 100755
--- a/bin/jenkins/finalize.sh
+++ b/bin/jenkins/finalize.sh
@@ -72,14 +72,35 @@ function check_for_asan_error {
   fi
 }
 
-# Check for AddressSanitizer messages. ASAN errors can show up in ERROR logs
-# (particularly for impalad). Some backend tests generate ERROR logs.
+function check_for_tsan_error {
+  ERROR_LOG=${1}
+  if grep -q "WARNING: ThreadSanitizer:" ${ERROR_LOG} ; then
+# Extract out the TSAN message from the log file into a temp file.
+# Starts with WARNING: ThreadSanitizer and then ends with a line with 
several '='
+# characters (currently 18, we match 10).
+tmp_tsan_output=$(mktemp)
+sed -n '/ThreadSanitizer:/,/==/p' ${ERROR_LOG} > 
"${tmp_tsan_output}"
+# Make each TSAN issue use its own JUnitXML file by including the log 
filename
+# in the step.
+base=$(basename ${ERROR_LOG})
+"${IMPALA_HOME}"/bin/generate_junitxml.py --phase finalize \
+  --step "tsan_error_${base}" \
+  --error "Thread Sanitizer message detected in ${ERROR_LOG}" \
+  --stderr "$(cat ${tmp_tsan_output})"
+rm "${tmp_tsan_output}"
+  fi
+}
+
+# Check for AddressSanitizer/ThreadSanitizer messages. ASAN/TSAN errors can 
show up
+# in ERROR logs (particularly for impalad). Some backend tests generate ERROR 
logs.
 for error_log in $(find $LOGS_DIR -name "*ERROR*"); do
   check_for_asan_error ${error_log}
+  check_for_tsan_error ${error_log}
 done
 # Backend tests can also generate output in logs/be_tests/LastTest.log
 if [[ -f ${LOGS_DIR}/be_tests/LastTest.log ]]; then
   check_for_asan_error ${LOGS_DIR}/be_tests/LastTest.log
+  check_for_tsan_error ${LOGS_DIR}/be_tests/LastTest.log
 fi
 
 # Check for DCHECK messages. DCHECKs translate into CHECKs, which log at FATAL 
level

[impala] 01/02: IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files

2020-09-01 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 329bb41294a57bfd63dc0d90d57966e8562686b1
Author: Zoltan Borok-Nagy 
AuthorDate: Fri Aug 28 18:01:36 2020 +0200

IMPALA-10115: Impala should check file schema as well to check full ACIDv2 
files

Currently Impala checks file metadata 'hive.acid.version' to decide the
full ACID schema. There are cases when Hive forgets to set this value
for full ACID files, e.g. query-based compactions.

So it's more robust to check the schema elements instead of the metadata
field. Also, sometimes Hive write the schema with different character
cases, e.g. originalTransaction vs originaltransaction, so we should
rather compare the column names in a case insensitive way.

Testing:
* added test for full ACID compaction
* added test_full_acid_schema_without_file_metadata_tag to test full
  ACID file without metadata 'hive.acid.version'

Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14
Reviewed-on: http://gerrit.cloudera.org:8080/16383
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/exec/hdfs-orc-scanner.cc|   8 ++---
 be/src/exec/orc-metadata-utils.cc  |  32 +-
 be/src/exec/orc-metadata-utils.h   |  17 ++
 testdata/data/README   |   5 +++
 .../data/full_acid_schema_but_no_acid_version.orc  | Bin 0 -> 545 bytes
 .../queries/QueryTest/acid-compaction.test |  37 +
 tests/query_test/test_acid.py  |  16 +
 7 files changed, 88 insertions(+), 27 deletions(-)

diff --git a/be/src/exec/hdfs-orc-scanner.cc b/be/src/exec/hdfs-orc-scanner.cc
index c3e0baa..fe86013 100644
--- a/be/src/exec/hdfs-orc-scanner.cc
+++ b/be/src/exec/hdfs-orc-scanner.cc
@@ -190,8 +190,9 @@ Status HdfsOrcScanner::Open(ScannerContext* context) {
   RETURN_IF_ERROR(footer_status);
 
   bool is_table_full_acid = scan_node_->hdfs_table()->IsTableFullAcid();
-  bool is_file_full_acid = reader_->hasMetadataValue(HIVE_ACID_VERSION_KEY) &&
-   reader_->getMetadataValue(HIVE_ACID_VERSION_KEY) == 
"2";
+  schema_resolver_.reset(new OrcSchemaResolver(*scan_node_->hdfs_table(),
+  _->getType(), filename(), is_table_full_acid));
+  bool is_file_full_acid = schema_resolver_->HasFullAcidV2Schema();
   acid_original_file_ = is_table_full_acid && !is_file_full_acid;
   if (is_table_full_acid) {
 acid_write_id_range_ = valid_write_ids_.GetWriteIdRange(filename());
@@ -218,9 +219,6 @@ Status HdfsOrcScanner::Open(ScannerContext* context) {
   filename()));
 }
   }
-  schema_resolver_.reset(new OrcSchemaResolver(*scan_node_->hdfs_table(),
-  _->getType(), filename(), is_table_full_acid, is_file_full_acid));
-  RETURN_IF_ERROR(schema_resolver_->ValidateFullAcidFileSchema());
 
   // Hive Streaming Ingestion allocates multiple write ids, hence create delta 
directories
   // like delta_5_10. Then it continuously appends new stripes (and footers) 
to the
diff --git a/be/src/exec/orc-metadata-utils.cc 
b/be/src/exec/orc-metadata-utils.cc
index aa81d7d..400bba0 100644
--- a/be/src/exec/orc-metadata-utils.cc
+++ b/be/src/exec/orc-metadata-utils.cc
@@ -17,9 +17,13 @@
 
 #include "exec/orc-metadata-utils.h"
 
+#include 
+
 #include "util/debug-util.h"
 #include "common/names.h"
 
+using boost::algorithm::iequals;
+
 namespace impala {
 
 Status OrcSchemaResolver::BuildSchemaPaths(int num_partition_keys,
@@ -90,7 +94,6 @@ Status OrcSchemaResolver::ResolveColumn(const SchemaPath& 
col_path,
   *node = root_;
   *pos_field = false;
   *missing_field = false;
-  DCHECK_OK(ValidateFullAcidFileSchema()); // Should have already been 
validated.
   if (col_path.empty()) return Status::OK();
   SchemaPath table_path, file_path;
   TranslateColPaths(col_path, _path, _path);
@@ -318,28 +321,27 @@ bool OrcSchemaResolver::IsAcidColumn(const SchemaPath& 
col_path) const {
  col_path.front() >= num_part_cols && col_path.front() < num_part_cols 
+ 5;
 }
 
-Status OrcSchemaResolver::ValidateFullAcidFileSchema() const {
-  if (!is_file_full_acid_) return Status::OK();
-  string error_msg = Substitute("File %0 should have full ACID schema.", 
filename_);
-  if (root_->getKind() != orc::TypeKind::STRUCT) return Status(error_msg);
-  if (root_->getSubtypeCount() != 6) return Status(error_msg);
+void OrcSchemaResolver::DetermineFullAcidSchema() {
+  is_file_full_acid_ = false;
+  if (root_->getKind() != orc::TypeKind::STRUCT) return;
+  if (root_->getSubtypeCount() != 6) return;
   if (root_->getSubtype(0)->getKind() != orc::TypeKind::INT ||
   root_->getSubtyp

[impala] branch master updated (69d0d0a -> 106dea6)

2020-09-01 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from 69d0d0a  IMPALA-10087: IMPALA-6050 causes alluxio not to be supported
 new 329bb41  IMPALA-10115: Impala should check file schema as well to 
check full ACIDv2 files
 new 106dea6  IMPALA-10121: Generate JUnitXML for TSAN messages

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/exec/hdfs-orc-scanner.cc|   8 ++---
 be/src/exec/orc-metadata-utils.cc  |  32 +-
 be/src/exec/orc-metadata-utils.h   |  17 ++
 bin/jenkins/finalize.sh|  25 --
 testdata/data/README   |   5 +++
 .../data/full_acid_schema_but_no_acid_version.orc  | Bin 0 -> 545 bytes
 .../queries/QueryTest/acid-compaction.test |  37 +
 tests/query_test/test_acid.py  |  16 +
 8 files changed, 111 insertions(+), 29 deletions(-)
 create mode 100644 testdata/data/full_acid_schema_but_no_acid_version.orc

[impala] branch master updated (f85dbff -> 69d0d0a)

2020-09-01 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from f85dbff  IMPALA-10030: Remove unnecessary jar dependencies
 new f4273a4  IMPALA-7310: Partial fix for NDV cardinality with NULLs.
 new 69d0d0a  IMPALA-10087: IMPALA-6050 causes alluxio not to be supported

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../java/org/apache/impala/analysis/SlotRef.java   |  24 +-
 .../org/apache/impala/common/FileSystemUtil.java   |   4 +-
 .../impala/analysis/ExprCardinalityTest.java   |  22 +-
 .../org/apache/impala/analysis/ExprNdvTest.java|  16 +-
 .../apache/impala/common/FileSystemUtilTest.java   |   8 +
 .../org/apache/impala/planner/CardinalityTest.java |  41 +-
 .../queries/PlannerTest/tpcds/tpcds-q04.test   | 840 +++--
 .../queries/PlannerTest/tpcds/tpcds-q11.test   | 636 
 8 files changed, 806 insertions(+), 785 deletions(-)

[impala] 02/02: IMPALA-10087: IMPALA-6050 causes alluxio not to be supported

2020-09-01 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 69d0d0af471f7013627ead3dff86a402ebc263a6
Author: abeltian 
AuthorDate: Fri Aug 28 15:07:20 2020 +0800

IMPALA-10087: IMPALA-6050 causes alluxio not to be supported

This change adds file type support for alluxio.
Alluxio URLs have a different prefix
such as：alluxio://zk@zk-1:2181,zk-2:2181,zk-3:2181/path/

Testing:
Add unit test for alluxio file system type checks.

Change-Id: Id92ec9cb0ee241a039fe4a96e1bc2ab3eaaf8f77
Reviewed-on: http://gerrit.cloudera.org:8080/16379
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 fe/src/main/java/org/apache/impala/common/FileSystemUtil.java | 4 +++-
 fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java | 8 
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java 
b/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
index 38b1ddd..b4a41b2 100644
--- a/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
+++ b/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
@@ -422,7 +422,8 @@ public class FileSystemUtil {
 HDFS,
 LOCAL,
 S3,
-OZONE;
+OZONE,
+ALLUXIO;
 
 private static final Map SCHEME_TO_FS_MAPPING =
 ImmutableMap.builder()
@@ -433,6 +434,7 @@ public class FileSystemUtil {
 .put("hdfs", HDFS)
 .put("s3a", S3)
 .put("o3fs", OZONE)
+.put("alluxio", ALLUXIO)
 .build();
 
 /**
diff --git a/fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java 
b/fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java
index 030961a..da5e11e 100644
--- a/fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java
+++ b/fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java
@@ -21,6 +21,7 @@ import static 
org.apache.impala.common.FileSystemUtil.HIVE_TEMP_FILE_PREFIX;
 import static org.apache.impala.common.FileSystemUtil.isIgnoredDir;
 import static org.junit.Assert.assertFalse;
 import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.assertEquals;
 
 import org.apache.hadoop.fs.FileStatus;
 import org.apache.hadoop.fs.Path;
@@ -84,6 +85,13 @@ public class FileSystemUtilTest {
 assertFalse(isIgnoredDir(new Path(TEST_TABLE_PATH + 
"/part=100/datafile")));
   }
 
+  @Test
+  public void testAlluxioFsType() {
+Path path = new Path("alluxio://zk@zk-1:2181,zk-2:2181,zk-3:2181/path/");
+assertEquals(FileSystemUtil.FsType.ALLUXIO,
+FileSystemUtil.FsType.getFsType(path.toUri().getScheme()));
+  }
+
   private boolean testIsInIgnoredDirectory(Path input) {
 return testIsInIgnoredDirectory(input, true);
   }

[impala] 01/02: Fix concurrency for docker-based tests on 140+GB memory machines

2020-08-11 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 19f16a0f4889a59f7785bb88d059d2d8c335988d
Author: Joe McDonnell 
AuthorDate: Sun Aug 9 19:38:42 2020 -0700

Fix concurrency for docker-based tests on 140+GB memory machines

A prior change increased the suite concurrency for the
docker-based tests on machines with 140+GB of memory.
This new rung should also bump the parallel test
concurrency (i.e. for parallel EE tests). This sets
the parallel test concurrency to 12 for this rung
(which is what we use for the 95GB-140GB rung).

Testing:
 - Ran test-with-docker.py on a m5.12xlarge

Change-Id: Ib7299abd585da9ba1a838640dadc0bef9c72a39b
Reviewed-on: http://gerrit.cloudera.org:8080/16326
Reviewed-by: Laszlo Gaal 
Tested-by: Joe McDonnell 
---
 docker/test-with-docker.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docker/test-with-docker.py b/docker/test-with-docker.py
index b348d3b..35f64aa 100755
--- a/docker/test-with-docker.py
+++ b/docker/test-with-docker.py
@@ -253,6 +253,7 @@ def _compute_defaults():
   if total_memory_gb >= 140:
 suite_concurrency = 6
 memlimit_gb = 11
+parallel_test_concurrency = min(cpus, 12)
   elif total_memory_gb >= 95:
 suite_concurrency = 4
 memlimit_gb = 11

[impala] branch master updated (f95f794 -> ac63e19)

2020-08-11 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from f95f794  IMPALA-10017: Implement ds_kll_union() function
 new 19f16a0  Fix concurrency for docker-based tests on 140+GB memory 
machines
 new ac63e19  IMPALA-10043: Keep more logs when using EE_TEST_SHARDS

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 bin/run-all-tests.sh   | 5 +
 docker/test-with-docker.py | 1 +
 2 files changed, 6 insertions(+)

[impala] 02/02: IMPALA-10043: Keep more logs when using EE_TEST_SHARDS

2020-08-11 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit ac63e19e0d3c797b08dcf80053fc8e3259d8472d
Author: Joe McDonnell 
AuthorDate: Wed Aug 5 14:17:54 2020 -0700

IMPALA-10043: Keep more logs when using EE_TEST_SHARDS

IMPALA-9887 introduced the EE_TEST_SHARDS setting, which splits
the end-to-end test into shards and restarts Impala in between.
In order to keep the logs from all the shards, the value for
max_log_files needs to be increased so that none get aged out.
This multiplies IMPALA_MAX_LOG_FILES by the number of shards
using EE_TEST_SHARDS.

Testing:
 - Ran a test with EE_TEST_SHARDS=6 and verified that the
   logs are preserved.

Change-Id: Ie011b892cd2eb1a528012ec5600e72e44f281a88
Reviewed-on: http://gerrit.cloudera.org:8080/16297
Tested-by: Impala Public Jenkins 
Reviewed-by: Laszlo Gaal 
---
 bin/run-all-tests.sh | 5 +
 1 file changed, 5 insertions(+)

diff --git a/bin/run-all-tests.sh b/bin/run-all-tests.sh
index 5287861..74f65a9 100755
--- a/bin/run-all-tests.sh
+++ b/bin/run-all-tests.sh
@@ -254,6 +254,10 @@ do
   # Some test frameworks (e.g. the docker-based tests) use this.
   run_ee_tests
 else
+  # Increase the maximum number of log files so that the logs from the 
shards
+  # don't get aged out. Multiply the default number by the number of 
shards.
+  IMPALA_MAX_LOG_FILES_SAVE="${IMPALA_MAX_LOG_FILES:-10}"
+  export IMPALA_MAX_LOG_FILES="$((${EE_TEST_SHARDS} * 
${IMPALA_MAX_LOG_FILES_SAVE}))"
   # When the EE tests are sharded, it runs 1/Nth of the tests at a time, 
restarting
   # Impala between the shards. There are two benefits:
   # 1. It isolates errors so that if Impala crashes, the next shards will 
still run
@@ -268,6 +272,7 @@ do
 run_ee_tests "--shard_tests=$shard_idx/${EE_TEST_SHARDS}"
 start_impala_cluster
   done
+  export IMPALA_MAX_LOG_FILES="${IMPALA_MAX_LOG_FILES_SAVE}"
 fi
   fi

[impala] branch master updated: IMPALA-9645 Port LLVM codegen to adapt aarch64

2020-08-07 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new fab251e  IMPALA-9645 Port LLVM codegen to adapt aarch64
fab251e is described below

commit fab251efe3de449d22439dd17798cd414168748c
Author: zhaorenhai 
AuthorDate: Sun Apr 12 12:05:52 2020 +

IMPALA-9645 Port LLVM codegen to adapt aarch64

On aarch64, the Lowered type  of  struct {bool, int128} is form
{ {i8}, {i128} }. No padding add. This is different with x86-64,
which is { {i8}, {15*i8}, {i128} } with padding add automatically.

And here also add some type conversion between x86 and aarch64 data types.

And also add some aarch64 cpu's feature.

Change-Id: I3f30ee84ea9bf5245da88154632bb69079103d11
Reviewed-on: http://gerrit.cloudera.org:8080/15718
Tested-by: Impala Public Jenkins 
Reviewed-by: Tim Armstrong 
---
 be/src/codegen/codegen-anyval.cc | 121 +++
 be/src/codegen/llvm-codegen.cc   |   7 +++
 be/src/exec/text-converter.cc|  19 ++
 be/src/exprs/scalar-fn-call.cc   |  39 +
 4 files changed, 175 insertions(+), 11 deletions(-)

diff --git a/be/src/codegen/codegen-anyval.cc b/be/src/codegen/codegen-anyval.cc
index 66d79e7..1346f95 100644
--- a/be/src/codegen/codegen-anyval.cc
+++ b/be/src/codegen/codegen-anyval.cc
@@ -41,28 +41,56 @@ const char* CodegenAnyVal::LLVM_COLLECTIONVAL_NAME = 
"struct.impala_udf::Collect
 llvm::Type* CodegenAnyVal::GetLoweredType(LlvmCodeGen* cg, const ColumnType& 
type) {
   switch (type.type) {
 case TYPE_BOOLEAN: // i16
+#ifndef __aarch64__
   return cg->i16_type();
+#else
+  return cg->i64_type();
+#endif
 case TYPE_TINYINT: // i16
+#ifndef __aarch64__
   return cg->i16_type();
+#else
+  return cg->i64_type();
+#endif
 case TYPE_SMALLINT: // i32
+#ifndef __aarch64__
   return cg->i32_type();
+#else
+  return cg->i64_type();
+#endif
 case TYPE_INT: // i64
   return cg->i64_type();
 case TYPE_BIGINT: // { i8, i64 }
+#ifndef __aarch64__
   return llvm::StructType::get(cg->i8_type(), cg->i64_type());
+#else
+  return llvm::ArrayType::get(cg->i64_type(), 2);
+#endif
 case TYPE_FLOAT: // i64
   return cg->i64_type();
 case TYPE_DOUBLE: // { i8, double }
+#ifndef __aarch64__
   return llvm::StructType::get(cg->i8_type(), cg->double_type());
+#else
+  return llvm::ArrayType::get(cg->i64_type(), 2);
+#endif
 case TYPE_STRING: // { i64, i8* }
 case TYPE_VARCHAR: // { i64, i8* }
 case TYPE_CHAR: // Uses StringVal, so same as STRING/VARCHAR.
 case TYPE_FIXED_UDA_INTERMEDIATE: // { i64, i8* }
 case TYPE_ARRAY: // CollectionVal has same memory layout as StringVal.
 case TYPE_MAP: // CollectionVal has same memory layout as StringVal.
+#ifndef __aarch64__
   return llvm::StructType::get(cg->i64_type(), cg->ptr_type());
+#else
+  return llvm::ArrayType::get(cg->i64_type(), 2);
+#endif
 case TYPE_TIMESTAMP: // { i64, i64 }
+#ifndef __aarch64__
   return llvm::StructType::get(cg->i64_type(), cg->i64_type());
+#else
+  return llvm::ArrayType::get(cg->i64_type(), 2);
+#endif
 case TYPE_DECIMAL: // %"struct.impala_udf::DecimalVal" (isn't lowered)
// = { {i8}, [15 x i8], {i128} }
   return cg->GetNamedType(LLVM_DECIMALVAL_NAME);
@@ -198,9 +226,14 @@ llvm::Value* CodegenAnyVal::GetIsNull(const char* name) 
const {
 case TYPE_BIGINT:
 case TYPE_DOUBLE: {
   // Lowered type is of form { i8, * }. Get the i8 value.
-  llvm::Value* is_null_i8 = builder_->CreateExtractValue(value_, 0);
-  DCHECK(is_null_i8->getType() == codegen_->i8_type());
-  return builder_->CreateTrunc(is_null_i8, codegen_->bool_type(), name);
+  // On aarch64, Lowered type is of form { i64, * }
+  llvm::Value* is_null = builder_->CreateExtractValue(value_, 0);
+#ifndef __aarch64__
+  DCHECK(is_null->getType() == codegen_->i8_type());
+#else
+  DCHECK(is_null->getType() == codegen_->i64_type());
+#endif
+  return builder_->CreateTrunc(is_null, codegen_->bool_type(), name);
 }
 case TYPE_DECIMAL: {
   // Lowered type is of the form { {i8}, ... }
@@ -240,8 +273,14 @@ void CodegenAnyVal::SetIsNull(llvm::Value* is_null) {
 case TYPE_BIGINT:
 case TYPE_DOUBLE: {
   // Lowered type is of form { i8, * }. Set the i8 value to 'is_null'.
+  // On aarch64, lowered type is of form { i64, * }
+#ifndef __aarch64__
   llvm::Value* is_null_ext =
   builder_->CreateZExt(is_null, codegen_->i8_type(), "is_null_ext");
+#else
+  llvm::Value* is_null_ext =
+  builder_->CreateZExt(is_null, codegen_->i64

[impala] branch master updated (dbbd403 -> bbec044)

2020-08-06 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from dbbd403  IMPALA-10005: Fix Snappy decompression for non-block 
filesystems
 new 86b70e9  IMPALA-9851: Truncate long error message.
 new 7a6469e  IMPALA-10053: Remove uses of MonoTime::GetDeltaSince()
 new bbec044  IMPALA-10044: Fix cleanup for bootstrap_toolchain.py failure 
case

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/runtime/bufferpool/buffer-pool-internal.h |   3 +
 be/src/runtime/bufferpool/buffer-pool-test.cc|  54 +++
 be/src/runtime/bufferpool/buffer-pool.cc |  24 +++--
 be/src/runtime/bufferpool/buffer-pool.h  |   1 +
 be/src/runtime/krpc-data-stream-recvr.cc |   3 +-
 be/src/runtime/krpc-data-stream-sender.cc|   4 +
 be/src/service/data-stream-service.cc|   2 +-
 be/src/util/error-util-test.cc   |   7 ++
 be/src/util/error-util.cc| 112 ---
 be/src/util/error-util.h |  11 ++-
 be/src/util/internal-queue.h |  13 +++
 bin/bootstrap_toolchain.py   |   3 +-
 12 files changed, 172 insertions(+), 65 deletions(-)

[impala] 01/03: IMPALA-9851: Truncate long error message.

2020-08-06 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 86b70e9850cce0b45194a64cd89ae21df0e82029
Author: Riza Suminto 
AuthorDate: Wed Aug 5 17:03:08 2020 -0700

IMPALA-9851: Truncate long error message.

Error message length was unbounded and can grow very large into couple
of MB in size. This patch truncate error message to maximum 128kb in
size.

This patch also fix potentially long error message related to
BufferPool::Client::DebugString(). Before this patch, DebugString() will
print all pages in 'pinned_pages_', 'dirty_unpinned_pages_', and
'in_flight_write_pages_' PageList. With this patch, DebugString() only
include maximum of 100 first pages in each PageList.

Testing:
- Add be test BufferPoolTest.ShortDebugString
- Add test within ErrorMsg.GenericFormatting to test for truncation.
- Run and pass core tests.

Change-Id: Ic9fa4d024fb3dc9de03c7484f41b5e420a710e5a
Reviewed-on: http://gerrit.cloudera.org:8080/16300
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/runtime/bufferpool/buffer-pool-internal.h |   3 +
 be/src/runtime/bufferpool/buffer-pool-test.cc|  54 +++
 be/src/runtime/bufferpool/buffer-pool.cc |  24 +++--
 be/src/runtime/bufferpool/buffer-pool.h  |   1 +
 be/src/util/error-util-test.cc   |   7 ++
 be/src/util/error-util.cc| 112 ---
 be/src/util/error-util.h |  11 ++-
 be/src/util/internal-queue.h |  13 +++
 8 files changed, 163 insertions(+), 62 deletions(-)

diff --git a/be/src/runtime/bufferpool/buffer-pool-internal.h 
b/be/src/runtime/bufferpool/buffer-pool-internal.h
index c2caf7b..20d7767 100644
--- a/be/src/runtime/bufferpool/buffer-pool-internal.h
+++ b/be/src/runtime/bufferpool/buffer-pool-internal.h
@@ -182,6 +182,9 @@ class BufferPool::PageList {
   }
 
   void Iterate(boost::function fn) { list_.Iterate(fn); }
+  void IterateFirstN(boost::function fn, int n) {
+list_.IterateFirstN(fn, n);
+  }
   bool Contains(Page* page) { return list_.Contains(page); }
   Page* tail() { return list_.tail(); }
   bool empty() const { return list_.empty(); }
diff --git a/be/src/runtime/bufferpool/buffer-pool-test.cc 
b/be/src/runtime/bufferpool/buffer-pool-test.cc
index 2c9add7..611963c 100644
--- a/be/src/runtime/bufferpool/buffer-pool-test.cc
+++ b/be/src/runtime/bufferpool/buffer-pool-test.cc
@@ -2353,6 +2353,60 @@ TEST_F(BufferPoolTest, BufferPoolGc) {
   buffer_pool->FreeBuffer(, );
   buffer_pool->DeregisterClient();
 }
+
+/// IMPALA-9851: Cap the number of pages that can be printed at
+/// BufferPool::Client::DebugString().
+TEST_F(BufferPoolTest, ShortDebugString) {
+  // Allocate pages more than BufferPool::MAX_PAGE_ITER_DEBUG.
+  int num_pages = 105;
+  int64_t max_page_len = TEST_BUFFER_LEN;
+  int64_t total_mem = num_pages * max_page_len;
+  global_reservations_.InitRootTracker(NULL, total_mem);
+  BufferPool pool(test_env_->metrics(), TEST_BUFFER_LEN, total_mem, total_mem);
+  BufferPool::ClientHandle client;
+  ASSERT_OK(pool.RegisterClient("test client", NULL, _reservations_, 
NULL,
+  total_mem, NewProfile(), ));
+  ASSERT_TRUE(client.IncreaseReservation(total_mem));
+
+  vector handles(num_pages);
+
+  // Create pages of various valid sizes.
+  for (int i = 0; i < num_pages; ++i) {
+int64_t page_len = TEST_BUFFER_LEN;
+int64_t used_before = client.GetUsedReservation();
+ASSERT_OK(pool.CreatePage(, page_len, [i]));
+ASSERT_TRUE(handles[i].is_open());
+ASSERT_TRUE(handles[i].is_pinned());
+const BufferHandle* buffer;
+ASSERT_OK(handles[i].GetBuffer());
+ASSERT_TRUE(buffer->data() != NULL);
+ASSERT_EQ(handles[i].len(), page_len);
+ASSERT_EQ(buffer->len(), page_len);
+ASSERT_EQ(client.GetUsedReservation(), used_before + page_len);
+  }
+
+  // Verify that only subset of pages are included in DebugString().
+  string page_count_substr = Substitute(
+  "$0 out of $1 pinned pages:", BufferPool::MAX_PAGE_ITER_DEBUG, 
num_pages);
+  string debug_string = client.DebugString();
+  ASSERT_NE(debug_string.find(page_count_substr), string::npos)
+  << page_count_substr << " not found at 
BufferPool::Client::DebugString(). "
+  << debug_string;
+
+  // Close the handles and check memory consumption.
+  for (int i = 0; i < num_pages; ++i) {
+int64_t used_before = client.GetUsedReservation();
+int page_len = handles[i].len();
+pool.DestroyPage(, [i]);
+ASSERT_EQ(client.GetUsedReservation(), used_before - page_len);
+  }
+
+  pool.DeregisterClient();
+
+  // All the reservations should be released at this point.
+  ASSERT_EQ(global_reservations_.GetReservation(), 0);
+  global_

[impala] 02/03: IMPALA-10053: Remove uses of MonoTime::GetDeltaSince()

2020-08-06 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 7a6469e44486191cd344e9f7dcf681763d6091db
Author: Thomas Tauber-Marshall 
AuthorDate: Wed Aug 5 16:57:56 2020 -0700

IMPALA-10053: Remove uses of MonoTime::GetDeltaSince()

MonoTime is a utility Impala imports from Kudu. The behavior of
MonoTime::GetDeltaSince() was accidentally flipped in
https://gerrit.cloudera.org/#/c/14932/ so we're getting negative
durations where we expect positive durations.

The function is deprecated anyways, so this patch removes all uses of
it and replaces them with the MonoTime '-' operator.

Testing:
- Manually ran with and without patch and inspected calculated values.
- Added DCHECKs to prevent sucn an issue from occurring again.

Change-Id: If8cd3eb51a4fd101bbe4b9c44ea9be6ea2ea0d06
Reviewed-on: http://gerrit.cloudera.org:8080/16296
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/runtime/krpc-data-stream-recvr.cc  | 3 ++-
 be/src/runtime/krpc-data-stream-sender.cc | 4 
 be/src/service/data-stream-service.cc | 2 +-
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/be/src/runtime/krpc-data-stream-recvr.cc 
b/be/src/runtime/krpc-data-stream-recvr.cc
index 43a13e4..97aa406 100644
--- a/be/src/runtime/krpc-data-stream-recvr.cc
+++ b/be/src/runtime/krpc-data-stream-recvr.cc
@@ -749,7 +749,8 @@ Status KrpcDataStreamRecvr::GetNext(RowBatch* output_batch, 
bool* eos) {
 
 void KrpcDataStreamRecvr::AddBatch(const TransmitDataRequestPB* request,
 TransmitDataResponsePB* response, RpcContext* rpc_context) {
-  MonoDelta 
duration(MonoTime::Now().GetDeltaSince(rpc_context->GetTimeReceived()));
+  MonoDelta duration(MonoTime::Now() - rpc_context->GetTimeReceived());
+  DCHECK_GE(duration.ToNanoseconds(), 0);
   dispatch_timer_->UpdateCounter(duration.ToNanoseconds());
   int use_sender_id = is_merging_ ? request->sender_id() : 0;
   // Add all batches to the same queue if is_merging_ is false.
diff --git a/be/src/runtime/krpc-data-stream-sender.cc 
b/be/src/runtime/krpc-data-stream-sender.cc
index b795310..9a0f28e 100644
--- a/be/src/runtime/krpc-data-stream-sender.cc
+++ b/be/src/runtime/krpc-data-stream-sender.cc
@@ -496,6 +496,8 @@ void 
KrpcDataStreamSender::Channel::TransmitDataCompleteCb() {
   const kudu::Status controller_status = rpc_controller_.status();
   if (LIKELY(controller_status.ok())) {
 DCHECK(rpc_in_flight_batch_ != nullptr);
+// 'receiver_latency_ns' is calculated with MonoTime, so it must be 
non-negative.
+DCHECK_GE(resp_.receiver_latency_ns(), 0);
 int64_t row_batch_size = 
RowBatch::GetSerializedSize(*rpc_in_flight_batch_);
 int64_t network_time = total_time - resp_.receiver_latency_ns();
 COUNTER_ADD(parent_->bytes_sent_counter_, row_batch_size);
@@ -628,6 +630,8 @@ void 
KrpcDataStreamSender::Channel::EndDataStreamCompleteCb() {
   int64_t total_time_ns = MonotonicNanos() - rpc_start_time_ns_;
   const kudu::Status controller_status = rpc_controller_.status();
   if (LIKELY(controller_status.ok())) {
+// 'receiver_latency_ns' is calculated with MonoTime, so it must be 
non-negative.
+DCHECK_GE(resp_.receiver_latency_ns(), 0);
 int64_t network_time_ns = total_time_ns - resp_.receiver_latency_ns();
 parent_->network_time_stats_->UpdateCounter(network_time_ns);
 parent_->recvr_time_stats_->UpdateCounter(eos_resp_.receiver_latency_ns());
diff --git a/be/src/service/data-stream-service.cc 
b/be/src/service/data-stream-service.cc
index 76ef7ba..ceea1fa 100644
--- a/be/src/service/data-stream-service.cc
+++ b/be/src/service/data-stream-service.cc
@@ -143,7 +143,7 @@ void DataStreamService::PublishFilter(
 template
 void DataStreamService::RespondRpc(const Status& status,
 ResponsePBType* response, kudu::rpc::RpcContext* ctx) {
-  MonoDelta duration(MonoTime::Now().GetDeltaSince(ctx->GetTimeReceived()));
+  MonoDelta duration(MonoTime::Now() - ctx->GetTimeReceived());
   status.ToProto(response->mutable_status());
   response->set_receiver_latency_ns(duration.ToNanoseconds());
   ctx->RespondSuccess();

[impala] 03/03: IMPALA-10044: Fix cleanup for bootstrap_toolchain.py failure case

2020-08-06 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit bbec0443fcdabf5de6f7ae0e47595414503f30f0
Author: Joe McDonnell 
AuthorDate: Wed Aug 5 14:02:30 2020 -0700

IMPALA-10044: Fix cleanup for bootstrap_toolchain.py failure case

If DownloadUnpackTarball::download()'s wget_and_unpack_package call
hits an exception, the exception handler cleans up any created
directories. Currently, it erroneously cleans up the directory where
the tarballs are downloaded even when it is not a temporary directory.
This would delete the entire toolchain.

This fixes the cleanup to only delete that directory if it is a
temporary directory.

Testing:
 - Simulated exception from wget_and_unpack_package and verified
   behavior.

Change-Id: Ia57f56b6717635af94247fce50b955c07a57d113
Reviewed-on: http://gerrit.cloudera.org:8080/16294
Reviewed-by: Laszlo Gaal 
Tested-by: Impala Public Jenkins 
---
 bin/bootstrap_toolchain.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/bin/bootstrap_toolchain.py b/bin/bootstrap_toolchain.py
index 647fc00..5d59da1 100755
--- a/bin/bootstrap_toolchain.py
+++ b/bin/bootstrap_toolchain.py
@@ -182,7 +182,8 @@ class DownloadUnpackTarball(object):
   # Clean up any partially-unpacked result.
   if os.path.isdir(unpack_dir):
 shutil.rmtree(unpack_dir)
-  if os.path.isdir(download_dir):
+  # Only delete the download directory if it is a temporary directory
+  if download_dir != self.destination_basedir and 
os.path.isdir(download_dir):
 shutil.rmtree(download_dir)
   raise
 if self.makedir:

[impala] 02/02: IMPALA-10005: Fix Snappy decompression for non-block filesystems

2020-08-06 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit dbbd40308a6d1cef77bfe45e016e775c918e0539
Author: Joe McDonnell 
AuthorDate: Thu Jul 23 20:44:30 2020 -0700

IMPALA-10005: Fix Snappy decompression for non-block filesystems

Snappy-compressed text always uses THdfsCompression::SNAPPY_BLOCKED
type compression in the backend. However, for non-block filesystems,
the frontend is incorrectly passing THdfsCompression::SNAPPY instead.
On debug builds, this leads to a DCHECK when trying to read
Snappy-compressed text. On release builds, it fails to decompress
the data.

This fixes the frontend to always pass THdfsCompression::SNAPPY_BLOCKED
for Snappy-compressed text.

This reworks query_test/test_compressed_formats.py to provide better
coverage:
 - Changed the RC and Seq test cases to verify that the file extension
   doesn't matter. Added Avro to this case as well.
 - Fixed the text case to use appropriate extensions (fixing IMPALA-9004)
 - Changed the utility function so it doesn't use Hive. This allows it
   to be enabled on non-HDFS filesystems like S3.
 - Changed the test to use unique_database and allow parallel execution.
 - Changed the test to run in the core job, so it now has coverage on
   the usual S3 test configuration. It is reasonably quick (1-2 minutes)
   and runs in parallel.

Testing:
 - Exhaustive job
 - Core s3 job
 - Changed the frontend to force it to use the code for non-block
   filesystems (i.e. the TFileSplitGeneratorSpec code) and
   verified that it is now able to read Snappy-compressed text.

Change-Id: I0879f2fc0bf75bb5c15cecb845ece46a901601ac
Reviewed-on: http://gerrit.cloudera.org:8080/16278
Tested-by: Impala Public Jenkins 
Reviewed-by: Sahil Takiar 
---
 .../org/apache/impala/catalog/HdfsCompression.java |  20 +-
 tests/query_test/test_compressed_formats.py| 202 +
 2 files changed, 135 insertions(+), 87 deletions(-)

diff --git a/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java 
b/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
index df76463..153106d 100644
--- a/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
+++ b/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
@@ -24,13 +24,15 @@ import com.google.common.base.Preconditions;
 import com.google.common.collect.ImmutableMap;
 
 /**
- * Support for recognizing compression suffixes on data files.
+ * Support for recognizing compression suffixes on data files. This is 
currently
+ * limited to text files. Other file formats embed metadata about the 
compression
+ * type and do not use the file suffixes.
  * Compression of a file is recognized in mapreduce by looking for suffixes of
  * supported codecs.
- * For now Impala supports GZIP, SNAPPY, BZIP2 and some additional formats if 
plugins
- * are available. Even if a plugin is available, we need to add the file 
suffixes here so
- * that we can resolve the compression type from the file name. LZO can use 
the specific
- * HIVE input class.
+ * For now Impala supports GZIP, SNAPPY_BLOCKED, BZIP2 and some additional 
formats if
+ * plugins are available. Even if a plugin is available, we need to add the 
file suffixes
+ * here so that we can resolve the compression type from the file name. LZO 
can use the
+ * specific HIVE input class.
  * Some compression types here are detected even though they are not 
supported. This
  * allows for better error messages (e.g. LZ4, LZO).
  */
@@ -39,7 +41,7 @@ public enum HdfsCompression {
   DEFLATE,
   GZIP,
   BZIP2,
-  SNAPPY,
+  SNAPPY_BLOCKED,
   LZO,
   LZO_INDEX, //Lzo index file.
   LZ4,
@@ -51,7 +53,7 @@ public enum HdfsCompression {
   put("deflate", DEFLATE).
   put("gz", GZIP).
   put("bz2", BZIP2).
-  put("snappy", SNAPPY).
+  put("snappy", SNAPPY_BLOCKED).
   put("lzo", LZO).
   put("index", LZO_INDEX).
   put("lz4", LZ4).
@@ -76,7 +78,7 @@ public enum HdfsCompression {
 case DEFLATE: return THdfsCompression.DEFLATE;
 case GZIP: return THdfsCompression.GZIP;
 case BZIP2: return THdfsCompression.BZIP2;
-case SNAPPY: return THdfsCompression.SNAPPY_BLOCKED;
+case SNAPPY_BLOCKED: return THdfsCompression.SNAPPY_BLOCKED;
 case LZO: return THdfsCompression.LZO;
 case LZ4: return THdfsCompression.LZ4;
 case ZSTD: return THdfsCompression.ZSTD;
@@ -90,7 +92,7 @@ public enum HdfsCompression {
   case DEFLATE: return FbCompression.DEFLATE;
   case GZIP: return FbCompression.GZIP;
   case BZIP2: return FbCompression.BZIP2;
-  case SNAPPY: return FbCompression.SNAPPY;
+  case SNAPPY_BLOCKED

[impala] branch master updated (c413f9b -> dbbd403)

2020-08-06 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from c413f9b  IMPALA-10047: Revert core piece of IMPALA-6984
 new 87aeb2a  IMPALA-9963: Implement ds_kll_n() function
 new dbbd403  IMPALA-10005: Fix Snappy decompression for non-block 
filesystems

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/exprs/datasketches-common.h |   2 +-
 be/src/exprs/datasketches-functions-ir.cc  |  11 ++
 be/src/exprs/datasketches-functions.h  |   5 +
 common/function-registry/impala_functions.py   |   2 +
 .../org/apache/impala/catalog/HdfsCompression.java |  20 +-
 .../queries/QueryTest/datasketches-kll.test|  37 
 tests/query_test/test_compressed_formats.py| 202 +
 7 files changed, 191 insertions(+), 88 deletions(-)

[impala] 01/02: IMPALA-9963: Implement ds_kll_n() function

2020-08-06 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 87aeb2ad78e2106f1d8df84d4d84975c7cde5b5a
Author: Gabor Kaszab 
AuthorDate: Thu Jul 30 09:41:00 2020 +0200

IMPALA-9963: Implement ds_kll_n() function

This function receives a serialized Apache DataSketches KLL sketch
and returns how many input values were fed into this sketch.

Change-Id: I166e87a468e68e888ac15fca7429ac2552dbb781
Reviewed-on: http://gerrit.cloudera.org:8080/16259
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/exprs/datasketches-common.h |  2 +-
 be/src/exprs/datasketches-functions-ir.cc  | 11 +++
 be/src/exprs/datasketches-functions.h  |  5 +++
 common/function-registry/impala_functions.py   |  2 ++
 .../queries/QueryTest/datasketches-kll.test| 37 ++
 5 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/be/src/exprs/datasketches-common.h 
b/be/src/exprs/datasketches-common.h
index 7560692..37a6458 100644
--- a/be/src/exprs/datasketches-common.h
+++ b/be/src/exprs/datasketches-common.h
@@ -37,7 +37,7 @@ const int DS_SKETCH_CONFIG = 12;
 /// Logs a common error message saying that sketch deserialization failed.
 void LogSketchDeserializationError(FunctionContext* ctx);
 
-/// Receives a serialized DataSketches sketch  (either Hll or KLL) in
+/// Receives a serialized DataSketches sketch (either Hll or KLL) in
 /// 'serialized_sketch', deserializes it and puts the deserialized sketch into 
'sketch'.
 /// The outgoing 'sketch' will hold the same configs as 'serialized_sketch' 
regardless of
 /// what was provided when it was constructed before this function call. 
Returns false if
diff --git a/be/src/exprs/datasketches-functions-ir.cc 
b/be/src/exprs/datasketches-functions-ir.cc
index d2898bc..b76cbe9 100644
--- a/be/src/exprs/datasketches-functions-ir.cc
+++ b/be/src/exprs/datasketches-functions-ir.cc
@@ -59,5 +59,16 @@ FloatVal 
DataSketchesFunctions::DsKllQuantile(FunctionContext* ctx,
   }
 }
 
+BigIntVal DataSketchesFunctions::DsKllN(FunctionContext* ctx,
+const StringVal& serialized_sketch) {
+  if (serialized_sketch.is_null || serialized_sketch.len == 0) return 
BigIntVal::null();
+  datasketches::kll_sketch sketch;
+  if (!DeserializeDsSketch(serialized_sketch, )) {
+LogSketchDeserializationError(ctx);
+return BigIntVal::null();
+  }
+  return sketch.get_n();
+}
+
 }
 
diff --git a/be/src/exprs/datasketches-functions.h 
b/be/src/exprs/datasketches-functions.h
index 143fd69..bd6b76c 100644
--- a/be/src/exprs/datasketches-functions.h
+++ b/be/src/exprs/datasketches-functions.h
@@ -42,6 +42,11 @@ public:
   /// of [0,1]. Otherwise this function returns error.
   static FloatVal DsKllQuantile(FunctionContext* ctx, const StringVal& 
serialized_sketch,
   const DoubleVal& rank);
+
+  /// 'serialized_sketch' is expected as a serialized Apache DataSketches KLL 
sketch. If
+  /// it is not, then the query fails.
+  /// Returns the number of input values fed to 'serialized_sketch'.
+  static BigIntVal DsKllN(FunctionContext* ctx, const StringVal& 
serialized_sketch);
 };
 
 }
diff --git a/common/function-registry/impala_functions.py 
b/common/function-registry/impala_functions.py
index 8398785..fbed357 100644
--- a/common/function-registry/impala_functions.py
+++ b/common/function-registry/impala_functions.py
@@ -935,6 +935,8 @@ visible_functions = [
   
'_ZN6impala21DataSketchesFunctions13DsHllEstimateEPN10impala_udf15FunctionContextERKNS1_9StringValE'],
   [['ds_kll_quantile'], 'FLOAT', ['STRING', 'DOUBLE'],
   
'_ZN6impala21DataSketchesFunctions13DsKllQuantileEPN10impala_udf15FunctionContextERKNS1_9StringValERKNS1_9DoubleValE'],
+  [['ds_kll_n'], 'BIGINT', ['STRING'],
+  
'_ZN6impala21DataSketchesFunctions6DsKllNEPN10impala_udf15FunctionContextERKNS1_9StringValE'],
 ]
 
 invisible_functions = [
diff --git 
a/testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test 
b/testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test
index b7b734b..ee240bf 100644
--- 
a/testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test
+++ 
b/testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test
@@ -144,3 +144,40 @@ FLOAT,FLOAT,FLOAT,FLOAT,FLOAT,FLOAT
  RESULTS
 100.169482422,25000.099609375,50.9152587891,NULL,50.5,NULL
 
+ QUERY
+# Check that ds_kll_n() returns null for an empty sketch.
+select ds_kll_n(ds_kll_sketch(cast(f2 as float))) from 
functional_parquet.emptytable;
+ RESULTS
+NULL
+ TYPES
+BIGINT
+
+ QUERY
+# Check that ds_kll_n() returns null for a null input.
+select ds_kll_n(c) from functional_parquet.nulltable;
+ RESULTS
+NULL
+ TYPES
+BIGINT
+
+ QUERY
+# Check that ds_kll_n() returns e

[impala] branch master updated: IMPALA-10047: Revert core piece of IMPALA-6984

2020-08-06 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new c413f9b  IMPALA-10047: Revert core piece of IMPALA-6984
c413f9b is described below

commit c413f9b558d51de877f497590baf14139ad5cf99
Author: Joe McDonnell 
AuthorDate: Tue Aug 4 17:29:19 2020 -0700

IMPALA-10047: Revert core piece of IMPALA-6984

Performance testing on TPC-DS found a peformance regression
on short queries due to delayed exec status reports. Further
testing traced this back to IMPALA-6984's behavior of
cancelling backends on EOS. The coordinator log show that
CancelBackends() call intermittently taking 10 seconds due
to timing out in the RPC layer.

As a temporary workaround, this reverts the core part of
IMPALA-6984 that added that CancelBackends() call for EOS.
It leaves the rest of IMPALA-6984 intact, as other code has built
on top of it.

Testing:
 - Core job
 - Performance tests

Change-Id: Ibf00a56e91f0376eaaa552e3bb4763501bfb49e8
(cherry picked from commit b91f3c0e064d592f3cdf2a2e089ca6546133ba55)
Reviewed-on: http://gerrit.cloudera.org:8080/16288
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 
---
 be/src/runtime/coordinator.cc | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/be/src/runtime/coordinator.cc b/be/src/runtime/coordinator.cc
index b57d66f..0ceae83 100644
--- a/be/src/runtime/coordinator.cc
+++ b/be/src/runtime/coordinator.cc
@@ -714,9 +714,7 @@ void Coordinator::HandleExecStateTransition(
   // execution and release resources.
   ReleaseExecResources();
   if (new_state == ExecState::RETURNED_RESULTS) {
-// Cancel all backends, but wait for the final status reports to be 
received so that
-// we have a complete profile for this successful query.
-CancelBackends(/*fire_and_forget=*/ false);
+// TODO: IMPALA-6984: cancel all backends in this case too.
 WaitForBackends();
   } else {
 CancelBackends(/*fire_and_forget=*/ true);

[impala] 02/02: IMPALA-9923: Load ORC serially to hack around flakiness

2020-08-05 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit dc2fdabbd1f2c930348671e17f885c5c54b628e4
Author: Joe McDonnell 
AuthorDate: Tue Aug 4 22:08:22 2020 -0700

IMPALA-9923: Load ORC serially to hack around flakiness

ORC dataload has been intermittently failing with
"Fail to get checksum, since file .../_orc_acid_version is under 
construction."
This is due to some Hive/HDFS interaction that seems to get
worse with parallelism.

This has been hitting a lot of developer tests. As a temporary
workaround, this changes dataload to load ORC serially. This is
slightly slower, but it should be more reliable.

Testing:
 - Ran precommit tests, manually verified dataload logs

Change-Id: I15eff1ec6cab32c1216ed7400e4c4b57bb81e4cd
Reviewed-on: http://gerrit.cloudera.org:8080/16292
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 
---
 bin/load-data.py | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/bin/load-data.py b/bin/load-data.py
index b461d7a..a7eb883 100755
--- a/bin/load-data.py
+++ b/bin/load-data.py
@@ -415,6 +415,7 @@ def main():
 
 impala_create_files = []
 hive_load_text_files = []
+hive_load_orc_files = []
 hive_load_nontext_files = []
 hbase_create_files = []
 hbase_postload_files = []
@@ -426,6 +427,8 @@ def main():
   elif hive_load_match in filename:
 if 'text-none-none' in filename:
   hive_load_text_files.append(filename)
+elif 'orc-def-block' in filename:
+  hive_load_orc_files.append(filename)
 else:
   hive_load_nontext_files.append(filename)
   elif hbase_create_match in filename:
@@ -448,6 +451,7 @@ def main():
 
 log_file_list("Impala Create Files:", impala_create_files)
 log_file_list("Hive Load Text Files:", hive_load_text_files)
+log_file_list("Hive Load Orc Files:", hive_load_orc_files)
 log_file_list("Hive Load Non-Text Files:", hive_load_nontext_files)
 log_file_list("HBase Create Files:", hbase_create_files)
 log_file_list("HBase Post-Load Files:", hbase_postload_files)
@@ -472,6 +476,13 @@ def main():
 # need to be loaded first
 assert(len(hive_load_text_files) <= 1)
 hive_exec_query_files_parallel(thread_pool, hive_load_text_files)
+# IMPALA-9923: Run ORC serially separately from other non-text formats. 
This hacks
+# around flakiness seen when loading this in parallel. This should be 
removed as
+# soon as possible.
+assert(len(hive_load_orc_files) <= 1)
+hive_exec_query_files_parallel(thread_pool, hive_load_orc_files)
+
+# Load all non-text formats (goes parallel)
 hive_exec_query_files_parallel(thread_pool, hive_load_nontext_files)
 
 assert(len(hbase_postload_files) <= 1)

[impala] 01/02: IMPALA-10037: Remove flaky test_mt_dop_scan_node

2020-08-05 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit f38ca7df8cf027fcaab4713a6b186b584cef
Author: Bikramjeet Vig 
AuthorDate: Tue Aug 4 17:14:37 2020 -0700

IMPALA-10037: Remove flaky test_mt_dop_scan_node

This test has inherent flakiness due to it relying on instances
fetching scan ranges from a shared queue. Therefore, this patch
removes the test since it was just a sanity check but its flakiness
outweighed its usefulness.

Change-Id: I1625872189ea7ac2d4e4d035956f784b6e18eb08
Reviewed-on: http://gerrit.cloudera.org:8080/16286
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 tests/query_test/test_mt_dop.py | 43 +
 1 file changed, 1 insertion(+), 42 deletions(-)

diff --git a/tests/query_test/test_mt_dop.py b/tests/query_test/test_mt_dop.py
index 8af3fa8..4f5b50d 100644
--- a/tests/query_test/test_mt_dop.py
+++ b/tests/query_test/test_mt_dop.py
@@ -37,6 +37,7 @@ WAIT_TIME_MS = build_flavor_timeout(6, 
slow_build_timeout=10)
 # the value 0 to cover the non-MT path as well.
 MT_DOP_VALUES = [0, 1, 2, 8]
 
+
 class TestMtDop(ImpalaTestSuite):
   @classmethod
   def add_test_dimensions(cls):
@@ -97,48 +98,6 @@ class TestMtDop(ImpalaTestSuite):
 assert expected_results in results.data
 
 
-class TestMtDopScanNode(ImpalaTestSuite):
-  @classmethod
-  def get_workload(self):
-return 'functional-query'
-
-  @classmethod
-  def add_test_dimensions(cls):
-super(TestMtDopScanNode, cls).add_test_dimensions()
-cls.ImpalaTestMatrix.add_constraint(
-  lambda v: v.get_value('table_format').file_format == 'text' and 
v.get_value(
-'table_format').compression_codec == 'none')
-
-  def test_mt_dop_scan_node(self, vector, unique_database):
-"""Regression test to make sure scan ranges are shared among all scan node 
instances
-when using mt_dop. This runs a selective hash join that will dynamically 
prune
-partitions leaving less than 5% of the data. Before IMPALA-9655 this would 
almost
-always result in a failure where at least one instance would have all its 
statically
-assigned scan ranges pruned."""
-fq_table_name = "%s.store_sales_subset" % unique_database
-self.execute_query("create table %s as select distinct(ss_sold_date_sk) as 
"
-   "sold_date from tpcds.store_sales limit 50" % 
fq_table_name)
-vector.get_value('exec_option')['mt_dop'] = 8
-vector.get_value('exec_option')['runtime_filter_wait_time_ms'] = 10
-
-# Since this depends on instances fetching scan ranges from a shared 
queue, running
-# it multiple times ensures any flakiness is removed. On a release build 
it has a
-# 0.05% failure rate.
-NUM_TRIES = 100
-failed_count = 0
-for i in xrange(NUM_TRIES):
-  try:
-result = self.execute_query(
-  "select count(ss_sold_date_sk) from tpcds.store_sales, %s where "
-  "ss_sold_date_sk = sold_date" % fq_table_name,
-  vector.get_value('exec_option'))
-assert "- BytesRead: 0" not in result.runtime_profile, 
result.runtime_profile
-break
-  except Exception:
-failed_count += 1
-if i == NUM_TRIES - 1: raise
-LOG.info("Num of times failed before success {0}".format(failed_count))
-
 class TestMtDopParquet(ImpalaTestSuite):
   @classmethod
   def get_workload(cls):

[impala] branch master updated (cc1eddb -> dc2fdab)

2020-08-05 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from cc1eddb  Add logging when query unregisters
 new f38ca7d  IMPALA-10037: Remove flaky test_mt_dop_scan_node
 new dc2fdab  IMPALA-9923: Load ORC serially to hack around flakiness

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 bin/load-data.py| 11 +++
 tests/query_test/test_mt_dop.py | 43 +
 2 files changed, 12 insertions(+), 42 deletions(-)

[impala] branch master updated: Add logging when query unregisters

2020-08-05 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new cc1eddb  Add logging when query unregisters
cc1eddb is described below

commit cc1eddbe193daf228dee1d53bb1e4ccd064d90a5
Author: Bikramjeet Vig 
AuthorDate: Tue Aug 4 17:10:02 2020 -0700

Add logging when query unregisters

This adds a log line which is printed when a query is successfully
unregistered by the async unregister thread pool. Added only for
additional observability.

Change-Id: I09be63afbee6b338a952a9b12321e028be9d7cb0
Reviewed-on: http://gerrit.cloudera.org:8080/16285
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/service/impala-server.cc | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/be/src/service/impala-server.cc b/be/src/service/impala-server.cc
index f1017ea..d4f05c8 100644
--- a/be/src/service/impala-server.cc
+++ b/be/src/service/impala-server.cc
@@ -1212,7 +1212,12 @@ void ImpalaServer::FinishUnregisterQuery(const 
QueryHandle& query_handle) {
   Status status = query_handle.query_driver()->Unregister(_driver_map_);
   string err_msg = "QueryDriver can only be deleted once: " + 
status.GetDetail();
   DCHECK(status.ok()) << err_msg;
-  if (UNLIKELY(!status.ok())) LOG(ERROR) << status.GetDetail();
+  if (UNLIKELY(!status.ok())) {
+LOG(ERROR) << status.GetDetail();
+  } else {
+VLOG_QUERY << "Query successfully unregistered: query_id="
+   << PrintId(query_handle->query_id());
+  }
 }
 
 void ImpalaServer::UnregisterQueryDiscardResult(

[impala] branch master updated: IMPALA-9633: Implement ds_hll_union()

2020-07-23 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new 9c542ef  IMPALA-9633: Implement ds_hll_union()
9c542ef is described below

commit 9c542ef5891f984300f9e5f45406caf145039e75
Author: Gabor Kaszab 
AuthorDate: Fri Jun 5 10:53:11 2020 +0200

IMPALA-9633: Implement ds_hll_union()

This function receives a set of sketches produced by ds_hll_sketch()
and merges them into a single sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
  ds_hll_estimate(ds_hll_union(sketch_col))
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Note, currently there is a known limitation of unioning string types
where some input sketches come from Impala and some from Hive. In
this case if there is an overlap in the input data used by Impala and
by Hive this overlapping data is still counted twice due to some
string representation difference between Impala and Hive.
For more details see:
https://issues.apache.org/jira/browse/IMPALA-9939

Testing:
  - Apart from the automated tests I added to this patch I also
tested ds_hll_union() on a bigger dataset to check that
serialization, deserialization and merging steps work well. I
took TPCH25.linelitem, created a number of sketches with grouping
by l_shipdate and called ds_hll_union() on those sketches.

Change-Id: I67cdbf6f3ebdb1296fea38465a15642bc9612d09
Reviewed-on: http://gerrit.cloudera.org:8080/16095
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/exprs/CMakeLists.txt|   1 +
 be/src/exprs/aggregate-functions-ir.cc | 100 +
 be/src/exprs/aggregate-functions.h |   8 +-
 ...ches-functions-ir.cc => datasketches-common.cc} |  28 +++---
 be/src/exprs/datasketches-common.h |  49 ++
 be/src/exprs/datasketches-functions-ir.cc  |  17 ++--
 .../java/org/apache/impala/catalog/BuiltinsDb.java |  14 +++
 testdata/data/README   |   4 +
 testdata/data/hll_sketches_from_impala.parquet | Bin 0 -> 3501 bytes
 .../queries/QueryTest/datasketches-hll.test|  60 +
 tests/query_test/test_datasketches.py  |   1 +
 11 files changed, 241 insertions(+), 41 deletions(-)

diff --git a/be/src/exprs/CMakeLists.txt b/be/src/exprs/CMakeLists.txt
index e0ed683..7af6145 100644
--- a/be/src/exprs/CMakeLists.txt
+++ b/be/src/exprs/CMakeLists.txt
@@ -36,6 +36,7 @@ add_library(Exprs
   compound-predicates-ir.cc
   conditional-functions.cc
   conditional-functions-ir.cc
+  datasketches-common.cc
   datasketches-functions-ir.cc
   date-functions-ir.cc
   decimal-functions-ir.cc
diff --git a/be/src/exprs/aggregate-functions-ir.cc 
b/be/src/exprs/aggregate-functions-ir.cc
index 06395f40..5b87d0b 100644
--- a/be/src/exprs/aggregate-functions-ir.cc
+++ b/be/src/exprs/aggregate-functions-ir.cc
@@ -29,6 +29,7 @@
 #include "codegen/impala-ir.h"
 #include "common/logging.h"
 #include "exprs/anyval-util.h"
+#include "exprs/datasketches-common.h"
 #include "exprs/hll-bias.h"
 #include "gutil/strings/substitute.h"
 #include "runtime/date-value.h"
@@ -1611,23 +1612,18 @@ BigIntVal 
AggregateFunctions::HllFinalize(FunctionContext* ctx, const StringVal&
   return estimate;
 }
 
-/// Config for DataSketches HLL algorithm to set the size of each entry within 
the
-/// sketch.
+/// Auxiliary function that receives an input type that has a 
serialize_compact()
+/// function (e.g. hll_sketch or hll_union) and returns the serialized version 
of it
+/// wrapped into a StringVal.
 /// Introducing this variable in the .cc to avoid including the whole 
DataSketches HLL
 /// functionality into the header.
-const datasketches::target_hll_type DS_HLL_TYPE = 
datasketches::target_hll_type::HLL_4;
-
-/// Auxiliary function that receives a hll_sketch and returns the serialized 
version of
-/// it wrapped into a StringVal.
-/// Introducing this function in the .cc to avoid including the whole 
DataSketches HLL
-/// functionality into the header.
-StringVal SerializeDsHllSketch(FunctionContext* ctx,
-const datasketches::hll_sketch& sketch) {
-  std::stringstream serialized_sketch;
-  sketch.serialize_compact(serialized_sketch);
-  std::string serialized_sketch_str = serialized_sketch.str();
-  StringVal dst(ctx, serialized_sketch_str.size());
-  memcpy(dst.ptr, serialized_sketch_str.c_str(), serialized_

[impala] branch master updated: IMPALA-9887: Add support for sharding end-to-end tests

2020-07-12 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new 605e301  IMPALA-9887: Add support for sharding end-to-end tests
605e301 is described below

commit 605e301739b8ef7619482db9b13444e84145b219
Author: Joe McDonnell 
AuthorDate: Wed Jun 24 12:27:04 2020 -0700

IMPALA-9887: Add support for sharding end-to-end tests

ASAN maintains stacks for each allocation and free of memory. Impala
sometimes allocates/frees memory from codegen'd code, so this means
that the number of distinct stacks is unbounded. ASAN is storing
these stacks in a hash table with a fixed number of buckets (one million).
As the stacks accumulate, allocations and frees get slower and slower,
because the lookup in this hashtable gets slower. This causes test
execution time to degrade over time. Since backend tests and custom cluster
tests don't have long running daemons, only the end to end tests are
affected.

This adds support for breaking end-to-end test execution into shards,
restarting Impala between each shard. This uses the preexisting shard_tests
pytest functionality introduced for the docker-based tests in IMPALA-6070.
The number of shards is configurable via the EE_TEST_SHARDS environment
variable. By default, EE_TEST_SHARDS=1 and no sharding is used.

Without sharding, an ASAN core job takes about 16-17 hours. With 6 shards,
it takes about 9 hours. It is recommended to always use sharding with ASAN.

Testing:
 - Ran core job
 - Ran ASAN with EE_TEST_SHARDS=6

Change-Id: I0bdbd79940df2bc7b951efdf0f044e6b40a3fda9
Reviewed-on: http://gerrit.cloudera.org:8080/16155
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 bin/run-all-tests.sh | 43 +--
 tests/run-tests.py   | 31 +++
 2 files changed, 64 insertions(+), 10 deletions(-)

diff --git a/bin/run-all-tests.sh b/bin/run-all-tests.sh
index 3a1f8b8..5287861 100755
--- a/bin/run-all-tests.sh
+++ b/bin/run-all-tests.sh
@@ -46,6 +46,7 @@ fi
 # Run End-to-end Tests
 : ${EE_TEST:=true}
 : ${EE_TEST_FILES:=}
+: ${EE_TEST_SHARDS:=1}
 # Run JDBC Test
 : ${JDBC_TEST:=true}
 # Run Cluster Tests
@@ -158,6 +159,8 @@ LOG_DIR="${IMPALA_EE_TEST_LOGS_DIR}"
 # Enable core dumps
 ulimit -c unlimited || true
 
+TEST_RET_CODE=0
+
 # Helper function to start Impala cluster.
 start_impala_cluster() {
   # TODO: IMPALA-9812: remove --unlock_mt_dop when it is no longer needed.
@@ -167,6 +170,21 @@ start_impala_cluster() {
   ${TEST_START_CLUSTER_ARGS} --impalad_args=--unlock_mt_dop=true
 }
 
+run_ee_tests() {
+  if [[ $# -gt 0 ]]; then
+EXTRA_ARGS=${1}
+  else
+EXTRA_ARGS=""
+  fi
+  # Run end-to-end tests.
+  # KERBEROS TODO - this will need to deal with ${KERB_ARGS}
+  if ! "${IMPALA_HOME}/tests/run-tests.py" ${COMMON_PYTEST_ARGS} \
+  ${RUN_TESTS_ARGS} ${EXTRA_ARGS} ${EE_TEST_FILES}; then
+#${KERB_ARGS};
+TEST_RET_CODE=1
+  fi
+}
+
 for i in $(seq 1 $NUM_TEST_ITERATIONS)
 do
   TEST_RET_CODE=0
@@ -231,12 +249,25 @@ do
   fi
 
   if [[ "$EE_TEST" == true ]]; then
-# Run end-to-end tests.
-# KERBEROS TODO - this will need to deal with ${KERB_ARGS}
-if ! "${IMPALA_HOME}/tests/run-tests.py" ${COMMON_PYTEST_ARGS} \
-${RUN_TESTS_ARGS} ${EE_TEST_FILES}; then
-  #${KERB_ARGS};
-  TEST_RET_CODE=1
+if [[ ${EE_TEST_SHARDS} -lt 2 ]]; then
+  # For runs without sharding, avoid adding the "--shard_tests" parameter.
+  # Some test frameworks (e.g. the docker-based tests) use this.
+  run_ee_tests
+else
+  # When the EE tests are sharded, it runs 1/Nth of the tests at a time, 
restarting
+  # Impala between the shards. There are two benefits:
+  # 1. It isolates errors so that if Impala crashes, the next shards will 
still run
+  #with a fresh Impala.
+  # 2. For ASAN runs, resources accumulate over test execution, so tests 
get slower
+  #over time (see IMPALA-9887). Running shards with regular restarts
+  #substantially speeds up execution time.
+  #
+  # Shards are 1 indexed (i.e. 1/N through N/N). This shards both serial 
and
+  # parallel tests.
+  for (( shard_idx=1 ; shard_idx <= ${EE_TEST_SHARDS} ; shard_idx++ )); do
+run_ee_tests "--shard_tests=$shard_idx/${EE_TEST_SHARDS}"
+start_impala_cluster
+  done
 fi
   fi
 
diff --git a/tests/run-tests.py b/tests/run-tests.py
index 55b002a..8f1e8d3 100755
--- a/tests/run-tests.py
+++ b/tests/run-tests.py
@@ -282,22 +282,44 @@ if __name__ == "__main__":
 run(sys.argv[1:])
   else:
 print_metrics('connections')
+
+# If using sha

[impala] 01/02: IMPALA-9531: Dropped support for dateless timestamps

2020-07-08 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 1bafb7bd29f4ecf1706d35e274c2b701a32281ac
Author: Adam Tamas 
AuthorDate: Tue Apr 7 10:07:47 2020 +0200

IMPALA-9531: Dropped support for dateless timestamps

Removed the support for dateless timestamps.
During dateless timestamp casts if the format doesn't contain
date part we get an error during tokenization of the format.
If the input str doesn't contain a date part then we get null result.

Examples:
select cast('01:02:59' as timestamp);
This will come back as NULL value.

select to_timestamp('01:01:01', 'HH:mm:ss');
select cast('01:02:59' as timestamp format 'HH12:MI:SS');
select cast('12 AM' as timestamp FORMAT 'AM.HH12');
These will come back with a parsing errors.

Casting from a table will generate similar results.

Testing:
Modified the previous tests related to dateless timestamps.
Added test to read fromtables which are still containing dateless
timestamps and covered timestamp to string path when no date tokens
are requested in the output string.

Change-Id: I48c49bf027cc4b917849b3d58518facba372b322
Reviewed-on: http://gerrit.cloudera.org:8080/15866
Tested-by: Impala Public Jenkins 
Reviewed-by: Gabor Kaszab 
---
 be/src/benchmarks/convert-timestamp-benchmark.cc   |   2 +-
 be/src/benchmarks/parse-timestamp-benchmark.cc |   4 +-
 be/src/exec/text-converter.inline.h|   2 +-
 be/src/exprs/cast-functions-ir.cc  |   6 -
 be/src/exprs/expr-test.cc  |  35 +-
 be/src/exprs/scalar-expr-evaluator.cc  |   2 +-
 be/src/exprs/timestamp-functions-ir.cc |  12 +-
 be/src/exprs/timestamp-functions.cc|  18 ++-
 be/src/exprs/timestamp-functions.h |   6 +-
 be/src/runtime/date-parse-util.cc  |   2 +-
 be/src/runtime/date-test.cc|   6 +-
 .../runtime/datetime-iso-sql-format-tokenizer.cc   |   3 +
 be/src/runtime/datetime-parser-common.cc   |   3 +
 be/src/runtime/datetime-parser-common.h|   1 +
 .../runtime/datetime-simple-date-format-parser.cc  | 136 -
 .../runtime/datetime-simple-date-format-parser.h   |  10 +-
 be/src/runtime/timestamp-parse-util.cc |   6 +-
 be/src/runtime/timestamp-test.cc   |  67 +-
 be/src/runtime/timestamp-value.h   |   3 +-
 bin/rat_exclude_files.txt  |   1 +
 common/function-registry/impala_functions.py   |  10 +-
 .../apache/impala/analysis/AnalyzeKuduDDLTest.java |   3 +-
 testdata/data/README   |  12 ++
 testdata/data/dateless_timestamps.parq | Bin 0 -> 435 bytes
 testdata/data/dateless_timestamps.txt  |   7 ++
 testdata/data/lazy_timestamp.csv   |   7 --
 .../functional-query/queries/QueryTest/date.test   |   7 --
 .../QueryTest/dateless_timestamp_parquet.test  |  25 
 .../queries/QueryTest/dateless_timestamp_text.test |  29 +
 .../functional-query/queries/QueryTest/exprs.test  |  20 ++-
 .../queries/QueryTest/select-lazy-timestamp.test   |   7 --
 tests/data_errors/test_data_errors.py  |   4 +-
 tests/query_test/test_cast_with_format.py  |  23 +++-
 tests/query_test/test_scanners.py  |  24 
 34 files changed, 275 insertions(+), 228 deletions(-)

diff --git a/be/src/benchmarks/convert-timestamp-benchmark.cc 
b/be/src/benchmarks/convert-timestamp-benchmark.cc
index a1d9331..c263fcf 100644
--- a/be/src/benchmarks/convert-timestamp-benchmark.cc
+++ b/be/src/benchmarks/convert-timestamp-benchmark.cc
@@ -166,7 +166,7 @@ fast path speedup: 10.2951
 vector AddTestDataDateTimes(int n, const string& startstr) {
   DateTimeFormatContext dt_ctx;
   dt_ctx.Reset("-MMM-dd HH:mm:ss");
-  SimpleDateFormatTokenizer::Tokenize(_ctx);
+  SimpleDateFormatTokenizer::Tokenize(_ctx, PARSE);
 
   random_device rd;
   mt19937 gen(rd());
diff --git a/be/src/benchmarks/parse-timestamp-benchmark.cc 
b/be/src/benchmarks/parse-timestamp-benchmark.cc
index c7ca51a..8d42311 100644
--- a/be/src/benchmarks/parse-timestamp-benchmark.cc
+++ b/be/src/benchmarks/parse-timestamp-benchmark.cc
@@ -258,9 +258,9 @@ int main(int argc, char **argv) {
   timestamp_suite.AddBenchmark("Impala", TestImpalaSimpleDateFormat, );
 
   dt_ctx_simple_date_format.Reset("-MM-dd HH:mm:ss", 19);
-  SimpleDateFormatTokenizer::Tokenize(_ctx_simple_date_format);
+  SimpleDateFormatTokenizer::Tokenize(_ctx_simple_date_format, PARSE);
   dt_ctx_tz_simple_date_format.Reset("-MM-dd HH:mm:ss+hh:mm", 25);
-  SimpleDateFormatTokenizer::Tokenize(_ctx_tz_simple_date_format);
+  SimpleDateFormatTokeni

[impala] 02/02: IMPALA-7923: DecimalValue should be marked as packed

2020-07-08 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 45c105d71d47c4c57e042b9bf8a0d8d8044083bc
Author: Daniel Becker 
AuthorDate: Thu Jul 2 00:06:09 2020 +0200

IMPALA-7923: DecimalValue should be marked as packed

IMPALA-7473 and IMPALA-9111 were symptoms of a more general problem that
DecimalValue is not guaranteed to be aligned by the Impala runtime but
the compiler assumes it is and under some circumstances, it will emit
code for aligned loads to value_ when value_ is an int128.

This commit marks DecimalValue as packed so that the compiler does not
assume any alignment.

TODO: Maybe benchmark if this introduces performance regressions, but it
shouldn't.

Change-Id: I55f936a4f4f4b5faf129a9265222e64fc486b8ed
Reviewed-on: http://gerrit.cloudera.org:8080/16134
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/exec/hdfs-avro-scanner-test.cc | 17 +++--
 be/src/exec/orc-column-readers.h  |  2 +-
 be/src/runtime/decimal-value.h| 23 +--
 be/src/runtime/decimal-value.inline.h |  5 +
 be/src/util/decimal-util.h|  2 +-
 be/src/util/dict-test.cc  |  7 ---
 6 files changed, 35 insertions(+), 21 deletions(-)

diff --git a/be/src/exec/hdfs-avro-scanner-test.cc 
b/be/src/exec/hdfs-avro-scanner-test.cc
index 621247b..e7086c8 100644
--- a/be/src/exec/hdfs-avro-scanner-test.cc
+++ b/be/src/exec/hdfs-avro-scanner-test.cc
@@ -461,7 +461,8 @@ TEST_F(HdfsAvroScannerTest, DecimalTest) {
   // Unscaled value can be stored in 4 bytes
   data[0] = 8; // decodes to 4
 #if __BYTE_ORDER == __LITTLE_ENDIAN
-  BitUtil::ByteSwap([1], (), 4);
+  const Decimal4Value::StorageType d4v_value = d4v.value();
+  BitUtil::ByteSwap([1], _value, 4);
 #else
   memcpy([1], (), 4);
 #endif
@@ -482,7 +483,8 @@ TEST_F(HdfsAvroScannerTest, DecimalTest) {
   d8v = Decimal8Value(123456789012345678);
   data[0] = 16; // decodes to 8
 #if __BYTE_ORDER == __LITTLE_ENDIAN
-  BitUtil::ByteSwap([1], (), 8);
+  const Decimal8Value::StorageType d8v_value = d8v.value();
+  BitUtil::ByteSwap([1], _value, 8);
 #else
   memcpy([1], (), 8);
 #endif
@@ -495,7 +497,8 @@ TEST_F(HdfsAvroScannerTest, DecimalTest) {
   Decimal16Value d16v(1234567890);
   data[0] = 10; // decodes to 5
 #if __BYTE_ORDER == __LITTLE_ENDIAN
-  BitUtil::ByteSwap([1], (), 5);
+  const Decimal16Value::StorageType d16v_value = d16v.value();
+  BitUtil::ByteSwap([1], _value, 5);
 #else
   memcpy([1], (), 5);
 #endif
@@ -506,12 +509,14 @@ TEST_F(HdfsAvroScannerTest, DecimalTest) {
   TestReadAvroDecimal(data, 4, d16v, -1, TErrorCode::AVRO_TRUNCATED_BLOCK);
 
   /// Produce a very large decimal value.
-  memset((), 0xFF, sizeof(d16v.value()));
+  Decimal16Value::StorageType d16v_value2;
+  memset(_value2, 0xFF, sizeof(d16v_value2));
+  d16v.set_value(d16v_value2);
   data[0] = 32; // decodes to 16
 #if __BYTE_ORDER == __LITTLE_ENDIAN
-  BitUtil::ByteSwap([1], (), 16);
+  BitUtil::ByteSwap([1], _value2, 16);
 #else
-  memcpy([1], (), 16);
+  memcpy([1], _value2, 16);
 #endif
   TestReadAvroDecimal(data, 17, d16v, 17);
   TestReadAvroDecimal(data, 20, d16v, 17);
diff --git a/be/src/exec/orc-column-readers.h b/be/src/exec/orc-column-readers.h
index e45b0a2..e50c216 100644
--- a/be/src/exec/orc-column-readers.h
+++ b/be/src/exec/orc-column-readers.h
@@ -420,7 +420,7 @@ class OrcDecimalColumnReader
   return Status::OK();
 }
 int64_t val = batch_->values.data()[row_idx];
-reinterpret_cast(OrcColumnReader::GetSlot(tuple))->value() 
= val;
+
reinterpret_cast(OrcColumnReader::GetSlot(tuple))->set_value(val);
 return Status::OK();
   }
 
diff --git a/be/src/runtime/decimal-value.h b/be/src/runtime/decimal-value.h
index e329476..c2744e0 100644
--- a/be/src/runtime/decimal-value.h
+++ b/be/src/runtime/decimal-value.h
@@ -40,8 +40,13 @@ namespace impala {
 /// Overflow is handled by an output return parameter. Functions should set 
this
 /// to true if overflow occured and leave it *unchanged* otherwise (e.g. |= 
rather than =).
 /// This allows the caller to not have to check overflow after every call.
+///
+/// Values of this class may be unaligned so we mark it as "packed" so that 
the compiler
+/// does not assume proper alignment. If the compiler assumes that the value 
is aligned it
+/// may generate aligned load instructions (for example 'vmovdqa') which fail 
in case the
+/// value is actually misaligned.
 template
-class DecimalValue {
+class __attribute__ ((packed)) DecimalValue {
  public:
   typedef T StorageType;
 
@@ -49,8 +54,7 @@ class DecimalValue {
   DecimalValue(const T& s) : value_(s) { }
 
   DecimalValue& operator=(const T& s) {
-// 'value_' may be unaligned. Use memcpy to avoid an unaligned store.
-memcpy(_, , sizeof(T)

[impala] branch master updated (3b820d7 -> 45c105d)

2020-07-08 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from 3b820d7  IMPALA-9921: Change error messages in checking needsQuotes to 
TRACE level logs
 new 1bafb7b  IMPALA-9531: Dropped support for dateless timestamps
 new 45c105d  IMPALA-7923: DecimalValue should be marked as packed

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/benchmarks/convert-timestamp-benchmark.cc   |   2 +-
 be/src/benchmarks/parse-timestamp-benchmark.cc |   4 +-
 be/src/exec/hdfs-avro-scanner-test.cc  |  17 ++-
 be/src/exec/orc-column-readers.h   |   2 +-
 be/src/exec/text-converter.inline.h|   2 +-
 be/src/exprs/cast-functions-ir.cc  |   6 -
 be/src/exprs/expr-test.cc  |  35 +-
 be/src/exprs/scalar-expr-evaluator.cc  |   2 +-
 be/src/exprs/timestamp-functions-ir.cc |  12 +-
 be/src/exprs/timestamp-functions.cc|  18 ++-
 be/src/exprs/timestamp-functions.h |   6 +-
 be/src/runtime/date-parse-util.cc  |   2 +-
 be/src/runtime/date-test.cc|   6 +-
 .../runtime/datetime-iso-sql-format-tokenizer.cc   |   3 +
 be/src/runtime/datetime-parser-common.cc   |   3 +
 be/src/runtime/datetime-parser-common.h|   1 +
 .../runtime/datetime-simple-date-format-parser.cc  | 136 -
 .../runtime/datetime-simple-date-format-parser.h   |  10 +-
 be/src/runtime/decimal-value.h |  23 +++-
 be/src/runtime/decimal-value.inline.h  |   5 +-
 be/src/runtime/timestamp-parse-util.cc |   6 +-
 be/src/runtime/timestamp-test.cc   |  67 +-
 be/src/runtime/timestamp-value.h   |   3 +-
 be/src/util/decimal-util.h |   2 +-
 be/src/util/dict-test.cc   |   7 +-
 bin/rat_exclude_files.txt  |   1 +
 common/function-registry/impala_functions.py   |  10 +-
 .../apache/impala/analysis/AnalyzeKuduDDLTest.java |   3 +-
 testdata/data/README   |  12 ++
 testdata/data/dateless_timestamps.parq | Bin 0 -> 435 bytes
 testdata/data/dateless_timestamps.txt  |   7 ++
 testdata/data/lazy_timestamp.csv   |   7 --
 .../functional-query/queries/QueryTest/date.test   |   7 --
 .../QueryTest/dateless_timestamp_parquet.test  |  25 
 .../queries/QueryTest/dateless_timestamp_text.test |  29 +
 .../functional-query/queries/QueryTest/exprs.test  |  20 ++-
 .../queries/QueryTest/select-lazy-timestamp.test   |   7 --
 tests/data_errors/test_data_errors.py  |   4 +-
 tests/query_test/test_cast_with_format.py  |  23 +++-
 tests/query_test/test_scanners.py  |  24 
 40 files changed, 310 insertions(+), 249 deletions(-)
 create mode 100644 testdata/data/dateless_timestamps.parq
 create mode 100644 testdata/data/dateless_timestamps.txt
 create mode 100644 
testdata/workloads/functional-query/queries/QueryTest/dateless_timestamp_parquet.test
 create mode 100644 
testdata/workloads/functional-query/queries/QueryTest/dateless_timestamp_text.test

[impala] branch master updated: IMPALA-9515: Full ACID Milestone 3: Read support for "original files"

2020-06-29 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new 930264a  IMPALA-9515: Full ACID Milestone 3: Read support for 
"original files"
930264a is described below

commit 930264afbdc6d309a30e2c7e1eef9fd7129ef29b
Author: Zoltan Borok-Nagy 
AuthorDate: Tue May 19 11:47:08 2020 +0200

IMPALA-9515: Full ACID Milestone 3: Read support for "original files"

"Original files" are files that don't have full ACID schema. We can see
such files if we upgrade a non-ACID table to full ACID. Also, the LOAD
DATA statement can load non-ACID files into full ACID tables. So such
files don't store special ACID columns, that means we need
to auto-generate their values. These are (operation,
originalTransaction, bucket, rowid, and currentTransaction).

With the exception of 'rowid', all of them can be calculated based on
the file path, so I add their values to the scanner's template tuple.

'rowid' is the ordinal number of the row inside a bucket inside a
directory. For now Impala only allows one file per bucket per
directory. Therefore we can generate row ids for each file
independently.

Multiple files in a single bucket in a directory can only be present if
the table was non-transactional earlier and we upgraded it to full ACID
table. After the first compaction we should only see one original file
per bucket per directory.

In HdfsOrcScanner we calculate the first row id for our split then
the OrcStructReader fills the rowid slot with the proper values.

Testing:
 * added e2e tests to check if the generated values are correct
 * added e2e test to reject tables that have multiple files per bucket
 * added unit tests to the new auxiliary functions

Change-Id: I176497ef9873ed7589bd3dee07d048a42dfad953
Reviewed-on: http://gerrit.cloudera.org:8080/16001
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/exec/acid-metadata-utils-test.cc|  29 +++
 be/src/exec/acid-metadata-utils.cc |  95 ++--
 be/src/exec/acid-metadata-utils.h  |   7 +-
 be/src/exec/hdfs-orc-scanner.cc|  75 ++-
 be/src/exec/hdfs-orc-scanner.h |  12 +
 be/src/exec/orc-column-readers.cc  |  37 ++-
 be/src/exec/orc-column-readers.h   |   9 +
 be/src/exec/orc-metadata-utils.cc  | 161 ++
 be/src/exec/orc-metadata-utils.h   |  41 
 testdata/data/README   |   5 +
 testdata/data/alltypes_non_acid.orc| Bin 0 -> 34176 bytes
 .../functional/functional_schema_template.sql  |  25 +++
 .../datasets/functional/schema_constraints.csv |   1 +
 .../queries/QueryTest/acid-negative.test   |  20 ++
 .../queries/QueryTest/full-acid-original-file.test | 247 +
 tests/query_test/test_acid.py  |  23 ++
 16 files changed, 711 insertions(+), 76 deletions(-)

diff --git a/be/src/exec/acid-metadata-utils-test.cc 
b/be/src/exec/acid-metadata-utils-test.cc
index 7db3e57..e2c4266 100644
--- a/be/src/exec/acid-metadata-utils-test.cc
+++ b/be/src/exec/acid-metadata-utils-test.cc
@@ -207,3 +207,32 @@ TEST(ValidWriteIdListTest, IsCompacted) {
   EXPECT_FALSE(ValidWriteIdList::IsCompacted("/foo/000"));
   EXPECT_FALSE(ValidWriteIdList::IsCompacted("/foo/p=1/000"));
 }
+
+TEST(ValidWriteIdListTest, GetWriteIdRange) {
+  EXPECT_EQ((make_pair(0, 0)),
+  ValidWriteIdList::GetWriteIdRange("/foo/0_0"));
+  EXPECT_EQ((make_pair(5, 5)),
+  ValidWriteIdList::GetWriteIdRange("/foo/base_5/000"));
+  EXPECT_EQ((make_pair(5, 5)),
+  ValidWriteIdList::GetWriteIdRange("/foo/base_5_v123/000"));
+  EXPECT_EQ((make_pair(5 ,10)),
+  ValidWriteIdList::GetWriteIdRange("/foo/delta_5_00010/000"));
+  EXPECT_EQ((make_pair(5 ,10)),
+  ValidWriteIdList::GetWriteIdRange("/foo/delta_5_00010_0006/000"));
+  EXPECT_EQ((make_pair(5 ,10)),
+  ValidWriteIdList::GetWriteIdRange("/foo/delta_5_00010_v123/000"));
+}
+
+TEST(ValidWriteIdListTest, GetBucketProperty) {
+  EXPECT_EQ(536870912, ValidWriteIdList::GetBucketProperty("/foo/000_0"));
+  EXPECT_EQ(536936448, ValidWriteIdList::GetBucketProperty("/foo/001_1"));
+  EXPECT_EQ(537001984, 
ValidWriteIdList::GetBucketProperty("/foo/bucket_2"));
+  EXPECT_EQ(537067520, ValidWriteIdList::GetBucketProperty(
+  "/foo/base_0001_v1/bucket_03_0"));
+  EXPECT_EQ(537133056, ValidWriteIdList::GetBucketProperty(
+  "/foo/delta_1_5/bucket_000

[impala] branch master updated: IMPALA-9878: Fix use-after-free in TmpFileMgrTest's TestAllocation

2020-06-22 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new 7b1cfac  IMPALA-9878: Fix use-after-free in TmpFileMgrTest's 
TestAllocation
7b1cfac is described below

commit 7b1cfacbc6c4c709947cb91517baa9ec364afee1
Author: Joe McDonnell 
AuthorDate: Mon Jun 22 09:36:33 2020 -0700

IMPALA-9878: Fix use-after-free in TmpFileMgrTest's TestAllocation

ASAN found a use-after-free for the in this code:
  file_group.Close(); <--- free underlying storage for 'file'
  EXPECT_FALSE(boost::filesystem::exists(file->path())); <-- use 'file'
This switches it to a copy of file->path().

Testing:
 - Ran tmp-file-mgr-test under ASAN

Change-Id: Idd5cbae70c287c78db8d1c560d8c777d6bed5b56
Reviewed-on: http://gerrit.cloudera.org:8080/16099
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 
---
 be/src/runtime/tmp-file-mgr-test.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/be/src/runtime/tmp-file-mgr-test.cc 
b/be/src/runtime/tmp-file-mgr-test.cc
index 57a409b..fcf581c 100644
--- a/be/src/runtime/tmp-file-mgr-test.cc
+++ b/be/src/runtime/tmp-file-mgr-test.cc
@@ -278,7 +278,7 @@ TEST_F(TmpFileMgrTest, TestFileAllocation) {
   // tmp file is only allocated on writes.
   EXPECT_OK(FileSystemUtil::CreateFile(file->path()));
   file_group.Close();
-  EXPECT_FALSE(boost::filesystem::exists(file->path()));
+  EXPECT_FALSE(boost::filesystem::exists(file_path));
   CheckMetrics(_file_mgr);
 }

[impala] branch master updated: IMPALA-3695: Remove KUDU_IS_SUPPORTED

2020-06-17 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new 6ec6aaa  IMPALA-3695: Remove KUDU_IS_SUPPORTED
6ec6aaa is described below

commit 6ec6aaae8edc552feb3416bebba0ed355c36e46e
Author: Tim Armstrong 
AuthorDate: Mon Jun 15 21:33:34 2020 -0700

IMPALA-3695: Remove KUDU_IS_SUPPORTED

Testing:
Ran exhaustive tests.

Change-Id: I059d7a42798c38b570f25283663c284f2fcee517
Reviewed-on: http://gerrit.cloudera.org:8080/16085
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 CMakeLists.txt |   2 -
 bin/bootstrap_toolchain.py | 137 +
 bin/impala-config.sh   |  13 --
 common/thrift/generate_error_codes.py  |   2 +-
 docker/entrypoint.sh   |  22 ++--
 .../java/org/apache/impala/common/RuntimeEnv.java  |   3 -
 .../org/apache/impala/analysis/AnalyzeDDLTest.java |   6 +-
 .../apache/impala/analysis/AnalyzeKuduDDLTest.java |   3 -
 .../impala/analysis/AnalyzeModifyStmtsTest.java|   7 --
 .../apache/impala/analysis/AnalyzeStmtsTest.java   |  14 +--
 .../impala/analysis/AnalyzeUpsertStmtTest.java |   1 -
 .../apache/impala/analysis/AuditingKuduTest.java   |   1 -
 .../apache/impala/analysis/ExprRewriterTest.java   |  62 +-
 .../org/apache/impala/analysis/ParserTest.java |   1 -
 .../java/org/apache/impala/analysis/ToSqlTest.java |   2 -
 .../org/apache/impala/planner/PlannerTest.java |   6 -
 .../org/apache/impala/planner/PlannerTestBase.java |   4 +-
 .../java/org/apache/impala/testutil/TestUtils.java |   4 -
 infra/python/bootstrap_virtualenv.py   |   6 +-
 testdata/bin/compute-table-stats.sh|   5 +-
 testdata/bin/create-load-data.sh   |   2 +-
 testdata/cluster/admin |   4 +-
 tests/common/kudu_test_suite.py|   3 -
 tests/common/skip.py   |   4 -
 tests/common/test_dimensions.py|   8 +-
 tests/comparison/leopard/impala_docker_env.py  |  43 ---
 tests/metadata/test_ddl.py |   1 -
 tests/metadata/test_show_create_table.py   |   2 -
 tests/query_test/test_resource_limits.py   |   3 +-
 tests/shell/test_shell_commandline.py  |   1 -
 30 files changed, 80 insertions(+), 292 deletions(-)

diff --git a/CMakeLists.txt b/CMakeLists.txt
index bc8c983..0f273bd 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -402,8 +402,6 @@ else()
 set(kuduClient_DIR "$ENV{IMPALA_KUDU_HOME}/release/share/kuduClient/cmake")
   endif()
 endif()
-# When KUDU_IS_SUPPORTED is false, the Kudu client is expected to be a 
non-functional
-# stub. It's still needed to link though.
 find_package(kuduClient REQUIRED NO_DEFAULT_PATH)
 include_directories(SYSTEM ${KUDU_CLIENT_INCLUDE_DIR})
 
diff --git a/bin/bootstrap_toolchain.py b/bin/bootstrap_toolchain.py
index bb71577..53d0f8b 100755
--- a/bin/bootstrap_toolchain.py
+++ b/bin/bootstrap_toolchain.py
@@ -41,9 +41,6 @@
 # DOWNLOAD_CDH_COMPONENTS - When set to true, this script will also download 
and extract
 #   the CDP Hadoop components (i.e. Hadoop, Hive, HBase, Ranger, etc) into
 #   CDP_COMPONENTS_HOME as appropriate.
-# KUDU_IS_SUPPORTED - If KUDU_IS_SUPPORTED is false, Kudu is disabled and we 
download
-#   the toolchain Kudu and use the symbols to compile a non-functional stub 
library so
-#   that Impala has something to link against.
 # IMPALA__VERSION - The version expected for . This is 
typically
 #   configured in bin/impala-config.sh and must exist for every package. This 
is used
 #   to construct an appropriate URL and expected archive name.
@@ -405,115 +402,6 @@ def check_custom_toolchain(toolchain_packages_home, 
packages):
 raise Exception("Toolchain bootstrap failed: required packages were 
missing")
 
 
-def build_kudu_stub(kudu_dir, gcc_dir):
-  """When Kudu isn't supported, the CentOS 7 Kudu package is downloaded from 
the
- toolchain. This replaces the client lib with a stubbed client. The
- 'kudu_dir' specifies the location of the unpacked CentOS 7 Kudu package.
- The 'gcc_dir' specifies the location of the unpacked GCC/G++."""
-
-  print "Building kudu stub"
-  # Find the client lib files in the Kudu dir. There may be several files with
-  # various extensions. Also there will be a debug version.
-  client_lib_paths = []
-  for path, _, files in os.walk(kudu_dir):
-for file in files:
-  if not file.startswith("libkudu_client.so"):
-continue
-  file_path = os.path.join(path, file)
-  if os.path.islink(file_path):
-continue
-  cli

[impala] branch master updated: IMPALA-9862: Don't exclude Solr dependencies in frontend build

2020-06-17 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new fd810e6  IMPALA-9862: Don't exclude Solr dependencies in frontend build
fd810e6 is described below

commit fd810e6f0926e0651c0ae30dd7ae654962701b95
Author: Joe McDonnell 
AuthorDate: Tue Jun 16 19:39:19 2020 -0700

IMPALA-9862: Don't exclude Solr dependencies in frontend build

Ranger can be configured in a variety of ways and some
have a runtime dependency on Solr. If Solr is excluded,
then Impala can fail to startup due to ClassNotFoundException
for org.apache.solr.SolrException.

This removes the exclusion for Solr from fe/pom.xml.

Testing:
 - Tests on a cluster that previously failed with ClassNotFoundException
   now pass.

Change-Id: Ifb74c20a56e5795cba2efbe887d32392af4017f3
Reviewed-on: http://gerrit.cloudera.org:8080/16089
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 
---
 fe/pom.xml | 6 --
 1 file changed, 6 deletions(-)

diff --git a/fe/pom.xml b/fe/pom.xml
index ec40300..787cafa 100644
--- a/fe/pom.xml
+++ b/fe/pom.xml
@@ -212,10 +212,6 @@ under the License.
   org.apache.kafka
   kafka_2.11
 
-
-  org.apache.solr
-  *
-
   
 
 
@@ -692,8 +688,6 @@ under the License.
 io.netty:*
 
 org.rocksdb:*
-
-org.apache.solr:*
 
 com.sun.jersey:jersey-server
 com.sun.jersey:jersey-server

[impala] branch master updated (aa6d788 -> d38e4d1)

2020-06-17 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from aa6d788  IMPALA-9842: Fix hang when cancelling BufferedPlanRootSink
 new bf94582  IMPALA-9831: Fix off by one error in condition for 
ValidateColumnOffsets()
 new a89489c  IMPALA-9604: Add TPCH-nested tests for column masking
 new d38e4d1  IMPALA-9435: Usability enhancements for data cache access 
trace

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/exec/parquet/parquet-metadata-utils.cc  |   2 +-
 be/src/runtime/io/CMakeLists.txt   |   7 +
 be/src/runtime/io/data-cache-test.cc   | 156 +++--
 be/src/runtime/io/data-cache-trace-replayer.cc | 203 +++
 be/src/runtime/io/data-cache-trace-test.cc | 338 ++
 be/src/runtime/io/data-cache-trace.cc  | 374 
 be/src/runtime/io/data-cache-trace.h   | 247 ++
 be/src/runtime/io/data-cache.cc| 378 +
 be/src/runtime/io/data-cache.h |  50 ++-
 bin/start-impala-cluster.py|  26 ++
 .../queries/masked-tpch_nested-q10.test|  58 
 ...nested-q15.test => masked-tpch_nested-q15.test} |   2 +-
 .../queries/masked-tpch_nested-q18.test|  81 +
 .../tpch_nested/queries/masked-tpch_nested-q2.test | 147 
 .../queries/masked-tpch_nested-q20.test|  42 +++
 .../queries/masked-tpch_nested-q21.test|  47 +++
 .../tpch_nested/queries/masked-tpch_nested-q9.test |  37 ++
 tests/authorization/test_ranger.py |  55 +++
 18 files changed, 1976 insertions(+), 274 deletions(-)
 create mode 100644 be/src/runtime/io/data-cache-trace-replayer.cc
 create mode 100644 be/src/runtime/io/data-cache-trace-test.cc
 create mode 100644 be/src/runtime/io/data-cache-trace.cc
 create mode 100644 be/src/runtime/io/data-cache-trace.h
 create mode 100644 
testdata/workloads/tpch_nested/queries/masked-tpch_nested-q10.test
 copy testdata/workloads/tpch_nested/queries/{tpch_nested-q15.test => 
masked-tpch_nested-q15.test} (91%)
 create mode 100644 
testdata/workloads/tpch_nested/queries/masked-tpch_nested-q18.test
 create mode 100644 
testdata/workloads/tpch_nested/queries/masked-tpch_nested-q2.test
 create mode 100644 
testdata/workloads/tpch_nested/queries/masked-tpch_nested-q20.test
 create mode 100644 
testdata/workloads/tpch_nested/queries/masked-tpch_nested-q21.test
 create mode 100644 
testdata/workloads/tpch_nested/queries/masked-tpch_nested-q9.test

[impala] 02/03: IMPALA-9604: Add TPCH-nested tests for column masking

2020-06-17 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit a89489cbc8c8a4b4a9222e0de318ae0d0d8ad26e
Author: stiga-huang 
AuthorDate: Mon Apr 6 11:26:27 2020 +0800

IMPALA-9604: Add TPCH-nested tests for column masking

Add tests for TPCH-nested queries with column masking policies on the
PII columns (phone, name, address). Some queries have the same results
as without the column masking policies so we reuse their test files.

Change-Id: I4a6c9fc480923369952e8e215f4a90b2f6448028
Reviewed-on: http://gerrit.cloudera.org:8080/15655
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 .../queries/masked-tpch_nested-q10.test|  58 
 .../queries/masked-tpch_nested-q15.test|  38 ++
 .../queries/masked-tpch_nested-q18.test|  81 
 .../tpch_nested/queries/masked-tpch_nested-q2.test | 147 +
 .../queries/masked-tpch_nested-q20.test|  42 ++
 .../queries/masked-tpch_nested-q21.test|  47 +++
 .../tpch_nested/queries/masked-tpch_nested-q9.test |  37 ++
 tests/authorization/test_ranger.py |  55 
 8 files changed, 505 insertions(+)

diff --git a/testdata/workloads/tpch_nested/queries/masked-tpch_nested-q10.test 
b/testdata/workloads/tpch_nested/queries/masked-tpch_nested-q10.test
new file mode 100644
index 000..2da39ee
--- /dev/null
+++ b/testdata/workloads/tpch_nested/queries/masked-tpch_nested-q10.test
@@ -0,0 +1,58 @@
+
+ QUERY: TPCH-Q10
+# Q10 - Returned Item Reporting Query
+# Converted select from multiple tables to joins
+select
+  c_custkey,
+  c_name,
+  sum(l_extendedprice * (1 - l_discount)) as revenue,
+  c_acctbal,
+  n_name,
+  c_address,
+  c_phone,
+  c_comment
+from
+  customer c,
+  c.c_orders o,
+  o.o_lineitems l,
+  region.r_nations n
+where
+  o_orderdate >= '1993-10-01'
+  and o_orderdate < '1994-01-01'
+  and l_returnflag = 'R'
+  and c_nationkey = n_nationkey
+group by
+  c_custkey,
+  c_name,
+  c_acctbal,
+  c_phone,
+  n_name,
+  c_address,
+  c_comment
+order by
+  revenue desc
+limit 20
+ RESULTS
+57040,'Xxxx#n',734235.2455,632.87,'JAPAN','Xxxnxx','22-8xx-xxx-','sits.
 slyly regular requests sleep alongside of the regular inst'
+143347,'Xxxx#n',721002.6948,2557.47,'EGYPT','nxXxXXx,Xxn','14-7xx-xxx-','ggle
 carefully enticing requests. final deposits use bold, bold pinto beans. 
ironic, idle re'
+60838,'Xxxx#n',679127.3077,2454.77,'BRAZIL','nnXxXnxXxXXxXxxxXxnXXxXX','12-9xx-xxx-','
 need to boost against the slyly regular account'
+101998,'Xxxx#n',637029.5667,3790.89,'UNITED 
KINGDOM','nnxnXXXxXxxXXXxXx','33-5xx-xxx-','ress foxes wake slyly after the 
bold excuses. ironic platelets are furiously carefully bold theodolites'
+125341,'Xxxx#n',633508.0860,4983.51,'GERMANY','XnnXXXnxxxXnXXxxXXxxxXxX','17-5xx-xxx-','arefully
 even depths. blithely even excuses sleep furiously. foxes use except the 
dependencies. ca'
+25501,'Xxxx#n',620269.7849,7725.04,'ETHIOPIA','  
XnnnXXxxXX,XxnXnxxXnXX','15-8xx-xxx-','he pending instructions 
wake carefully at the pinto beans. regular, final instructions along the slyly 
fina'
+115831,'Xxxx#n',596423.8672,5098.10,'FRANCE','xXxXxXXxx xx 
xxnxXnxXnxXnnxXnxxxXxXx','16-7xx-xxx-','l somas sleep. furiously final 
deposits wake blithely regular pinto b'
+84223,'Xxxx#n',594998.0239,528.65,'UNITED KINGDOM','xxnXxXxx 
xxXnnX nxXxxxnXXX','33-4xx-xxx-',' slyly final deposits haggle regular, 
pending dependencies. pending escapades wake '
+54289,'Xxxx#n',585603.3918,5583.02,'IRAN','xXXxxXxXnXxxnXXX 
,X','20-8xx-xxx-','ely special foxes are quickly finally ironic p'
+39922,'Xxxx#n',584878.1134,7321.11,'GERMANY','XxxnxnnxnXXXnxXnxnnnxXxnX','17-1xx-xxx-','y
 final requests. furiously final foxes cajole blithely special platelets. f'
+6226,'Xxxx#n',576783.7606,2230.09,'UNITED 
KINGDOM','nxXxn,XXXxxxXXnxxx,xxXnx,','33-6xx-xxx-','ending 
platelets along the express deposits cajole carefully final '
+922,'Xxxx#n',576767.5333,3869.25,'GERMANY','XxnXXxxxnXxXxxnxXXnXxXxXxxnxXxx','17-9xx-xxx-','luffily
 fluffy deposits. packages c'
+147946,'Xxxx#n',576455.1320,2030.13,'ALGERIA','xXXxXXxnXxxxnxXxXxxX','10-8xx-xxx-','ithely
 ironic deposits haggle blithely ironic requests. quickly regu'
+115640,'Xxxx#n',569341.1933,6436.10,'ARGENTINA','XxnxX 
nXxXxxxXnX','11-4xx-xxx-','ost slyly along the patterns; pinto be'
+73606,'Xxxx#n',568656.8578,1785.67,'JAPAN','xxXnXxxnxXxXxXXnxx','22-4xx-xxx-','he
 furiously regular ideas. slowly'
+110246,'Xxxx#nnn

[impala] 01/03: IMPALA-9831: Fix off by one error in condition for ValidateColumnOffsets()

2020-06-17 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit bf945824e62a48822414c9375d6641dc499846a9
Author: Joe McDonnell 
AuthorDate: Tue Jun 16 08:18:36 2020 -0700

IMPALA-9831: Fix off by one error in condition for ValidateColumnOffsets()

ParquetMetadataUtils::ValidateColumnOffsets() returns an error if the end of
the column is beyond the end of the file (i.e. offset > end_of_file).
Instead, because there is a footer, the end of column must not be the end
of the file either, so it should use offset >= end_of_file. Otherwise,
a subsequent DCHECK in ParquetPageReader using the stricter condition will
fire.

Testing:
 - Core job

Change-Id: I16bd6dfbb8eeacc1cb854ed4a3c2ed9f1c3aa11f
Reviewed-on: http://gerrit.cloudera.org:8080/16086
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/exec/parquet/parquet-metadata-utils.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/be/src/exec/parquet/parquet-metadata-utils.cc 
b/be/src/exec/parquet/parquet-metadata-utils.cc
index aece0e1..7ec32b6 100644
--- a/be/src/exec/parquet/parquet-metadata-utils.cc
+++ b/be/src/exec/parquet/parquet-metadata-utils.cc
@@ -245,7 +245,7 @@ Status ParquetMetadataUtils::ValidateColumnOffsets(const 
string& filename,
 }
 int64_t col_len = col_chunk.meta_data.total_compressed_size;
 int64_t col_end = col_start + col_len;
-if (col_end <= 0 || col_end > file_length) {
+if (col_end <= 0 || col_end >= file_length) {
   return Status(Substitute("Parquet file '$0': metadata is corrupt. Column 
$1 has "
   "invalid column offsets (offset=$2, size=$3, file_size=$4).", 
filename, i,
   col_start, col_len, file_length));

[impala] branch master updated: IMPALA-9842: Fix hang when cancelling BufferedPlanRootSink

2020-06-16 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new aa6d788  IMPALA-9842: Fix hang when cancelling BufferedPlanRootSink
aa6d788 is described below

commit aa6d7887eec1efd33c73f77e5346a499569e5b6b
Author: Joe McDonnell 
AuthorDate: Tue Jun 16 11:57:05 2020 -0700

IMPALA-9842: Fix hang when cancelling BufferedPlanRootSink

In BufferedPlanRootSink::FlushFinal(), if Cancel() runs
before FlushFinal() waits on the consumer_eos_ condition variable,
the thread in FlushFinal() will wait forever. This is because it is
not checking for cancellation or synchronizing with the Cancel()
thread.

Specifically:
Thread A: Calls BufferedPlanRootSink::Cancel(), signalling any
  thread currently waiting on the consumer_eos_ condition variable.
Thread B: Enters FlushFinal(). Never tests RuntimeState::is_cancelled()
  and calls Wait() on the consumer_eos_ condition variable. This waits
  forever.

This changes BufferedPlanRootSink::Cancel() to get the lock_ before
signalling the consumer_eos_ condition variable. It also changes
FlushFinal() to call Wait() in a loop. It breaks out of the loop if
it is cancelled or the batch_queue_ is empty. There are two cases:
1. FlushFinal() gets the lock_ first and only releases it when waiting
on the consumer_eos_ condition variable. It will get signalled by
Cancel().
2. Cancel() gets the lock_ first and FlushFinal() will not wait,
because is_cancelled() is true.

Testing:
 - Run core tests

Change-Id: Id6f3fbc05420ca95313fa79ea106547feb92b16b
Reviewed-on: http://gerrit.cloudera.org:8080/16088
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 
---
 be/src/exec/buffered-plan-root-sink.cc | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/be/src/exec/buffered-plan-root-sink.cc 
b/be/src/exec/buffered-plan-root-sink.cc
index 277eac6..6dd69dd 100644
--- a/be/src/exec/buffered-plan-root-sink.cc
+++ b/be/src/exec/buffered-plan-root-sink.cc
@@ -105,8 +105,9 @@ Status BufferedPlanRootSink::FlushFinal(RuntimeState* 
state) {
   // If no batches are ever added, wake up the consumer thread so it can check 
the
   // SenderState and return appropriately.
   rows_available_.NotifyAll();
-  // Wait until the consumer has read all rows from the batch_queue_.
-  {
+  // Wait until the consumer has read all rows from the batch_queue_ or this 
has
+  // been cancelled.
+  while (!IsCancelledOrClosed(state) && !IsQueueEmpty(state)) {
 SCOPED_TIMER(profile()->inactive_timer());
 consumer_eos_.Wait(l);
   }
@@ -136,6 +137,14 @@ void BufferedPlanRootSink::Close(RuntimeState* state) {
 
 void BufferedPlanRootSink::Cancel(RuntimeState* state) {
   DCHECK(state->is_cancelled());
+  // Get the lock_ to synchronize with FlushFinal(). Either FlushFinal() will 
be waiting
+  // on the consumer_eos_ condition variable and get signalled below, or it 
will see
+  // that is_cancelled() is true after it gets the lock. Drop the the lock 
before
+  // signalling the CV so that a blocked thread can immediately acquire the 
mutex when
+  // it wakes up.
+  {
+unique_lock l(lock_);
+  }
   // Wake up all sleeping threads so they can check the cancellation state.
   // While it should be safe to call NotifyOne() here, prefer to use 
NotifyAll() to
   // ensure that all sleeping threads are awoken. The calls to NotifyAll() are 
not on the

[impala] 02/02: IMPALA-9858: Fix wrong partition metrics in LocalCatalog profile

2020-06-16 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit ee70df2e9006d3592175af5f1fa7ec128f5f1b8d
Author: stiga-huang 
AuthorDate: Mon Jun 15 15:07:19 2020 +0800

IMPALA-9858: Fix wrong partition metrics in LocalCatalog profile

The hits and requests metrics of partitions are overcounted due to using
an updated map. This patch fixes it and adds test coverage on partition
metrics.

Tests
 - Run CatalogdMetaProviderTest

Change-Id: I10cabce2908f1d252b90390978e679d31003e89d
Reviewed-on: http://gerrit.cloudera.org:8080/16080
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 .../impala/catalog/local/CatalogdMetaProvider.java |  4 +-
 .../catalog/local/CatalogdMetaProviderTest.java| 61 +++---
 2 files changed, 43 insertions(+), 22 deletions(-)

diff --git 
a/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java 
b/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
index 03aba1e..02195ad 100644
--- a/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
+++ b/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
@@ -895,8 +895,8 @@ public class CatalogdMetaProvider implements MetaProvider {
   storePartitionsInCache(refImpl, hostIndex, fromCatalogd);
 }
 sw.stop();
-addStatsToProfile(PARTITIONS_STATS_CATEGORY, refToMeta.size(), numMisses, 
sw);
-LOG.trace("Request for partitions of {}: hit {}/{}", table, 
refToMeta.size(),
+addStatsToProfile(PARTITIONS_STATS_CATEGORY, numHits, numMisses, sw);
+LOG.trace("Request for partitions of {}: hit {}/{}", table, numHits,
 partitionRefs.size());
 
 // Convert the returned map to be by-name instead of by-ref.
diff --git 
a/fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java
 
b/fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java
index c4dcf2d..b378ce0 100644
--- 
a/fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java
+++ 
b/fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java
@@ -20,7 +20,6 @@ package org.apache.impala.catalog.local;
 import static org.junit.Assert.*;
 
 import java.util.ArrayList;
-import java.util.Collections;
 import java.util.List;
 import java.util.Map;
 import java.util.concurrent.ExecutorService;
@@ -44,6 +43,7 @@ import org.apache.impala.thrift.TBackendGflags;
 import org.apache.impala.thrift.TBriefTableMeta;
 import org.apache.impala.thrift.TCatalogObject;
 import org.apache.impala.thrift.TCatalogObjectType;
+import org.apache.impala.thrift.TCounter;
 import org.apache.impala.thrift.TDatabase;
 import org.apache.impala.thrift.TNetworkAddress;
 import org.apache.impala.thrift.TRuntimeProfileNode;
@@ -58,11 +58,13 @@ import com.google.common.base.Stopwatch;
 import com.google.common.cache.CacheStats;
 import com.google.common.collect.ImmutableCollection;
 import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Maps;
 
 public class CatalogdMetaProviderTest {
 
   private final static Logger LOG = LoggerFactory.getLogger(
   CatalogdMetaProviderTest.class);
+  private final static ListMap HOST_INDEX = new ListMap<>();
 
   private final CatalogdMetaProvider provider_;
   private final TableMetaRef tableRef_;
@@ -113,33 +115,36 @@ public class CatalogdMetaProviderTest {
   public void testCachePartitionsByRef() throws Exception {
 List allRefs = provider_.loadPartitionList(tableRef_);
 List partialRefs = allRefs.subList(3, 8);
-ListMap hostIndex = new ListMap<>();
 CacheStats stats = diffStats();
 
 // Should get no hits on the initial load of partitions.
-Map partMap = provider_.loadPartitionsByRefs(
-tableRef_, /* partitionColumnNames unused by this impl */null, 
hostIndex,
-partialRefs);
+Map partMap = loadPartitions(tableRef_, 
partialRefs);
 assertEquals(partialRefs.size(), partMap.size());
 stats = diffStats();
 assertEquals(0, stats.hitCount());
 
 // Load the same partitions again and we should get a hit for each 
partition.
-Map partMapHit = provider_.loadPartitionsByRefs(
-tableRef_, /* partitionColumnNames unused by this impl */null, 
hostIndex,
-partialRefs);
+Map partMapHit = loadPartitions(tableRef_, 
partialRefs);
 stats = diffStats();
 assertEquals(stats.hitCount(), partMapHit.size());
 
 // Load all of the partitions: we should get some hits and some misses.
-Map allParts = provider_.loadPartitionsByRefs(
-tableRef_, /* partitionColumnNames unused by this impl */null, 
hostIndex,
-allRefs);
+Map allParts = loadPartitions(tableRef_, 
allRefs);
 assertEquals(allRefs.size(), allParts.size());
 stats

[impala] branch master updated (13fbe51 -> ee70df2)

2020-06-16 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from 13fbe51  IMPALA-9838: Switch to GCC 7.5.0
 new 419aa2e  IMPALA-9778: Refactor partition modifications in DDL/DMLs
 new ee70df2  IMPALA-9858: Fix wrong partition metrics in LocalCatalog 
profile

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../impala/catalog/CatalogServiceCatalog.java  |   6 +-
 .../org/apache/impala/catalog/FeCatalogUtils.java  |  28 +-
 .../org/apache/impala/catalog/HdfsPartition.java   | 615 +
 .../java/org/apache/impala/catalog/HdfsTable.java  | 384 -
 .../impala/catalog/ParallelFileMetadataLoader.java |  23 +-
 .../apache/impala/catalog/PartitionStatsUtil.java  |   2 +-
 .../main/java/org/apache/impala/catalog/Table.java |  15 +
 .../impala/catalog/local/CatalogdMetaProvider.java |   4 +-
 .../apache/impala/service/CatalogOpExecutor.java   | 202 +++
 .../org/apache/impala/util/HdfsCachingUtil.java|  21 +-
 .../catalog/CatalogObjectToFromThriftTest.java |  10 +-
 .../org/apache/impala/catalog/CatalogTest.java |   7 +-
 .../catalog/local/CatalogdMetaProviderTest.java|  61 +-
 13 files changed, 853 insertions(+), 525 deletions(-)

[impala] 01/02: IMPALA-9778: Refactor partition modifications in DDL/DMLs

2020-06-16 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 419aa2e30db326f02e9b4ec563ef7864e82df86e
Author: stiga-huang 
AuthorDate: Mon May 25 18:01:38 2020 +0800

IMPALA-9778: Refactor partition modifications in DDL/DMLs

After this patch, in DDL/DMLs that update metadata of partitions,
instead of updating partitions in place, we always create new ones and
use them to replace the existing instances. This is guarded by making
HdfsPartition immutable. There are several benefits for this:
 - HdfsPartition can be shared across table versions. In full catalog
   update mode, catalog update can ignore unchanged partitions
   (IMPALA-3234) and send the update in partition granularity.
 - Aborted DDL/DMLs won't leave partition metadata in a bad shape (e.g.
   IMPALA-8406), which usually requires invalidation to recover.
 - Fetch-on-demand coordinators can cache partition meta using the
   partition id as the key. When table version updates, only metadata of
   changed partitions need to be reloaded (IMPALA-7533).
 - In the work of decoupling partitions from tables (IMPALA-3127), we
   don't need to assign a catalog version to partitions since the
   partition ids already identify the partitions.

However, HdfsPartition is not strictly immutable. Although all its
fields are final, some fields are still referencing mutable objects. We
need more refactoring to achieve this. This patch focuses on refactoring
the DDL/DML code paths.

Changes:
 - Make all fields of HdfsPartition final. Move
   HdfsPartition constructor logics and all its update methods into
   HdfsPartition.Builder.
 - Refactor in-place updates on HdfsPartition to be creating a new one
   and dropping the old one. HdfsPartition.Builder represents the
   in-progress modifications. Once all modifications are done, call its
   build() method to create the new HdfsPartition instance. The old
   HdfsPartition instance is only replaced at the end of the
   modifications.
 - Move the "dirty" marker of HdfsPartition into a map of HdfsTable. It
   maps from the old partition id to the in-progress partition builder.
   For "dirty" partitions, we’ll reload its HMS meta and file meta.

Tests:
 - No new tests are added since the existing tests already provide
   sufficient coverage
 - Run CORE tests

Change-Id: Ib52e5810d01d5e0c910daacb9c98977426d3914c
Reviewed-on: http://gerrit.cloudera.org:8080/15985
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 .../impala/catalog/CatalogServiceCatalog.java  |   6 +-
 .../org/apache/impala/catalog/FeCatalogUtils.java  |  28 +-
 .../org/apache/impala/catalog/HdfsPartition.java   | 615 +
 .../java/org/apache/impala/catalog/HdfsTable.java  | 384 -
 .../impala/catalog/ParallelFileMetadataLoader.java |  23 +-
 .../apache/impala/catalog/PartitionStatsUtil.java  |   2 +-
 .../main/java/org/apache/impala/catalog/Table.java |  15 +
 .../apache/impala/service/CatalogOpExecutor.java   | 202 +++
 .../org/apache/impala/util/HdfsCachingUtil.java|  21 +-
 .../catalog/CatalogObjectToFromThriftTest.java |  10 +-
 .../org/apache/impala/catalog/CatalogTest.java |   7 +-
 11 files changed, 810 insertions(+), 503 deletions(-)

diff --git 
a/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java 
b/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
index 994abff..fafad2c 100644
--- a/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
+++ b/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
@@ -35,6 +35,7 @@ import java.util.concurrent.Semaphore;
 import java.util.concurrent.TimeUnit;
 import java.util.concurrent.atomic.AtomicLong;
 import java.util.concurrent.locks.ReentrantReadWriteLock;
+import java.util.stream.Collectors;
 
 import org.apache.commons.collections.MapUtils;
 import org.apache.hadoop.fs.RemoteIterator;
@@ -3125,8 +3126,11 @@ public class CatalogServiceCatalog extends Catalog {
 "Unable to fetch valid transaction ids while loading file metadata 
for table "
 + table.getFullName(), ex);
   }
+  List partBuilders = 
partToPartialInfoMap.keySet().stream()
+  .map(HdfsPartition.Builder::new)
+  .collect(Collectors.toList());
   Map> fdsByPart = new 
ParallelFileMetadataLoader(
-  table, partToPartialInfoMap.keySet(), reqWriteIdList, validTxnList, 
logPrefix)
+  table, partBuilders, reqWriteIdList, validTxnList, logPrefix)
   .loadAndGet();
   for (HdfsPartition partition : fdsByPart.keySet()) {
 TPartialPartitionInfo partitionInfo = 
partToPartialInfoMap.get(par

[impala] 01/03: IMPALA-9849: Set halt_on_error=1 for TSAN builds

2020-06-15 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 38b96174621c7d2f58b580d1e2bba4b95b261d1c
Author: Sahil Takiar 
AuthorDate: Mon Jun 8 11:35:05 2020 -0700

IMPALA-9849: Set halt_on_error=1 for TSAN builds

Set halt_on_error to true by default for TSAN builds (we already do this
for ASAN builds). This ensures that Impala crashes whenever a TSAN error
is detected. IMPALA-9568 accidentally broke this.

Testing:
* Ran dataload + be tests in a TSAN build

Change-Id: I268c338d9194a66b37c3ccd97027e3543d27bea7
Reviewed-on: http://gerrit.cloudera.org:8080/16069
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/common/init.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/be/src/common/init.cc b/be/src/common/init.cc
index db47282..4524f44 100644
--- a/be/src/common/init.cc
+++ b/be/src/common/init.cc
@@ -426,7 +426,7 @@ extern "C" const char* __tsan_default_options() {
 #else
  "1 "
 #endif
- "history_size=7 allocator_may_return_null=1 "
+ "halt_on_error=1 history_size=7 allocator_may_return_null=1 "
  "suppressions=" THREAD_SANITIZER_SUPPRESSIONS;
 }
 #endif

[impala] 02/03: IMPALA-9709: Remove Impala-lzo from the development environment

2020-06-15 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit f15a311065f2d30b727d53d96fae87f07132e4d9
Author: Joe McDonnell 
AuthorDate: Sun Apr 26 18:38:26 2020 -0700

IMPALA-9709: Remove Impala-lzo from the development environment

This removes Impala-lzo from the Impala development environment.
Impala-lzo is not built as part of the Impala build. The LZO plugin
is no longer loaded. LZO tables are not loaded during dataload,
and LZO is no longer tested.

This removes some obsolete scan APIs that were only used by Impala-lzo.
With this commit, Impala-lzo would require code changes to build
against Impala.

The plugin infrastructure is not removed, and this leaves some
LZO support code in place. If someone were to decide to revive
Impala-lzo, they would still be able to load it as a plugin
and get the same functionality as before. This plugin support
may be removed later.

Testing:
 - Dryrun of GVO
 - Modified TestPartitionMetadataUncompressedTextOnly's
   test_unsupported_text_compression() to add LZO case

Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e
Reviewed-on: http://gerrit.cloudera.org:8080/15814
Reviewed-by: Bikramjeet Vig 
Tested-by: Joe McDonnell 
---
 CMakeLists.txt |  11 ---
 be/src/exec/hdfs-plugin-text-scanner.cc|   6 ++--
 be/src/exec/hdfs-scan-node-base.cc |  10 +-
 be/src/exec/hdfs-scan-node-base.h  |  12 ++--
 be/src/util/codec.cc   |   2 +-
 bin/bootstrap_system.sh|  23 ++
 bin/clean.sh   |   7 -
 bin/impala-config.sh   |  18 +++
 bin/set-ld-library-path.sh |   3 --
 bin/start-impala-cluster.py|   7 -
 buildall.sh|  10 --
 docker/entrypoint.sh   |   8 -
 docker/impala_base/Dockerfile  |   4 +--
 docker/test-with-docker.py |  13 +---
 .../org/apache/impala/analysis/ToSqlUtils.java |   3 +-
 .../org/apache/impala/catalog/HdfsCompression.java |   4 ++-
 .../org/apache/impala/catalog/HdfsFileFormat.java  |   6 ++--
 .../org/apache/impala/planner/HdfsScanNode.java|   1 -
 .../org/apache/impala/planner/HdfsTableSink.java   |   6 ++--
 .../apache/impala/analysis/AnalyzeStmtsTest.java   |  10 +++---
 .../org/apache/impala/analysis/AnalyzerTest.java   |  10 +++---
 testdata/bad_text_lzo/bad_text.lzo | Bin 736999 -> 0 bytes
 testdata/bad_text_lzo/bad_text.lzo.index   | Bin 5192 -> 0 bytes
 testdata/bin/create-load-data.sh   |  22 -
 testdata/bin/generate-schema-statements.py |  31 +++
 testdata/bin/generate-test-vectors.py  |   1 -
 testdata/bin/load_nested.py|   5 ++-
 testdata/bin/lzo_indexer.sh|  20 
 .../common/etc/hadoop/conf/core-site.xml.py|   3 --
 .../common/etc/hadoop/conf/yarn-site.xml.py|   4 +--
 .../functional/functional_schema_template.sql  |  11 ---
 .../datasets/functional/schema_constraints.csv |   4 ---
 .../joins-hdfs-num-rows-est-enabled.test   |   8 ++---
 .../queries/PlannerTest/joins.test |   8 ++---
 .../functional-query_dimensions.csv|   2 +-
 .../functional-query_exhaustive.csv|   1 -
 .../DataErrorsTest/hdfs-scan-node-errors.test  |  18 ---
 .../queries/QueryTest/disable-lzo-plugin.test  |   7 -
 .../queries/QueryTest/show-create-table.test   |  12 
 .../unsupported-compression-partitions.test|   9 +-
 .../perf-regression/perf-regression_dimensions.csv |   2 +-
 .../perf-regression/perf-regression_exhaustive.csv |   1 -
 .../perf-regression/perf-regression_pairwise.csv   |   1 -
 .../targeted-perf/targeted-perf_dimensions.csv |   2 +-
 .../targeted-perf/targeted-perf_exhaustive.csv |   1 -
 .../targeted-perf/targeted-perf_pairwise.csv   |   1 -
 .../targeted-stress/targeted-stress_dimensions.csv |   2 +-
 .../targeted-stress/targeted-stress_exhaustive.csv |   1 -
 .../targeted-stress/targeted-stress_pairwise.csv   |   1 -
 .../tpcds-unmodified_dimensions.csv|   2 +-
 .../tpcds-unmodified_exhaustive.csv|   1 -
 .../tpcds-unmodified/tpcds-unmodified_pairwise.csv |   1 -
 testdata/workloads/tpcds/tpcds_dimensions.csv  |   2 +-
 testdata/workloads/tpcds/tpcds_exhaustive.csv  |   1 -
 testdata/workloads/tpcds/tpcds_pairwise.csv|   1 -
 testdata/workloads/tpch/tpch_dimensions.csv

[impala] 03/03: IMPALA-9838: Switch to GCC 7.5.0

2020-06-15 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 13fbe510c0d70a8cbe82f0ca83f59b3faf5353c8
Author: Joe McDonnell 
AuthorDate: Thu May 28 22:12:21 2020 -0700

IMPALA-9838: Switch to GCC 7.5.0

This upgrades GCC and libstdc++ to version 7.5.0. There
have been ABI changes since 4.9.2, so this means that
the native-toolchain produced with the new compiler is
not interoperable with one produced by the old compiler.
To allow that transition, IMPALA_TOOLCHAIN_PACKAGES_HOME
is now a subdirectory of IMPALA_TOOLCHAIN
(toolchain-packages-gcc${IMPALA_GCC_VERSION}) to distinguish
it from the old packages.

Some Python packages in the impala-python virtualenv are
compiled using the toolchain GCC and now use the new ABI.
This leads to two changes:
1. When constructing the LD_LIBRARY_PATH for impala-python,
we include the GCC libstdc++ libraries. Otherwise, certain
Python packages that use C++ fail on older OSes like Centos 7.
This fixes IMPALA-9804.
2. Since developers work on various branches, this changes
the virtualenv's directory location to a directory with
the GCC version in the name. This allows the virtualenv
built with GCC 7 to coexist with the current virtualenv
built with GCC 4.9.2. The location for the old virtualenv is
${IMPALA_HOME}/infra/python/env. The new location is
${IMPALA_HOME}/infra/python/env-gcc${IMPALA_GCC_VERSION}. This
required updating several impala-python scripts.

There are various odds-and-ends related to the transition:
1. Due to the small string optimization, the size of std::string
changed, which means that various data structures also changed
in size. This required updating some static asserts.
2. There is a bug in clang-tidy that reports a use-after-free
for some code using std::shared_ptr. Clang is not modeling
the shared_ptr correctly, so it is a false-positive. As a workaround,
this disables the clang-analyzer-cplusplus.NewDelete diagnostic.
3. Various small compilation fixes (includes, etc).

Performance testing:
 - Ran single-node performance tests on TPC-H for the following
   configurations:
- TPC-H Parquet scale 30 with normal configurations
- TPC-H Parquet scale 30 with codegen disabled
- TPC-H Kudu scale 10
   None found any significant regressions. Full results are
   posted on the JIRA.
 - Ran single-node performance tests on targeted-perf scale 10.
   No significant regressions.
 - The size of binaries (impalad, etc) is slightly smaller with the new GCC:
   GCC 4.9.2 release impalad binary: 545664
   GCC 7.5.0 release impalad binary: 539900
 - Compilation in DEBUG mode is roughly 15-25% faster

Functional testing:
 - Ran core jobs, exhaustive release jobs, UBSAN

Change-Id: Ia0beb2b618ba669c9699f8dbc0c52d1203d004e4
Reviewed-on: http://gerrit.cloudera.org:8080/16045
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 
---
 .clang-tidy   |  1 +
 be/src/runtime/sorter-internal.h  |  2 ++
 be/src/runtime/sorter.cc  |  4 
 be/src/runtime/thread-resource-mgr.cc |  1 +
 be/src/util/container-util.h  |  4 ++--
 bin/impala-config.sh  | 14 +++---
 bin/impala-flake8 |  2 +-
 bin/impala-gcovr  |  2 +-
 bin/impala-ipython|  2 +-
 bin/impala-pip|  2 +-
 bin/impala-py.test|  2 +-
 bin/impala-python |  2 +-
 bin/impala-python-common.sh   |  1 +
 bin/impala-shell.sh   |  5 +++--
 bin/set-pythonpath.sh |  4 ++--
 infra/python/bootstrap_virtualenv.py  | 13 ++---
 tests/comparison/ORACLE.txt   |  2 +-
 17 files changed, 36 insertions(+), 27 deletions(-)

diff --git a/.clang-tidy b/.clang-tidy
index faf4b7b..cc70284 100644
--- a/.clang-tidy
+++ b/.clang-tidy
@@ -24,6 +24,7 @@ Checks: "-*,clang*,\
 -clang-analyzer-core.uninitialized.ArraySubscript,\
 -clang-analyzer-core.uninitialized.Assign,\
 -clang-analyzer-core.uninitialized.Branch,\
+-clang-analyzer-cplusplus.NewDelete,\
 -clang-analyzer-cplusplus.NewDeleteLeaks,\
 -clang-analyzer-deadcode.DeadStores,\
 -clang-analyzer-optin.performance.Padding,\
diff --git a/be/src/runtime/sorter-internal.h b/be/src/runtime/sorter-internal.h
index ea8275a..492fc95 100644
--- a/be/src/runtime/sorter-internal.h
+++ b/be/src/runtime/sorter-internal.h
@@ -21,6 +21,8 @@
 
 #include "sorter.h"
 
+#include 
+
 namespace impala {
 
 /// Wrapper around BufferPool::PageHandle that tracks additional info about 
the page.
diff --git a/be/src/runtime/sorter.cc b/be/src/runtime/sorter.cc
index 339e0b9..f30ecc4 1006

[impala] branch master updated (f8c28f8 -> 13fbe51)

2020-06-15 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from f8c28f8  IMPALA-9843: Add support for metastore db schema upgrade
 new 38b9617  IMPALA-9849: Set halt_on_error=1 for TSAN builds
 new f15a311  IMPALA-9709: Remove Impala-lzo from the development 
environment
 new 13fbe51  IMPALA-9838: Switch to GCC 7.5.0

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .clang-tidy|   1 +
 CMakeLists.txt |  11 ---
 be/src/common/init.cc  |   2 +-
 be/src/exec/hdfs-plugin-text-scanner.cc|   6 ++--
 be/src/exec/hdfs-scan-node-base.cc |  10 +-
 be/src/exec/hdfs-scan-node-base.h  |  12 ++--
 be/src/runtime/sorter-internal.h   |   2 ++
 be/src/runtime/sorter.cc   |   4 ---
 be/src/runtime/thread-resource-mgr.cc  |   1 +
 be/src/util/codec.cc   |   2 +-
 be/src/util/container-util.h   |   4 +--
 bin/bootstrap_system.sh|  23 ++
 bin/clean.sh   |   7 -
 bin/impala-config.sh   |  32 +++
 bin/impala-flake8  |   2 +-
 bin/impala-gcovr   |   2 +-
 bin/impala-ipython |   2 +-
 bin/impala-pip |   2 +-
 bin/impala-py.test |   2 +-
 bin/impala-python  |   2 +-
 bin/impala-python-common.sh|   1 +
 bin/impala-shell.sh|   5 +--
 bin/set-ld-library-path.sh |   3 --
 bin/set-pythonpath.sh  |   4 +--
 bin/start-impala-cluster.py|   7 -
 buildall.sh|  10 --
 docker/entrypoint.sh   |   8 -
 docker/impala_base/Dockerfile  |   4 +--
 docker/test-with-docker.py |  13 +---
 .../org/apache/impala/analysis/ToSqlUtils.java |   3 +-
 .../org/apache/impala/catalog/HdfsCompression.java |   4 ++-
 .../org/apache/impala/catalog/HdfsFileFormat.java  |   6 ++--
 .../org/apache/impala/planner/HdfsScanNode.java|   1 -
 .../org/apache/impala/planner/HdfsTableSink.java   |   6 ++--
 .../apache/impala/analysis/AnalyzeStmtsTest.java   |  10 +++---
 .../org/apache/impala/analysis/AnalyzerTest.java   |  10 +++---
 infra/python/bootstrap_virtualenv.py   |  13 ++--
 testdata/bad_text_lzo/bad_text.lzo | Bin 736999 -> 0 bytes
 testdata/bad_text_lzo/bad_text.lzo.index   | Bin 5192 -> 0 bytes
 testdata/bin/create-load-data.sh   |  22 -
 testdata/bin/generate-schema-statements.py |  31 +++
 testdata/bin/generate-test-vectors.py  |   1 -
 testdata/bin/load_nested.py|   5 ++-
 testdata/bin/lzo_indexer.sh|  20 
 .../common/etc/hadoop/conf/core-site.xml.py|   3 --
 .../common/etc/hadoop/conf/yarn-site.xml.py|   4 +--
 .../functional/functional_schema_template.sql  |  11 ---
 .../datasets/functional/schema_constraints.csv |   4 ---
 .../joins-hdfs-num-rows-est-enabled.test   |   8 ++---
 .../queries/PlannerTest/joins.test |   8 ++---
 .../functional-query_dimensions.csv|   2 +-
 .../functional-query_exhaustive.csv|   1 -
 .../DataErrorsTest/hdfs-scan-node-errors.test  |  18 ---
 .../queries/QueryTest/disable-lzo-plugin.test  |   7 -
 .../queries/QueryTest/show-create-table.test   |  12 
 .../unsupported-compression-partitions.test|   9 +-
 .../perf-regression/perf-regression_dimensions.csv |   2 +-
 .../perf-regression/perf-regression_exhaustive.csv |   1 -
 .../perf-regression/perf-regression_pairwise.csv   |   1 -
 .../targeted-perf/targeted-perf_dimensions.csv |   2 +-
 .../targeted-perf/targeted-perf_exhaustive.csv |   1 -
 .../targeted-perf/targeted-perf_pairwise.csv   |   1 -
 .../targeted-stress/targeted-stress_dimensions.csv |   2 +-
 .../targeted-stress/targeted-stress_exhaustive.csv |   1 -
 .../targeted-stress/targeted-stress_pairwise.csv   |   1 -
 .../tpcds-unmodified_dimensions.csv|   2 +-
 .../tpcds-unmodified_exhaustive.csv|

[impala] 01/03: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

2020-06-12 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 0cb44242d20532945e5fb09f5bbef6c65415a753
Author: Vihang Karajgaonkar 
AuthorDate: Fri May 22 14:56:43 2020 -0700

IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

This change enhances the Catalog-v2 API getPartialCatalogObject to
support ValidWriteIdList as an optional field in the TableInfoSelector.
When such a field is provided by the clients, catalog compares the
provided ValidWriteIdList with the cached ValidWriteIdList of the
table. The catalog reloads the table if it determines that the cached
table is stale with respect to the ValidWriteIdList provided.
In case the table is already at or above the requested ValidWriteIdList
catalog uses the cached table metadata information to filter out
filedescriptors pertaining to the provided ValidWriteIdList.
Note that in case compactions it is possible that the requested
ValidWriteIdList cannot be satisfied using the cached file-metadata
for some partitions. For such partitions, catalog re-fetches the
file-metadata from the FileSystem.

In order to implement the fall-back to getting the file-metadata from
filesystem, the patch refactor some of file-metadata loading logic into
ParallelFileMetadataLoader which also helps simplify some methods
in HdfsTable.java. Additionally, it modifies the WriteIdBasedPredicate
to optionally do a strict check which throws an exception on some
scenarios.

This is helpful to provide a snapshot view of the table metadata during
query compilation with respect to other changes happening to the table
concurrently. Note that this change does not implement the coordinator
side changes needed for catalog clients to use such a field. That would
be taken up in a separate change to keep this patch smaller.

Testing:
1. Ran existing filemetadata loader tests.
2. Added a new test which exercises the various cases for
ValidWriteIdList comparison.
3. Ran core tests along with the dependent MetastoreClientPool
patch (IMPALA-9824).

Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Reviewed-on: http://gerrit.cloudera.org:8080/16008
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 common/thrift/CatalogService.thrift|   7 +
 .../impala/catalog/CatalogServiceCatalog.java  | 110 +++-
 .../apache/impala/catalog/FileMetadataLoader.java  |  15 +-
 .../java/org/apache/impala/catalog/HdfsTable.java  | 122 +++--
 .../impala/catalog/ParallelFileMetadataLoader.java | 101 +++-
 .../main/java/org/apache/impala/catalog/Table.java |   2 +-
 .../org/apache/impala/catalog/TableLoadingMgr.java |   2 +-
 .../impala/catalog/local/DirectMetaProvider.java   |   7 +-
 .../apache/impala/catalog/local/LocalFsTable.java  |   2 +-
 .../apache/impala/catalog/local/MetaProvider.java  |   3 +-
 .../apache/impala/service/CatalogOpExecutor.java   |   8 +-
 .../java/org/apache/impala/util/AcidUtils.java | 224 +++-
 .../catalog/CatalogObjectToFromThriftTest.java |  14 +-
 .../org/apache/impala/catalog/CatalogTest.java | 107 ++--
 .../catalog/CatalogdTableInvalidatorTest.java  |   2 +-
 .../impala/catalog/FileMetadataLoaderTest.java |  20 +-
 .../catalog/PartialCatalogInfoWriteIdTest.java | 587 +
 .../events/MetastoreEventsProcessorTest.java   |   5 +-
 .../apache/impala/testutil/ImpalaJdbcClient.java   |   6 +
 .../apache/impala/testutil/ImpaladTestCatalog.java |   2 +-
 .../java/org/apache/impala/util/AcidUtilsTest.java |   3 +-
 shaded-deps/pom.xml|   1 +
 22 files changed, 1168 insertions(+), 182 deletions(-)

diff --git a/common/thrift/CatalogService.thrift 
b/common/thrift/CatalogService.thrift
index 0ab972d..8c42471 100644
--- a/common/thrift/CatalogService.thrift
+++ b/common/thrift/CatalogService.thrift
@@ -329,6 +329,10 @@ struct TTableInfoSelector {
   // The response should contain table constraints like primary keys
   // and foreign keys
   8: bool want_table_constraints
+
+  // If this is for a ACID table and this is set, this table info returned
+  // will be consistent the provided valid_write_ids
+  9: optional CatalogObjects.TValidWriteIdList valid_write_ids
 }
 
 // Returned information about a particular partition.
@@ -488,6 +492,9 @@ struct TGetCatalogObjectResponse {
 struct TGetPartitionStatsRequest {
   1: required CatalogServiceVersion protocol_version = CatalogServiceVersion.V1
   2: required CatalogObjects.TTableName table_name
+  // if the table is transactional then this field represents the client's view
+  // of the table snapshot view in terms of ValidWriteIdList.
+  3: optional CatalogObjects.TValidWriteIdList valid_write_ids
 }
 
 // Response for requesting

[impala] 02/03: IMPALA-9847: reduce web UI serialized JSON size

2020-06-12 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 6ca6e403580dc592c026b4f684d31f8a4dcfae11
Author: Tim Armstrong 
AuthorDate: Wed Jun 10 16:52:08 2020 -0700

IMPALA-9847: reduce web UI serialized JSON size

Switch to using the plain writer in some places, and
tweak PrettyWriter to produce denser output for the
debug UI JSON (so that it's still human readable but
denser).

Testing:
Manually tested. The profile for the below query went
from 338kB to 134kB.

  select min(l_orderkey) from tpch_parquet.lineitem;

Change-Id: I66af9d00f0f0fc70e324033b6464b75a6adadd6f
Reviewed-on: http://gerrit.cloudera.org:8080/16068
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/service/impala-hs2-server.cc   | 3 ++-
 be/src/service/impala-http-handler.cc | 6 --
 be/src/util/webserver.cc  | 4 
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/be/src/service/impala-hs2-server.cc 
b/be/src/service/impala-hs2-server.cc
index 3cca3e5..757c4e9 100644
--- a/be/src/service/impala-hs2-server.cc
+++ b/be/src/service/impala-hs2-server.cc
@@ -1042,8 +1042,9 @@ void ImpalaServer::GetRuntimeProfile(
   if (request.format == TRuntimeProfileFormat::THRIFT) {
 return_val.__set_thrift_profile(thrift_profile);
   } else if (request.format == TRuntimeProfileFormat::JSON) {
+// Serialize to JSON without extra whitespace/formatting.
 rapidjson::StringBuffer sb;
-rapidjson::PrettyWriter writer(sb);
+rapidjson::Writer writer(sb);
 json_profile.Accept(writer);
 ss << sb.GetString();
 return_val.__set_profile(ss.str());
diff --git a/be/src/service/impala-http-handler.cc 
b/be/src/service/impala-http-handler.cc
index b2ece97..197100d 100644
--- a/be/src/service/impala-http-handler.cc
+++ b/be/src/service/impala-http-handler.cc
@@ -,9 +,11 @@ void ImpalaHttpHandler::AdmissionStateHandler(
   string staleness_detail = ac->GetStalenessDetail("", 
_since_last_statestore_update);
 
   // In order to embed a plain json inside the webpage generated by mustache, 
we need
-  // to stringify it and write it out as a json element.
+  // to stringify it and write it out as a json element. We do not need to 
pretty-print
+  // it, so use the basic writer.
   rapidjson::StringBuffer strbuf;
-  PrettyWriter writer(strbuf);
+  Writer writer(strbuf);
+
   resource_pools.Accept(writer);
   Value raw_json(strbuf.GetString(), document->GetAllocator());
   document->AddMember("resource_pools_plain_json", raw_json, 
document->GetAllocator());
diff --git a/be/src/util/webserver.cc b/be/src/util/webserver.cc
index cbf2874..fa6a317 100644
--- a/be/src/util/webserver.cc
+++ b/be/src/util/webserver.cc
@@ -805,7 +805,11 @@ void Webserver::RenderUrlWithTemplate(const struct 
sq_connection* connection,
 // Callbacks may optionally be rendered as a text-only, pretty-printed 
Json document
 // (mostly for debugging or integration with third-party tools).
 StringBuffer strbuf;
+// Write the JSON out with human-readable formatting. The settings are 
tweaked to
+// reduce extraneous whitespace characters, compared to the default 
formatting.
 PrettyWriter writer(strbuf);
+writer.SetIndent('\t', 1);
+writer.SetFormatOptions(kFormatSingleLineArray);
 document.Accept(writer);
 (*output) << strbuf.GetString();
 *content_type = JSON;

[impala] 03/03: IMPALA-9843: Add support for metastore db schema upgrade

2020-06-12 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit f8c28f8adfd781727c311b15546a532ce65881e0
Author: Vihang Karajgaonkar 
AuthorDate: Tue Jun 9 12:44:21 2020 -0700

IMPALA-9843: Add support for metastore db schema upgrade

This change adds support to upgrade the HMS database schema using the
hive schema tool. It adds a new option to the buildall.sh script
which can be provided to upgrade the HMS db schema. Alternatively,
users can directly upgrade the schema using the
create-test-configuration.sh script. The logs for the schema upgrade
are available in logs/cluster/schematool.log.

Following invocations will upgrade the HMS database schema.

1. buildall.sh -upgrade_metastore_db
2. bin/create-test-configuration.sh -upgrade_metastore_db

This upgrade option is idempotent. It is a no-op if the metastore
schema is already at its latest version. In case of any errors, the
only fallback currently is to format the metastore schema and load
the test data again.

Testing:
Upgraded the HMS schema on my local dev environment and made
sure that the HMS service starts without any errors.

Change-Id: I85af8d57e110ff284832056a1661f94b85ed3b09
Reviewed-on: http://gerrit.cloudera.org:8080/16054
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 bin/create-test-configuration.sh | 13 +
 buildall.sh  | 20 +---
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/bin/create-test-configuration.sh b/bin/create-test-configuration.sh
index 8ab2e48..83d500b 100755
--- a/bin/create-test-configuration.sh
+++ b/bin/create-test-configuration.sh
@@ -66,6 +66,7 @@ function generate_config {
 
 CREATE_METASTORE=0
 CREATE_RANGER_POLICY_DB=0
+UPGRADE_METASTORE_DB=0
 
 # parse command line options
 for ARG in $*
@@ -77,9 +78,13 @@ do
 -create_ranger_policy_db)
   CREATE_RANGER_POLICY_DB=1
   ;;
+-upgrade_metastore_db)
+  UPGRADE_METASTORE_DB=1
+  ;;
 -help|*)
   echo "[-create_metastore] : If true, creates a new metastore."
   echo "[-create_ranger_policy_db] : If true, creates a new Ranger policy 
db."
+  echo "[-upgrade_metastore_db] : If true, upgrades the schema of HMS db."
   exit 1
   ;;
   esac
@@ -163,12 +168,20 @@ if [ $CREATE_METASTORE -eq 1 ]; then
   # version and invokes the appropriate scripts
   CLASSPATH={$CLASSPATH}:${CONFIG_DIR} ${HIVE_HOME}/bin/schematool -initSchema 
-dbType \
 postgres 1>${IMPALA_CLUSTER_LOGS_DIR}/schematool.log 2>&1
+  # TODO: We probably don't need to do this anymore
   # Increase the size limit of PARAM_VALUE from SERDE_PARAMS table to be able 
to create
   # HBase tables with large number of columns.
   echo "alter table \"SERDE_PARAMS\" alter column \"PARAM_VALUE\" type 
character varying" \
   | psql -q -U hiveuser -d ${METASTORE_DB}
 fi
 
+if [ $UPGRADE_METASTORE_DB -eq 1 ]; then
+  echo "Upgrading the schema of metastore db ${METASTORE_DB}. Check \
+${IMPALA_CLUSTER_LOGS_DIR}/schematool.log for details."
+  CLASSPATH={$CLASSPATH}:${CONFIG_DIR} ${HIVE_HOME}/bin/schematool 
-upgradeSchema \
+-dbType postgres 1>${IMPALA_CLUSTER_LOGS_DIR}/schematool.log 2>&1
+fi
+
 if [ $CREATE_RANGER_POLICY_DB -eq 1 ]; then
   echo "Creating Ranger Policy Server DB"
   dropdb -U hiveuser "${RANGER_POLICY_DB}" 2> /dev/null || true
diff --git a/buildall.sh b/buildall.sh
index 158de01..dbe4030 100755
--- a/buildall.sh
+++ b/buildall.sh
@@ -58,6 +58,7 @@ TESTDATA_ACTION=0
 TESTS_ACTION=1
 FORMAT_CLUSTER=0
 FORMAT_METASTORE=0
+UPGRADE_METASTORE_SCHEMA=0
 FORMAT_RANGER_POLICY_DB=0
 NEED_MINICLUSTER=0
 START_IMPALA_CLUSTER=0
@@ -114,6 +115,9 @@ do
 -format_metastore)
   FORMAT_METASTORE=1
   ;;
+-upgrade_metastore_db)
+  UPGRADE_METASTORE_SCHEMA=1
+  ;;
 -format_ranger_policy_db)
   FORMAT_RANGER_POLICY_DB=1
   ;;
@@ -201,6 +205,8 @@ do
"[Default: False]"
   echo "[-format_cluster] : Format the minicluster [Default: False]"
   echo "[-format_metastore] : Format the metastore db [Default: False]"
+  echo "[-upgrade_metastore_db] : Upgrades the schema of metastore db"\
+   "[Default: False]"
   echo "[-format_ranger_policy_db] : Format the Ranger policy db [Default: 
False]"
   echo "[-release_and_debug] : Build both release and debug binaries. 
Overrides "\
"other build types [Default: false]"
@@ -269,7 +275,10 @@ Examples of common tasks:
   ./buildall.sh -testdata
 
   # Build, format mini-cluster and metastore, load all test data, run tests
-  ./buildall.sh -testdata

[impala] branch master updated (67b4764 -> f8c28f8)

2020-06-12 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from 67b4764  IMPALA-9752: aggregate profile stats on executor
 new 0cb4424  IMPALA-9791: Support validWriteIdList in 
getPartialCatalogObject API
 new 6ca6e40  IMPALA-9847: reduce web UI serialized JSON size
 new f8c28f8  IMPALA-9843: Add support for metastore db schema upgrade

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/service/impala-hs2-server.cc|   3 +-
 be/src/service/impala-http-handler.cc  |   6 +-
 be/src/util/webserver.cc   |   4 +
 bin/create-test-configuration.sh   |  13 +
 buildall.sh|  20 +-
 common/thrift/CatalogService.thrift|   7 +
 .../impala/catalog/CatalogServiceCatalog.java  | 110 +++-
 .../apache/impala/catalog/FileMetadataLoader.java  |  15 +-
 .../java/org/apache/impala/catalog/HdfsTable.java  | 122 +++--
 .../impala/catalog/ParallelFileMetadataLoader.java | 101 +++-
 .../main/java/org/apache/impala/catalog/Table.java |   2 +-
 .../org/apache/impala/catalog/TableLoadingMgr.java |   2 +-
 .../impala/catalog/local/DirectMetaProvider.java   |   7 +-
 .../apache/impala/catalog/local/LocalFsTable.java  |   2 +-
 .../apache/impala/catalog/local/MetaProvider.java  |   3 +-
 .../apache/impala/service/CatalogOpExecutor.java   |   8 +-
 .../java/org/apache/impala/util/AcidUtils.java | 224 +++-
 .../catalog/CatalogObjectToFromThriftTest.java |  14 +-
 .../org/apache/impala/catalog/CatalogTest.java | 107 ++--
 .../catalog/CatalogdTableInvalidatorTest.java  |   2 +-
 .../impala/catalog/FileMetadataLoaderTest.java |  20 +-
 .../catalog/PartialCatalogInfoWriteIdTest.java | 587 +
 .../events/MetastoreEventsProcessorTest.java   |   5 +-
 .../apache/impala/testutil/ImpalaJdbcClient.java   |   6 +
 .../apache/impala/testutil/ImpaladTestCatalog.java |   2 +-
 .../java/org/apache/impala/util/AcidUtilsTest.java |   3 +-
 shaded-deps/pom.xml|   1 +
 27 files changed, 1208 insertions(+), 188 deletions(-)
 create mode 100644 
fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java

[impala] branch master updated: IMPALA-9107 (part 2): Add script to use the m2 archive tarball

2020-06-11 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new fb28285  IMPALA-9107 (part 2): Add script to use the m2 archive tarball
fb28285 is described below

commit fb282852ef52d72079a86c55a90982ffac567cc7
Author: Joe McDonnell 
AuthorDate: Thu Apr 2 17:28:45 2020 -0700

IMPALA-9107 (part 2): Add script to use the m2 archive tarball

This adds a script to find an appropriate m2 archive
tarball, download it, and use it to prepopulate the
~/.m2 directory.

The script uses the JSON interface for Jenkins to search through
the all-build-options-ub1604 builds on jenkins.impala.io to
find one that:
1. Is building the "master" branch
2. Has the m2_archive.tar.gz
Then, it downloads the m2 archive and uses it to populate ~/.m2.
It does not overwrite or remove any files already in ~/.m2.

The build scripts that call populate_m2_directory.py do not
rely on the script succeeding. They will continue even if
the script fails.

This also modifies the build-all-flag-combinations.sh script
to only build the m2 archive if the GENERATE_M2_ARCHIVE
environment variable is true. GENERATE_M2_ARCHIVE=true will
clear out the ~/.m2 directory to build an accurate m2 archive.
Precommit jobs will use GENERATE_M2_ARCHIVE=false, which
will allow them to use the m2 archive to speed up the build.

Testing:
 - Ran gerrify-verify-dryrun
 - Tested locally

Change-Id: I5065658d8c0514550927161855b0943fa7b3a402
Reviewed-on: http://gerrit.cloudera.org:8080/15735
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 
---
 bin/bootstrap_build.sh |   5 +
 bin/bootstrap_system.sh|   5 +
 bin/jenkins/build-all-flag-combinations.sh |  17 ++-
 bin/jenkins/populate_m2_directory.py   | 172 +
 4 files changed, 195 insertions(+), 4 deletions(-)

diff --git a/bin/bootstrap_build.sh b/bin/bootstrap_build.sh
index 1168bb0..a450ef7 100755
--- a/bin/bootstrap_build.sh
+++ b/bin/bootstrap_build.sh
@@ -54,4 +54,9 @@ if [ ! -d /usr/local/apache-maven-3.5.4 ]; then
   sudo ln -s /usr/local/apache-maven-3.5.4/bin/mvn /usr/local/bin
 fi
 
+# Try to prepopulate the m2 directory to save time
+if ! bin/jenkins/populate_m2_directory.py ; then
+  echo "Failed to prepopulate the m2 directory. Continuing..."
+fi
+
 ./buildall.sh -notests -so
diff --git a/bin/bootstrap_system.sh b/bin/bootstrap_system.sh
index a52083d..18cce2b 100755
--- a/bin/bootstrap_system.sh
+++ b/bin/bootstrap_system.sh
@@ -471,3 +471,8 @@ fi
 cd "$HADOOP_LZO_HOME"
 time -p ant package
 cd "$IMPALA_HOME"
+
+# Try to prepopulate the m2 directory to save time
+if ! bin/jenkins/populate_m2_directory.py ; then
+  echo "Failed to prepopulate the m2 directory. Continuing..."
+fi
diff --git a/bin/jenkins/build-all-flag-combinations.sh 
b/bin/jenkins/build-all-flag-combinations.sh
index a6a0d2c..9209e48 100755
--- a/bin/jenkins/build-all-flag-combinations.sh
+++ b/bin/jenkins/build-all-flag-combinations.sh
@@ -32,6 +32,8 @@ export IMPALA_MAVEN_OPTIONS="-U"
 
 . bin/impala-config.sh
 
+: ${GENERATE_M2_ARCHIVE:=false}
+
 # These are configurations for buildall.
 CONFIGS=(
   # Test gcc builds with and without -so:
@@ -46,6 +48,13 @@ CONFIGS=(
 
 FAILED=""
 
+if [[ "$GENERATE_M2_ARCHIVE" == true ]]; then
+  # The m2 archive relies on parsing the maven log to get a list of jars 
downloaded
+  # from particular repositories. To accurately produce the archive every 
time, we
+  # need to clear out the ~/.m2 directory before producing the archive.
+  rm -rf ~/.m2
+fi
+
 TMP_DIR=$(mktemp -d)
 function onexit {
   echo "$0: Cleaning up temporary directory"
@@ -53,8 +62,6 @@ function onexit {
 }
 trap onexit EXIT
 
-mkdir -p ${TMP_DIR}
-
 for CONFIG in "${CONFIGS[@]}"; do
   DESCRIPTION="Options $CONFIG"
 
@@ -91,7 +98,9 @@ then
   exit 1
 fi
 
-# Make a tarball of the .m2 directory
-bin/jenkins/archive_m2_directory.sh logs/mvn/mvn_accumulated.log 
logs/m2_archive.tar.gz
+if [[ "$GENERATE_M2_ARCHIVE" == true ]]; then
+  # Make a tarball of the .m2 directory
+  bin/jenkins/archive_m2_directory.sh logs/mvn/mvn_accumulated.log 
logs/m2_archive.tar.gz
+fi
 
 # Note: The exit callback handles cleanup of the temp directory.
diff --git a/bin/jenkins/populate_m2_directory.py 
b/bin/jenkins/populate_m2_directory.py
new file mode 100755
index 000..1570189
--- /dev/null
+++ b/bin/jenkins/populate_m2_directory.py
@@ -0,0 +1,172 @@
+#!/usr/bin/python
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for

[impala] branch master updated: IMPALA-8860: Improve /log_level usability on WebUI

2020-06-09 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new ad8f468  IMPALA-8860: Improve /log_level usability on WebUI
ad8f468 is described below

commit ad8f468871d3a893bf4c7b702025ec70765ce8e1
Author: Zoltan Garaguly 
AuthorDate: Tue May 12 15:20:08 2020 +0200

IMPALA-8860: Improve /log_level usability on WebUI

Add glog level fetching logic and fetch glog level on every request
which allows:
 - showing glog level on page load/reload
 - showing current glog level in "Log level" combo box
 - showing current glog level in text format

Add log4j log levels fetching logic and fetch all java class log levels
on every request:
 -  log4j levels for all java classes previously set are shown on page
as a list, fetching of individual class log levels not needed
anymore

Page layout standardization:
 - glog/log4j part has similar layout
 - using terms of frontend/backend logs

Change-Id: I2fbf2ef21f4af297913a4e9b16a391768624da33
Reviewed-on: http://gerrit.cloudera.org:8080/15903
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/util/logging-support.cc | 80 +++---
 common/thrift/Logging.thrift   | 18 ++---
 .../java/org/apache/impala/util/GlogAppender.java  | 36 ++
 tests/webserver/test_web_pages.py  | 41 ---
 www/log_level.tmpl | 53 +++---
 5 files changed, 113 insertions(+), 115 deletions(-)

diff --git a/be/src/util/logging-support.cc b/be/src/util/logging-support.cc
index 9d0232d..d05d8dc 100644
--- a/be/src/util/logging-support.cc
+++ b/be/src/util/logging-support.cc
@@ -86,7 +86,7 @@ int FLAGS_v_original_value;
 
 static jclass log4j_logger_class_;
 // Jni method descriptors corresponding to getLogLevel() and setLogLevel() 
operations.
-static jmethodID get_log_level_method; // GlogAppender.getLogLevel()
+static jmethodID get_log_levels_method; // GlogAppender.getLogLevels()
 static jmethodID set_log_level_method; // GlogAppender.setLogLevel()
 static jmethodID reset_log_levels_method; // GlogAppender.resetLogLevels()
 
@@ -98,27 +98,18 @@ void AddDocumentMember(const string& message, const char* 
member,
   document->AddMember(key, output, document->GetAllocator());
 }
 
-template
-Webserver::UrlCallback MakeCallback(const F& fnc, bool display_log4j_handlers) 
{
-  return [fnc, display_log4j_handlers](const auto& req, auto* doc) {
-// Display log4j log level handlers only when display_log4j_handlers is 
true.
-if (display_log4j_handlers) AddDocumentMember("true", 
"include_log4j_handlers", doc);
-(*fnc)(req, doc);
-  };
-}
-
 void InitDynamicLoggingSupport() {
   JNIEnv* env = JniUtil::GetJNIEnv();
   ABORT_IF_ERROR(JniUtil::GetGlobalClassRef(env, 
"org/apache/impala/util/GlogAppender",
 _logger_class_));
-  JniMethodDescriptor get_log_level_method_desc =
-  {"getLogLevel", "([B)Ljava/lang/String;", _log_level_method};
+  JniMethodDescriptor get_log_levels_method_desc =
+  {"getLogLevels", "()[B", _log_levels_method};
   JniMethodDescriptor set_log_level_method_desc =
   {"setLogLevel", "([B)Ljava/lang/String;", _log_level_method};
   JniMethodDescriptor reset_log_level_method_desc =
   {"resetLogLevels", "()V", _log_levels_method};
   ABORT_IF_ERROR(JniUtil::LoadStaticJniMethod(
-  env, log4j_logger_class_, _log_level_method_desc));
+  env, log4j_logger_class_, _log_levels_method_desc));
   ABORT_IF_ERROR(JniUtil::LoadStaticJniMethod(
   env, log4j_logger_class_, _log_level_method_desc));
   ABORT_IF_ERROR(JniUtil::LoadStaticJniMethod(
@@ -132,38 +123,28 @@ void InitDynamicLoggingSupport() {
   [](const char* flagname, int value) { return value >= 0 && value <= 3; 
});
 }
 
-// Helper method to get the log level of given Java class. It is a JNI wrapper 
around
-// GlogAppender.getLogLevel().
-Status GetJavaLogLevel(const TGetJavaLogLevelParams& params, string* result) {
-  return JniCall::static_method(log4j_logger_class_, get_log_level_method)
-  .with_thrift_arg(params).Call(result);
-}
-
 Status ResetJavaLogLevels() {
   return JniCall::static_method(log4j_logger_class_, 
reset_log_levels_method).Call();
 }
 
-// Callback handler for /get_java_loglevel.
-void GetJavaLogLevelCallback(const Webserver::WebRequest& req, Document* 
document) {
-  const auto& args = req.parsed_args;
-  Webserver::ArgumentMap::const_iterator log_getclass = args.find("class");
-  if (log_getclass == args.end() || log_getclass->second.empty()) {
-

[impala] branch master updated: IMPALA-9318: Add admission control setting to cap MT_DOP

2020-06-09 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new 9125de7  IMPALA-9318: Add admission control setting to cap MT_DOP
9125de7 is described below

commit 9125de7ae3d2ba0eca59097fd9732a6fbb609107
Author: Joe McDonnell 
AuthorDate: Sat May 16 20:33:49 2020 -0700

IMPALA-9318: Add admission control setting to cap MT_DOP

This introduces the max-mt-dop setting for admission
control. If a statement runs with an MT_DOP setting that
exceeds the max-mt-dop, then the MT_DOP setting is
downgraded to the max-mt-dop value. If max-mt-dop is set
to a negative value, no limit is applied. max-mt-dop is
set via the llama-site.xml and can be set at the daemon
level or at the resource pool level. When there is no
max-mt-dop setting, it defaults to -1, so no limit is
applied. The max-mt-dop is evaluated once prior to query
planning. The MT_DOP settings for queries past planning
are not reevaluated if the policy changes.

If a statement is downgraded, it's runtime profile contains
a message explaining the downgrade:
MT_DOP limited by admission control: Requested MT_DOP=9 reduced to MT_DOP=4.

Testing:
 - Added custom cluster test with various max-mt-dop settings
 - Ran core tests

Change-Id: I3affb127a5dca517591323f2b1c880aa4b38badd
Reviewed-on: http://gerrit.cloudera.org:8080/16020
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/service/client-request-state.cc |  7 
 be/src/service/impala-server.cc| 16 
 be/src/service/impala-server.h |  5 +++
 common/thrift/ImpalaInternalService.thrift |  9 +
 .../org/apache/impala/util/RequestPoolService.java |  5 +++
 fe/src/test/resources/fair-scheduler-maxmtdop.xml  | 21 ++
 fe/src/test/resources/llama-site-maxmtdop.xml  | 30 ++
 .../queries/QueryTest/max-mt-dop.test  | 47 ++
 tests/custom_cluster/test_mt_dop.py| 31 +-
 9 files changed, 170 insertions(+), 1 deletion(-)

diff --git a/be/src/service/client-request-state.cc 
b/be/src/service/client-request-state.cc
index 5a54dbc..c919123 100644
--- a/be/src/service/client-request-state.cc
+++ b/be/src/service/client-request-state.cc
@@ -196,6 +196,13 @@ Status ClientRequestState::Exec() {
   DebugQueryOptions(query_ctx_.client_request.query_options));
   summary_profile_->AddInfoString("Query Options (set by configuration and 
planner)",
   DebugQueryOptions(exec_request_->query_options));
+  if (query_ctx_.__isset.overridden_mt_dop_value) {
+DCHECK(query_ctx_.client_request.query_options.__isset.mt_dop);
+summary_profile_->AddInfoString("MT_DOP limited by admission control",
+Substitute("Requested MT_DOP=$0 reduced to MT_DOP=$1",
+query_ctx_.overridden_mt_dop_value,
+query_ctx_.client_request.query_options.mt_dop));
+  }
 
   switch (exec_request_->stmt_type) {
 case TStmtType::QUERY:
diff --git a/be/src/service/impala-server.cc b/be/src/service/impala-server.cc
index 3251456..6d0a4dd 100644
--- a/be/src/service/impala-server.cc
+++ b/be/src/service/impala-server.cc
@@ -910,6 +910,9 @@ void ImpalaServer::AddPoolConfiguration(TQueryCtx* ctx,
<< " overlay_mask=" << overlay_mask.to_string();
   OverlayQueryOptions(pool_options, overlay_mask, 
>client_request.query_options);
 
+  // Enforce the max mt_dop after the defaults and overlays have already been 
done.
+  EnforceMaxMtDop(ctx, config.max_mt_dop);
+
   status = ValidateQueryOptions(_options);
   if (!status.ok()) {
 VLOG_QUERY << "Ignoring errors while validating default query options for 
pool="
@@ -917,6 +920,19 @@ void ImpalaServer::AddPoolConfiguration(TQueryCtx* ctx,
   }
 }
 
+void ImpalaServer::EnforceMaxMtDop(TQueryCtx* query_ctx, int64_t max_mt_dop) {
+  TQueryOptions& query_options = query_ctx->client_request.query_options;
+  // The mt_dop is overridden if all three conditions are met:
+  // 1. There is a nonnegative max mt_dop setting
+  // 2. The mt_dop query option is set
+  // 3. The specified mt_dop is larger than the max mt_dop setting
+  if (max_mt_dop >= 0 && query_options.__isset.mt_dop &&
+  max_mt_dop < query_options.mt_dop) {
+query_ctx->__set_overridden_mt_dop_value(query_options.mt_dop);
+query_options.__set_mt_dop(max_mt_dop);
+  }
+}
+
 Status ImpalaServer::Execute(TQueryCtx* query_ctx, shared_ptr 
session_state,
 QueryHandle* query_handle) {
   PrepareQueryContext(query_ctx);
diff --git a/be/src/service/impala-server.h b/be/src/service/impala-server.h
index cfe3fc8..91

[impala] branch master updated: IMPALA-9673: Add external warehouse dir variable in E2E test

2020-06-05 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new d45e3a5  IMPALA-9673: Add external warehouse dir variable in E2E test
d45e3a5 is described below

commit d45e3a50b003259e4ef102b47781a028eb19
Author: xiaomeng 
AuthorDate: Fri May 22 11:13:55 2020 -0700

IMPALA-9673: Add external warehouse dir variable in E2E test

Updated CDP build to 7.2.1.0-57 to include new Hive features such as
HIVE-22995.
In minicluster, we have default values of hive.create.as.acid and
hive.create.as.insert.only which are false. So by default hive creates
external type table located in external warehouse directory.
Due to HIVE-22995, desc db returns external warehouse directory.

With above reasons, we need use external warehouse dir in some tests.
Also add a new test for "CREATE DATABASE ... LOCATION".

Tested:
Re-run failed test in minicluster.
Run exhaustive tests.

Change-Id: I57926babf4caebfd365e6be65a399f12ea68687f
Reviewed-on: http://gerrit.cloudera.org:8080/15990
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 bin/impala-config.sh   | 16 +-
 .../functional/functional_schema_template.sql  | 23 --
 .../queries/QueryTest/create-database.test | 36 --
 .../queries/QueryTest/describe-db.test |  6 ++--
 .../queries/QueryTest/describe-hive-db.test|  6 ++--
 tests/common/environ.py|  1 +
 tests/common/impala_test_suite.py  | 12 ++--
 tests/query_test/test_compressed_formats.py| 12 +---
 8 files changed, 78 insertions(+), 34 deletions(-)

diff --git a/bin/impala-config.sh b/bin/impala-config.sh
index 481ea4e..c6387b7 100755
--- a/bin/impala-config.sh
+++ b/bin/impala-config.sh
@@ -172,16 +172,16 @@ export CDH_BUILD_NUMBER=1814051
 export CDH_MAVEN_REPOSITORY=\
 
"https://${IMPALA_TOOLCHAIN_HOST}/build/cdh_components/${CDH_BUILD_NUMBER}/maven;
 
-export CDP_BUILD_NUMBER=2523282
+export CDP_BUILD_NUMBER=3192304
 export CDP_MAVEN_REPOSITORY=\
 
"https://${IMPALA_TOOLCHAIN_HOST}/build/cdp_components/${CDP_BUILD_NUMBER}/maven;
-export CDP_HADOOP_VERSION=3.1.1.7.1.1.0-380
-export CDP_HBASE_VERSION=2.2.3.7.1.1.0-380
-export CDP_HIVE_VERSION=3.1.3000.7.1.1.0-380
-export CDP_KNOX_VERSION=1.3.0.7.1.1.0-380
-export CDP_OZONE_VERSION=0.4.0.7.1.1.0-380
-export CDP_RANGER_VERSION=2.0.0.7.1.1.0-380
-export CDP_TEZ_VERSION=0.9.1.7.1.1.0-380
+export CDP_HADOOP_VERSION=3.1.1.7.2.1.0-57
+export CDP_HBASE_VERSION=2.2.3.7.2.1.0-57
+export CDP_HIVE_VERSION=3.1.3000.7.2.1.0-57
+export CDP_KNOX_VERSION=1.3.0.7.2.1.0-57
+export CDP_OZONE_VERSION=0.6.0.7.2.1.0-57
+export CDP_RANGER_VERSION=2.0.0.7.2.1.0-57
+export CDP_TEZ_VERSION=0.9.1.7.2.1.0-57
 
 export IMPALA_PARQUET_VERSION=1.10.99-cdh6.x-SNAPSHOT
 export IMPALA_AVRO_JAVA_VERSION=1.8.2-cdh6.x-SNAPSHOT
diff --git a/testdata/datasets/functional/functional_schema_template.sql 
b/testdata/datasets/functional/functional_schema_template.sql
index dc53371..c323666 100644
--- a/testdata/datasets/functional/functional_schema_template.sql
+++ b/testdata/datasets/functional/functional_schema_template.sql
@@ -2273,16 +2273,6 @@ TBLPROPERTIES('transactional'='true');
  DATASET
 functional
  BASE_TABLE_NAME
-materialized_view
- HIVE_MAJOR_VERSION
-3
- CREATE_HIVE
-CREATE MATERIALIZED VIEW IF NOT EXISTS {db_name}{db_suffix}.{table_name}
-  AS SELECT * FROM {db_name}{db_suffix}.insert_only_transactional_table;
-
- DATASET
-functional
- BASE_TABLE_NAME
 insert_only_transactional_bucketed_table
  HIVE_MAJOR_VERSION
 3
@@ -2323,6 +2313,19 @@ SELECT * from functional.{table_name};
  DATASET
 functional
  BASE_TABLE_NAME
+materialized_view
+ HIVE_MAJOR_VERSION
+3
+ CREATE_HIVE
+-- The create materialized view command is moved down so that the database's
+-- managed directory has been created. Otherwise the command would fail. This
+-- is a bug in Hive.
+CREATE MATERIALIZED VIEW IF NOT EXISTS {db_name}{db_suffix}.{table_name}
+  AS SELECT * FROM {db_name}{db_suffix}.insert_only_transactional_table;
+=
+ DATASET
+functional
+ BASE_TABLE_NAME
 uncomp_src_alltypes
  CREATE_HIVE
 CREATE TABLE {db_name}{db_suffix}.{table_name} LIKE functional.alltypes STORED 
AS ORC;
diff --git 
a/testdata/workloads/functional-query/queries/QueryTest/create-database.test 
b/testdata/workloads/functional-query/queries/QueryTest/create-database.test
index 1b698b5..5cdaed3 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/create-database.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/create-database.test
@@ -16,7 +16,7 @@ STRING, STRING
 # for a newly created datab

[impala] branch master updated (6a1c448 -> 03f2b55)

2020-06-05 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from 6a1c448  IMPALA-9782: fix Kudu DML with mt_dop
 new c62a680  IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
 new 03f2b55  Filter out "Checksum validation failed" messages during the 
maven build

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/CMakeLists.txt  |   3 +-
 be/src/benchmarks/bloom-filter-benchmark.cc|  26 +-
 be/src/codegen/gen_ir_descriptions.py  |   7 +-
 be/src/exec/filter-context.cc  |  40 +-
 be/src/exec/kudu-scanner.cc| 134 ---
 be/src/runtime/raw-value-ir.cc |  77 +++-
 be/src/runtime/raw-value.h |  17 +
 be/src/runtime/raw-value.inline.h  | 125 +++
 be/src/runtime/runtime-filter-bank.cc  |   6 +-
 be/src/runtime/runtime-filter-ir.cc|   4 +-
 be/src/runtime/runtime-filter.h|   1 +
 be/src/service/query-options-test.cc   |   4 +
 be/src/service/query-options.cc|   8 +
 be/src/service/query-options.h |   6 +-
 be/src/util/bloom-filter-ir.cc |  13 +-
 be/src/util/bloom-filter-test.cc   |  65 ++--
 be/src/util/bloom-filter.cc| 248 -
 be/src/util/bloom-filter.h | 201 --
 be/src/util/debug-util.cc  |   1 +
 be/src/util/debug-util.h   |   1 +
 bin/impala-config.sh   |   6 +-
 bin/mvn-quiet.sh   |   8 +-
 common/thrift/ImpalaInternalService.thrift |   4 +
 common/thrift/ImpalaService.thrift |   8 +
 common/thrift/PlanNodes.thrift |   7 +
 .../impala/planner/RuntimeFilterGenerator.java |  63 +++-
 .../org/apache/impala/planner/PlannerTest.java |  24 +-
 .../PlannerTest/bloom-filter-assignment.test   | 408 +
 .../queries/PlannerTest/kudu-update.test   |  20 +-
 .../queries/PlannerTest/kudu.test  |   4 +-
 .../PlannerTest/runtime-filter-query-options.test  | 117 ++
 .../queries/PlannerTest/tpch-kudu.test | 381 ++-
 ...n_max_filters.test => all_runtime_filters.test} | 188 ++
 .../QueryTest/diff_runtime_filter_types.test   | 151 
 .../queries/QueryTest/runtime_filters.test |   5 +
 tests/query_test/test_runtime_filters.py   |  33 +-
 tests/query_test/test_spilling.py  |   6 +-
 37 files changed, 1696 insertions(+), 724 deletions(-)
 create mode 100644 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
 copy 
testdata/workloads/functional-query/queries/QueryTest/{min_max_filters.test => 
all_runtime_filters.test} (67%)
 create mode 100644 
testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test

[impala] 02/02: Filter out "Checksum validation failed" messages during the maven build

2020-06-05 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 03f2b559c31af7fc11165cf3b00876900e234663
Author: Joe McDonnell 
AuthorDate: Fri Apr 17 19:20:53 2020 -0700

Filter out "Checksum validation failed" messages during the maven build

Some Impala dependencies come from repositories that don't have
checksums available. During the build, this produces a large
number of messages like:
[WARNING] Checksum validation failed, no checksums available from the 
repository for ...
or:
[WARNING] Checksum validation failed, could not read expected checksum ...
These messages are not very useful, and they make it harder to search
the console output for failed tests. This filters them out of the maven
output. Differet versions of maven structure the messsages differently,
so this filters all the "Checksum validation failed" messages that happen
at WARNING level.

Testing:
 - Ran core tests, verified the messages are gone

Change-Id: I19afbd157533e52ef3157730c7ec5159241749bc
Reviewed-on: http://gerrit.cloudera.org:8080/15775
Tested-by: Impala Public Jenkins 
Reviewed-by: Anurag Mantripragada 
---
 bin/mvn-quiet.sh | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/bin/mvn-quiet.sh b/bin/mvn-quiet.sh
index f782ff4..c7c557e 100755
--- a/bin/mvn-quiet.sh
+++ b/bin/mvn-quiet.sh
@@ -34,10 +34,16 @@ EOF
 LOGGING_OPTIONS="-Dorg.slf4j.simpleLogger.showDateTime \
   -Dorg.slf4j.simpleLogger.dateTimeFormat=HH:mm:ss"
 
+# Filter out "Checksum validation failed" messages, as they are mostly 
harmless and
+# make it harder to search for failed tests in the console output. Limit the 
filtering
+# to WARNING messages.
+CHECKSUM_VALIDATION_FAILED_REGEX="[WARNING].*Checksum validation failed"
+
 # Always use maven's batch mode (-B), as it produces output that is easier to 
parse.
 if ! mvn -B $IMPALA_MAVEN_OPTIONS $LOGGING_OPTIONS "$@" | \
   tee -a "$LOG_FILE" | \
-  grep -E -e WARNING -e ERROR -e SUCCESS -e FAILURE -e Test -e "Found Banned"; 
then
+  grep -E -e WARNING -e ERROR -e SUCCESS -e FAILURE -e Test -e "Found Banned" 
| \
+  grep -v -i "${CHECKSUM_VALIDATION_FAILED_REGEX}"; then
   echo "mvn $IMPALA_MAVEN_OPTIONS $@ exited with code $?"
   exit 1
 fi

[impala] 02/02: IMPALA-9820: Pull Datasketches-5 HLL MurmurHash fix

2020-06-04 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 3713d5db8dcac540ce0b5cb45974054ca87792db
Author: Gabor Kaszab 
AuthorDate: Tue Jun 2 22:08:38 2020 +0200

IMPALA-9820: Pull Datasketches-5 HLL MurmurHash fix

There is a bug in DataSketches HLL MurmurHash where long strings are
over-read resulting a cardinality estimate that is more than 15% off
from the correct cardinality number. A recent upstream fix in Apache
DataSketches addresses this issue and this patch pulls it to Impala.

https://issues.apache.org/jira/browse/DATASKETCHES-5

Testing:
  - I used ds_hll_sketch() and ds_hll_estimate() functions from
IMPALA-9632 to trigger DataSketches HLL functionality.
  - Ran DataSketches HLL on lineitem.l_comment in TPCH25_parquet to
reproduce the issue. The symptom was that the actual result was
around 15% off from the correct cardinality result (~69M vs 79M).
  - After applying this fix re-running the query gives much closer
results, usually under 3% error range.

Change-Id: I84d73fce1e7a197c1f8fb49404b58ed9bb0b843d
Reviewed-on: http://gerrit.cloudera.org:8080/16026
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/thirdparty/datasketches/MurmurHash3.h | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/be/src/thirdparty/datasketches/MurmurHash3.h 
b/be/src/thirdparty/datasketches/MurmurHash3.h
index 45a64c6..f68e989 100644
--- a/be/src/thirdparty/datasketches/MurmurHash3.h
+++ b/be/src/thirdparty/datasketches/MurmurHash3.h
@@ -104,14 +104,12 @@ FORCE_INLINE void MurmurHash3_x64_128(const void* key, 
int lenBytes, uint64_t se
   out.h2 = seed;
 
   // Number of full 128-bit blocks of 16 bytes.
-  // Possible exclusion fo a remainder of up to 15 bytes.
+  // Possible exclusion of a remainder of up to 15 bytes.
   const int nblocks = lenBytes >> 4; // bytes / 16 
 
-  // Process the 128-bit blocks (the body) into teh hash
+  // Process the 128-bit blocks (the body) into the hash
   const uint64_t* blocks = (const uint64_t*)(data);
   for (int i = 0; i < nblocks; ++i) { // 16 bytes per block
-//uint64_t k1 = getblock64(blocks, 0);
-//uint64_t k2 = getblock64(blocks, 1);
 uint64_t k1 = getblock64(blocks,i*2+0);
 uint64_t k2 = getblock64(blocks,i*2+1);
 
@@ -124,12 +122,9 @@ FORCE_INLINE void MurmurHash3_x64_128(const void* key, int 
lenBytes, uint64_t se
 out.h2 = ROTL64(out.h2,31);
 out.h2 += out.h1;
 out.h2 = out.h2*5+0x38495ab5;
-
-blocks += 2;
   }
 
   // tail
-  //const uint8_t * tail = (const uint8_t*)blocks;
   const uint8_t * tail = (const uint8_t*)(data + (nblocks << 4));
 
   uint64_t k1 = 0;
@@ -175,4 +170,4 @@ FORCE_INLINE void MurmurHash3_x64_128(const void* key, int 
lenBytes, uint64_t se
 
 //-
 
-#endif // _MURMURHASH3_H_
\ No newline at end of file
+#endif // _MURMURHASH3_H_

[impala] branch master updated (37b5599 -> 3713d5d)

2020-06-04 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from 37b5599  IMPALA-9809: Multi-aggregation query on particular dataset 
crashes impalad
 new 3c71586  IMPALA-9723: Raise error when when Hive Streaming side-file 
is found
 new 3713d5d  IMPALA-9820: Pull Datasketches-5 HLL MurmurHash fix

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/thirdparty/datasketches/MurmurHash3.h   | 11 +++--
 .../java/org/apache/impala/util/AcidUtils.java |  9 +++-
 .../java/org/apache/impala/util/AcidUtilsTest.java | 27 ++
 3 files changed, 38 insertions(+), 9 deletions(-)

[impala] branch master updated: IMPALA-9702: Cleanup unique_database directories

2020-06-02 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new bfdc5bf  IMPALA-9702: Cleanup unique_database directories
bfdc5bf is described below

commit bfdc5bf6af2703127d4ff5611ed049f11b2cb004
Author: Joe McDonnell 
AuthorDate: Sun May 31 14:56:01 2020 -0700

IMPALA-9702: Cleanup unique_database directories

If there are external tables in a database, drop database cascade
won't remove the external table locations. If those locations
are inside the database, then the database directory does not
get removed. Some tests that use unique_database fail when
running for the second time (or with a data snapshot) due to the
preexisting files.

This adds code to remove the database directory for
unique_database. It also adds some debugging statements that
list the files at the beginning of bin/run-all-tests.sh and
again at the end.

Testing:
 - Ran a core job and verified that the unique database
   directories are being removed
 - Ran 
TestMixedPartitions::test_incompatible_avro_partition_in_non_avro_table()
   multiple times and it passes when it previously failed.

Change-Id: I0530c028e5e7c241dfc054f04c78e2a045c2d035
Reviewed-on: http://gerrit.cloudera.org:8080/16015
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 bin/run-all-tests.sh |  9 +
 tests/conftest.py| 23 ---
 2 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/bin/run-all-tests.sh b/bin/run-all-tests.sh
index 7e61068..77ce7c4 100755
--- a/bin/run-all-tests.sh
+++ b/bin/run-all-tests.sh
@@ -171,6 +171,9 @@ for i in $(seq 1 $NUM_TEST_ITERATIONS)
 do
   TEST_RET_CODE=0
 
+  # Store a list of the files at the beginning of each iteration.
+  hdfs dfs -ls -R /test-warehouse > 
${IMPALA_LOGS_DIR}/file-list-begin-${i}.log 2>&1
+
   start_impala_cluster
 
   if [[ "$BE_TEST" == true ]]; then
@@ -276,6 +279,12 @@ do
   # succeed.
   # ${IMPALA_HOME}/tests/run-process-failure-tests.sh
 
+  # Store a list of the files at the end of each iteration. This can be 
compared
+  # to the file-list-begin*.log from the beginning of the iteration to see if 
files
+  # are not being cleaned up. This is most useful on the first iteration, when
+  # the list of files is from dataload.
+  hdfs dfs -ls -R /test-warehouse > ${IMPALA_LOGS_DIR}/file-list-end-${i}.log 
2>&1
+
   # Finally, kill the spawned timeout process and its child sleep process.
   # There may not be a sleep process, so ignore failure.
   pkill -P $TIMEOUT_PID || true
diff --git a/tests/conftest.py b/tests/conftest.py
index b544c5e..f8c98f6 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -34,7 +34,7 @@ from tests.common.environ import build_flavor_timeout
 from common.test_result_verifier import QueryTestResult
 from tests.common.patterns import is_valid_impala_identifier
 from tests.comparison.db_connection import ImpalaConnection
-from tests.util.filesystem_utils import FILESYSTEM, ISILON_WEBHDFS_PORT
+from tests.util.filesystem_utils import FILESYSTEM, ISILON_WEBHDFS_PORT, 
WAREHOUSE
 
 LOG = logging.getLogger('test_configuration')
 LOG_FORMAT = "-- %(asctime)s %(levelname)-8s %(threadName)s: %(message)s"
@@ -343,22 +343,31 @@ def unique_database(request, testid_checksum):
' test function name or any prefixes for long length or 
invalid '
'characters.'.format(db_name))
 
+  def cleanup_database(db_name, must_exist):
+request.instance.execute_query_expect_success(request.instance.client,
+'DROP DATABASE {0} `{1}` CASCADE'.format(
+"" if must_exist else "IF EXISTS", db_name),
+{'sync_ddl': sync_ddl})
+# The database directory may not be removed if there are external tables 
in the
+# database when it is dropped. The external locations are not removed by 
cascade.
+# These preexisting files/directories can cause errors when tests run 
repeatedly or
+# use a data snapshot (see IMPALA-9702), so this forces cleanup of the 
database
+# directory.
+db_location = "{0}/{1}.db".format(WAREHOUSE, db_name).lstrip('/')
+request.instance.filesystem_client.delete_file_dir(db_location, 
recursive=True)
+
   def cleanup():
 # Make sure we don't try to drop the current session database
 request.instance.execute_query_expect_success(request.instance.client, 
"use default")
 for db_name in db_names:
-  request.instance.execute_query_expect_success(
-  request.instance.client, 'DROP DATABASE `{0}` 
CASCADE'.format(db_name),
-  {'sync_ddl': sync_ddl})
+  cleanup_database(db_name, True)
   LOG.info('Dropped datab

[impala] branch master updated: Reapply IMPALA-9781: Fix GCC 7 unaligned 128-bit loads / stores

2020-06-01 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new b188340  Reapply IMPALA-9781: Fix GCC 7 unaligned 128-bit loads / 
stores
b188340 is described below

commit b1883405cd59988b92df279965ff2f733c0e
Author: Joe McDonnell 
AuthorDate: Wed May 27 16:21:36 2020 -0700

Reapply IMPALA-9781: Fix GCC 7 unaligned 128-bit loads / stores

When running a release binary built with GCC 7.5.0, it crashes
with an unaligned memory error in multiple pieces of code.
In these locations, we are doing stores to 128-bit values, but we
cannot guarantee alignment. GCC 7 must be optimizing the code to
use instructions that require a higher level of alignment than
we can provide.

This switches the code locations to use memcpy to avoid the
unaligned stores (with local variables as necessary).

Testing:
 - Ran exhaustive tests with a release binary built by GCC 7.5.0
 - Ran UBSAN core tests
 - Add unaligned test case in decimal-test

Change-Id: I7edd8beeb15e4fbb69126a9f97a1476a4b8f12a9
Reviewed-on: http://gerrit.cloudera.org:8080/16009
Tested-by: Impala Public Jenkins 
Reviewed-by: Tim Armstrong 
---
 be/src/exprs/slot-ref.cc   | 5 -
 be/src/runtime/decimal-test.cc | 4 
 be/src/runtime/decimal-value.h | 3 ++-
 be/src/util/dict-encoding.h| 5 +++--
 4 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/be/src/exprs/slot-ref.cc b/be/src/exprs/slot-ref.cc
index 634a989..661c7ef 100644
--- a/be/src/exprs/slot-ref.cc
+++ b/be/src/exprs/slot-ref.cc
@@ -422,7 +422,10 @@ DecimalVal SlotRef::GetDecimalValInterpreted(
 case 8:
   return DecimalVal(*reinterpret_cast(t->GetSlot(slot_offset_)));
 case 16:
-  return 
DecimalVal(*reinterpret_cast(t->GetSlot(slot_offset_)));
+  // Avoid an unaligned load by using memcpy
+  __int128_t val;
+  memcpy(, t->GetSlot(slot_offset_), sizeof(val));
+  return DecimalVal(val);
 default:
   DCHECK(false);
   return DecimalVal::null();
diff --git a/be/src/runtime/decimal-test.cc b/be/src/runtime/decimal-test.cc
index 9a2f7a8..c5aed53 100644
--- a/be/src/runtime/decimal-test.cc
+++ b/be/src/runtime/decimal-test.cc
@@ -726,6 +726,10 @@ TEST(DecimalTest, UnalignedValues) {
   stringstream ss;
   RawValue::PrintValue(unaligned, ColumnType::CreateDecimalType(28, 2), 0, 
);
   EXPECT_EQ("123.45", ss.str());
+  // Regression test for IMPALA-9781: Verify that operator=() works
+  *unaligned = 0;
+  __int128_t val = unaligned->value();
+  EXPECT_EQ(val, 0);
   free(unaligned_mem);
 }
 
diff --git a/be/src/runtime/decimal-value.h b/be/src/runtime/decimal-value.h
index 761d474..e329476 100644
--- a/be/src/runtime/decimal-value.h
+++ b/be/src/runtime/decimal-value.h
@@ -49,7 +49,8 @@ class DecimalValue {
   DecimalValue(const T& s) : value_(s) { }
 
   DecimalValue& operator=(const T& s) {
-value_ = s;
+// 'value_' may be unaligned. Use memcpy to avoid an unaligned store.
+memcpy(_, , sizeof(T));
 return *this;
   }
 
diff --git a/be/src/util/dict-encoding.h b/be/src/util/dict-encoding.h
index f440332..e6e01bc 100644
--- a/be/src/util/dict-encoding.h
+++ b/be/src/util/dict-encoding.h
@@ -346,10 +346,11 @@ class DictDecoder : public DictDecoderBase {
   virtual int num_entries() const { return dict_.size(); }
 
   virtual void GetValue(int index, void* buffer) {
-T* val_ptr = reinterpret_cast(buffer);
 DCHECK_GE(index, 0);
 DCHECK_LT(index, dict_.size());
-*val_ptr = dict_[index];
+// Avoid an unaligned store by using memcpy
+T val = dict_[index];
+memcpy(buffer, reinterpret_cast(), sizeof(T));
   }
 
   /// Returns the next value.  Returns false if the data is invalid.

[impala] branch master updated: IMPALA-9749: ASAN builds should not run FE tests.

2020-06-01 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new 30f68db  IMPALA-9749: ASAN builds should not run FE tests.
30f68db is described below

commit 30f68dbe111c6c23394e805dfdf8ae63cedce57c
Author: Anurag Mantripragada 
AuthorDate: Wed May 6 18:41:10 2020 -0700

IMPALA-9749: ASAN builds should not run FE tests.

https://gerrit.cloudera.org/#/c/15778/ inadvertently changed the
behaviour of ASAN builds to to run FE tests. After this change,
FE custom cluster tests run immediately after other  FE tests
when FE_TEST is true.

Testing:
Ran private parametrized job with ASAN.

Change-Id: I26c469a20032bdc1f4f0bb3938d9f1c50163c99a
Reviewed-on: http://gerrit.cloudera.org:8080/15921
Tested-by: Impala Public Jenkins 
Reviewed-by: Thomas Tauber-Marshall 
---
 bin/run-all-tests.sh | 41 +
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/bin/run-all-tests.sh b/bin/run-all-tests.sh
index 46e35ae..7e61068 100755
--- a/bin/run-all-tests.sh
+++ b/bin/run-all-tests.sh
@@ -158,14 +158,20 @@ LOG_DIR="${IMPALA_EE_TEST_LOGS_DIR}"
 # Enable core dumps
 ulimit -c unlimited || true
 
+# Helper function to start Impala cluster.
+start_impala_cluster() {
+  # TODO-MT: remove --unlock_mt_dop when it is no longer needed.
+  run-step "Starting Impala cluster" start-impala-cluster.log \
+  "${IMPALA_HOME}/bin/start-impala-cluster.py" \
+  --log_dir="${IMPALA_EE_TEST_LOGS_DIR}" \
+  ${TEST_START_CLUSTER_ARGS} --impalad_args=--unlock_mt_dop=true
+}
+
 for i in $(seq 1 $NUM_TEST_ITERATIONS)
 do
   TEST_RET_CODE=0
 
-  # TODO-MT: remove --unlock_mt_dop when it is no longer needed.
-  run-step "Starting Impala cluster" start-impala-cluster.log \
-  "${IMPALA_HOME}/bin/start-impala-cluster.py" 
--log_dir="${IMPALA_EE_TEST_LOGS_DIR}" \
-  ${TEST_START_CLUSTER_ARGS} --impalad_args=--unlock_mt_dop=true
+  start_impala_cluster
 
   if [[ "$BE_TEST" == true ]]; then
 if [[ "$TARGET_FILESYSTEM" == "local" ]]; then
@@ -200,12 +206,25 @@ do
 if [[ "$CODE_COVERAGE" == true ]]; then
   MVN_ARGS+="-DcodeCoverage"
 fi
-# Don't run the FE custom cluster/service tests here since they restart 
Impala. We'll
-# run them with the other custom cluster/service tests below.
+# Run the FE tests first. We run the FE custom cluster tests below since 
they
+# restart Impala.
+MVN_ARGS_TEMP=$MVN_ARGS
 MVN_ARGS+=" -Dtest=!org.apache.impala.custom*.*Test"
 if ! "${IMPALA_HOME}/bin/mvn-quiet.sh" -fae test ${MVN_ARGS}; then
   TEST_RET_CODE=1
 fi
+
+# Run the FE custom cluster tests only if not running against S3
+if [[ "${TARGET_FILESYSTEM}" != "s3" ]]; then
+  MVN_ARGS=$MVN_ARGS_TEMP
+  MVN_ARGS+=" -Dtest=org.apache.impala.custom*.*Test"
+  if ! "${IMPALA_HOME}/bin/mvn-quiet.sh" -fae test ${MVN_ARGS}; then
+TEST_RET_CODE=1
+  fi
+  # Restart the minicluster after running the FE custom cluster tests.
+  # TODO-MT: remove --unlock_mt_dop when it is no longer needed.
+  start_impala_cluster
+fi
 popd
   fi
 
@@ -250,16 +269,6 @@ do
   TEST_RET_CODE=1
 fi
 export IMPALA_MAX_LOG_FILES="${IMPALA_MAX_LOG_FILES_SAVE}"
-
-# Run the FE custom cluster tests only if not running against S3.
-if [[ "${TARGET_FILESYSTEM}" != "s3" ]]; then
-  pushd "${IMPALA_FE_DIR}"
-  MVN_ARGS=" -Dtest=org.apache.impala.custom*.*Test "
-  if ! "${IMPALA_HOME}/bin/mvn-quiet.sh" -fae test ${MVN_ARGS}; then
-TEST_RET_CODE=1
-  fi
-  popd
-fi
   fi
 
   # Run the process failure tests.

[impala] branch master updated: IMPALA-9760: Add IMPALA_TOOLCHAIN_PACKAGES_HOME to prepare for GCC7

2020-05-30 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new 56ee90c  IMPALA-9760: Add IMPALA_TOOLCHAIN_PACKAGES_HOME to prepare 
for GCC7
56ee90c is described below

commit 56ee90c598dcc637f10647ffc3e03cc0a70b92ce
Author: Joe McDonnell 
AuthorDate: Wed May 27 13:32:43 2020 -0700

IMPALA-9760: Add IMPALA_TOOLCHAIN_PACKAGES_HOME to prepare for GCC7

The locations for native-toolchain packages in IMPALA_TOOLCHAIN
currently do not include the compiler version. This means that
the toolchain can't distinguish between native-toolchain packages
built with gcc 4.9.2 versus gcc 7.5.0. The collisions can cause
issues when switching back and forth between branches.

This introduces the IMPALA_TOOLCHAIN_PACKAGES_HOME environment
variable, which is a location inside IMPALA_TOOLCHAIN that would
hold native-toolchain packages. Currently, it is set to the same
as IMPALA_TOOLCHAIN, so there is no difference in behavior.
This lays the groundwork to add the compiler version to this
path when switching to GCC7.

Testing:
 - The only impediment to building with
   IMPALA_TOOLCHAIN_PACKAGES_HOME=$IMPALA_TOOLCHAIN/test is
   Impala-lzo. With a custom Impala-lzo, compilation succeeds.
   Either Impala-lzo will be fixed or it will be removed.
 - Core tests

Change-Id: I1ff641e503b2161baf415355452f86b6c8bfb15b
Reviewed-on: http://gerrit.cloudera.org:8080/15991
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 
---
 CMakeLists.txt | 17 +++---
 be/CMakeLists.txt  |  2 +-
 be/src/service/CMakeLists.txt  |  4 +--
 bin/bootstrap_toolchain.py | 58 +++---
 bin/distcc/distcc.sh   |  5 +++
 bin/dump_breakpad_symbols.py   | 12 +++
 bin/impala-config.sh   | 21 
 bin/impala-shell.sh|  2 +-
 bin/jenkins/finalize.sh|  2 +-
 bin/run-backend-tests.sh   |  2 +-
 bin/run-binary.sh  |  2 +-
 bin/run-jvm-binary.sh  |  2 +-
 bin/run_clang_tidy.sh  |  4 +--
 bin/set-ld-library-path.sh |  3 +-
 bin/set-pythonpath.sh  |  2 +-
 cmake_modules/clang_toolchain.cmake| 10 +++---
 cmake_modules/toolchain.cmake  |  5 +--
 docker/setup_build_context.py  |  7 ++--
 fe/pom.xml |  2 +-
 infra/python/bootstrap_virtualenv.py   | 13 
 shell/make_shell_tarball.sh|  2 +-
 shell/packaging/make_python_package.sh |  2 +-
 testdata/datasets/tpcds/preload|  2 +-
 testdata/datasets/tpch/preload |  2 +-
 24 files changed, 105 insertions(+), 78 deletions(-)

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 484f741..5719249 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -83,12 +83,13 @@ function(set_dep_root NAME)
   string(TOLOWER ${NAME} NAME_LOWER)
   string(REPLACE "_" "-" NAME_LOWER ${NAME_LOWER})
   set(VAL_NAME "IMPALA_${NAME}_VERSION")
-  set(${NAME}_ROOT $ENV{IMPALA_TOOLCHAIN}/${NAME_LOWER}-$ENV{${VAL_NAME}} 
PARENT_SCOPE)
+  set(${NAME}_ROOT 
$ENV{IMPALA_TOOLCHAIN_PACKAGES_HOME}/${NAME_LOWER}-$ENV{${VAL_NAME}}
+  PARENT_SCOPE)
 endfunction()
 
 # Define root path for all dependencies, this is in the form of
 # set_dep_root(PACKAGE) ->
-#   PACKAGE_ROOT set to 
$ENV{IMPALA_TOOLCHAIN}/PACKAGE-$ENV{IMPALA_PACKAGE_VERSION}
+#   PACKAGE_ROOT set to 
$ENV{IMPALA_TOOLCHAIN_PACKAGES_HOME}/PACKAGE-$ENV{IMPALA_PACKAGE_VERSION}
 set_dep_root(AVRO)
 set_dep_root(ORC)
 set_dep_root(BOOST)
@@ -104,7 +105,8 @@ set_dep_root(GTEST)
 set_dep_root(LIBEV)
 set_dep_root(LIBUNWIND)
 set_dep_root(LLVM)
-set(LLVM_DEBUG_ROOT 
$ENV{IMPALA_TOOLCHAIN}/llvm-$ENV{IMPALA_LLVM_DEBUG_VERSION})
+set(LLVM_DEBUG_ROOT
+$ENV{IMPALA_TOOLCHAIN_PACKAGES_HOME}/llvm-$ENV{IMPALA_LLVM_DEBUG_VERSION})
 set_dep_root(LZ4)
 set_dep_root(ZSTD)
 set_dep_root(OPENLDAP)
@@ -113,7 +115,8 @@ set_dep_root(RE2)
 set_dep_root(RAPIDJSON)
 set_dep_root(SNAPPY)
 set_dep_root(THRIFT)
-set(THRIFT11_ROOT $ENV{IMPALA_TOOLCHAIN}/thrift-$ENV{IMPALA_THRIFT11_VERSION})
+set(THRIFT11_ROOT
+$ENV{IMPALA_TOOLCHAIN_PACKAGES_HOME}/thrift-$ENV{IMPALA_THRIFT11_VERSION})
 set_dep_root(ZLIB)
 set_dep_root(CCTZ)
 
@@ -435,10 +438,14 @@ add_custom_target(cscope ALL DEPENDS gen-deps
   COMMAND "${CMAKE_SOURCE_DIR}/bin/gen-cscope.sh"
 )
 
+# This call is passing IMPALA_TOOLCHAIN_PACKAGES_HOME into Impala-lzo's 
build.sh,
+# but this is known not to work with the current version of Impala-lzo when
+# IMPALA_TOOLCHAIN_PACKAGES_HOME is a subdirectory of IMPALA_TOOLCHAIN. Either
+# Impala-lzo will need to be fixed or it will need

[impala] branch master updated: IMPALA-9762: Fix GCC7 shift-count-overflow in tuple-row-compare.cc

2020-05-29 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new f474b03  IMPALA-9762: Fix GCC7 shift-count-overflow in 
tuple-row-compare.cc
f474b03 is described below

commit f474b03dce35956d0762159eed16516b793903eb
Author: Joe McDonnell 
AuthorDate: Tue May 19 18:36:48 2020 -0700

IMPALA-9762: Fix GCC7 shift-count-overflow in tuple-row-compare.cc

This fixes a GCC 7 compilation error for this code in
TupleRowZOrderComparator's GetSharedIntRepresentation() and
GetSharedFloatRepresentation():

  return (static_cast(val) <<
  std::max((sizeof(U) - sizeof(T)) * 8, (uint64_t)0)) ^ mask;

In this case, the std::max is running with uint64_t arguments. For
template instatiations with sizeof(T) > sizeof(U), this results
in integer overflow and a very large positive integer causing
the shift-count-overflow. These instantiations are not used by
Impala, but the compiler still needs to generate them.

This changes the logic to use signed integers for the std::max,
avoiding the shift-count-overflow.

Testing:
 - Build on GCC 4.9.2 and GCC 7
 - Core tests

Change-Id: I518e8bed1bb8d49d9cb76a33b07b665e15dfef87
Reviewed-on: http://gerrit.cloudera.org:8080/15962
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/util/tuple-row-compare.cc | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/be/src/util/tuple-row-compare.cc b/be/src/util/tuple-row-compare.cc
index d960bbe..099fa46 100644
--- a/be/src/util/tuple-row-compare.cc
+++ b/be/src/util/tuple-row-compare.cc
@@ -439,8 +439,9 @@ U TupleRowZOrderComparator::GetSharedRepresentation(void* 
val, ColumnType type)
 
 template 
 U inline TupleRowZOrderComparator::GetSharedIntRepresentation(const T val, U 
mask) const {
-  return (static_cast(val) <<
-  std::max((sizeof(U) - sizeof(T)) * 8, (uint64_t)0)) ^ mask;
+  uint64_t shift_size = static_cast(
+  std::max(static_cast((sizeof(U) - sizeof(T)) * 8), (int64_t) 
0));
+  return (static_cast(val) << shift_size) ^ mask;
 }
 
 template 
@@ -449,13 +450,14 @@ U inline 
TupleRowZOrderComparator::GetSharedFloatRepresentation(void* val, U mas
   T floating_value = *reinterpret_cast(val);
   memcpy(, _value, sizeof(T));
   if (UNLIKELY(std::isnan(floating_value))) return 0;
+  uint64_t shift_size = static_cast(
+  std::max(static_cast((sizeof(U) - sizeof(T)) * 8), (int64_t) 
0));
   if (floating_value < 0.0) {
 // Flipping all bits for negative values.
-return static_cast(~tmp) << std::max((sizeof(U) - sizeof(T)) * 8, 
(uint64_t)0);
+return static_cast(~tmp) << shift_size;
   } else {
 // Flipping only first bit.
-return (static_cast(tmp) << std::max((sizeof(U) - sizeof(T)) * 8, 
(uint64_t)0)) ^
-mask;
+return (static_cast(tmp) << shift_size) ^ mask;
   }
 }

[impala] branch master updated: Revert "IMPALA-9781: Fix GCC 7 unaligned 128-bit loads / stores"

2020-05-29 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new 7a2e80c  Revert "IMPALA-9781: Fix GCC 7 unaligned 128-bit loads / 
stores"
7a2e80c is described below

commit 7a2e80cf602b8c13d935cfc06a2a55a3c48f8d0b
Author: Joe McDonnell 
AuthorDate: Fri May 29 12:32:01 2020 -0700

Revert "IMPALA-9781: Fix GCC 7 unaligned 128-bit loads / stores"

The change in decimal-util.h introduced undefined behavior.
See IMPALA-9800.

This reverts commit 227da84c3757eb857008e7b82aad622ed959eb84.

Change-Id: Id2b2e43c478a220ff545fdbca712e47905c8d22b
Reviewed-on: http://gerrit.cloudera.org:8080/16006
Reviewed-by: Joe McDonnell 
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 
---
 be/src/exprs/slot-ref.cc| 5 +
 be/src/util/decimal-util.h  | 3 +--
 be/src/util/dict-encoding.h | 5 ++---
 3 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/be/src/exprs/slot-ref.cc b/be/src/exprs/slot-ref.cc
index 661c7ef..634a989 100644
--- a/be/src/exprs/slot-ref.cc
+++ b/be/src/exprs/slot-ref.cc
@@ -422,10 +422,7 @@ DecimalVal SlotRef::GetDecimalValInterpreted(
 case 8:
   return DecimalVal(*reinterpret_cast(t->GetSlot(slot_offset_)));
 case 16:
-  // Avoid an unaligned load by using memcpy
-  __int128_t val;
-  memcpy(, t->GetSlot(slot_offset_), sizeof(val));
-  return DecimalVal(val);
+  return 
DecimalVal(*reinterpret_cast(t->GetSlot(slot_offset_)));
 default:
   DCHECK(false);
   return DecimalVal::null();
diff --git a/be/src/util/decimal-util.h b/be/src/util/decimal-util.h
index 4ddfe23..f505ecc 100644
--- a/be/src/util/decimal-util.h
+++ b/be/src/util/decimal-util.h
@@ -128,8 +128,7 @@ class DecimalUtil {
   const uint8_t* buffer, int fixed_len_size, T* v) {
 DCHECK_GT(fixed_len_size, 0);
 DCHECK_LE(fixed_len_size, sizeof(T));
-// Avoid an unaligned store by using memset
-memset(v, 0, sizeof(T));
+*v = 0;
 // We need to sign extend val. For example, if the original value was
 // -1, the original bytes were -1,-1,-1,-1. If we only wrote out 1 byte, 
after
 // the encode step above, val would contain (-1, 0, 0, 0). We need to sign
diff --git a/be/src/util/dict-encoding.h b/be/src/util/dict-encoding.h
index e6e01bc..f440332 100644
--- a/be/src/util/dict-encoding.h
+++ b/be/src/util/dict-encoding.h
@@ -346,11 +346,10 @@ class DictDecoder : public DictDecoderBase {
   virtual int num_entries() const { return dict_.size(); }
 
   virtual void GetValue(int index, void* buffer) {
+T* val_ptr = reinterpret_cast(buffer);
 DCHECK_GE(index, 0);
 DCHECK_LT(index, dict_.size());
-// Avoid an unaligned store by using memcpy
-T val = dict_[index];
-memcpy(buffer, reinterpret_cast(), sizeof(T));
+*val_ptr = dict_[index];
   }
 
   /// Returns the next value.  Returns false if the data is invalid.

[impala] 03/04: IMPALA-9415: Switch result set size calculations from capacity() to size()

2020-05-28 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 4cc1b4ad04cd5770a41961269d69a65cdfac1dcf
Author: Joe McDonnell 
AuthorDate: Wed May 27 15:05:04 2020 -0700

IMPALA-9415: Switch result set size calculations from capacity() to size()

The behavior of string's capacity() is implementation specific.
In GCC 7.5.0, the implementation has different behavior compared
to GCC 4.9.2. This is causing a DCHECK to fire in
ClientRequestState::FetchRowsInternal():

// Confirm that this was not an underestimate of the memory required.
DCHECK_GE(before + delta_bytes, after)

What happens on GCC 7.5.0 is that the capacity of the string before the
copy is 29, but after the copy to the result set, the capacity is 30.
The size remains unchanged.

This switches the code to use size(), which is guaranteed to be
consistent across copies. This loses some accuracy, because there is some
string object overhead and excess capacity that no longer counts. However,
this is not code that requires perfect accuracy.

Testing:
 - Ran core tests with GCC 4.9.2 and GCC 7.5.0

Change-Id: I3f9ab260927e14d8951b7c7661f2b5b18a1da39a
Reviewed-on: http://gerrit.cloudera.org:8080/15992
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/service/query-result-set.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/be/src/service/query-result-set.cc 
b/be/src/service/query-result-set.cc
index 5445ec7..f2d5b8e 100644
--- a/be/src/service/query-result-set.cc
+++ b/be/src/service/query-result-set.cc
@@ -226,7 +226,7 @@ int64_t AsciiQueryResultSet::ByteSize(int start_idx, int 
num_rows) {
   int64_t bytes = 0;
   const int end = min(static_cast(num_rows), result_set_->size() - 
start_idx);
   for (int i = start_idx; i < start_idx + end; ++i) {
-bytes += sizeof(result_set_[i]) + result_set_[i].capacity();
+bytes += sizeof(result_set_[i]) + result_set_[i].size();
   }
   return bytes;
 }
@@ -237,7 +237,7 @@ namespace {
 
 // Utility functions for computing the size of HS2 Thrift structs in bytes.
 inline int64_t ByteSize(const ThriftTColumnValue& val) {
-  return sizeof(val) + val.stringVal.value.capacity();
+  return sizeof(val) + val.stringVal.value.size();
 }
 
 int64_t ByteSize(const TRow& row) {

[impala] 01/04: IMPALA-9692 (part 2): Refactor parts of TExecPlanFragmentInfo to protobuf

2020-05-28 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit b67c0906f596ca336d0ea0e8cbc618a20ac0e563
Author: Thomas Tauber-Marshall 
AuthorDate: Tue Mar 24 14:04:53 2020 -0700

IMPALA-9692 (part 2): Refactor parts of TExecPlanFragmentInfo to protobuf

The new admission control service will be written in protobuf, so
there are various admission control related structures currently
stored in Thrift that it would be convenient to convert to protobuf,
to minimize the amount of converting back and forth that needs to be
done.

This patch converts some portions of TExecPlanFragmentInfo to
protobuf. TExecPlanFragmentInfo is sent as a sidecar with the Exec()
rpc, so the refactored parts are now just directly included in the
ExecQueryFInstancesRequestPB.

The portions that are converted are those that are part of the
QuerySchedule, in particular the TPlanFragmentDestination,
TScanRangeParams, and TJoinBuildInput.

This patch is just a refactor and doesn't contain any functional
changes.

One notable related change is that DataSink::CreateSink() has two
parameters removed - TPlanFragmentCtx (which no longer exists) and
TPlanFragmentInstanceCtx. These variables and the new PB eqivalents
are available via the RuntimeState that was already being passed in as
another parameter and don't need to be individually passed in.

Testing:
- Passed a full run of existing tests.
- Ran the single node perf test and didn't detect any regressions.

Change-Id: I3a8e46767b257bbf677171ac2f4efb1b623ba41b
Reviewed-on: http://gerrit.cloudera.org:8080/15844
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/benchmarks/expr-benchmark.cc|  7 +-
 be/src/benchmarks/hash-benchmark.cc|  5 +-
 be/src/codegen/llvm-codegen-test.cc|  6 +-
 be/src/exec/blocking-join-node.cc  | 16 +++--
 be/src/exec/data-sink.h|  3 +-
 be/src/exec/hbase-scan-node.cc | 16 ++---
 be/src/exec/hbase-table-sink.cc|  5 +-
 be/src/exec/hbase-table-sink.h |  4 +-
 be/src/exec/hdfs-scan-node-base.cc | 66 +-
 be/src/exec/hdfs-scan-node-base.h  |  2 +-
 be/src/exec/hdfs-table-sink.cc |  5 +-
 be/src/exec/hdfs-table-sink.h  |  4 +-
 be/src/exec/kudu-scan-node-base.cc |  9 +--
 be/src/exec/kudu-table-sink.cc |  5 +-
 be/src/exec/kudu-table-sink.h  |  4 +-
 be/src/exec/nested-loop-join-builder.cc|  6 +-
 be/src/exec/nested-loop-join-builder.h |  4 +-
 be/src/exec/partitioned-hash-join-builder.cc   |  5 +-
 be/src/exec/partitioned-hash-join-builder.h|  4 +-
 be/src/exec/plan-root-sink.cc  |  7 +-
 be/src/exec/plan-root-sink.h   |  4 +-
 be/src/exec/scan-node.h|  7 +-
 be/src/exprs/expr-codegen-test.cc  |  6 +-
 be/src/rpc/CMakeLists.txt  |  2 +
 be/src/runtime/coordinator-backend-state.cc| 35 ++
 be/src/runtime/data-stream-test.cc | 40 +--
 be/src/runtime/fragment-instance-state.cc  | 34 +
 be/src/runtime/fragment-instance-state.h   | 13 ++--
 be/src/runtime/fragment-state.cc   | 52 --
 be/src/runtime/fragment-state.h| 30 +---
 be/src/runtime/krpc-data-stream-sender.cc  | 60 
 be/src/runtime/krpc-data-stream-sender.h   |  8 +--
 be/src/runtime/query-state.cc  | 37 ++
 be/src/runtime/row-batch.cc| 10 +--
 be/src/runtime/runtime-state.cc| 24 ---
 be/src/runtime/runtime-state.h | 18 +++--
 be/src/runtime/test-env.cc | 16 +++--
 be/src/scheduling/query-schedule.h |  7 +-
 be/src/scheduling/scheduler-test-util.cc   | 20 +++---
 be/src/scheduling/scheduler-test-util.h|  3 +-
 be/src/scheduling/scheduler-test.cc| 34 -
 be/src/scheduling/scheduler.cc | 80 +++---
 be/src/scheduling/scheduler.h  |  6 +-
 be/src/service/fe-support.cc   |  5 +-
 be/src/util/CMakeLists.txt |  1 +
 be/src/util/compression-util.cc| 64 +
 .../src/util/compression-util.h| 28 +++-
 be/src/util/container-util.h   |  8 +++
 be/src/util/uid-util.h |  6 ++
 common/protobuf

[impala] branch master updated (a148517 -> 227da84)

2020-05-28 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from a148517  IMPALA-9787: fix spinning thread with memory-based table 
invalidation
 new b67c090  IMPALA-9692 (part 2): Refactor parts of TExecPlanFragmentInfo 
to protobuf
 new f39ddb1  IMPALA-9761: Fix GCC7 ambiguous else warning for gtest macros
 new 4cc1b4a  IMPALA-9415: Switch result set size calculations from 
capacity() to size()
 new 227da84  IMPALA-9781: Fix GCC 7 unaligned 128-bit loads / stores

The 4 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/benchmarks/expr-benchmark.cc   |  7 +-
 be/src/benchmarks/hash-benchmark.cc   |  5 +-
 be/src/codegen/llvm-codegen-test.cc   |  6 +-
 be/src/exec/blocking-join-node.cc | 16 +++--
 be/src/exec/data-sink.h   |  3 +-
 be/src/exec/hbase-scan-node.cc| 16 ++---
 be/src/exec/hbase-table-sink.cc   |  5 +-
 be/src/exec/hbase-table-sink.h|  4 +-
 be/src/exec/hdfs-scan-node-base.cc| 66 ++-
 be/src/exec/hdfs-scan-node-base.h |  2 +-
 be/src/exec/hdfs-table-sink.cc|  5 +-
 be/src/exec/hdfs-table-sink.h |  4 +-
 be/src/exec/kudu-scan-node-base.cc|  9 +--
 be/src/exec/kudu-table-sink.cc|  5 +-
 be/src/exec/kudu-table-sink.h |  4 +-
 be/src/exec/nested-loop-join-builder.cc   |  6 +-
 be/src/exec/nested-loop-join-builder.h|  4 +-
 be/src/exec/parquet/parquet-common-test.cc|  8 ++-
 be/src/exec/partitioned-hash-join-builder.cc  |  5 +-
 be/src/exec/partitioned-hash-join-builder.h   |  4 +-
 be/src/exec/plan-root-sink.cc |  7 +-
 be/src/exec/plan-root-sink.h  |  4 +-
 be/src/exec/scan-node.h   |  7 +-
 be/src/exprs/expr-codegen-test.cc |  6 +-
 be/src/exprs/slot-ref.cc  |  5 +-
 be/src/rpc/CMakeLists.txt |  2 +
 be/src/runtime/buffered-tuple-stream-test.cc  |  4 +-
 be/src/runtime/bufferpool/buffer-pool-test.cc |  8 ++-
 be/src/runtime/coordinator-backend-state.cc   | 35 ++
 be/src/runtime/data-stream-test.cc| 40 ++--
 be/src/runtime/fragment-instance-state.cc | 34 +-
 be/src/runtime/fragment-instance-state.h  | 13 ++--
 be/src/runtime/fragment-state.cc  | 52 ---
 be/src/runtime/fragment-state.h   | 30 ++---
 be/src/runtime/io/disk-io-mgr-test.cc |  4 +-
 be/src/runtime/krpc-data-stream-sender.cc | 60 -
 be/src/runtime/krpc-data-stream-sender.h  |  8 +--
 be/src/runtime/query-state.cc | 37 +++
 be/src/runtime/row-batch.cc   | 10 +--
 be/src/runtime/runtime-state.cc   | 24 ---
 be/src/runtime/runtime-state.h| 18 +++--
 be/src/runtime/test-env.cc| 16 +++--
 be/src/runtime/timestamp-test.cc  | 12 ++--
 be/src/scheduling/query-schedule.h|  7 +-
 be/src/scheduling/scheduler-test-util.cc  | 20 +++---
 be/src/scheduling/scheduler-test-util.h   |  3 +-
 be/src/scheduling/scheduler-test.cc   | 34 +-
 be/src/scheduling/scheduler.cc| 80 ---
 be/src/scheduling/scheduler.h |  6 +-
 be/src/service/fe-support.cc  |  5 +-
 be/src/service/query-result-set.cc|  4 +-
 be/src/testutil/cpu-util.h|  4 +-
 be/src/util/CMakeLists.txt|  1 +
 be/src/util/compression-util.cc   | 64 ++
 be/src/util/{flat_buffer.h => compression-util.h} | 14 ++--
 be/src/util/container-util.h  |  8 +++
 be/src/util/decimal-util.h|  3 +-
 be/src/util/dict-encoding.h   |  5 +-
 be/src/util/uid-util.h|  6 ++
 common/protobuf/CMakeLists.txt|  2 +-
 common/protobuf/common.proto  | 21 --
 common/protobuf/control_service.proto | 73 +
 common/protobuf/planner.proto | 76 +
 common/protobuf/row_batch.proto   |  2 +-
 common/thrift/ImpalaInternalService.thrift| 58 ++--
 common/thrift/PlanNodes.thrift|  8 ++-

[impala] 02/04: IMPALA-9761: Fix GCC7 ambiguous else warning for gtest macros

2020-05-28 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit f39ddb1443eaa588ec85254d0a4aefe95f105b7a
Author: Joe McDonnell 
AuthorDate: Tue May 19 19:39:43 2020 -0700

IMPALA-9761: Fix GCC7 ambiguous else warning for gtest macros

On GCC7, a dangling-else warning is firing for code like:

if (cond1) ASSERT_TRUE(cond2)

This is true for several ASSERT_* and EXPECT_* gtest macros.
gtest had some code to avoid warnings for code of this form,
but that code is no longer effective. gtest now disables
the dangling-else warning. Since this is just a matter of
adding braces, this adds braces for all those locations.
For consistency, this may include locations that were not
failing. I found locations by doing:
git grep EXPECT_ | grep if
git grep ASSERT_ | grep if
and manually looking through the output.

Testing:
 - Builds successfully

Change-Id: Ieb664afe83736a71508302575e8e66a1b506c985
Reviewed-on: http://gerrit.cloudera.org:8080/15964
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/exec/parquet/parquet-common-test.cc|  8 ++--
 be/src/runtime/buffered-tuple-stream-test.cc  |  4 +++-
 be/src/runtime/bufferpool/buffer-pool-test.cc |  8 ++--
 be/src/runtime/io/disk-io-mgr-test.cc |  4 +++-
 be/src/runtime/timestamp-test.cc  | 12 
 be/src/testutil/cpu-util.h|  4 +++-
 6 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/be/src/exec/parquet/parquet-common-test.cc 
b/be/src/exec/parquet/parquet-common-test.cc
index b67603a..0dc519f 100644
--- a/be/src/exec/parquet/parquet-common-test.cc
+++ b/be/src/exec/parquet/parquet-common-test.cc
@@ -30,7 +30,9 @@ void ValidateRanges(RangeVec skip_ranges, int num_rows, const 
RangeVec& expected
   RangeVec result;
   bool success = ComputeCandidateRanges(num_rows, _ranges, );
   EXPECT_EQ(should_succeed, success);
-  if (success) EXPECT_EQ(expected, result);
+  if (success) {
+EXPECT_EQ(expected, result);
+  }
 }
 
 void ValidateRangesError(RangeVec skip_ranges, int num_rows, const RangeVec& 
expected) {
@@ -76,7 +78,9 @@ void ValidatePages(const vector& first_row_indexes, 
const RangeVec& ran
   bool success = ComputeCandidatePages(page_locations, ranges, num_rows,
   _pages);
   EXPECT_EQ(should_succeed, success);
-  if (success) EXPECT_EQ(expected_page_indexes, candidate_pages);
+  if (success) {
+EXPECT_EQ(expected_page_indexes, candidate_pages);
+  }
 }
 
 void ValidatePagesError(const vector& first_row_indexes, const 
RangeVec& ranges,
diff --git a/be/src/runtime/buffered-tuple-stream-test.cc 
b/be/src/runtime/buffered-tuple-stream-test.cc
index 37d0323..1c42f07 100644
--- a/be/src/runtime/buffered-tuple-stream-test.cc
+++ b/be/src/runtime/buffered-tuple-stream-test.cc
@@ -854,7 +854,9 @@ void SimpleTupleStreamTest::TestAttachMemory(bool 
pin_stream, bool attach_on_rea
   } else {
 EXPECT_EQ(0, num_buffers_attached) << "No buffers attached during 
iteration.";
   }
-  if (attach_on_read || !pin_stream) EXPECT_EQ(4, num_flushes);
+  if (attach_on_read || !pin_stream) {
+EXPECT_EQ(4, num_flushes);
+  }
   out_batch->Reset();
   stream.Close(out_batch, RowBatch::FlushMode::FLUSH_RESOURCES);
   if (attach_on_read) {
diff --git a/be/src/runtime/bufferpool/buffer-pool-test.cc 
b/be/src/runtime/bufferpool/buffer-pool-test.cc
index ea788c4..2c9add7 100644
--- a/be/src/runtime/bufferpool/buffer-pool-test.cc
+++ b/be/src/runtime/bufferpool/buffer-pool-test.cc
@@ -584,7 +584,9 @@ void BufferPoolTest::TestBufferAllocation(bool reserved) {
   BufferPool::ClientHandle client;
   ASSERT_OK(pool.RegisterClient("test client", NULL, _reservations_, 
NULL,
   TOTAL_MEM, NewProfile(), ));
-  if (reserved) ASSERT_TRUE(client.IncreaseReservationToFit(TOTAL_MEM));
+  if (reserved) {
+ASSERT_TRUE(client.IncreaseReservationToFit(TOTAL_MEM));
+  }
 
   vector handles(NUM_BUFFERS);
 
@@ -2095,7 +2097,9 @@ void BufferPoolTest::TestRandomInternalImpl(BufferPool* 
pool, TmpFileGroup* file
   int rand_pick = uniform_int_distribution(0, pages.size() - 1)(*rng);
   PageHandle* page = [rand_pick].first;
   if (!client.IncreaseReservationToFit(page->len())) continue;
-  if (!page->is_pinned() || multiple_pins) ASSERT_OK(pool->Pin(, 
page));
+  if (!page->is_pinned() || multiple_pins) {
+ASSERT_OK(pool->Pin(, page));
+  }
   // Block on the pin and verify data for sync pins.
   if (p < 0.35) VerifyData(*page, pages[rand_pick].second);
 } else if (p < 0.70) {
diff --git a/be/src/runtime/io/disk-io-mgr-test.cc 
b/be/src/runtime/io/disk-io-mgr-test.cc
index 2cf4642..f549e19 100644
--- a/be/src/runtime/io/disk-io-mgr-test.cc
+++ b/be/s

[impala] 04/04: IMPALA-9781: Fix GCC 7 unaligned 128-bit loads / stores

2020-05-28 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 227da84c3757eb857008e7b82aad622ed959eb84
Author: Joe McDonnell 
AuthorDate: Wed May 27 16:21:36 2020 -0700

IMPALA-9781: Fix GCC 7 unaligned 128-bit loads / stores

When running a release binary built with GCC 7.5.0, it crashes
with an unaligned memory error in multiple pieces of code.
In these locations, we are doing stores to 128-bit values, but we
cannot guarantee alignment. GCC 7 must be optimizing the code to
use instructions that require a higher level of alignment than
we can provide.

This switches the code locations to use memset / memcpy with
local variables to avoid the unaligned stores.

Testing:
 - Ran exhaustive tests with a release binary built by GCC 7.5.0
 - Ran exhaustive tests

Change-Id: I67320790789d5b57aeaf2dff0eae7352a1cbf81e
Reviewed-on: http://gerrit.cloudera.org:8080/15993
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/exprs/slot-ref.cc| 5 -
 be/src/util/decimal-util.h  | 3 ++-
 be/src/util/dict-encoding.h | 5 +++--
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/be/src/exprs/slot-ref.cc b/be/src/exprs/slot-ref.cc
index 634a989..661c7ef 100644
--- a/be/src/exprs/slot-ref.cc
+++ b/be/src/exprs/slot-ref.cc
@@ -422,7 +422,10 @@ DecimalVal SlotRef::GetDecimalValInterpreted(
 case 8:
   return DecimalVal(*reinterpret_cast(t->GetSlot(slot_offset_)));
 case 16:
-  return 
DecimalVal(*reinterpret_cast(t->GetSlot(slot_offset_)));
+  // Avoid an unaligned load by using memcpy
+  __int128_t val;
+  memcpy(, t->GetSlot(slot_offset_), sizeof(val));
+  return DecimalVal(val);
 default:
   DCHECK(false);
   return DecimalVal::null();
diff --git a/be/src/util/decimal-util.h b/be/src/util/decimal-util.h
index f505ecc..4ddfe23 100644
--- a/be/src/util/decimal-util.h
+++ b/be/src/util/decimal-util.h
@@ -128,7 +128,8 @@ class DecimalUtil {
   const uint8_t* buffer, int fixed_len_size, T* v) {
 DCHECK_GT(fixed_len_size, 0);
 DCHECK_LE(fixed_len_size, sizeof(T));
-*v = 0;
+// Avoid an unaligned store by using memset
+memset(v, 0, sizeof(T));
 // We need to sign extend val. For example, if the original value was
 // -1, the original bytes were -1,-1,-1,-1. If we only wrote out 1 byte, 
after
 // the encode step above, val would contain (-1, 0, 0, 0). We need to sign
diff --git a/be/src/util/dict-encoding.h b/be/src/util/dict-encoding.h
index f440332..e6e01bc 100644
--- a/be/src/util/dict-encoding.h
+++ b/be/src/util/dict-encoding.h
@@ -346,10 +346,11 @@ class DictDecoder : public DictDecoderBase {
   virtual int num_entries() const { return dict_.size(); }
 
   virtual void GetValue(int index, void* buffer) {
-T* val_ptr = reinterpret_cast(buffer);
 DCHECK_GE(index, 0);
 DCHECK_LT(index, dict_.size());
-*val_ptr = dict_[index];
+// Avoid an unaligned store by using memcpy
+T val = dict_[index];
+memcpy(buffer, reinterpret_cast(), sizeof(T));
   }
 
   /// Returns the next value.  Returns false if the data is invalid.

[impala] 01/02: IMPALA-9775: Fix test failure in TestAcid.test_acid_heartbeats

2020-05-28 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit febce6519f5791e5578608eac4d68f5f9ccb0457
Author: wzhou-code 
AuthorDate: Sun May 24 20:59:20 2020 -0700

IMPALA-9775: Fix test failure in TestAcid.test_acid_heartbeats

The failure was caused by IMPALA-9764, which change the sleep
interval between heartbeats. To fix it, add an upper limit of
the sleep interval as 100 seconds, and increase the execution
time for the query in test case TestAcid::test_acid_heartbeats.
Skip the test for table formats with compression to reduce the
total execution time.

Testing:
 - Ran following command to verify that the bug was fixed:
 ./bin/impala-py.test tests/query_test/test_acid.py\
   ::TestAcid::test_acid_heartbeats \
   --workload_exploration_strategy=functional-query:exhaustive
 - Passed all exhaustive tests.

Change-Id: I7922797d7e3ce94a2c8948211245f4e77fdb08b7
Reviewed-on: http://gerrit.cloudera.org:8080/15984
Reviewed-by: Zoltan Borok-Nagy 
Tested-by: Impala Public Jenkins 
---
 .../main/java/org/apache/impala/common/TransactionKeepalive.java  | 8 ++--
 tests/query_test/test_acid.py | 8 +---
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git 
a/fe/src/main/java/org/apache/impala/common/TransactionKeepalive.java 
b/fe/src/main/java/org/apache/impala/common/TransactionKeepalive.java
index 50e35a2..52dff28 100644
--- a/fe/src/main/java/org/apache/impala/common/TransactionKeepalive.java
+++ b/fe/src/main/java/org/apache/impala/common/TransactionKeepalive.java
@@ -48,6 +48,10 @@ import com.sun.tools.javac.code.Attribute.Array;
 public class TransactionKeepalive {
   public static final Logger LOG = 
Logger.getLogger(TransactionKeepalive.class);
 
+  // (IMPALA-9775) The sleep interval is deduced from Hive configuration 
parameter
+  // hive.txn.timeout. To be safe, set an upper limit for sleep interval as 100
+  // seconds for carrying through the test case TestAcid.test_acid_heartbeats.
+  private static final long MAX_SLEEP_INTERVAL_MILLISECONDS = 10;
   private static final long MILLION = 100L;
 
   private final long sleepIntervalMs_;
@@ -209,8 +213,8 @@ public class TransactionKeepalive {
*/
   public TransactionKeepalive(MetaStoreClientPool metaStoreClientPool) {
 HiveConf hiveConf = new HiveConf(TransactionKeepalive.class);
-sleepIntervalMs_ = hiveConf.getTimeVar(
-HiveConf.ConfVars.HIVE_TXN_TIMEOUT, TimeUnit.MILLISECONDS) / 3;
+sleepIntervalMs_ = Math.min(MAX_SLEEP_INTERVAL_MILLISECONDS, 
hiveConf.getTimeVar(
+HiveConf.ConfVars.HIVE_TXN_TIMEOUT, TimeUnit.MILLISECONDS) / 3);
 Preconditions.checkState(sleepIntervalMs_ > 0);
 Preconditions.checkNotNull(metaStoreClientPool);
 metaStoreClientPool_ = metaStoreClientPool;
diff --git a/tests/query_test/test_acid.py b/tests/query_test/test_acid.py
index ed564a4..f9e3f02 100644
--- a/tests/query_test/test_acid.py
+++ b/tests/query_test/test_acid.py
@@ -174,13 +174,15 @@ class TestAcid(ImpalaTestSuite):
   @SkipIfADLS.hive
   @SkipIfIsilon.hive
   @SkipIfLocal.hive
-  @pytest.mark.execute_serially
   def test_acid_heartbeats(self, vector, unique_database):
 """Tests heartbeating of transactions. Creates a long-running query via
 some jitting and in the meanwhile it periodically checks whether there is
 a transaction that has sent a heartbeat since its start.
 """
 if self.exploration_strategy() != 'exhaustive': pytest.skip()
+table_format = vector.get_value('table_format')
+if table_format.compression_codec != 'none': pytest.skip()
+
 last_open_txn_start_time = self._latest_open_transaction()
 dummy_tbl = "{}.{}".format(unique_database, "dummy")
 self.execute_query("create table {} (i int) tblproperties"
@@ -188,8 +190,8 @@ class TestAcid(ImpalaTestSuite):

"'transactional_properties'='insert_only')".format(dummy_tbl))
 try:
   handle = self.execute_query_async(
-  "insert into {} values (sleep(20))".format(dummy_tbl))
-  MAX_ATTEMPTS = 10
+  "insert into {} values (sleep(32))".format(dummy_tbl))
+  MAX_ATTEMPTS = 16
   attempt = 0
   success = False
   while attempt < MAX_ATTEMPTS:

[impala] 02/02: Run test_row_validation only on HDFS

2020-05-28 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit b8496d56e49814c36edadbff17b80b10bf40a3a2
Author: Zoltan Borok-Nagy 
AuthorDate: Wed May 27 11:10:59 2020 +0200

Run test_row_validation only on HDFS

Added pytest.skip() when the test is being run on a filesystem other
than HDFS. The test only makes sense on filesystems that support APPEND
because it simulates Hive Streaming V2. And currently Hive Streaming
only works on HDFS.

Change-Id: Id2a647ba5c75a600f177f82290241a93afc71ea7
Reviewed-on: http://gerrit.cloudera.org:8080/15988
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 tests/query_test/test_acid_row_validation.py | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tests/query_test/test_acid_row_validation.py 
b/tests/query_test/test_acid_row_validation.py
index c6438c0..041e754 100644
--- a/tests/query_test/test_acid_row_validation.py
+++ b/tests/query_test/test_acid_row_validation.py
@@ -18,11 +18,12 @@
 # Functional tests for ACID integration with Hive.
 
 import os
+import pytest
 
 from tests.common.impala_test_suite import ImpalaTestSuite
 from tests.common.skip import SkipIfLocal
 from tests.util.acid_txn import AcidTxn
-
+from tests.util.filesystem_utils import IS_HDFS
 
 # Tests that Impala validates rows against a validWriteIdList correctly.
 class TestAcidRowValidation(ImpalaTestSuite):
@@ -63,6 +64,10 @@ class TestAcidRowValidation(ImpalaTestSuite):
 """Tests reading from a file written by Hive Streaming Ingestion. In the 
first no rows
 are valid. Then we commit the first transaction and read the table. Then 
we commit the
 last transaction and read the table."""
+# This test only makes sense on a filesystem that supports the file append 
operation
+# (e.g. S3 doesn't) because it simulates Hive Streaming V2. So let's run 
it only on
+# HDFS.
+if not IS_HDFS: pytest.skip()
 tbl_name = "streaming"
 self._create_test_table(vector, unique_database, tbl_name)
 self.run_test_case('QueryTest/acid-row-validation-0', vector, 
use_db=unique_database)

[impala] branch master updated (5c69e7b -> b8496d5)

2020-05-28 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from 5c69e7b  IMPALA-9597: Eliminate redundant Ranger audits for column 
masking
 new febce65  IMPALA-9775: Fix test failure in TestAcid.test_acid_heartbeats
 new b8496d5  Run test_row_validation only on HDFS

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../main/java/org/apache/impala/common/TransactionKeepalive.java  | 8 ++--
 tests/query_test/test_acid.py | 8 +---
 tests/query_test/test_acid_row_validation.py  | 7 ++-
 3 files changed, 17 insertions(+), 6 deletions(-)

[impala] branch master updated: IMPALA-9597: Eliminate redundant Ranger audits for column masking

2020-05-26 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new 5c69e7b  IMPALA-9597: Eliminate redundant Ranger audits for column 
masking
5c69e7b is described below

commit 5c69e7ba583297dc886652ac5952816882b928af
Author: Fang-Yu Rao 
AuthorDate: Mon Apr 27 08:41:53 2020 -0700

IMPALA-9597: Eliminate redundant Ranger audits for column masking

After IMPALA-9350, Impala is able to produce the corresponding Ranger
audits when a query involves policies of column masking. However,
redundant audit events could be produced due to the fact that the
analysis of the TableRef containing a column involved in a column
masking policy could be performed more than once for a query that has
to be analyzed more than once. For example, a query consisting of a
WithClause or a query that requires a rewrite operation followed by a
re-analysis phase would result in
RangerImpalaPlugin#evalDataMaskPolicies() being invoked multiple times,
each producing an audit log entry for the same column.

Moreover, for a query involving column masking policies, the
corresponding audit log entries will still be generated even though
there is an AuthorizationException thrown in the authorization phase.

This patch fixes those two issues described above by adding some
post-processing steps after the analysis of a query to deduplicate the
List of AuthzAuditEvent's for column masking policies. Specifically,
we stash the deduplicated audit events after the analysis of the query
and will add back those deduplicated events only if the authorization of
the query is successful.

On the other hand, this patch also resolves an inconsistency when an
"Unmasked" policy is involved in a query that retains the original
column value. Specifically, when an "Unmasked" policy is the only column
masking policy involved in this query,
RangerAuthorizationChecker#createColumnMask() will not be called to
produce the corresponding AuthzAuditEvent, whereas createColumnMask()
will be invoked to produce the respective AuthzAuditEvent if there are
policies of other types. Since an "Unmasked" policy essentially
does not change the original column value, we filter out the respective
events with mask type equal to "MASK_NONE" which corresponds to an
"Unmasked" policy.

Testing:
- Added three test cases in
  RangerAuditLogTest#testAuditsForColumnMasking() to make sure the
  issues above are resolved.
- Verified that this patch passes the FE tests in the DEBUG build.

Change-Id: I42d60130fba93d63fbc36949f2bf746b7ae2497d
Reviewed-on: http://gerrit.cloudera.org:8080/15854
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 .../apache/impala/analysis/AnalysisContext.java|  29 ++--
 .../impala/authorization/AuthorizationChecker.java |  10 +-
 .../authorization/BaseAuthorizationChecker.java|   2 +-
 .../authorization/NoopAuthorizationFactory.java|   4 +
 .../ranger/RangerAuthorizationChecker.java |  48 +--
 .../ranger/RangerAuthorizationContext.java |  42 ++
 .../authorization/ranger/RangerImpalaPlugin.java   |  22 +++
 .../authorization/ranger/RangerAuditLogTest.java   | 153 ++---
 .../org/apache/impala/common/FrontendTestBase.java |   4 +
 9 files changed, 259 insertions(+), 55 deletions(-)

diff --git a/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java 
b/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
index feddb4c..df68afb 100644
--- a/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
+++ b/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
@@ -416,27 +416,18 @@ public class AnalysisContext {
 
 // Analyze statement and record exception.
 AnalysisException analysisException = null;
-TClientRequest clientRequest;
-AuthorizationContext authzCtx = null;
-
+TClientRequest clientRequest = queryCtx_.getClient_request();
+AuthorizationContext authzCtx = 
authzChecker.createAuthorizationContext(true,
+clientRequest.isSetRedacted_stmt() ?
+clientRequest.getRedacted_stmt() : clientRequest.getStmt(),
+queryCtx_.getSession(), Optional.of(timeline_));
+Preconditions.checkState(authzCtx != null);
 try {
-  clientRequest = queryCtx_.getClient_request();
-  authzCtx = authzChecker.createAuthorizationContext(true,
-  clientRequest.isSetRedacted_stmt() ?
-  clientRequest.getRedacted_stmt() : clientRequest.getStmt(),
-  queryCtx_.getSession(), Optional.of(timeline_));
-  // TODO (IMPALA-9597): Generating column masking audit events in the

[impala] 01/02: IMPALA-9755: Flaky test: test_global_exchange_counters

2020-05-20 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 95ee26354dc0ce61e5844430d1eaf553fd13d154
Author: Sahil Takiar 
AuthorDate: Tue May 19 14:58:23 2020 -0700

IMPALA-9755: Flaky test: test_global_exchange_counters

De-flake TestObservability.test_global_exchange_counters in
test_observability.py.

IMPALA-6984 added a feature to send a Cancel RPC to running fragments
when the coordinator fragment fetches all rows defined by a limit. This
causes fragments to terminate early (which is a good thing). However,
test_global_exchange_counters expects each fragment to produce some
rows, which is why it recently became flaky.

This patch modifies test_global_exchange_counters so that it allows for
some fragments to produce 0 rows.

Testing:
* Ran test_observability.py locally
* Looped 8 concurrent streams of test_global_exchange_counters for an
  hour, no failures (previously I was able to reproduce the test issue
  within 5 minutes)

Change-Id: Icb3a1b5ccb5695eb71343e96cc830f12d5c72f1e
Reviewed-on: http://gerrit.cloudera.org:8080/15960
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 tests/query_test/test_observability.py | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/tests/query_test/test_observability.py 
b/tests/query_test/test_observability.py
index 101f086..8f27c72 100644
--- a/tests/query_test/test_observability.py
+++ b/tests/query_test/test_observability.py
@@ -404,7 +404,6 @@ class TestObservability(ImpalaTestSuite):
   def __verify_profile_event_sequence(self, event_regexes, runtime_profile):
 """Check that 'event_regexes' appear in a consecutive series of lines in
'runtime_profile'"""
-lines = runtime_profile.splitlines()
 event_regex_index = 0
 
 # Check that the strings appear in the above order with no gaps in the 
profile.
@@ -501,9 +500,13 @@ class TestObservability(ImpalaTestSuite):
 if key in line:
   # Match byte count within parentheses
   m = re.search("\(([0-9]+)\)", line)
-  assert m, "Cannot match pattern for key %s in line '%s'" % (key, 
line)
-  # Only keep first (query-level) counter
-  if counters[key] == 0:
+
+  # If a match was not found, then the value of the key should be 0
+  if not m:
+assert key + ": 0" in line, "Invalid format for key %s" % key
+assert counters[key] != 0, "Query level counter for key %s cannot 
be 0" % key
+  elif counters[key] == 0:
+# Only keep first (query-level) counter
 counters[key] = int(m.group(1))
 
 # All counters have values

[impala] branch master updated (a11106e -> 3e76da9)

2020-05-20 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from a11106e  IMPALA-9764: TransactionKeepalive should set sleep interval 
based on Hive Configuration
 new 95ee263  IMPALA-9755: Flaky test: test_global_exchange_counters
 new 3e76da9  IMPALA-9708: Remove Sentry support

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 README-build.md|   7 +-
 be/src/catalog/catalog-server.cc   |  11 -
 be/src/catalog/catalog-service-client-wrapper.h|   8 -
 be/src/catalog/catalog.cc  |  11 -
 be/src/catalog/catalog.h   |   7 -
 be/src/common/global-flags.cc  |   3 +
 be/src/exec/catalog-op-executor.cc |  15 -
 be/src/exec/catalog-op-executor.h  |   6 -
 be/src/service/fe-support.cc   |  29 -
 be/src/service/frontend.cc |   8 +-
 be/src/transport/TSasl.cpp |   2 +-
 be/src/util/backend-gflag-util.cc  |   8 -
 bin/bootstrap_toolchain.py |  49 +-
 bin/create-test-configuration.sh   |  19 -
 bin/impala-config.sh   |  52 +-
 buildall.sh|  18 +-
 common/thrift/BackendGflags.thrift |   6 +-
 common/thrift/CatalogService.thrift|  24 -
 fe/pom.xml | 173 +---
 .../impala/analysis/AlterDbSetOwnerStmt.java   |   6 +-
 .../analysis/AlterTableOrViewSetOwnerStmt.java |   6 +-
 .../apache/impala/analysis/AuthorizationStmt.java  |   3 +-
 .../apache/impala/analysis/CreateTableStmt.java|   2 +-
 .../authorization/AuthorizationProvider.java   |   2 -
 .../authorization/PrivilegeRequestBuilder.java |   2 +-
 .../impala/authorization/sentry/ImpalaAction.java  |  87 ---
 .../authorization/sentry/ImpalaActionFactory.java  |  57 --
 .../authorization/sentry/ImpalaPrivilegeModel.java |  44 --
 .../authorization/sentry/SentryAuthProvider.java   |  70 --
 .../authorization/sentry/SentryAuthorizable.java   |  59 --
 .../sentry/SentryAuthorizableColumn.java   |  83 --
 .../authorization/sentry/SentryAuthorizableDb.java |  54 --
 .../sentry/SentryAuthorizableFactory.java  |  86 --
 .../authorization/sentry/SentryAuthorizableFn.java |  61 --
 .../sentry/SentryAuthorizableServer.java   |  53 --
 .../sentry/SentryAuthorizableTable.java|  71 --
 .../sentry/SentryAuthorizableUri.java  |  53 --
 .../sentry/SentryAuthorizationChecker.java | 155 
 .../sentry/SentryAuthorizationConfig.java  | 147 
 .../sentry/SentryAuthorizationFactory.java | 109 ---
 .../sentry/SentryAuthorizationPolicy.java  | 167 
 .../sentry/SentryCatalogdAuthorizationManager.java | 541 -
 .../impala/authorization/sentry/SentryConfig.java  |  74 --
 .../sentry/SentryImpaladAuthorizationManager.java  | 306 
 .../sentry/SentryPolicyReaderException.java|  35 -
 .../authorization/sentry/SentryPolicyService.java  | 544 -
 .../impala/authorization/sentry/SentryProxy.java   | 651 
 .../sentry/SentryUnavailableException.java |  35 -
 .../impala/authorization/sentry/SentryUtil.java|  48 --
 .../org/apache/impala/service/BackendConfig.java   |   3 -
 .../java/org/apache/impala/service/FeSupport.java  |  12 -
 .../java/org/apache/impala/service/Frontend.java   |   3 +-
 .../java/org/apache/impala/service/JniCatalog.java |  21 -
 .../org/apache/impala/util/AuthorizationUtil.java  |   5 +-
 .../impala/analysis/AnalyzeAuthStmtsTest.java  |  17 +-
 .../authorization/AuthorizationStmtTest.java   |  85 +-
 .../impala/authorization/AuthorizationTest.java| 673 
 .../authorization/AuthorizationTestBase.java   |  89 +--
 .../sentry/ImpalaActionFactoryTest.java| 135 
 .../authorization/sentry/SentryProxyTest.java  | 614 ---
 .../org/apache/impala/catalog/CatalogTest.java |   2 +-
 .../impala/testutil/SentryServicePinger.java   |  99 ---
 .../impala/testutil/TestSentryGroupMapper.java |  80 --
 .../apache/impala/util/AuthorizationUtilTest.java  |  25 +-
 fe/src/test/resources/hive-site.xml.py |  10 +-
 fe/src/test/resources/sentry-site.xml.py   |  65 --
 impala-parent/pom.xml  |   4 +-
 infra/deploy/deploy.py |   1 -
 testdata/bin/run-all.sh|  14 +-
 testdata/bin/run-hive-server.sh

[impala] 02/04: IMPALA-9585: [DOCS] update mt_dop docs

2020-05-12 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 2dd20dc1e409fb6c460f9dbb1b995044ba8b858e
Author: Tim Armstrong 
AuthorDate: Thu May 7 16:12:23 2020 -0700

IMPALA-9585: [DOCS] update mt_dop docs

Updated to reflect changes in IMPALA-9099 and IMPALA-9736.

Change-Id: Ifc7511fede5f9b36ae8250d3acf8d0061b48106f
Reviewed-on: http://gerrit.cloudera.org:8080/15883
Reviewed-by: Tamas Mate 
Tested-by: Impala Public Jenkins 
Reviewed-by: Bikramjeet Vig 
---
 docs/topics/impala_mt_dop.xml | 39 +++
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/docs/topics/impala_mt_dop.xml b/docs/topics/impala_mt_dop.xml
index 04fb1c0..5ca2be6 100644
--- a/docs/topics/impala_mt_dop.xml
+++ b/docs/topics/impala_mt_dop.xml
@@ -51,8 +51,7 @@ under the License.
   
 
   
-Currently, the operations affected by the MT_DOP
-query option are:
+Currently, MT_DOP support varies by statement type:
   
   
 
@@ -64,11 +63,28 @@ under the License.
 
 
   
-Queries with execution plans containing only scan and aggregation 
operators,
-or local joins that do not need data exchanges (such as for nested 
types).
-Other queries produce an error if MT_DOP is set 
to a non-zero
-value. Therefore, this query option is typically only set for the 
duration of
-specific long-running, CPU-intensive queries.
+SELECT statements. MT_DOP is 0 
by default
+for SELECT statements but can be set to a value 
greater
+than 0 to control intra-node parallelism. This may be useful to 
tune
+query performance and in particular to reduce execution time of
+long-running, CPU-intensive queries.
+  
+
+
+  
+DML statements. MT_DOP values 
greater
+than zero are not currently supported for DML statements. DML 
statements
+will produce an error if MT_DOP is set to a 
non-zero value.
+  
+
+
+  
+In  and earlier, not all 
SELECT
+statements support setting MT_DOP. Specifically, 
only
+scan and aggregation operators, and
+local joins that do not need data exchanges (such as for nested 
types) are
+supported. Other SELECT statements produce an 
error if
+MT_DOP is set to a non-zero value.
   
 
   
@@ -149,7 +165,7 @@ compute stats billion_rows_parquet;
 
 
   The following example shows the effects of setting 
MT_DOP
-  for a query involving only scan and aggregation operations for a Parquet 
table:
+  for a query on a Parquet table:

[impala] 04/04: IMPALA-9714: Fix edge cases in SimpleLogger and add test

2020-05-12 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 0815a184fdfeb3293849a8441ba003d63a588dab
Author: Joe McDonnell 
AuthorDate: Mon May 4 14:03:30 2020 -0700

IMPALA-9714: Fix edge cases in SimpleLogger and add test

SimpleLogger is used for several existing log types and a change
to use it for the data cache access trace is underway. Since this
is commonly used, it is useful to nail down specific semantics and
test them.

This fixes the following edge cases:
1. LoggingSupport::DeleteOldLogs() currently maintains a map from mtime
   to the filename in order to decide which files need to be deleted.
   This stops working when there are fast updates to the log, because
   mtime has seconds resolution and DeleteOldLogs() is only able to
   recognize a single file per mtime with the current map. This changes
   the map to a set of pairs of mtime + filename. The behavior is
   identical except that if there are multiple files with the same
   mtime, they each get their own entry in the set. This allows
   DeleteOldLogs() to more accurately maintain the maximum log files.
2. SimpleLogger::Init() now enforces the limit on the maximum number
   of log files. This provides a clear semantic when dealing with
   preexisting files from a previous incarnation of the same logger.
3. SimpleLogger will now create any intermediate directories when
   creating the logging directory (i.e. existingdir/a/b/c works).
4. This changes the enforcement moves enforcement max_audit_event_log_files
   to use the limits provided by SimpleLogger rather than a background
   thread calling DeleteOldLogs() periodically.

This also introduces SimpleLogger::GetLogFiles(), which is a static
function to get the log files given a directory and prefix. This
is necessary for testing, but it also will be useful for code that
wants to process logs from SimpleLogger.

Testing:
 - Added a new simple-logger-test that codifies the expected behavior
 - Ran core tests

Change-Id: Idd092a65b31d34f40a660cab7b5e0695a3627c78
Reviewed-on: http://gerrit.cloudera.org:8080/15861
Reviewed-by: Thomas Tauber-Marshall 
Tested-by: Impala Public Jenkins 
---
 be/src/common/init.cc   |   6 -
 be/src/common/logging.cc|  16 +-
 be/src/common/logging.h |   5 -
 be/src/service/impala-server.cc |   8 +-
 be/src/service/impala-server.h  |   3 -
 be/src/util/CMakeLists.txt  |   2 +
 be/src/util/filesystem-util-test.cc |  37 +
 be/src/util/filesystem-util.cc  |   9 +-
 be/src/util/filesystem-util.h   |   7 +-
 be/src/util/logging-support.cc  |  15 +-
 be/src/util/simple-logger-test.cc   | 290 
 be/src/util/simple-logger.cc|  56 +--
 be/src/util/simple-logger.h |  14 +-
 13 files changed, 419 insertions(+), 49 deletions(-)

diff --git a/be/src/common/init.cc b/be/src/common/init.cc
index 278a563..db47282 100644
--- a/be/src/common/init.cc
+++ b/be/src/common/init.cc
@@ -74,10 +74,6 @@ DECLARE_string(redaction_rules_file);
 DECLARE_string(reserved_words_version);
 DECLARE_bool(symbolize_stacktrace);
 
-DEFINE_int32(max_audit_event_log_files, 0, "Maximum number of audit event log 
files "
-"to retain. The most recent audit event log files are retained. If set to 
0, "
-"all audit event log files are retained.");
-
 DEFINE_int32(memory_maintenance_sleep_time_ms, 1, "Sleep time in 
milliseconds "
 "between memory maintenance iterations");
 
@@ -146,8 +142,6 @@ extern "C" { void __gcov_flush(); }
 if (impala::TestInfo::is_test()) continue;
 // Check for log rotation in every interval of the maintenance thread
 impala::CheckAndRotateLogFiles(FLAGS_max_log_files);
-// Check for audit event log rotation in every interval of the maintenance 
thread
-impala::CheckAndRotateAuditEventLogFiles(FLAGS_max_audit_event_log_files);
 // Check for minidump rotation in every interval of the maintenance 
thread. This is
 // necessary since an arbitrary number of minidumps can be written by 
sending SIGUSR1
 // to the process.
diff --git a/be/src/common/logging.cc b/be/src/common/logging.cc
index 7e9e4f7..297bedb 100644
--- a/be/src/common/logging.cc
+++ b/be/src/common/logging.cc
@@ -32,7 +32,7 @@
 #include 
 
 #include "common/logging.h"
-#include "service/impala-server.h"
+#include "util/container-util.h"
 #include "util/debug-util.h"
 #include "util/error-util.h"
 #include "util/logging-support.h"
@@ -45,7 +45,6 @@
 DECLARE_string(redaction_rules_file);
 DECLARE_string(log_filename);
 DECLARE_bool(redirect_stdout_stderr);
-D

[impala] branch master updated (7295edc -> 0815a18)

2020-05-12 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from 7295edc  IMPALA-9680: Fixed compressed inserts failing
 new 7d260b6  IMPALA-9727: Fix HBaseScanNode explain formatting
 new 2dd20dc  IMPALA-9585: [DOCS] update mt_dop docs
 new a93f2c2  IMPALA-8205: Support number of true and false statistics for 
boolean column
 new 0815a18  IMPALA-9714: Fix edge cases in SimpleLogger and add test

The 4 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/common/init.cc  |6 -
 be/src/common/logging.cc   |   16 +-
 be/src/common/logging.h|5 -
 be/src/exec/catalog-op-executor.cc |7 +-
 be/src/exec/incr-stats-util-test.cc|   20 +-
 be/src/exec/incr-stats-util.cc |   46 +-
 be/src/exec/incr-stats-util.h  |   18 +-
 be/src/service/impala-server.cc|8 +-
 be/src/service/impala-server.h |3 -
 be/src/util/CMakeLists.txt |2 +
 be/src/util/filesystem-util-test.cc|   37 +
 be/src/util/filesystem-util.cc |9 +-
 be/src/util/filesystem-util.h  |7 +-
 be/src/util/logging-support.cc |   15 +-
 be/src/util/simple-logger-test.cc  |  290 +++
 be/src/util/simple-logger.cc   |   56 +-
 be/src/util/simple-logger.h|   14 +-
 common/thrift/CatalogObjects.thrift|8 +
 docs/topics/impala_mt_dop.xml  |   39 +-
 .../impala/analysis/AlterTableSetColumnStats.java  |8 +-
 .../apache/impala/analysis/ComputeStatsStmt.java   |   11 +
 .../org/apache/impala/catalog/ColumnStats.java |   54 +-
 .../org/apache/impala/planner/HBaseScanNode.java   |6 +-
 .../java/org/apache/impala/service/Frontend.java   |   13 +-
 .../org/apache/impala/analysis/ParserTest.java |3 +
 .../org/apache/impala/catalog/CatalogTest.java |2 +
 .../queries/PlannerTest/hbase.test |   24 +-
 .../queries/QueryTest/acid-compute-stats.test  |   22 +-
 .../QueryTest/alter-table-set-column-stats.test|  120 +-
 .../queries/QueryTest/alter-table.test |   12 +-
 .../QueryTest/compute-stats-avro-catalog-v2.test   |  264 +--
 .../queries/QueryTest/compute-stats-avro.test  |  262 +--
 .../queries/QueryTest/compute-stats-date.test  |   32 +-
 .../queries/QueryTest/compute-stats-decimal.test   |   24 +-
 .../QueryTest/compute-stats-incremental.test   |  172 +-
 .../queries/QueryTest/compute-stats.test   | 2446 ++--
 .../QueryTest/hbase-compute-stats-incremental.test |   40 +-
 .../queries/QueryTest/hbase-compute-stats.test |  104 +-
 .../queries/QueryTest/hbase-show-stats.test|   30 +-
 .../queries/QueryTest/show-stats.test  |   64 +-
 .../queries/QueryTest/truncate-table.test  |   76 +-
 41 files changed, 2449 insertions(+), 1946 deletions(-)
 create mode 100644 be/src/util/simple-logger-test.cc

[impala] 01/04: IMPALA-9727: Fix HBaseScanNode explain formatting

2020-05-12 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 7d260b602895280fab1a1a543a3e9700493febbd
Author: Shant Hovsepian 
AuthorDate: Fri Apr 10 16:01:51 2020 -0400

IMPALA-9727: Fix HBaseScanNode explain formatting

In the case with more than one hbase predicate the indentation level
wasn't correctly formatted in the explain string.

Instead of:

|  |  13:SCAN HBASE [default.dimension d]
|  | hbase filters:
|  | d:foo EQUAL '1'
|  | d:bar EQUAL '2'
|  | d:baz EQUAL '3'
|  | predicate:

This was produced:

|  | 13:SCAN HBASE [default.dimension d]
|  | hbase filters:
   d:foo EQUAL '1'
   d:bar EQUAL '2'
   d:baz EQUAL '3'
|  | predicate:

Change-Id: I30fad791408a1f7e35e9b3f2e6cb4958952dd567
Reviewed-on: http://gerrit.cloudera.org:8080/15749
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 .../org/apache/impala/planner/HBaseScanNode.java   |  6 +++---
 .../queries/PlannerTest/hbase.test | 24 +-
 2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java 
b/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
index bcfbc00..999c5bf 100644
--- a/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
+++ b/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
@@ -585,9 +585,9 @@ public class HBaseScanNode extends ScanNode {
 } else {
   for (int i = 0; i < filters_.size(); ++i) {
 THBaseFilter filter = filters_.get(i);
-output.append("\n  " + filter.family + ":" + filter.qualifier + " 
" +
-CompareFilter.CompareOp.values()[filter.op_ordinal].toString() 
+ " " +
-"'" + filter.filter_constant + "'");
+output.append("\n" + detailPrefix + filter.family + ":" + 
filter.qualifier
++ " " + 
CompareFilter.CompareOp.values()[filter.op_ordinal].toString()
++ " " + "'" + filter.filter_constant + "'");
   }
 }
 output.append('\n');
diff --git 
a/testdata/workloads/functional-planner/queries/PlannerTest/hbase.test 
b/testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
index 886fb05..5a26b0f 100644
--- a/testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
+++ b/testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
@@ -690,6 +690,7 @@ from
   functional_hbase.alltypessmall a,
   functional_hbase.alltypessmall c
 where
+  b.string_col > '1' and b.string_col < '3000' and
   b.bool_col = false and
   c.month = 4 and
   a.int_col = b.int_col and
@@ -698,20 +699,23 @@ where
 PLAN-ROOT SINK
 |
 04:HASH JOIN [INNER JOIN]
-|  hash predicates: a.int_col = b.int_col
-|  row-size=29B cardinality=300
+|  hash predicates: b.int_col = c.int_col
+|  row-size=42B cardinality=130
 |
-|--00:SCAN HBASE [functional_hbase.alltypessmall b]
-| predicates: b.bool_col = FALSE
-| row-size=9B cardinality=25
+|--02:SCAN HBASE [functional_hbase.alltypessmall c]
+| predicates: c.`month` = 4
+| row-size=12B cardinality=13
 |
 03:HASH JOIN [INNER JOIN]
-|  hash predicates: a.int_col = c.int_col
-|  row-size=20B cardinality=120
+|  hash predicates: a.int_col = b.int_col
+|  row-size=30B cardinality=40
 |
-|--02:SCAN HBASE [functional_hbase.alltypessmall c]
-| predicates: c.`month` = 4
-| row-size=12B cardinality=12
+|--00:SCAN HBASE [functional_hbase.alltypessmall b]
+| hbase filters:
+| d:string_col GREATER '1'
+| d:string_col LESS '3000'
+| predicates: b.string_col > '1', b.bool_col = FALSE, b.string_col < '3000'
+| row-size=22B cardinality=4
 |
 01:SCAN HBASE [functional_hbase.alltypessmall a]
row-size=8B cardinality=50

[impala] 01/03: IMPALA-9570: [DOCS] add memory management

2020-05-10 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 4846751c84b70673ea25cd88e8c9d2085f7ae55e
Author: Shajini Thayasingh 
AuthorDate: Wed Apr 29 15:02:53 2020 -0700

IMPALA-9570: [DOCS] add memory management

add memory management and fix broken links. Incorporated review changes.

Change-Id: I6e8b6d0c3fe2e1746831665b3d3ae98a0beaa1e7
Reviewed-on: http://gerrit.cloudera.org:8080/15836
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 docs/impala_keydefs.ditamap | 4 ++--
 docs/topics/impala_udf.xml  | 8 
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/docs/impala_keydefs.ditamap b/docs/impala_keydefs.ditamap
index d133c0d..594fa4d 100644
--- a/docs/impala_keydefs.ditamap
+++ b/docs/impala_keydefs.ditamap
@@ -69,13 +69,13 @@ under the License.
   
 impala-udf-samples
   
-  
+  https://github.com/cloudera/impala-udf-samples/blob/master/uda-sample.cc; 
scope="external" format="html" keys="uda-sample.cc">
 uda-sample.cc
   
   
 udf-sample.h
   
-  
+  https://github.com/cloudera/impala-udf-samples/blob/master/uda-sample.h; 
scope="external" format="html" keys="uda-sample.h">
 uda-sample.h
   
   https://github.com/apache/impala/blob/master/be/src/testutil/test-udas.cc;
 scope="external" format="html" keys="test-udas.cc">
diff --git a/docs/topics/impala_udf.xml b/docs/topics/impala_udf.xml
index 8d6c382..60735a4 100644
--- a/docs/topics/impala_udf.xml
+++ b/docs/topics/impala_udf.xml
@@ -920,6 +920,14 @@ within UDAs, you can return without specifying a value.
 
 
 
+  Intermediate values returned by the init, update and merge functions 
that referred to allocations
+  must be allocated using FunctionContext::Allocate() 
and freed using FunctionContext::Free().
+  Both serialize and finalize functions are responsible for cleaning 
up the intermediate value and freeing such allocations.
+  StringVals returned to Impala directly by Serialize(), Finalize() or 
GetValue() functions should be backed by
+  temporary results memory allocated using the 
StringVal(FunctionContext*, int) constructor,
+  StringVal::CopyFrom(FunctionContext*, const uint8_t*, 
size_t), or StringVal::Resize().
+
+
   In the SQL syntax, you create a UDAF by using the statement 
CREATE AGGREGATE FUNCTION.
   You specify the entry points of the underlying C++ functions using 
the clauses INIT_FN,
   UPDATE_FN, MERGE_FN, SERIALIZE_FN, and

[impala] branch master updated (f4f7fb5 -> dcf4979)

2020-05-10 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from f4f7fb5  IMPALA-9729: consistent GetExecSummary() behaviour
 new 4846751  IMPALA-9570: [DOCS] add memory management
 new e8d1794  IMPALA-9716: Add jitter to the exponential backoff in status 
reporting
 new dcf4979  IMPALA-9736: fix mt_dop not supported error

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/runtime/query-state.cc  | 10 +-
 docs/impala_keydefs.ditamap|  4 ++--
 docs/topics/impala_udf.xml |  8 
 fe/src/main/java/org/apache/impala/planner/Planner.java|  3 +--
 .../queries/PlannerTest/mt-dop-validation.test |  4 ++--
 5 files changed, 22 insertions(+), 7 deletions(-)

[impala] branch master updated: Revert "IMPALA-9718: Delete pkg_resources from IMPALA_HOME/shell/"

2020-05-07 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new 0a0001e  Revert "IMPALA-9718: Delete pkg_resources from 
IMPALA_HOME/shell/"
0a0001e is described below

commit 0a0001e1a85462c81c9c4617a2e864c98913f229
Author: Joe McDonnell 
AuthorDate: Thu May 7 10:03:45 2020 -0700

Revert "IMPALA-9718: Delete pkg_resources from IMPALA_HOME/shell/"

The fix for IMPALA-9718 introduced test failures on Centos 7.
See IMPALA-9735.

This reverts commit 75d98b4b081df95b58d7388da39bb1ec7c2f4f67.

Change-Id: Id09c55435f432a8626a45079f58860d6e27ac55e
Reviewed-on: http://gerrit.cloudera.org:8080/15881
Reviewed-by: Tim Armstrong 
Tested-by: Joe McDonnell 
---
 LICENSE.txt |1 +
 shell/make_shell_tarball.sh |1 +
 shell/pkg_resources.py  | 2700 +++
 3 files changed, 2702 insertions(+)

diff --git a/LICENSE.txt b/LICENSE.txt
index b4cedd9..c76c157 100644
--- a/LICENSE.txt
+++ b/LICENSE.txt
@@ -396,6 +396,7 @@ www/DataTables* and www/datatables*: MIT license
 
 

 
+shell/pkg_resources.py: Python Software License V2
 Parts of be/src/runtime/string-search.h: Python Software License V2
 Parts of shell/impala_shell.py: Python Software License V2
 shell/ext-py/bitarray*: Python Software License V2
diff --git a/shell/make_shell_tarball.sh b/shell/make_shell_tarball.sh
index 6626e15..d5c0c2c 100755
--- a/shell/make_shell_tarball.sh
+++ b/shell/make_shell_tarball.sh
@@ -128,6 +128,7 @@ cp ${SHELL_HOME}/TSSLSocketWithWildcardSAN.py 
${TARBALL_ROOT}/lib
 cp ${SHELL_HOME}/ImpalaHttpClient.py ${TARBALL_ROOT}/lib
 cp ${SHELL_HOME}/shell_exceptions.py ${TARBALL_ROOT}/lib
 cp ${SHELL_HOME}/shell_output.py ${TARBALL_ROOT}/lib
+cp ${SHELL_HOME}/pkg_resources.py ${TARBALL_ROOT}/lib
 cp ${SHELL_HOME}/impala-shell ${TARBALL_ROOT}
 cp ${SHELL_HOME}/impala_shell.py ${TARBALL_ROOT}
 cp ${SHELL_HOME}/compatibility.py ${TARBALL_ROOT}
diff --git a/shell/pkg_resources.py b/shell/pkg_resources.py
new file mode 100644
index 000..70ecc44
--- /dev/null
+++ b/shell/pkg_resources.py
@@ -0,0 +1,2700 @@
+from __future__ import print_function, unicode_literals
+
+"""
+  This file is redistributed under the Python Software Foundation License:
+  http://docs.python.org/2/license.html
+"""
+
+"""Package resource API
+
+
+A resource is a logical file contained within a package, or a logical
+subdirectory thereof.  The package resource API expects resource names
+to have their path parts separated with ``/``, *not* whatever the local
+path separator is.  Do not use os.path operations to manipulate resource
+names being passed into the API.
+
+The package resource API is designed to work with normal filesystem packages,
+.egg files, and unpacked .egg files.  It can also work in a limited way with
+.zip files and with custom PEP 302 loaders that support the ``get_data()``
+method.
+"""
+
+import sys, os, zipimport, time, re, imp, types
+from urlparse import urlparse, urlunparse
+
+try:
+frozenset
+except NameError:
+from sets import ImmutableSet as frozenset
+
+# capture these to bypass sandboxing
+from os import utime
+try:
+from os import mkdir, rename, unlink
+WRITE_SUPPORT = True
+except ImportError:
+# no write support, probably under GAE
+WRITE_SUPPORT = False
+
+from os import open as os_open
+from os.path import isdir, split
+
+# This marker is used to simplify the process that checks is the
+# setuptools package was installed by the Setuptools project
+# or by the Distribute project, in case Setuptools creates
+# a distribution with the same version.
+#
+# The bootstrapping script for instance, will check if this
+# attribute is present to decide wether to reinstall the package
+_distribute = True
+
+def _bypass_ensure_directory(name, mode=0777):
+# Sandbox-bypassing version of ensure_directory()
+if not WRITE_SUPPORT:
+raise IOError('"os.mkdir" not supported on this platform.')
+dirname, filename = split(name)
+if dirname and filename and not isdir(dirname):
+_bypass_ensure_directory(dirname)
+mkdir(dirname, mode)
+
+
+
+
+
+
+
+
+def get_supported_platform():
+"""Return this platform's maximum compatible version.
+
+distutils.util.get_platform() normally reports the minimum version
+of Mac OS X that would be required to *use* extensions produced by
+distutils.  But what we want when checking compatibility is to know the
+version of Mac OS X that we are *running*.  To allow usage of packages that
+explicitly require a newer version of Mac OS X, we must also know

[impala] branch master updated: IMPALA-9731: Remove USE_CDP_HIVE=false and Hive 2 support

2020-05-07 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new f241fd0  IMPALA-9731: Remove USE_CDP_HIVE=false and Hive 2 support
f241fd0 is described below

commit f241fd08ac97a9c20a3c97a86f45b9ba5e7ec2fb
Author: Joe McDonnell 
AuthorDate: Wed May 6 18:25:13 2020 -0700

IMPALA-9731: Remove USE_CDP_HIVE=false and Hive 2 support

Impala 4 moved to using CDP versions for components, which involves
adopting Hive 3. This removes the old code supporting CDH components
and Hive 2. Specifically, it does the following:
1. Remove USE_CDP_HIVE and default to the values from USE_CDP_HIVE=true.
   USE_CDP_HIVE now has no effect on the Impala environment. This also
   means that bin/jenkins/build-all-flag-combinations.sh no longer
   include USE_CDP_HIVE=false as a configuration.
2. Remove USE_CDH_KUDU and default to getting Impala from the
   native toolchain.
3. Ban IMPALA_HIVE_MAJOR_VERSION<3 and remove related code, including
   the IMPALA_HIVE_MAJOR_VERSION=2 maven profile in fe/pom.xml.

There is a fair amount of code that still references the Hive major
version. Upstream Hive is now working on Hive 4, so there is a high
likelihood that we'll need some code to deal with that transition.
This leaves some code (such as maven profiles) and test logic in
place.

Change-Id: Id85e849beaf4e19dda4092874185462abd2ec608
Reviewed-on: http://gerrit.cloudera.org:8080/15869
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 README-build.md|  10 +-
 bin/bootstrap_toolchain.py | 100 +---
 bin/impala-config.sh   | 135 ++---
 bin/jenkins/build-all-flag-combinations.sh |  11 +-
 fe/pom.xml | 239 -
 .../hadoop/hive/common/ValidWriteIdList.java   |  74 ---
 .../org/apache/impala/compat/MetastoreShim.java| 556 -
 testdata/bin/create-load-data.sh   |   5 -
 testdata/bin/run-hive-server.sh|  30 +-
 testdata/cluster/admin |  15 +-
 .../common/etc/hadoop/conf/core-site.xml.py|  14 +-
 11 files changed, 81 insertions(+), 1108 deletions(-)

diff --git a/README-build.md b/README-build.md
index 1297b86..c604716 100644
--- a/README-build.md
+++ b/README-build.md
@@ -29,7 +29,7 @@ can do so through the environment variables and scripts 
listed below.
 | SKIP_TOOLCHAIN_BOOTSTRAP | "false" | Skips downloading the toolchain any 
python dependencies if "true" |
 | CDH_BUILD_NUMBER | | Identifier to indicate the CDH build number
 | CDH_COMPONENTS_HOME | 
"${IMPALA_HOME}/toolchain/cdh_components-${CDH_BUILD_NUMBER}" | Location of the 
CDH components within the toolchain. |
-| CDH_MAJOR_VERSION | "5" | Identifier used to uniqueify paths for potentially 
incompatible component builds. |
+| CDH_MAJOR_VERSION | "7" | Identifier used to uniqueify paths for potentially 
incompatible component builds. |
 | IMPALA_CONFIG_SOURCED | "1" |  Set by ${IMPALA_HOME}/bin/impala-config.sh 
(internal use) |
 | JAVA_HOME | "/usr/lib/jvm/${JAVA_VERSION}" | Used to locate Java |
 | JAVA_VERSION | "java-7-oracle-amd64" | Can override to set a local Java 
version. |
@@ -59,11 +59,11 @@ can do so through the environment variables and scripts 
listed below.
 ## Dependencies
 | Environment variable | Default value | Description |
 |--|---|-|
-| HADOOP_HOME  | 
"${CDH_COMPONENTS_HOME}/hadoop-${IMPALA_HADOOP_VERSION}/" | Used to locate 
Hadoop |
+| HADOOP_HOME  | 
"${CDP_COMPONENTS_HOME}/hadoop-${IMPALA_HADOOP_VERSION}/" | Used to locate 
Hadoop |
 | HADOOP_INCLUDE_DIR   | "${HADOOP_HOME}/include" | For 'hdfs.h' |
 | HADOOP_LIB_DIR   | "${HADOOP_HOME}/lib" | For 'libhdfs.a' or 
'libhdfs.so' |
-| HIVE_HOME| 
"${CDH_COMPONENTS_HOME}/{hive-${IMPALA_HIVE_VERSION}/" | |
-| HBASE_HOME   | 
"${CDH_COMPONENTS_HOME}/hbase-${IMPALA_HBASE_VERSION}/" | |
-| SENTRY_HOME  | 
"${CDH_COMPONENTS_HOME}/sentry-${IMPALA_SENTRY_VERSION}/" | Used to setup test 
data |
+| HIVE_HOME| 
"${CDP_COMPONENTS_HOME}/{hive-${IMPALA_HIVE_VERSION}/" | |
+| HBASE_HOME   | 
"${CDP_COMPONENTS_HOME}/hbase-${IMPALA_HBASE_VERSION}/" | |
+| SENTRY_HOME  | 
"${CDP_COMPONENTS_HOME}/sentry-${IMPALA_SENTRY_VERSION}/" | Used to setup test 
data |
 | THRIFT_HOME  | "${IMPALA_TOOLCHAIN}/thrift-${IMPALA_THRIFT_VERSION}" 
| |
 
diff --git a/bin/bootstrap_too

[impala] branch master updated: IMPALA-9639: [DOCS] Document Impala support for Kudu DATE type

2020-05-05 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
 new 39c5c4d  IMPALA-9639: [DOCS] Document Impala support for Kudu DATE type
39c5c4d is described below

commit 39c5c4d01db7ad60d21d8df6b681738a3f3b09b1
Author: Kris Hahn 
AuthorDate: Thu Apr 9 16:51:15 2020 -0700

IMPALA-9639: [DOCS] Document Impala support for Kudu DATE type

Documented read/write support for DATE type in 3.4. Made review changes.

Change-Id: I865599587817358b0c94debfcb0e9644fab4ae00
Reviewed-on: http://gerrit.cloudera.org:8080/15702
Tested-by: Impala Public Jenkins 
Reviewed-by: Tamas Mate 
Reviewed-by: Thomas Tauber-Marshall 
---
 docs/topics/impala_date.xml | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/docs/topics/impala_date.xml b/docs/topics/impala_date.xml
index 556936c..8fa4561 100644
--- a/docs/topics/impala_date.xml
+++ b/docs/topics/impala_date.xml
@@ -42,8 +42,8 @@ under the License.
   
 
  Use the DATE data type to store date values. The
-DATE type is supported for HBase, Text, Avro, and
-  Parquet. 
+DATE type is supported for Avro, HBase, Kudu, Parquet,
+  and Text. 
 
 
   Range:
@@ -199,6 +199,8 @@ under the License.
   The DATE type is available in Impala 3.3 and higher.
 
 
+
+In Impala 3.4, you can read and write DATE values to Kudu tables.

[impala] 01/02: IMPALA-9539: Enable CNF rewrites by default

2020-05-01 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit d0325b2ac176d536fc9c8959b1f2d01335bc32a2
Author: Aman Sinha 
AuthorDate: Fri Apr 24 17:32:38 2020 -0700

IMPALA-9539: Enable CNF rewrites by default

This patch enables the conjunctive normal form rewrites
by default by setting enable_cnf_rewrites to true. Since
the CNF rule does an explicit analyze of the predicate
if it was not previously analyzed, in case no rewrite
was done we were previously returning the analyzed
predicate. This causes some side effects hence I have
fixed it by returning the original un-analyzed predicate
when no rewrite is done.

Other functional and performance testing with this flag
set to true did not uncover major regressions and showed
significant performance gains for queries with disjunctions
in the tpch and tpcds suites.

Testing:
 - Updated the PlannerTest tests with plan changes
   in various test suites. Removed previously added tpch
   tests which were explicitly setting this flag to true.
 - I had previously added a test in convert-to-cnf.test
   with enable_cnf_rewrites=false, so I did not add any
   new tests with this flag disabled.

Change-Id: I4dde86e092c61d71ddf9081f768072ced470b589
Reviewed-on: http://gerrit.cloudera.org:8080/15807
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 common/thrift/ImpalaInternalService.thrift |   2 +-
 .../apache/impala/rewrite/ConvertToCNFRule.java|   6 +-
 .../org/apache/impala/planner/PlannerTest.java |   6 +-
 .../queries/PlannerTest/constant-folding.test  |  23 +-
 .../queries/PlannerTest/tpcds-all.test |   4 +-
 .../queries/PlannerTest/tpch-all.test  | 376 ++---
 .../queries/PlannerTest/tpch-kudu.test |  70 ++--
 .../queries/PlannerTest/tpch-nested.test   | 220 ++--
 .../queries/PlannerTest/tpch-views.test| 222 ++--
 9 files changed, 386 insertions(+), 543 deletions(-)

diff --git a/common/thrift/ImpalaInternalService.thrift 
b/common/thrift/ImpalaInternalService.thrift
index 1ca85b4..d447d69 100644
--- a/common/thrift/ImpalaInternalService.thrift
+++ b/common/thrift/ImpalaInternalService.thrift
@@ -412,7 +412,7 @@ struct TQueryOptions {
   99: optional i64 preagg_bytes_limit = -1;
 
   // See comment in ImpalaService.thrift
-  100: optional bool enable_cnf_rewrites = false;
+  100: optional bool enable_cnf_rewrites = true;
 
   // See comment in ImpalaService.thrift
   101: optional i32 max_cnf_exprs = 0;
diff --git a/fe/src/main/java/org/apache/impala/rewrite/ConvertToCNFRule.java 
b/fe/src/main/java/org/apache/impala/rewrite/ConvertToCNFRule.java
index 9b95f1a..0925fd5 100644
--- a/fe/src/main/java/org/apache/impala/rewrite/ConvertToCNFRule.java
+++ b/fe/src/main/java/org/apache/impala/rewrite/ConvertToCNFRule.java
@@ -110,11 +110,15 @@ public class ConvertToCNFRule implements ExprRewriteRule {
 // we can skip the rewrite since the disjunct can be pushed down as-is
 List tids = new ArrayList<>();
 if (!cpred.isAnalyzed()) {
+  // clone before analyzing to avoid side effects of analysis
+  cpred = (CompoundPredicate) (cpred.clone());
   cpred.analyzeNoThrow(analyzer);
 }
 cpred.getIds(tids, null);
 if (tids.size() <= 1) {
-  return cpred;
+  // if no transform is done, return the original predicate,
+  // not the one that that may have been analyzed above
+  return pred;
 }
   }
   if (cpred.getOp() == CompoundPredicate.Operator.OR) {
diff --git a/fe/src/test/java/org/apache/impala/planner/PlannerTest.java 
b/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
index c452332..d0cbb3f 100644
--- a/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
+++ b/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
@@ -129,7 +129,7 @@ public class PlannerTest extends PlannerTestBase {
   }
 
   @Test
-  public void testConstantPropagataion() {
+  public void testConstantPropagation() {
 runPlannerTestFile("constant-propagation");
   }
 
@@ -1016,8 +1016,6 @@ public class PlannerTest extends PlannerTestBase {
*/
   @Test
   public void testConvertToCNF() {
-TQueryOptions options = new TQueryOptions();
-options.setEnable_cnf_rewrites(true);
-runPlannerTestFile("convert-to-cnf", "tpch_parquet", options);
+runPlannerTestFile("convert-to-cnf", "tpch_parquet");
   }
 }
diff --git 
a/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
 
b/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
index 0c9..6eeb8c0 100644
--- 
a/testdat

[impala] branch master updated (1a36a03 -> 53ff6f9)

2020-05-01 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from 1a36a03  IMPALA-9398: Fix shell history duplication when cmdloop breaks
 new d0325b2  IMPALA-9539: Enable CNF rewrites by default
 new 53ff6f9  IMPALA-9649: Exclude shiro* and add to banned dependency 
maven plugin

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 common/thrift/ImpalaInternalService.thrift |   2 +-
 fe/pom.xml |  50 +-
 .../apache/impala/rewrite/ConvertToCNFRule.java|   6 +-
 .../impala/authorization/AuthorizationTest.java| 111 
 .../org/apache/impala/planner/PlannerTest.java |   6 +-
 .../TestSentryResourceAuthorizationProvider.java   |  36 --
 .../queries/PlannerTest/constant-folding.test  |  23 +-
 .../queries/PlannerTest/tpcds-all.test |   4 +-
 .../queries/PlannerTest/tpch-all.test  | 376 --
 .../queries/PlannerTest/tpch-kudu.test |  70 +--
 .../queries/PlannerTest/tpch-nested.test   | 220 
 .../queries/PlannerTest/tpch-views.test| 222 
 tests/authorization/test_authorization.py  |  29 --
 tests/authorization/test_grant_revoke.py   |  47 --
 tests/authorization/test_owner_privileges.py   | 571 -
 tests/authorization/test_sentry.py |  53 --
 tests/authorization/test_show_grant.py | 150 --
 17 files changed, 429 insertions(+), 1547 deletions(-)
 delete mode 100644 
fe/src/test/java/org/apache/impala/testutil/TestSentryResourceAuthorizationProvider.java
 delete mode 100644 tests/authorization/test_owner_privileges.py
 delete mode 100644 tests/authorization/test_show_grant.py

[impala] 02/02: IMPALA-9649: Exclude shiro* and add to banned dependency maven plugin

2020-05-01 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 53ff6f9bf5ca907f15ec1187eb5d4007d46eb61e
Author: David Knupp 
AuthorDate: Thu Apr 23 17:06:42 2020 -0700

IMPALA-9649: Exclude shiro* and add to banned dependency maven plugin

The earlier attempt to exclude the shiro-core and shiro-crypto-cipher jars
from fe/pom.xml failed to find all instances, and security scans picked
them up again. This patch also excludes the jar from the following:

- sentry-core-common
- sentry-provider-cache
- sentry-provider-db
- sentry-provider-file

Furthermore, to avoid compilation errors related to the absense of shiro, it
was necessary to remove the TestSentryResourceAuthorizationProvider class,
and any tests that referenced it. Since Sentry is not being used any longer,
this shouldn't be an issue.

Tested by running build, which didn't fail from banned dependency plugin,
as well running the standard set of tests on jenkins.impala.io.

Change-Id: I9f9994bf81c1d2e025a03925e8eccb147c34d66e
Reviewed-on: http://gerrit.cloudera.org:8080/15796
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 
---
 fe/pom.xml |  50 +-
 .../impala/authorization/AuthorizationTest.java| 111 
 .../TestSentryResourceAuthorizationProvider.java   |  36 --
 tests/authorization/test_authorization.py  |  29 --
 tests/authorization/test_grant_revoke.py   |  47 --
 tests/authorization/test_owner_privileges.py   | 571 -
 tests/authorization/test_sentry.py |  53 --
 tests/authorization/test_show_grant.py | 150 --
 8 files changed, 43 insertions(+), 1004 deletions(-)

diff --git a/fe/pom.xml b/fe/pom.xml
index 61e26a2..94c0cf8 100644
--- a/fe/pom.xml
+++ b/fe/pom.xml
@@ -192,10 +192,17 @@ under the License.
 
 
 
+  org.apache.impala
+  yarn-extras
+  ${yarn-extras.version}
+
+
+
   org.apache.sentry
   sentry-core-common
   ${sentry.version}
   
+
 
   org.apache.shiro
   shiro-crypto-cipher
@@ -208,12 +215,6 @@ under the License.
 
 
 
-  org.apache.impala
-  yarn-extras
-  ${yarn-extras.version}
-
-
-
   org.apache.sentry
   sentry-core-model-db
   ${sentry.version}
@@ -237,6 +238,16 @@ under the License.
   sentry-provider-db
   ${sentry.version}
   
+
+
+  org.apache.shiro
+  shiro-crypto-cipher
+
+
+  org.apache.shiro
+  shiro-core
+
+
 
 
   net.minidev
@@ -269,6 +280,17 @@ under the License.
   org.apache.sentry
   sentry-provider-file
   ${sentry.version}
+  
+
+
+  org.apache.shiro
+  shiro-crypto-cipher
+
+
+  org.apache.shiro
+  shiro-core
+
+  
 
 
 
@@ -276,6 +298,16 @@ under the License.
   sentry-provider-cache
   ${sentry.version}
   
+
+
+  org.apache.shiro
+  shiro-crypto-cipher
+
+
+  org.apache.shiro
+  shiro-core
+
+
 
 
   net.minidev
@@ -349,7 +381,7 @@ under the License.
   org.apache.hadoop
   hadoop-common
 
-   
+
   org.apache.hive
   *
 
@@ -758,6 +790,9 @@ under the License.
 org.fusesource.leveldbjni:*
 
 org.apache.httpcomponents:fluent-hc
+
+org.apache.shiro:shiro-core:*
+org.apache.shiro:shiro-crypto-cipher:*
 
 org.apache.hadoop:*
@@ -1380,6 +1415,7 @@ under the License.
 javax.el
 3.0.1-b08
   
+
 
   
 
diff --git 
a/fe/src/test/java/org/apache/impala/authorization/AuthorizationTest.java 
b/fe/src/test/java/org/apache/impala/authorization/AuthorizationTest.java
index 620246f..fda304d 100644
--- a/fe/src/test/java/org/apache/impala/authorization/AuthorizationTest.java
+++ b/fe/src/test/java/org/apache/impala/authorization/AuthorizationTest.java
@@ -46,7 +46,6 @@ import org.apache.impala.common.ImpalaException;
 import org.apache.impala.common.InternalException;
 import org.apache.impala.common.RuntimeEnv;
 import org.apache.impala.testutil.TestSentryGroupMapper;
-import org.apache.impala.testutil.TestSentryResourceAuthorizationProvider;
 import org.apache.impala.service.Frontend;
 import org.apache.impala.testutil.ImpaladTestCatalog;
 import org.apache.impala.thrift.TMetadataOpRequest;
@@ -519,79 +518,6 @@ public class AuthorizationTest extends FrontendTestBase {
   }
 
   @Test
-  public void

[impala] 02/02: IMPALA-9701: fix data race in BTS

2020-04-28 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit f4258b5f971f90390b93aa7a2e76dd0b8a1d8825
Author: Tim Armstrong 
AuthorDate: Mon Apr 27 16:52:40 2020 -0700

IMPALA-9701: fix data race in BTS

A benign data race in BufferedTupleStream was flagged
by TSAN.

Testing:
Reran the unit test under TSAN, it succeeded.

Change-Id: Ie2c4464adbc51bb8b0214ba0adbfa71217b87c86
Reviewed-on: http://gerrit.cloudera.org:8080/15826
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/runtime/buffered-tuple-stream.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/be/src/runtime/buffered-tuple-stream.cc 
b/be/src/runtime/buffered-tuple-stream.cc
index a35a89e..1ca8a92 100644
--- a/be/src/runtime/buffered-tuple-stream.cc
+++ b/be/src/runtime/buffered-tuple-stream.cc
@@ -1079,7 +1079,9 @@ void BufferedTupleStream::ReadIterator::Init(bool 
attach_on_read) {
   valid_ = true;
   rows_returned_ = 0;
   DCHECK(!attach_on_read_) << "attach_on_read can only be set once";
-  attach_on_read_ = attach_on_read;
+  // Only set 'attach_on_read' if needed. Otherwise, if this is the builtin
+  // iterator, a benign data race may be flagged by TSAN (see IMPALA-9701).
+  if (attach_on_read) attach_on_read_ = attach_on_read;
 }
 
 void BufferedTupleStream::ReadIterator::SetReadPage(list::iterator 
read_page) {

[impala] branch master updated (afe765e -> f4258b5)

2020-04-28 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from afe765e  Don't filter maven messages about banned dependencies
 new 75a6d7b  IMPALA-9097: Don't require minicluster for backend tests
 new f4258b5  IMPALA-9701: fix data race in BTS

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/runtime/buffered-tuple-stream.cc|  4 +++-
 be/src/service/frontend.cc |  8 ++--
 .../java/org/apache/impala/service/Frontend.java   | 23 ++
 .../org/apache/impala/service/JniFrontend.java |  5 +++--
 4 files changed, 27 insertions(+), 13 deletions(-)

[impala] 01/02: IMPALA-9097: Don't require minicluster for backend tests

2020-04-28 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 75a6d7b2bba66825efb3a37f14c9447e64ea584f
Author: Joe McDonnell 
AuthorDate: Fri Nov 22 17:58:16 2019 -0800

IMPALA-9097: Don't require minicluster for backend tests

Currently, many backend tests require a running minicluster,
because they initialize a Frontend object that requires a
connection to the Hive Metastore. If the minicluster is not
running or if cluster configurations are missing (i.e.
bin/create-test-configurations.sh needs to run), the backend
tests will fail. The docker based tests always hit this,
because they run the backend tests without a minicluster.

The HMS dependency comes from the Frontend's MetaStoreClientPool,
which is unnecesary for backend tests. This modifies the
code so that it does not initialize this for backend tests,
and thus backend tests pass without a running minicluster.

Testing:
 - Ran backend tests without a running minicluster

Change-Id: I8f1b1385853fb23df28d24d38761237e6e5c97a7
Reviewed-on: http://gerrit.cloudera.org:8080/15641
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
 be/src/service/frontend.cc |  8 ++--
 .../java/org/apache/impala/service/Frontend.java   | 23 ++
 .../org/apache/impala/service/JniFrontend.java |  5 +++--
 3 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/be/src/service/frontend.cc b/be/src/service/frontend.cc
index baf1089..346ff0c 100644
--- a/be/src/service/frontend.cc
+++ b/be/src/service/frontend.cc
@@ -25,6 +25,7 @@
 #include "rpc/jni-thrift-util.h"
 #include "util/backend-gflag-util.h"
 #include "util/jni-util.h"
+#include "util/test-info.h"
 #include "util/time.h"
 
 #include "common/names.h"
@@ -81,7 +82,7 @@ DEFINE_string(kudu_master_hosts, "", "Specifies the default 
Kudu master(s). The
 
 Frontend::Frontend() {
   JniMethodDescriptor methods[] = {
-{"", "([B)V", _ctor_},
+{"", "([BZ)V", _ctor_},
 {"createExecRequest", "([B)[B", _exec_request_id_},
 {"getExplainPlan", "([B)Ljava/lang/String;", _explain_plan_id_},
 {"getHadoopConfig", "([B)[B", _hadoop_config_id_},
@@ -130,7 +131,10 @@ Frontend::Frontend() {
   jbyteArray cfg_bytes;
   ABORT_IF_ERROR(GetThriftBackendGflags(jni_env, _bytes));
 
-  jobject fe = jni_env->NewObject(fe_class, fe_ctor_, cfg_bytes);
+  // Pass in whether this is a backend test, so that the Frontend can avoid 
certain
+  // unnecessary initialization that introduces dependencies on a running 
minicluster.
+  jboolean is_be_test = TestInfo::is_be_test();
+  jobject fe = jni_env->NewObject(fe_class, fe_ctor_, cfg_bytes, is_be_test);
   ABORT_IF_EXC(jni_env);
   ABORT_IF_ERROR(JniUtil::LocalToGlobalRef(jni_env, fe, _));
 }
diff --git a/fe/src/main/java/org/apache/impala/service/Frontend.java 
b/fe/src/main/java/org/apache/impala/service/Frontend.java
index 1715233..d4ed406 100644
--- a/fe/src/main/java/org/apache/impala/service/Frontend.java
+++ b/fe/src/main/java/org/apache/impala/service/Frontend.java
@@ -281,8 +281,9 @@ public class Frontend {
 
   private static ExecutorService checkAuthorizationPool_;
 
-  public Frontend(AuthorizationFactory authzFactory) throws ImpalaException {
-this(authzFactory, FeCatalogManager.createFromBackendConfig());
+  public Frontend(AuthorizationFactory authzFactory, boolean isBackendTest)
+  throws ImpalaException {
+this(authzFactory, FeCatalogManager.createFromBackendConfig(), 
isBackendTest);
   }
 
   /**
@@ -292,11 +293,12 @@ public class Frontend {
   @VisibleForTesting
   public Frontend(AuthorizationFactory authzFactory, FeCatalog testCatalog)
   throws ImpalaException {
-this(authzFactory, FeCatalogManager.createForTests(testCatalog));
+// This signature is only used for frontend tests, so pass false for 
isBackendTest
+this(authzFactory, FeCatalogManager.createForTests(testCatalog), false);
   }
 
-  private Frontend(AuthorizationFactory authzFactory, FeCatalogManager 
catalogManager)
-  throws ImpalaException {
+  private Frontend(AuthorizationFactory authzFactory, FeCatalogManager 
catalogManager,
+  boolean isBackendTest) throws ImpalaException {
 catalogManager_ = catalogManager;
 authzFactory_ = authzFactory;
 
@@ -323,10 +325,15 @@ public class Frontend {
 impaladTableUsageTracker_ = ImpaladTableUsageTracker.createFromConfig(
 BackendConfig.INSTANCE);
 queryHookManager_ = 
QueryEventHookManager.createFromConfig(BackendConfig.INSTANCE);
-metaStoreClientPool_ = new MetaStoreClientPool(1, 0);
-if (MetastoreShim.getMajor

[impala] 02/02: Don't filter maven messages about banned dependencies

2020-04-27 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit afe765e3bdf8facb1940f4c7620eb7f9084bcb1f
Author: Joe McDonnell 
AuthorDate: Mon Apr 27 12:01:41 2020 -0700

Don't filter maven messages about banned dependencies

The frontend build uses the maven-enforcer-plugin to ban
some dependencies or require specific versions of dependencies.
The messages look like:
Found Banned Dependency: foo.bar.baz:1.2.3

These are currently filtered by bin/mvn-quiet.sh. This adds
an exception for "Found Banned" so they are not filtered.

Testing:
 - Ran on a branch with a known banned dependency and verified
   the output

Change-Id: I24abe59ad6bffb28ac63d014aa0ec7388ef5478f
Reviewed-on: http://gerrit.cloudera.org:8080/15820
Tested-by: Impala Public Jenkins 
Reviewed-by: David Knupp 
---
 bin/mvn-quiet.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/bin/mvn-quiet.sh b/bin/mvn-quiet.sh
index cc673da..f782ff4 100755
--- a/bin/mvn-quiet.sh
+++ b/bin/mvn-quiet.sh
@@ -36,7 +36,8 @@ LOGGING_OPTIONS="-Dorg.slf4j.simpleLogger.showDateTime \
 
 # Always use maven's batch mode (-B), as it produces output that is easier to 
parse.
 if ! mvn -B $IMPALA_MAVEN_OPTIONS $LOGGING_OPTIONS "$@" | \
-  tee -a "$LOG_FILE" | grep -E -e WARNING -e ERROR -e SUCCESS -e FAILURE -e 
Test; then
+  tee -a "$LOG_FILE" | \
+  grep -E -e WARNING -e ERROR -e SUCCESS -e FAILURE -e Test -e "Found Banned"; 
then
   echo "mvn $IMPALA_MAVEN_OPTIONS $@ exited with code $?"
   exit 1
 fi

[impala] 01/02: IMPALA-9613: [DOCS] Document the data_cache_eviction_policy

2020-04-27 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 961519747bf75bc2ba8519d00757b8e176d14538
Author: Kris Hahn 
AuthorDate: Wed Apr 8 20:39:00 2020 -0700

IMPALA-9613: [DOCS] Document the data_cache_eviction_policy

Describe start up flag to set LRU or LIRS policy. Tweak LIRS description.

Change-Id: Ic46ae00549157535c12f761aff7747fc90249d98
Reviewed-on: http://gerrit.cloudera.org:8080/15694
Tested-by: Impala Public Jenkins 
Reviewed-by: Joe McDonnell 
---
 docs/topics/impala_data_cache.xml | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/docs/topics/impala_data_cache.xml 
b/docs/topics/impala_data_cache.xml
index fed4181..210b181 100644
--- a/docs/topics/impala_data_cache.xml
+++ b/docs/topics/impala_data_cache.xml
@@ -85,6 +85,17 @@ under the License.
 
 --data_cache=/data/0,/data/1:500GB
 
+ In Impala 3.4 and higher, you can configure one of the following cache 
eviction policies for
+  the data cache: 
+LRU (Least Recently Used--the default)
+LIRS (Inter-referenece Recency Set)
+   LIRS is a scan-resistent, low performance-overhead policy. You 
configure a cache
+  eviction policy using the --data_cache_eviction_policy 
Impala Daemon start-up
+  flag: 
+
+--data_cache_eviction_policy=policy
+
+

[impala] branch master updated (c4ac9d2 -> afe765e)

2020-04-27 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


from c4ac9d2  Revert "IMPALA-9648: Exclude netty and netty-all from 
hadoop-hdfs mvn download"
 new 9615197  IMPALA-9613: [DOCS] Document the data_cache_eviction_policy
 new afe765e  Don't filter maven messages about banned dependencies

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 bin/mvn-quiet.sh  |  3 ++-
 docs/topics/impala_data_cache.xml | 11 +++
 2 files changed, 13 insertions(+), 1 deletion(-)

[impala] 02/03: Revert "IMPALA-9648: Don't ban netty 3* from fe/pom.xml"

2020-04-27 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 326ef554caef75c50a1394240766b501e5a699c3
Author: David Knupp 
AuthorDate: Mon Apr 27 15:38:49 2020 -0700

Revert "IMPALA-9648: Don't ban netty 3* from fe/pom.xml"

This patch was leading to CI builds failing in some environments.

This reverts commit f129a179a2c1b304e4d15fe4950449c5786abda1.

Change-Id: I4f38cab4deb0d9457d50d1e1a899af4cb90d3c24
Reviewed-on: http://gerrit.cloudera.org:8080/15824
Reviewed-by: David Knupp 
Tested-by: David Knupp 
---
 fe/pom.xml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fe/pom.xml b/fe/pom.xml
index 59d6eeb..70d17d6 100644
--- a/fe/pom.xml
+++ b/fe/pom.xml
@@ -758,6 +758,8 @@ under the License.
 org.fusesource.leveldbjni:*
 
 org.apache.httpcomponents:fluent-hc
+
+io.netty:netty:[3.10.6,)
 
 io.netty:netty-all:[4.1.46,)

[impala] 01/03: IMPALA-9640: [DOCS] Document Impala support for Kudu VARCHAR type

2020-04-27 Thread joemcdonnell

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit b9e84738b68710e00afb7efb3cf0c25c0582f7f8
Author: Kris Hahn 
AuthorDate: Thu Apr 9 17:37:42 2020 -0700

IMPALA-9640: [DOCS] Document Impala support for Kudu VARCHAR type

Removed VARCHAR from unsupported types in "Kudu considerations".

Change-Id: I61ad6982c35a009b15a2a082692f118a0fbcee65
Reviewed-on: http://gerrit.cloudera.org:8080/15703
Tested-by: Impala Public Jenkins 
Reviewed-by: Tamas Mate 
Reviewed-by: Joe McDonnell 
---
 docs/shared/impala_common.xml | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/docs/shared/impala_common.xml b/docs/shared/impala_common.xml
index cdf25d8..6b0e812 100644
--- a/docs/shared/impala_common.xml
+++ b/docs/shared/impala_common.xml
@@ -4509,10 +4509,9 @@ sudo pip-python install ssl
 Currently, the INSERT OVERWRITE syntax cannot be used 
with Kudu tables.
   
 
-  
-Currently, the data types CHAR, 
VARCHAR,
-ARRAY, MAP, and 
STRUCT cannot be used
-with Kudu tables.
+   Currently, the data types
+CHAR, ARRAY, MAP, 
and
+  STRUCT cannot be used with Kudu tables.

< 1 2 3 4 5 6 7 8 9 10 >

501 - 600 of 1097 matches

Mail list logo