[Impala-ASF-CR] IMPALA-9838: Switch to GCC 7.5.0

2020-06-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16045 )

Change subject: IMPALA-9838: Switch to GCC 7.5.0
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6235/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0beb2b618ba669c9699f8dbc0c52d1203d004e4
Gerrit-Change-Number: 16045
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Mon, 08 Jun 2020 04:20:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9709: Remove Impala-lzo from the development environment

2020-06-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15814 )

Change subject: IMPALA-9709: Remove Impala-lzo from the development environment
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6234/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e
Gerrit-Change-Number: 15814
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Mon, 08 Jun 2020 03:48:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9838: Switch to GCC 7.5.0

2020-06-07 Thread Joe McDonnell (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16045

to look at the new patch set (#4).

Change subject: IMPALA-9838: Switch to GCC 7.5.0
..

IMPALA-9838: Switch to GCC 7.5.0

This upgrades GCC and libstdc++ to version 7.5.0. There
have been ABI changes since 4.9.2, so this means that
the native-toolchain produced with the new compiler is
not interoperable with one produced by the old compiler.
To allow that transition, IMPALA_TOOLCHAIN_PACKAGES_HOME
is now a subdirectory of IMPALA_TOOLCHAIN
(toolchain-packages-gcc${IMPALA_GCC_VERSION}) to distinguish
it from the old packages.

Some Python packages in the impala-python virtualenv are
compiled using the toolchain GCC and now use the new ABI.
This leads to two changes:
1. When constructing the LD_LIBRARY_PATH for impala-python,
we include the GCC libstdc++ libraries. Otherwise, certain
Python packages that use C++ fail on older OSes like Centos 7.
This fixes IMPALA-9804.
2. Since developers work on various branches, this changes
the virtualenv's directory location to a directory with
the GCC version in the name. This allows the virtualenv
built with GCC 7 to coexist with the current virtualenv
built with GCC 4.9.2. The location for the old virtualenv is
${IMPALA_HOME}/infra/python/env. The new location is
${IMPALA_HOME}/infra/python/env-gcc${IMPALA_GCC_VERSION}. This
required updating several impala-python scripts.

There are various odds-and-ends related to the transition:
1. Due to the small string optimization, the size of std::string
changed, which means that various data structures also changed
in size. This required updating some static asserts.
2. There is a bug in clang-tidy that reports a use-after-free
for some code using std::shared_ptr. Clang is not modeling
the shared_ptr correctly, so it is a false-positive. As a workaround,
this disables the clang-analyzer-cplusplus.NewDelete diagnostic.
3. Various small compilation fixes (includes, etc).

Performance testing:
 - Ran single-node performance tests on TPC-H for the following
   configurations:
- TPC-H Parquet scale 30 with normal configurations
- TPC-H Parquet scale 30 with codegen disabled
- TPC-H Kudu scale 10
   None found any significant regressions. Full results are
   posted on the JIRA.
 - Ran single-node performance tests on targeted-perf scale 10.
   No significant regressions.
 - The size of binaries (impalad, etc) is slightly smaller with the new GCC:
   GCC 4.9.2 release impalad binary: 545664
   GCC 7.5.0 release impalad binary: 539900
 - Compilation in DEBUG mode is roughly 15-25% faster

Functional testing:
 - Ran core jobs, exhaustive release jobs, UBSAN

Change-Id: Ia0beb2b618ba669c9699f8dbc0c52d1203d004e4
---
M .clang-tidy
M be/src/runtime/sorter-internal.h
M be/src/runtime/sorter.cc
M be/src/runtime/thread-resource-mgr.cc
M be/src/util/container-util.h
M bin/impala-config.sh
M bin/impala-flake8
M bin/impala-gcovr
M bin/impala-ipython
M bin/impala-pip
M bin/impala-py.test
M bin/impala-python
M bin/impala-python-common.sh
M bin/impala-shell.sh
M bin/set-pythonpath.sh
M infra/python/bootstrap_virtualenv.py
M tests/comparison/ORACLE.txt
17 files changed, 36 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/16045/4
--
To view, visit http://gerrit.cloudera.org:8080/16045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia0beb2b618ba669c9699f8dbc0c52d1203d004e4
Gerrit-Change-Number: 16045
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9838: Switch to GCC 7.5.0

2020-06-07 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16045 )

Change subject: IMPALA-9838: Switch to GCC 7.5.0
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16045/3/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/16045/3/bin/impala-config.sh@217
PS3, Line 217: export 
IMPALA_TOOLCHAIN_PACKAGES_HOME=${IMPALA_TOOLCHAIN}/toolchain-packages-gcc${IMPALA_GCC_VERSION}
> line too long (101 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16045/3/infra/python/bootstrap_virtualenv.py
File infra/python/bootstrap_virtualenv.py:

http://gerrit.cloudera.org:8080/#/c/16045/3/infra/python/bootstrap_virtualenv.py@398
PS3, Line 398:
> flake8: E201 whitespace after '['
Done



--
To view, visit http://gerrit.cloudera.org:8080/16045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0beb2b618ba669c9699f8dbc0c52d1203d004e4
Gerrit-Change-Number: 16045
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Mon, 08 Jun 2020 03:35:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9709: Remove Impala-lzo from the development environment

2020-06-07 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15814 )

Change subject: IMPALA-9709: Remove Impala-lzo from the development environment
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15814/4/tests/metadata/test_partition_metadata.py
File tests/metadata/test_partition_metadata.py:

http://gerrit.cloudera.org:8080/#/c/15814/4/tests/metadata/test_partition_metadata.py@215
PS4, Line 215: F
> flake8: E122 continuation line missing indentation or outdented
Done



--
To view, visit http://gerrit.cloudera.org:8080/15814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e
Gerrit-Change-Number: 15814
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Mon, 08 Jun 2020 03:35:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9709: Remove Impala-lzo from the development environment

2020-06-07 Thread Joe McDonnell (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15814

to look at the new patch set (#5).

Change subject: IMPALA-9709: Remove Impala-lzo from the development environment
..

IMPALA-9709: Remove Impala-lzo from the development environment

This removes Impala-lzo from the Impala development environment.
Impala-lzo is not built as part of the Impala build. LZO is no
longer loaded a plugin. LZO tables are not loaded during dataload,
and LZO is no longer tested.

This removes some obsolete scan APIs that were only used by Impala-lzo.
With this commit, Impala-lzo would require code changes to build
against Impala.

The plugin infrastructure is not removed, and this leaves some
LZO support code in place. If someone were to decide to revive
Impala-lzo, they would still be able to load it as a plugin
and get the same functionality as before. This plugin support
may be removed later.

Testing:
 - Dryrun of GVO
 - Modified TestPartitionMetadataUncompressedTextOnly's
   test_unsupported_text_compression() to add LZO case

Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e
---
M CMakeLists.txt
M be/src/exec/hdfs-plugin-text-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M bin/bootstrap_system.sh
M bin/clean.sh
M bin/impala-config.sh
M bin/set-ld-library-path.sh
M bin/start-impala-cluster.py
M buildall.sh
M docker/entrypoint.sh
M docker/impala_base/Dockerfile
M docker/test-with-docker.py
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
D testdata/bad_text_lzo/bad_text.lzo
D testdata/bad_text_lzo/bad_text.lzo.index
M testdata/bin/create-load-data.sh
M testdata/bin/generate-schema-statements.py
M testdata/bin/generate-test-vectors.py
M testdata/bin/load_nested.py
D testdata/bin/lzo_indexer.sh
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
M testdata/cluster/node_templates/common/etc/hadoop/conf/yarn-site.xml.py
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/joins-hdfs-num-rows-est-enabled.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-query/functional-query_dimensions.csv
M testdata/workloads/functional-query/functional-query_exhaustive.csv
M 
testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-scan-node-errors.test
D testdata/workloads/functional-query/queries/QueryTest/disable-lzo-plugin.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/unsupported-compression-partitions.test
M testdata/workloads/perf-regression/perf-regression_dimensions.csv
M testdata/workloads/perf-regression/perf-regression_exhaustive.csv
M testdata/workloads/perf-regression/perf-regression_pairwise.csv
M testdata/workloads/targeted-perf/targeted-perf_dimensions.csv
M testdata/workloads/targeted-perf/targeted-perf_exhaustive.csv
M testdata/workloads/targeted-perf/targeted-perf_pairwise.csv
M testdata/workloads/targeted-stress/targeted-stress_dimensions.csv
M testdata/workloads/targeted-stress/targeted-stress_exhaustive.csv
M testdata/workloads/targeted-stress/targeted-stress_pairwise.csv
M testdata/workloads/tpcds-unmodified/tpcds-unmodified_dimensions.csv
M testdata/workloads/tpcds-unmodified/tpcds-unmodified_exhaustive.csv
M testdata/workloads/tpcds-unmodified/tpcds-unmodified_pairwise.csv
M testdata/workloads/tpcds/tpcds_dimensions.csv
M testdata/workloads/tpcds/tpcds_exhaustive.csv
M testdata/workloads/tpcds/tpcds_pairwise.csv
M testdata/workloads/tpch/tpch_dimensions.csv
M testdata/workloads/tpch/tpch_exhaustive.csv
M testdata/workloads/tpch/tpch_pairwise.csv
M tests/common/test_dimensions.py
M tests/custom_cluster/test_hive_text_codec_interop.py
D tests/custom_cluster/test_scanner_plugin.py
M tests/metadata/test_metadata_query_statements.py
M tests/metadata/test_partition_metadata.py
M tests/query_test/test_compressed_formats.py
M tests/query_test/test_scanners_fuzz.py
62 files changed, 85 insertions(+), 333 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/15814/5
--
To view, visit http://gerrit.cloudera.org:8080/15814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e
Gerrit-Change-Number: 15814
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9838: Switch to GCC 7.5.0

2020-06-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16045 )

Change subject: IMPALA-9838: Switch to GCC 7.5.0
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6233/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0beb2b618ba669c9699f8dbc0c52d1203d004e4
Gerrit-Change-Number: 16045
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 08 Jun 2020 03:33:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9709: Remove Impala-lzo from the development environment

2020-06-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15814 )

Change subject: IMPALA-9709: Remove Impala-lzo from the development environment
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6232/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e
Gerrit-Change-Number: 15814
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 08 Jun 2020 03:32:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9838: Switch to GCC 7.5.0

2020-06-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16045 )

Change subject: IMPALA-9838: Switch to GCC 7.5.0
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16045/3/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/16045/3/bin/impala-config.sh@217
PS3, Line 217: export 
IMPALA_TOOLCHAIN_PACKAGES_HOME=${IMPALA_TOOLCHAIN}/toolchain-packages-gcc${IMPALA_GCC_VERSION}
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/16045/3/infra/python/bootstrap_virtualenv.py
File infra/python/bootstrap_virtualenv.py:

http://gerrit.cloudera.org:8080/#/c/16045/3/infra/python/bootstrap_virtualenv.py@398
PS3, Line 398:
flake8: E201 whitespace after '['



--
To view, visit http://gerrit.cloudera.org:8080/16045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0beb2b618ba669c9699f8dbc0c52d1203d004e4
Gerrit-Change-Number: 16045
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 08 Jun 2020 02:48:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9838: Switch to GCC 7.5.0

2020-06-07 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16045


Change subject: IMPALA-9838: Switch to GCC 7.5.0
..

IMPALA-9838: Switch to GCC 7.5.0

This upgrades GCC and libstdc++ to version 7.5.0. There
have been ABI changes since 4.9.2, so this means that
the native-toolchain produced with the new compiler is
not interoperable with one produced by the old compiler.
To allow that transition, IMPALA_TOOLCHAIN_PACKAGES_HOME
is now a subdirectory of IMPALA_TOOLCHAIN
(toolchain-packages-gcc${IMPALA_GCC_VERSION}) to distinguish
it from the old packages.

Some Python packages in the impala-python virtualenv are
compiled using the toolchain GCC and now use the new ABI.
This leads to two changes:
1. When constructing the LD_LIBRARY_PATH for impala-python,
we include the GCC libstdc++ libraries. Otherwise, certain
Python packages that use C++ fail on older OSes like Centos 7.
This fixes IMPALA-9804.
2. Since developers work on various branches, this changes
the virtualenv's directory location to a directory with
the GCC version in the name. This allows the virtualenv
built with GCC 7 to coexist with the current virtualenv
built with GCC 4.9.2. The location for the old virtualenv is
${IMPALA_HOME}/infra/python/env. The new location is
${IMPALA_HOME}/infra/python/env-gcc${IMPALA_GCC_VERSION}. This
required updating several impala-python scripts.

There are various odds-and-ends related to the transition:
1. Due to the small string optimization, the size of std::string
changed, which means that various data structures also changed
in size. This required updating some static asserts.
2. There is a bug in clang-tidy that reports a use-after-free
for some code using std::shared_ptr. Clang is not modeling
the shared_ptr correctly, so it is a false-positive. As a workaround,
this disables the clang-analyzer-cplusplus.NewDelete diagnostic.
3. Various small compilation fixes (includes, etc).

Performance testing:
 - Ran single-node performance tests on TPC-H for the following
   configurations:
- TPC-H Parquet scale 30 with normal configurations
- TPC-H Parquet scale 30 with codegen disabled
- TPC-H Kudu scale 10
   None found any significant regressions. Full results are
   posted on the JIRA.
 - Ran single-node performance tests on targeted-perf scale 10.
   No significant regressions.
 - The size of binaries (impalad, etc) is slightly smaller with the new GCC:
   GCC 4.9.2 release impalad binary: 545664
   GCC 7.5.0 release impalad binary: 539900
 - Compilation in DEBUG mode is roughly 15-25% faster

Functional testing:
 - Ran core jobs, exhaustive release jobs, UBSAN

Change-Id: Ia0beb2b618ba669c9699f8dbc0c52d1203d004e4
---
M .clang-tidy
M be/src/runtime/sorter-internal.h
M be/src/runtime/sorter.cc
M be/src/runtime/thread-resource-mgr.cc
M be/src/util/container-util.h
M bin/impala-config.sh
M bin/impala-flake8
M bin/impala-gcovr
M bin/impala-ipython
M bin/impala-pip
M bin/impala-py.test
M bin/impala-python
M bin/impala-python-common.sh
M bin/impala-shell.sh
M bin/set-pythonpath.sh
M infra/python/bootstrap_virtualenv.py
M tests/comparison/ORACLE.txt
17 files changed, 35 insertions(+), 27 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/16045/3
--
To view, visit http://gerrit.cloudera.org:8080/16045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia0beb2b618ba669c9699f8dbc0c52d1203d004e4
Gerrit-Change-Number: 16045
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9709: Remove Impala-lzo from the development environment

2020-06-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15814 )

Change subject: IMPALA-9709: Remove Impala-lzo from the development environment
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15814/4/tests/metadata/test_partition_metadata.py
File tests/metadata/test_partition_metadata.py:

http://gerrit.cloudera.org:8080/#/c/15814/4/tests/metadata/test_partition_metadata.py@215
PS4, Line 215: F
flake8: E122 continuation line missing indentation or outdented



--
To view, visit http://gerrit.cloudera.org:8080/15814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e
Gerrit-Change-Number: 15814
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 08 Jun 2020 02:48:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9709: Remove Impala-lzo from the development environment

2020-06-07 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15814


Change subject: IMPALA-9709: Remove Impala-lzo from the development environment
..

IMPALA-9709: Remove Impala-lzo from the development environment

This removes Impala-lzo from the Impala development environment.
Impala-lzo is not built as part of the Impala build. LZO is no
longer loaded a plugin. LZO tables are not loaded during dataload,
and LZO is no longer tested.

This removes some obsolete scan APIs that were only used by Impala-lzo.
With this commit, Impala-lzo would require code changes to build
against Impala.

The plugin infrastructure is not removed, and this leaves some
LZO support code in place. If someone were to decide to revive
Impala-lzo, they would still be able to load it as a plugin
and get the same functionality as before. This plugin support
may be removed later.

Testing:
 - Dryrun of GVO
 - Modified TestPartitionMetadataUncompressedTextOnly's
   test_unsupported_text_compression() to add LZO case

Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e
---
M CMakeLists.txt
M be/src/exec/hdfs-plugin-text-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M bin/bootstrap_system.sh
M bin/clean.sh
M bin/impala-config.sh
M bin/set-ld-library-path.sh
M bin/start-impala-cluster.py
M buildall.sh
M docker/entrypoint.sh
M docker/impala_base/Dockerfile
M docker/test-with-docker.py
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
D testdata/bad_text_lzo/bad_text.lzo
D testdata/bad_text_lzo/bad_text.lzo.index
M testdata/bin/create-load-data.sh
M testdata/bin/generate-schema-statements.py
M testdata/bin/generate-test-vectors.py
M testdata/bin/load_nested.py
D testdata/bin/lzo_indexer.sh
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
M testdata/cluster/node_templates/common/etc/hadoop/conf/yarn-site.xml.py
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/joins-hdfs-num-rows-est-enabled.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-query/functional-query_dimensions.csv
M testdata/workloads/functional-query/functional-query_exhaustive.csv
M 
testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-scan-node-errors.test
D testdata/workloads/functional-query/queries/QueryTest/disable-lzo-plugin.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/unsupported-compression-partitions.test
M testdata/workloads/perf-regression/perf-regression_dimensions.csv
M testdata/workloads/perf-regression/perf-regression_exhaustive.csv
M testdata/workloads/perf-regression/perf-regression_pairwise.csv
M testdata/workloads/targeted-perf/targeted-perf_dimensions.csv
M testdata/workloads/targeted-perf/targeted-perf_exhaustive.csv
M testdata/workloads/targeted-perf/targeted-perf_pairwise.csv
M testdata/workloads/targeted-stress/targeted-stress_dimensions.csv
M testdata/workloads/targeted-stress/targeted-stress_exhaustive.csv
M testdata/workloads/targeted-stress/targeted-stress_pairwise.csv
M testdata/workloads/tpcds-unmodified/tpcds-unmodified_dimensions.csv
M testdata/workloads/tpcds-unmodified/tpcds-unmodified_exhaustive.csv
M testdata/workloads/tpcds-unmodified/tpcds-unmodified_pairwise.csv
M testdata/workloads/tpcds/tpcds_dimensions.csv
M testdata/workloads/tpcds/tpcds_exhaustive.csv
M testdata/workloads/tpcds/tpcds_pairwise.csv
M testdata/workloads/tpch/tpch_dimensions.csv
M testdata/workloads/tpch/tpch_exhaustive.csv
M testdata/workloads/tpch/tpch_pairwise.csv
M tests/common/test_dimensions.py
M tests/custom_cluster/test_hive_text_codec_interop.py
D tests/custom_cluster/test_scanner_plugin.py
M tests/metadata/test_metadata_query_statements.py
M tests/metadata/test_partition_metadata.py
M tests/query_test/test_compressed_formats.py
M tests/query_test/test_scanners_fuzz.py
62 files changed, 85 insertions(+), 333 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/15814/4
--
To view, visit http://gerrit.cloudera.org:8080/15814
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3a4f12247d8872b7e14c9feb4b2c58cfd60d4c0e
Gerrit-Change-Number: 15814
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6231/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 08 Jun 2020 02:27:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-07 Thread Riza Suminto (Code Review)
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15963

to look at the new patch set (#11).

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..

IMPALA-6692: Trigger sort node run before hitting memory limit.

Sorter node works by adding row batches to a sort run. After all
batches added to current unsorted run or memory limit is hit, sorter
will immediately start the run. If the latter case happen, sorter will
spill the sorted run to disk after sort complete, create new unsorted
run object, and continue add the next row batches, and so on.

This algorithm try to fit as much rows into memory before start
sorting. However, in the case of partitioned sort with large number of
row batches, fitting too much rows into memory will cause the sort to
be slow and block the sorter node for a long time before it can
release some memory and continue accepting the next row batch from
exchange node. One slow sorter node can block exchange node from
sending row batches to other sorter node that is free.

This patch speedup the decision to start the sort without waiting it
to hit memory limit first by capping the intermediary quicksort run to
lower memory limit, determined by query option 'sort_run_bytes_limit'.
If the total used reservation of quicksort has exceed
sort_run_bytes_limit, current unsorted_run_ will be wrapped up,
sorted, and then spilled. Thus, overlapping the next sort run with
spill from previous sort run.

To reduce regression for cases where total input size of sort node
might be fully fitted into available memory, sort_run_bytes_limit will
not be enforced for the first sort run. However, it will stay limited
by sort_run_bytes_limit if planner estimates hint that spill is
inevitably will happen.

We also add new summary counter 'AddBatchTime' to get summary of how
much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate
the longest time spent in Sorter::AddBatch, presumably busy doing
intermediary sort.

Testing:
- Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits
- Run core tests
- Run data loading of 3 largest TPC-DS facts table of 300GB scale into
  real cluster using 5 backends, and 4GB mem_limit.
  sort_run_bytes_limit is varied between unspecified (not limited) vs
  512 MB. The performance result is summarized in the following table.

+---+-+--+---+-+
|  Insert table |  #Rows  |  Avg |   no limit|  512 MB 
limit   |
|   | | SortDataSize 
++--+-+---+
|   | |   per Node   |  Query |  Max |  Query  |
  Max  |
|   | |  |  Time  | AddBatchTime |   Time  |  
AddBatchTime |
+---+-+--++--+-+---+
| store_sales   | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m |
   5s634ms |
+---+-+--++--+-+---+
| catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms |  15m27s |
   3s603ms |
+---+-+--++--+-+---+
| web_sales | 216.01M |  5.67 GB |  8m16s | 29s250ms |   6m41s |
   3s856ms |
+---+-+--++--+-+---+

Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
---
M be/src/exec/partial-sort-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_sort.py
11 files changed, 141 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/11
--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9341: Set delegateAdmin to false for REVOKE without GRANT OPTION

2020-06-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16046 )

Change subject: IMPALA-9341: Set delegateAdmin to false for REVOKE without 
GRANT OPTION
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6230/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16046
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19ff45a5a30293e9c6cf35b22ea4aa5cb10355c9
Gerrit-Change-Number: 16046
Gerrit-PatchSet: 1
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sun, 07 Jun 2020 22:53:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9341: Set delegateAdmin to false for REVOKE without GRANT OPTION

2020-06-07 Thread Fang-Yu Rao (Code Review)
Fang-Yu Rao has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16046


Change subject: IMPALA-9341: Set delegateAdmin to false for REVOKE without 
GRANT OPTION
..

IMPALA-9341: Set delegateAdmin to false for REVOKE without GRANT OPTION

When executing a GRANT or REVOKE statement with Ranger being the
authorization provider, Impala has to prepare a GrantRevokeRequest to
allow Ranger to add/delete the corresponding RangerPolicy or modify the
existing RangerPolicyItem's in the related RangerPolicy. One of the
fields that has to be set in a GrantRevokeRequest is delegateAdmin,
which dictates whether the grantee is allowed to transfer the privilege
on the resource to other principals. Specifically, the field
of delegateAdmin in the updated RangerPolicyItem corresponding to the
grantee would be set to the value of delegateAdmin in the
GrantRevokeRequest prepared by Impala.

Before this patch, when executing a REVOKE statement without the GRANT
OPTION, Impala would set delegateAdmin in the GrantRevokeRequest to
true. This is fine if the privilege to be revoked is the only privilege
that was previously granted to the grantee. However, in the case when
the privilege to be revoked was not granted and there is a
RangerPolicyItem with respect to the other privilege on the same
resource, the grantee actually obtains the permission to transfer the
non-matching privilege afterwards. The root cause of this issue is that
the privileges on the same resource share the same field of
delegateAdmin in the corresponding RangerPolicyItem, a current
limitation of Ranger. In this regard, as a workaround, we set
delegateAdmin in the GrantRevokeRequest to false for a REVOKE statement
without the GRANT OPTION.

We would like to point out that there is a limitation of this
workaround. More precisely, in the case when the grantee was permitted
to transfer the non-matching privilege, setting delegateAdmin to false
in the GrantRevokeRequest would deprive the grantee of the permission
that should not have been revoked, making it a bit inconvenient for both
the administrator and the grantee since the permission to transfer the
non-matching privilege should be restored afterwards if necessary.

An alternative approach is for Impala to always check the current
delegateAdmin value when performing a REVOKE statement without the GRANT
OPTION. Specifically, we could resolve this problem by 1) checking
whether or not there exists a RangerPolicyItem with respect to the same
resource and the grantee such that the delegateAdmin field is set to
true and 2) setting up the delegateAdmin field in the GrantRevokeRequest
accordingly. This alternative, however, suffers from the drawback that
additional logic has to be added to iterate over the RangerPolicy's
for the resource in the query, slowing down the query execution.
Therefore, we decide to choose the approach proposed in this patch over
the alternative that is less efficient.

Testing:
- Revised a test case in test_ranger.py to reflect the behavior change
  of Impala when a REVOKE statement without the GRANT OPTION is
  executed.
- Verified that this patch passed the exhaustive tests in the DEBUG
  build.

Change-Id: I19ff45a5a30293e9c6cf35b22ea4aa5cb10355c9
---
M 
fe/src/main/java/org/apache/impala/authorization/ranger/RangerCatalogdAuthorizationManager.java
M tests/authorization/test_ranger.py
2 files changed, 7 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/16046/1
--
To view, visit http://gerrit.cloudera.org:8080/16046
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I19ff45a5a30293e9c6cf35b22ea4aa5cb10355c9
Gerrit-Change-Number: 16046
Gerrit-PatchSet: 1
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Quanlong Huang