[Impala-ASF-CR] Add all build targets to CMake and speed up builds
Tim Armstrong has posted comments on this change. Change subject: Add all build targets to CMake and speed up builds .. Patch Set 5: There were a couple of races exposed by running builds with different settings: * The shell tarball must depend on thrift-deps * impala-config.sh had side-effects that caused problems if run concurrently I've included fixes for both in the latest patch -- To view, visit http://gerrit.cloudera.org:8080/4790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I23617adf13bdeb034c24f6bba14b5ae480e8dd26 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] Add all build targets to CMake and speed up builds
Hello Jim Apple, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4790 to look at the new patch set (#4). Change subject: Add all build targets to CMake and speed up builds .. Add all build targets to CMake and speed up builds Use CMake's dependency resolution always instead of serial execution of targets via shell scripts. This improves parallelism by building fe, be, and other targets at the same time and avoid some overhead from invoking "make" multiple times. This reduces the time taken for an incremental compilation of fe and be from 56s to 24s with this command: ./buildall.sh -debug -noclean -notests -skiptests -ninja Also use Impala-lzo's build script. This depends on the IMPALA-4277 fixes to the Impala-lzo build script. Log directory creation is also moved from impala-config.sh to buildall.sh. This means that impala-config.sh has no side-effects and can be run concurrently with no issues. Also make sure that "make" builds all the same artifacts as buildall.sh when run with no args. Testing: Ran a jenkins core job, also experimented locally. Ran a jenkins core job with distcc disabled - this exposed some concurrency bugs where impala-config.sh fails if run concurrently. Change-Id: I23617adf13bdeb034c24f6bba14b5ae480e8dd26 --- M CMakeLists.txt M be/src/benchmarks/CMakeLists.txt M be/src/service/CMakeLists.txt M bin/impala-config.sh M bin/make_impala.sh M buildall.sh M ext-data-source/CMakeLists.txt M fe/CMakeLists.txt 8 files changed, 85 insertions(+), 66 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/4790/4 -- To view, visit http://gerrit.cloudera.org:8080/4790 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I23617adf13bdeb034c24f6bba14b5ae480e8dd26 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4365: Enabling end-to-end tests on a remote cluster
David Knupp has uploaded a new patch set (#7). Change subject: IMPALA-4365: Enabling end-to-end tests on a remote cluster .. IMPALA-4365: Enabling end-to-end tests on a remote cluster This patch lays the groundwork for loading data and running end-to-end tests on a remote CDH cluster. The requirements for the cluster to run the tests are: - Managed by Cloudera Manager (CM) - GPL Extras need to be installed - KMS and KeyTrustee installed and available as a service - SERDEPROPERTIES in the Hive DB modified to accept wide tables - Hive warehouse dir points to /test-warehouse The actual data loading is done via a new script, remote_data_load.py, which takes the CM host as an argument. It can be run from a client machine that is not a node of the cluster, but it needs to have the Impala repo checked out and Impala built. This insures that all of the necessary data load scripts are available, as well as setting up the environment properly (client binaries like beeline and the hbase shell are available, python libraries like cm_api are installed, necessary environment variables are defined, etc.) It should be noted that running remote_data_load.py will overwrite any local XML config files with the configurations downloaded from the remote cluster. Usage: remote_data_load.py [options] Options: -h, --helpshow this help message and exit --snapshot-file=SNAPSHOT_FILE Path to the test-warehouse archive --cm-user=CM_USER Cloudera Manager admin user --cm-pass=CM_PASS Cloudera Manager admin user password --gateway=GATEWAY Gateway host to upload the data from. If not set, uses the CM host as gateway. --ssh-user=SSH_USER System user on the remote machine with passwordless SSH configured. --no-load Do not try to load the snapshot --exploration-strategy=EXPLORATION_STRATEGY --testRun end-to-end tests against cluster Testing: This patch is being submitted with the understanding that there are still problems to work out with the remote data load script itself. However, since many of the existing build scripts also had to be modified, it is more important to make sure that no regressions were inadvertently introduced into the existing data load process. Loading data to a local mini-cluster was checked repeatedly while this patch was being developed, as well as running it against the Jenkins job that provides the test-warehouse snapshot used by the many other Impala CI builds that run daily. Remote data loading is working for the most part, although recent Kudu-related changes have introduced unforeseen problems: https://github.com/apache/incubator-impala/commit/041fa6d In the meantime, setting KUDU_IS_SUPPORTED to false provides a temporary workaround. Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9 --- M bin/load-data.py A bin/remote_data_load.py M testdata/bin/compute-table-stats.sh M testdata/bin/create-load-data.sh M testdata/bin/create-table-many-blocks.sh M testdata/bin/generate-schema-statements.py M testdata/bin/load-test-warehouse-snapshot.sh M testdata/bin/load_nested.py M testdata/bin/run-step.sh M testdata/bin/setup-hdfs-env.sh 10 files changed, 754 insertions(+), 66 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/4769/7 -- To view, visit http://gerrit.cloudera.org:8080/4769 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Martin Grund Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-4330: Fix JSON syntax in generate metrics.py
Lars Volker has uploaded a new change for review. http://gerrit.cloudera.org:8080/4887 Change subject: IMPALA-4330: Fix JSON syntax in generate_metrics.py .. IMPALA-4330: Fix JSON syntax in generate_metrics.py The hardcoded JSON string in MDL_BASE had a superfluous comma, that tripped both the simplejson and json parsers. This change removes it so the string works with both parsers. Change-Id: I98456df28d48ed22cefcc570e88df78fdf441c23 --- M common/thrift/generate_metrics.py 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/4887/1 -- To view, visit http://gerrit.cloudera.org:8080/4887 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I98456df28d48ed22cefcc570e88df78fdf441c23 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker
[Impala-ASF-CR] IMPALA-3346: DeepCopy() Kudu rows into Impala tuples.
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3346: DeepCopy() Kudu rows into Impala tuples. .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/4862/6/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java: Line 145: public boolean isKuduScanSlot() { return getColumn() instanceof KuduColumn; } this assumes isScanSlot(). either checkstate that or test for it and return a null otherwise. http://gerrit.cloudera.org:8080/#/c/4862/6/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java: Line 62: * Null flags are omitted for non-nullable slots, except for Kudu scan slots which always what was the reason for not simplifying this and always having null flags for everything? -- To view, visit http://gerrit.cloudera.org:8080/4862 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic911e4eff9fe98bf28d8a1bab5c9d7e9ab66d9cb Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3346: DeepCopy() Kudu rows into Impala tuples.
Marcel Kornacker has submitted this change and it was merged. Change subject: IMPALA-3346: DeepCopy() Kudu rows into Impala tuples. .. IMPALA-3346: DeepCopy() Kudu rows into Impala tuples. Implements additional changes to make the memory layout of Kudu rows identical to Impala tuples. In particular, Kudu rows allocate a null bit even for non-nullable columns, and Impala now does the same for Kudu scan tuples. This change exploits the now-identical Kudu and Impala tuple layouts to avoid the expensive translation. Perf: Mostafa reported a 50% efficiency gain on full table scans. Testing: A private core/hdfs run passed. TODO: 1) Test cases with nullable/nonnullable non-PK slots. 2) Specify mem layout to client (depends on KUDU-1694) 3) Avoid mem copies (depends on KUDU-1695) Change-Id: Ic911e4eff9fe98bf28d8a1bab5c9d7e9ab66d9cb Reviewed-on: http://gerrit.cloudera.org:8080/4862 Reviewed-by: Dan Hecht Tested-by: Marcel Kornacker --- M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java 5 files changed, 67 insertions(+), 202 deletions(-) Approvals: Marcel Kornacker: Verified Dan Hecht: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/4862 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ic911e4eff9fe98bf28d8a1bab5c9d7e9ab66d9cb Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Marcel Kornacker has submitted this change and it was merged. Change subject: IMPALA-3823: Add timer to measure Parquet footer reads .. IMPALA-3823: Add timer to measure Parquet footer reads It's been observed that Parquet footer reads perform poorly especially when reading from S3. This patch adds a timer "FooterProcessingTimer" which keeps a track of the average time each split of each scan node spends in reading and processing the parquet footer. Added a new utility counter called SummaryStatsCounter which keeps track of the min, max and average values seen so far from a set of values. This counter is used to calculate the min, max and average time taken to scan and process Parquet footers per query per node. The RuntimeProfile has also been updated to keep a track of, display and serialize this new counter to thrift. BE tests have been added to verify that this counter works fine. Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Reviewed-on: http://gerrit.cloudera.org:8080/4371 Reviewed-by: Sailesh Mukil Tested-by: Marcel Kornacker --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile-test.cc M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M common/thrift/RuntimeProfile.thrift M tests/query_test/test_scanners.py 8 files changed, 303 insertions(+), 16 deletions(-) Approvals: Marcel Kornacker: Verified Sailesh Mukil: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/4371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-4223: Handle truncated file read from HDFS cache
Marcel Kornacker has submitted this change and it was merged. Change subject: IMPALA-4223: Handle truncated file read from HDFS cache .. IMPALA-4223: Handle truncated file read from HDFS cache While overwriting files on HDFS via Hive it can happen that Impala sees a partially written, cached file. In these cases we did not correctly handle the partial cached read. This change adds a check and triggers a fall back to disk reads for such errors. If the file is partially written to disk, too, then the query will report a file corruption warning through the disk read path. Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a Reviewed-on: http://gerrit.cloudera.org:8080/4828 Reviewed-by: Dan Hecht Reviewed-by: Tim Armstrong Tested-by: Marcel Kornacker --- M be/src/runtime/disk-io-mgr-scan-range.cc 1 file changed, 13 insertions(+), 5 deletions(-) Approvals: Marcel Kornacker: Verified Tim Armstrong: Looks good to me, but someone else must approve Dan Hecht: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/4828 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Add functional tests for compute stats with mt dop > 0.
Marcel Kornacker has posted comments on this change. Change subject: Add functional tests for compute stats with mt_dop > 0. .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/4879/3/testdata/workloads/functional-query/queries/QueryTest/mt-dop-compute-stats.test File testdata/workloads/functional-query/queries/QueryTest/mt-dop-compute-stats.test: Line 3: compute stats alltypes > The existing compute stats test does not systematically run on all file for works for me, i just wanted to make sure we have enough coverage to release it as "mt works for compute stats". -- To view, visit http://gerrit.cloudera.org:8080/4879 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icd4e7e44f9f23f66e59ad1fb298e13da76ad817a Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4314: Standardize on MT-related data structures
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4314: Standardize on MT-related data structures .. Patch Set 4: Code-Review+2 final comment changes and rebase -- To view, visit http://gerrit.cloudera.org:8080/4853 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I465d0e15e2cf17cafe4c747d34c8f595d3645151 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4314: Standardize on MT-related data structures
Hello Henry Robinson, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4853 to look at the new patch set (#4). Change subject: IMPALA-4314: Standardize on MT-related data structures .. IMPALA-4314: Standardize on MT-related data structures This removes the data structures that were "superceded" in IMPALA-3903 and changes all control flow to utilize the new data structures. The new data structures are renamed to remove the "Mt" prefix. Change-Id: I465d0e15e2cf17cafe4c747d34c8f595d3645151 --- M be/src/benchmarks/expr-benchmark.cc M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/scheduling/query-schedule.cc M be/src/scheduling/query-schedule.h M be/src/scheduling/simple-scheduler-test-util.cc M be/src/scheduling/simple-scheduler-test-util.h M be/src/scheduling/simple-scheduler.cc M be/src/scheduling/simple-scheduler.h M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/query-exec-state.cc M common/thrift/Frontend.thrift M common/thrift/Planner.thrift M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test 23 files changed, 632 insertions(+), 1,169 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/4853/4 -- To view, visit http://gerrit.cloudera.org:8080/4853 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I465d0e15e2cf17cafe4c747d34c8f595d3645151 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3823: Add timer to measure Parquet footer reads .. Patch Set 11: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3346: DeepCopy() Kudu rows into Impala tuples.
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3346: DeepCopy() Kudu rows into Impala tuples. .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4862 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic911e4eff9fe98bf28d8a1bab5c9d7e9ab66d9cb Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4223: Handle truncated file read from HDFS cache
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4223: Handle truncated file read from HDFS cache .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4828 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4314: Standardize on MT-related data structures
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4314: Standardize on MT-related data structures .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/4853/3/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS3, Line 246: boost > remove boost:: Done PS3, Line 456: runtime > Will you file a JIRA to fix this? filed https://issues.cloudera.org/browse/IMPALA-4400 http://gerrit.cloudera.org:8080/#/c/4853/3/be/src/scheduling/query-schedule.h File be/src/scheduling/query-schedule.h: Line 116: void Validate() const; > Still missing mention of how this signals failure. Done -- To view, visit http://gerrit.cloudera.org:8080/4853 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I465d0e15e2cf17cafe4c747d34c8f595d3645151 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes