[Impala-ASF-CR] IMPALA-5990: Part 1: JNI-based LZ4 de/compression
Alex Behm has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8150 Change subject: IMPALA-5990: Part 1: JNI-based LZ4 de/compression .. IMPALA-5990: Part 1: JNI-based LZ4 de/compression Adds LZ4 de/compression in FeSupport geared towards eventually compressing catalog objects end-to-end in their journey from the catalogd over the statestored to coordinator impalads. A follow-on change will use the new LZ4 functions to do the actual compression. Testing: - Added a new unit test Change-Id: I237802944875b07080db0159ff8ec548150fd95e --- M be/src/service/fe-support.cc M be/src/util/jni-util.cc M be/src/util/jni-util.h M fe/src/main/java/org/apache/impala/service/FeSupport.java A fe/src/test/java/org/apache/impala/service/Lz4CompressionTest.java 5 files changed, 424 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/8150/1 -- To view, visit http://gerrit.cloudera.org:8080/8150 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I237802944875b07080db0159ff8ec548150fd95e Gerrit-Change-Number: 8150 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Behm
[Impala-ASF-CR] IMPALA-5940: Avoid log spew by using Status::Expected.
Hello Alex Behm, Mostafa Mokhtar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8100 to look at the new patch set (#3). Change subject: IMPALA-5940: Avoid log spew by using Status::Expected. .. IMPALA-5940: Avoid log spew by using Status::Expected. In IMPALA-5926, we fixed a case where closing the session triggered a stack trace in the logs which impacted performance for short-running queries. Looking at log files from several active clusters, I identified a few other cases where we could clean up log spew with the same (trivial) approach. The following messages will no longer log: "Failed to codegen MaterializeTuple() due to unsupported type: $0" "Not implemented for this format." "ScalarFnCall Codegen not supported for CHAR" "Could not codegen CodegenMaterializeExprs: " "Could not codegen TupleRowComparator::Compare(): $0" These codegen related messages were happening at every execution node, and were therefore very common. In addition, the following messages, which were very frequent in the logs, now will log, but without the back trace: "... Client ... timed-out during recv call." "Query Id ... not found." To do this, I changed them from logging implicitly via Status(...) to logging explicitly. The snippet I used to identify these was: find . -type f -name '*IMPALAD*.gz' | xargs gzcat | awk '/^I/ { if(x) { print x; } x = "" } /status.cc/ { x=" "; } { if(x) { x=x $0 } }' | sed -e 's/0x[0-9a-fx]* //g' | sed -e 's/[0-9a-f]\{16\}:[0-9a-f]*/QUERYID/g' | tr -s '\t' ' ' | tr '[0-9]' 'N' | sort | uniq -c | sort -n | tee output.txt I also analyzed some logs using SQL, against a pre-processed logs table: with v as ( select regexp_replace( regexp_replace( translate(substr(message, 42), "\n\t", " "), "[a-zA-Z0-9.-]*[.][a-zA-Z0-9-]*:[0-9]*", ""), "@.*$", "@@@...") as m from logs_table where `class`="status.cc") select m, count(*) from v group by 1 order by 2 desc limit 100 Testing: * Automated tests. * Manual testing for one of the new back-trace-suppressed paths: $ impala-python shell/gen-py/ImpalaService/ImpalaService-remote -h localhost:21000 GetRuntimeProfile 'beeswaxd.ttypes.QueryHandle()' Traceback (most recent call last): File "shell/gen-py/ImpalaService/ImpalaService-remote", line 106, in pp.pprint(client.GetRuntimeProfile(eval(args[0]),)) File "/home/philip/src/impala/shell/gen-py/ImpalaService/ImpalaService.py", line 161, in GetRuntimeProfile return self.recv_GetRuntimeProfile() File "/home/philip/src/impala/shell/gen-py/ImpalaService/ImpalaService.py", line 184, in recv_GetRuntimeProfile raise result.error beeswaxd.ttypes.BeeswaxException: BeeswaxException(handle=QueryHandle(log_context='', id=''), log_context='', SQLState='HY000', _message='GetRuntimeProfile error: Query id 0:0 not found.\n', errorCode=0) $ grep 'Query id' logs/cluster/impalad.INFO | tail -n 1 I0926 20:29:09.44 6787 impala-server.cc:642] Query id 0:0 not found. Change-Id: I38088482377a1c3e794a9c8178ef83f29957a330 --- M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exprs/scalar-fn-call.cc M be/src/runtime/client-cache.h M be/src/runtime/tuple.cc M be/src/service/impala-server.cc M be/src/util/tuple-row-compare.cc 7 files changed, 23 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8100/3 -- To view, visit http://gerrit.cloudera.org:8080/8100 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I38088482377a1c3e794a9c8178ef83f29957a330 Gerrit-Change-Number: 8100 Gerrit-PatchSet: 3 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-5975: Work around broken beeline clients
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8132 ) Change subject: IMPALA-5975: Work around broken beeline clients .. IMPALA-5975: Work around broken beeline clients To make statements execute, some clients require always appending a semi-colon to the end. The workaround is quite simple. Change-Id: Id8b9f3dde4445513f1f389785a002c6cc6b3dada Reviewed-on: http://gerrit.cloudera.org:8080/8132 Reviewed-by: Joe McDonnellTested-by: Impala Public Jenkins --- M testdata/bin/create-load-data.sh M testdata/bin/create-table-many-blocks.sh 2 files changed, 4 insertions(+), 4 deletions(-) Approvals: Joe McDonnell: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8132 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Id8b9f3dde4445513f1f389785a002c6cc6b3dada Gerrit-Change-Number: 8132 Gerrit-PatchSet: 3 Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5975: Work around broken beeline clients
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8132 ) Change subject: IMPALA-5975: Work around broken beeline clients .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8132 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id8b9f3dde4445513f1f389785a002c6cc6b3dada Gerrit-Change-Number: 8132 Gerrit-PatchSet: 2 Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Wed, 27 Sep 2017 03:27:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5940: Avoid log spew by using Status::Expected.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8100 ) Change subject: IMPALA-5940: Avoid log spew by using Status::Expected. .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8100/2/be/src/runtime/client-cache.h File be/src/runtime/client-cache.h: http://gerrit.cloudera.org:8080/#/c/8100/2/be/src/runtime/client-cache.h@243 PS2, Line 243: return Status::NoTrace(TErrorCode::RPC_RECV_TIMEOUT, strings::Substitute( > Is it necessary to have Status write the message to the log? is it not suf Yep, you're right. Much lighter change. I'll update. -- To view, visit http://gerrit.cloudera.org:8080/8100 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I38088482377a1c3e794a9c8178ef83f29957a330 Gerrit-Change-Number: 8100 Gerrit-PatchSet: 2 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 27 Sep 2017 03:20:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 5: (1 comment) > The Parquet writer does something similar for the Parquet > statistics, can any of that code be reused with regards to codegen? This was discussed in the original design thread, and we felt that there was too much parquet specific stuff current in/going to be added soon in the parquet statistics code, so we decided to just keep it separate. http://gerrit.cloudera.org:8080/#/c/7793/2/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/7793/2/common/thrift/ImpalaInternalService.thrift@740 PS2, Line 740: 4: optional i64 queue_timeout_ms; : : // Default query options that are applied to requests mapped to this pool. : > This could be a reasonable approach, or another idea is: Done -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 5 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Wed, 27 Sep 2017 03:08:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
Zach Amsden has posted comments on this change. ( http://gerrit.cloudera.org:8080/7438 ) Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication .. Patch Set 6: I totally agree about doing anything fancy as a followup. In particular, I did not find anything at all useful in AVX to help. There is no vectored multiply large enough for this, and FMA operations won't help either as they basically only help with precision loss on floating point types. Considering the perf difference is so extreme in some cases, I think we should strongly consider either living with the broken behavior until we can come up with a solution, or else verifying with users making use of the DECIMAL_V2 feature that this will not be a problem. -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-Change-Number: 7438 Gerrit-PatchSet: 6 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Wed, 27 Sep 2017 03:00:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Hello Michael Ho, Lars Volker, Matthew Jacobs, Mostafa Mokhtar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7793 to look at the new patch set (#5). Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter and/or a min-max filter, depending on if it has HDFS and/or Kudu targets, respectively. Min-max filters are generated by the PartitionedHashJoinBuilder. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Because the Kudu client doesn't provide a way to specify generic filter exprs, min-max filters are only generated when the target expr is a bare Kudu column ref. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Codegen is used to eliminate branching on the type of the min-max filter. Testing: - Updated planner tests. - Ran existing runtime filter tests. - Ran preliminary perf tests to demonstrate that it works. Will update with more specific results. - Still needs more e2e tests. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/service/impala-internal-service.cc M be/src/util/CMakeLists.txt A be/src/util/min-max-filter-ir.cc A be/src/util/min-max-filter.cc A be/src/util/min-max-filter.h M common/thrift/ImpalaInternalService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test 37 files changed, 1,464 insertions(+), 146 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/7793/5 -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 5 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar
[Impala-ASF-CR] DRAFT - IMPALA-4252: Min-max runtime filters for Kudu
Hello Michael Ho, Lars Volker, Matthew Jacobs, Mostafa Mokhtar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7793 to look at the new patch set (#4). Change subject: DRAFT - IMPALA-4252: Min-max runtime filters for Kudu .. DRAFT - IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter and/or a min-max filter, depending on if it has HDFS and/or Kudu targets, respectively. Min-max filters are generated by the PartitionedHashJoinBuilder. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Because the Kudu client doesn't provide a way to specify generic filter exprs, min-max filters are only generated when the target expr is a bare Kudu column ref. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Codegen is used to eliminate branching on the type of the min-max filter. Testing: - Updated planner tests. - Ran existing runtime filter tests. - Ran preliminary perf tests to demonstrate that it works. Will update with more specific results. - Still needs more e2e tests. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/service/impala-internal-service.cc M be/src/util/CMakeLists.txt A be/src/util/min-max-filter-ir.cc A be/src/util/min-max-filter.cc A be/src/util/min-max-filter.h M common/thrift/ImpalaInternalService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test 37 files changed, 1,464 insertions(+), 146 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/7793/4 -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 4 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar
[Impala-ASF-CR] IMPALA-4252: Move runtime filters to ScanNode
Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8148 Change subject: IMPALA-4252: Move runtime filters to ScanNode .. IMPALA-4252: Move runtime filters to ScanNode As a preliminary step towards adding runtime filters for Kudu scans, this patch moves runtime filter related code from HdfsScanNodeBase to ScanNode so that it's available to KuduScanNodeBase. Testing: - Ran existing runtime filter tests. Change-Id: I17bdc869046dc2cd837d02f333441fa6324ff086 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/scan-node.cc M be/src/exec/scan-node.h 4 files changed, 126 insertions(+), 107 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/8148/1 -- To view, visit http://gerrit.cloudera.org:8080/8148 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I17bdc869046dc2cd837d02f333441fa6324ff086 Gerrit-Change-Number: 8148 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5376: Implement all TPCDS test cases or alternates for Impala.
Tim Wood has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/8140 ) Change subject: IMPALA-5376: Implement all TPCDS test cases or alternates for Impala. .. IMPALA-5376: Implement all TPCDS test cases or alternates for Impala. Main source for TPCDS query and result definitions: https://github.com/gregrahn/tpcds-kit. TPC-DS v2.5.0 qualification queries from G. Rahn, Cloudera, Inc. Data set constructed in mini-cluster using $IMPALA_HOME/buildall.sh -testdata This commit continues previous work on IMPALA-5376 in the ASF Impala repo and the Cloudera Gerrit service. This commit splits multi-query tests in the TPC-DS suite definition into one query and result set per test file, as the test framework requires. Names for such files have -1, -2... inner suffixes. The complete TPC-DS test suite runs with passes, skips and xfails, but no failures, as reflected by runs of $IMPALA_HOME/tests/run-tests.py query_test/test_tpcds_queries.py ... Expected result sets come from the TPC-DS kit. Some TPC-DS test cases in this commit have been modified in sematically-neutral ways so as to pass on Impala; others are marked to skip or xfail due to bugs. Tests that flap are marked to skip, with a bug ID, since they don't reliably pass or xfail. The tests/query_test/test_tpcds_queries.py driver file is authoritative for the active/skip/xfail status for each case and a brief reason. The following list describes the current status as: --- test-name deviance from TPC-DS spec changes made --- tpcds-q22a.test RESULT MISMATCH in LSD of AVG() values Fixed AVG()s --- tpcds-q30.test UNRECOGNIZED CHARACTER MARKED XFAIL, IMPALA-5961. --- tpcds-q31.test RESULT MISMATCH in LSD of DECIMAL values ADDED TRUNCATE(2)s AROUND LAST 4 COLUMNS. MARKED SKIP, IMPALA-5956 --- tpcds-q35a.test RESULT MISMATCH MARKED XFAIL, IMPALA-5950. --- tpcds-q36a.test RESULT MISMATCH MARKED XFAIL, IMPALA-4741 --- tpcds-q39.test MULTIPLE RESULT SET not recognized by test framework MARKED XFAIL. --- tpcds-q47.test RESULT MISMATCH in LSD of DECIMAL values ADDED TRUNCATE(2) TO 8th COLUMN OF WITH TABLE, TAKE ACTUAL RESULT AS EXPECTED. --- tpcds-q49.test RESULT MISMATCH in LSD of DECIMAL values MARKED XFAIL, IMPALA-5945 --- tpcds-q57.test RESULT MISMATCH, excess scale in DECIMAL values FIXED, ADDED TRUNCATE(2) AROUND 6th COLUMN. --- tpcds-q58.test RESULT MISMATCH in DECIMAL values MARKED XFAIL. IMPALA-5946 --- tpcds-q59.test RESULT MISMATCH, excess scale in DECIMAL values FIXED, ADDED TRUNCATE(2) AROUND 4th-10th COLUMNS. --- tpcds-q61.test RESULT MISMATCH in DECIMAL value FIXED. CAST RESULT QUOTIENT TO DECIMAL(15, 4), TAKE ACTUAL RESULT AS EXPECTED --- tpcds-q63.test RESULT MISMATCH, excess scale in DECIMAL values ADDED TRUNCATE(2) TO 3rd COLUMN --- tpcds-q64.test RESULT MISMATCH ADDED ORDER BY COLUMNS. --- tpcds-q66.test RESULT MISMATCH MARKED XFAIL, IMPALA-4741 --- tpcds-q77a.test RESULT MISMATCH FIXED. TAKE ACTUAL RESULT AS EXPECTED --- tpcds-q78.test RESULT MISMATCH FIXED. TAKE ACTUAL RESULT AS EXPECTED --- tpcds-q83.test RESULT MISMATCH MARKED XFAIL. IMPALA-5945. --- tpcds-q85.test MISSING TABLE "reason" MARKED XFAIL, IMPALA-5960 --- tpcds-q86a.test RESULT MISMATCH FIXED. TAKE ACTUAL RESULT AS EXPECTED --- tpcds-q89.test RESULT MISMATCH, DECIMAL values flap MARKED XFAIL. ADDED ROUND(2) TO 8th COLUMN, TAKE ACTUAL RESULTS AS EXPECTED, IMPALA-5956. --- tpcds-q90.test RESULT MISMATCH MARKED XFAIL, IMPALA-5945. --- tpcds-q93.test MISSING TABLE "reason" MARKED XFAIL, IMPALA-5960 --- tpcds-q98.test RESULT MISMATCH FIXED, ADDED ROUND() TO LAST COLUMN IMPALA-5986: Allow SET option names to contain digits when resetting them between queries. Change-Id: I026a0123b035b6828ef35b599dda50aadf0171bb --- A testdata/workloads/tpcds/queries/tpcds-q10.test A testdata/workloads/tpcds/queries/tpcds-q10a.test A testdata/workloads/tpcds/queries/tpcds-q11.test A testdata/workloads/tpcds/queries/tpcds-q12.test A testdata/workloads/tpcds/queries/tpcds-q13.test A testdata/workloads/tpcds/queries/tpcds-q14-1.test A testdata/workloads/tpcds/queries/tpcds-q14-2.test A testdata/workloads/tpcds/queries/tpcds-q14a-1.test A testdata/workloads/tpcds/queries/tpcds-q14a-2.test A testdata/workloads/tpcds/queries/tpcds-q15.test A testdata/workloads/tpcds/queries/tpcds-q16.test A testdata/workloads/tpcds/queries/tpcds-q17.test A testdata/workloads/tpcds/queries/tpcds-q18.test A testdata/workloads/tpcds/queries/tpcds-q18a.test A testdata/workloads/tpcds/queries/tpcds-q20.test A testdata/workloads/tpcds/queries/tpcds-q21.test A testdata/workloads/tpcds/queries/tpcds-q22.test A testdata/workloads/tpcds/queries/tpcds-q22a.test M testdata/workloads/tpcds/queries/tpcds-q23-1.test M testdata/workloads/tpcds/queries/tpcds-q23-2.test A testdata/workloads/tpcds/queries/tpcds-q24-1.test A testdata/workloads/tpcds/queries/tpcds-q24-2.test A testdata/workloads/tpcds/queries/tpcds-q25.test A
[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/7438 ) Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication .. Patch Set 6: I think we should explore your suggestion to try SEE/AVX in a follow on patch. (to optimze both multiplication and division if it's possible) -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-Change-Number: 7438 Gerrit-PatchSet: 6 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Wed, 27 Sep 2017 01:24:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4939, IMPALA-4940: Decimal V2 multiplication
Taras Bobrovytsky has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/7438 ) Change subject: IMPALA-4939, IMPALA-4940: Decimal V2 multiplication .. IMPALA-4939, IMPALA-4940: Decimal V2 multiplication Implement the new DECIMAL return type rules for multiply expressions, active when query option DECIMAL_V2=1. The algorithm for determining the type of the result of multiplication is described in the JIRA. DECIMAL V1: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,38)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,30)| +---+ DECIMAL V2: +---+ | typeof(cast('0.1' as decimal(38,38)) * cast('0.1' as decimal(38,38))) | +---+ | DECIMAL(38,37)| +---+ +---+ | typeof(cast('0.1' as decimal(38,15)) * cast('0.1' as decimal(38,15))) | +---+ | DECIMAL(38,6) | +---+ In this patch, we also fix the early multiplication overflow. We compute a 256 bit integer intermediate value, which we then attempt to scale down and round. Performance: I ran TPCH 300 and TPCDS 1000 workloads and the performance is almost identical. For TPCH Q1, there was an improvement from 21 seconds to 16 seconds. I did not see any regressions. The performance improvement is due to the way we check for overflows after this patch (by counting the leading zeros instead of dividing). It can be clealy seen in this query: select cast(2.2 as decimal(38, 1)) * cast(2.2 as decimal(38, 1)) before: 7.85s after: 2.03s I noticed performance regressions in the following cases: - When we need to convert to a 256 bit integer before multiplying, which was introduced in this patch. Whether this happens depends on the resulting precision and the value of the inputs. In the following extreme case, the intermediate value is converted to a 256 bit integer every time. select cast(1.1 as decimal(38, 37)) * cast(1.1 as decimal(38, 37)) before: 14.56s (returns null) after: 126.17s - When we need to scale down the intermediate value. In the following query the result is decimal(38,6) after the patch, so the intermediate needs to be scaled down. select cast(2.2 as decimal(38,1)) * cast(2.2 as decimal(38,19)) before: 7.25s after: 13.06s These regressions are possible only when the resulting precision is 38 which is not common in typical workloads. Note: The actual queries that I ran for the benchmark are not exactly as above. I constructed tables with millions of rows with those values. I ran the queries with DECIMAL_v2=1 option before and after the patch. Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 --- M be/src/exprs/expr-test.cc M be/src/runtime/decimal-value.inline.h M be/src/util/bit-util.h M fe/src/main/java/org/apache/impala/analysis/TypesUtil.java 4 files changed, 289 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7438/6 -- To view, visit http://gerrit.cloudera.org:8080/7438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37ad6232d7953bd75c18dc86e665b2b501a1ebe1 Gerrit-Change-Number: 7438 Gerrit-PatchSet: 6 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5986: Allow SET option names to contain digits when resetting them between queries.
Tim Wood has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/8140 ) Change subject: IMPALA-5986: Allow SET option names to contain digits when resetting them between queries. .. IMPALA-5986: Allow SET option names to contain digits when resetting them between queries. Change-Id: I026a0123b035b6828ef35b599dda50aadf0171bb --- M tests/common/impala_test_suite.py 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/8140/3 -- To view, visit http://gerrit.cloudera.org:8080/8140 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I026a0123b035b6828ef35b599dda50aadf0171bb Gerrit-Change-Number: 8140 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Wood
[Impala-ASF-CR] IMPALA-5870: Improve runtime profile for partial sort
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8123 ) Change subject: IMPALA-5870: Improve runtime profile for partial sort .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b Gerrit-Change-Number: 8123 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 27 Sep 2017 00:08:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/hdfs-parquet-scanner.cc@245 PS5, Line 245: context_->ReleaseCompletedResources(nullptr, true); > I fixed it while I was checking the callsites of ReleaseCompletedResources( I can obviously fix it if you feel strongly but it just doesn't seem that important to me. -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 5 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 26 Sep 2017 23:50:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Hello Lars Volker, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8085 to look at the new patch set (#6). Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet This change only affects uncompressed plain-encoded Parquet where RowBatches may directly reference strings stored in the I/O buffers. The proposed fix is to simply copy the data pages if needed then use the same logic that we use for decompressed data pages. This copy inevitably adds some CPU overhead, but I believe this is acceptable because: * We generally recommend using compression, and optimize for that case. * Copying memory is cheaper than decompressing data. * Scans of uncompressed data are very likely to be I/O bound. This allows several major simplifications: * The resource management for compressed and uncompressed scans is much more similar. * We don't need to attach Disk I/O buffers to RowBatches. * We don't need to deal with attaching I/O buffers in ScannerContext. * Column readers can release each I/O buffer *before* advancing to the next one, making it easier to reason about resource consumption. E.g. each Parquet column only needs one I/O buffer at a time to make progress. Future changes will apply to Avro, Sequence Files and Text. Once all scanners are converted, ScannerContext::contains_tuple_data_ will always be false and we can remove some dead code. Testing === Ran core ASAN and exhaustive debug builds. Perf No difference in most cases when scanning uncompressed parquet. There is a significant regression (50% increase in runtime) in targeted perf tests scanning non-dictionary-encoded strings (see benchmark output below). After the regression performance is comparable to Snappy compression. I also did a TPC-H run but ran into some issues with the report generator. I manually compared times and there were no regressions. ++---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | ++---+-++++ | TARGETED-PERF(_61) | parquet / none / none | 23.02 | +0.60% | 4.23 | +5.97% | ++---+-++++ +++---++-++++-+---+ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | +++---++-++++-+---+ | TARGETED-PERF(_61) | PERF_STRING-Q2 | parquet / none / none | 3.00 | 1.98| R +52.10% | 0.97%| 1.25%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q1 | parquet / none / none | 2.86 | 1.92| R +49.11% | 0.34%| 2.34%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q3 | parquet / none / none | 3.16 | 2.15| R +47.04% | 1.03%| 0.72%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q4 | parquet / none / none | 3.16 | 2.17| R +45.60% | 0.14%| 1.11%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q5 | parquet / none / none | 3.51 | 2.55| R +37.88% | 0.83%| 0.49%| 1 | 5 | | TARGETED-PERF(_61) | PERF_AGG-Q5| parquet / none / none | 0.79 | 0.61| R +30.86% | 1.54%| 4.10%| 1 | 5 | | TARGETED-PERF(_61) | primitive_top-n_al | parquet / none / none | 39.45 | 35.07 | +12.51% | 0.29%| 0.29%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q7 | parquet / none / none | 6.78 | 6.10| +11.13% | 0.99%| 0.74%| 1 | 5 | | TARGETED-PERF(_61) | PERF_STRING-Q6 | parquet / none / none | 8.83 | 8.14| +8.52% | 0.15%| 0.32%| 1 | 5 | ... Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M be/src/exec/scanner-context.h 4 files changed, 64 insertions(+), 53 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/8085/6 -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType:
[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 ) Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet .. Patch Set 5: (8 comments) http://gerrit.cloudera.org:8080/#/c/8085/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8085/5//COMMIT_MSG@7 PS5, Line 7: IMPALA-5307 > Can you add a line at the bottom what the other part(s) would look like? Done http://gerrit.cloudera.org:8080/#/c/8085/5//COMMIT_MSG@56 PS5, Line 56: +++---++-++++-+---+ > Nit: You could make the second column smaller to make this more readable, a Done http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/hdfs-parquet-scanner.cc@245 PS5, Line 245: context_->ReleaseCompletedResources(nullptr, true); > I think it's best to change the whole file at once, or only change occurren I fixed it while I was checking the callsites of ReleaseCompletedResources(). http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h@476 PS5, Line 476: Status AllocateUncompressedDataPage( > Should we call this "AllocateUncompressedDataBuffer"? Otherwise it sounds t That seems vaguer. It would also be confusing since the pool is data_page_pool_. I updated the comment to mention that the contents are uncompressed rather than the page. http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h@477 PS5, Line 477: int64_t size, const std::string& desc, uint8_t** buffer); > Maybe err_desc, err_detail, or detail? "desc" reminds me of descriptors. Done. Also switch to const char* so we don't need to create a std::string for every call. http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h@485 PS5, Line 485: IsStringType > This does not say "VarLenStringType" but above in a comment you refer to va Fixed it. We don't need to copy if it's a CHAR. http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.cc@1075 PS5, Line 1075: uncompressed_size, "uncompressed variable-length data", _buffer)); > DCHECK(copy_buffer != nullptr); And maybe initialize it to nullptr, so that It doesn't seem that necessary to me since it's part of the contract of the function and it will crash immediately anyway. http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.cc@1105 PS5, Line 1105: *buffer = data_page_pool_->TryAllocate(size); Bad indentation. -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 5 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 26 Sep 2017 23:49:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5310: Add COMPUTE STATS TABLESAMPLE.
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8136 ) Change subject: IMPALA-5310: Add COMPUTE STATS TABLESAMPLE. .. Patch Set 1: (26 comments) First pass. High level approach seems reasonable to me. Will take a look at the tests next. http://gerrit.cloudera.org:8080/#/c/8136/1/be/src/exec/catalog-op-executor.cc File be/src/exec/catalog-op-executor.cc: http://gerrit.cloudera.org:8080/#/c/8136/1/be/src/exec/catalog-op-executor.cc@218 PS1, Line 218: // The first column is the COUNT(*) expr of the original query. Can you put a similar comment above L206. I had a hard time figuring out what rows[0].colVals[0] represented in L208. http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@58 PS1, Line 58: Existing partition-level row counts are not modified. Mention what happens to the partitions that don't have the row count set. http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@60 PS1, Line 60: unmodified remove http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@61 PS1, Line 61: extrapolated Maybe add some details about how extrapolation is computed. I haven't looked at the entire patch so you may mention it elsewhere. If so, you can just point to the file that provides that description. http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@62 PS1, Line 62: so as not to confuse other engines like Hive/SparkSQL which may rely on : * the shared HMS fields being accurate. That part may be confusing to someone with no background knowledge. For instance, the fact that other engines require "accurate" statistics which we update using sampling and extrapolation :), kind of oxymoron. I think it is ok to just say that we store the extrapolated stats and remove that part. Thoughts? http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@64 PS1, Line 64: Independently, row-count extrapolation is also done during planning based on the : * numRows / totalSize ratio because the table contents may have changed since the : * last time COMPUTE STATS was run. Remove and put it near the relevant code (planning). http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@77 PS1, Line 77: Not supported for now to keep the logic/code simple. Ha, this is the opposite in the COMPUTE STATS case. http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@106 PS1, Line 106: totalFileBytes_ Can't you use the HdfsTable.totalFileBytes_ instead? http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@108 PS1, Line 108: Only set when : // TABLESAMPLE was specified. Set to -1 for non-HDFS tables Maybe "Set to -1 for non-HDFS tables or when TABLESAMPLE is not specified"? http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@305 PS1, Line 305: expectAllPartitions_ = !updateTableStatsOnly() So if I do not update table stats only I should expect to compute stats on all partitions? Essentially, is isIncremental check blended in this function? I think it may make sense to extract that check. http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@344 PS1, Line 344: expectAllPartitions_ = false; Why is this needed? Isn't L305 sufficient? http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@440 PS1, Line 440: // Only add group by clause for HdfsTables. Comment doesn't seem relevant to the code that follows. http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@442 PS1, Line 442: !updateTableStatsOnly() expectAllPartitions_? http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@452 PS1, Line 452: !updateTableStatsOnly() expectAllPartitions_? http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@506 PS1, Line 506: sets 'expectAllPartitions_' to false Hm, I don't see that in the code. http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@511 PS1, Line 511: base remove?
[Impala-ASF-CR] IMPALA-5448: fix invalid number of splits reported in Parquet scan node
Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8147 Change subject: IMPALA-5448: fix invalid number of splits reported in Parquet scan node .. IMPALA-5448: fix invalid number of splits reported in Parquet scan node Parquet splits with multi columns are marked as completed by using HdfsScanNodeBase::RangeComplete(). It duplicately counts the file types as column codec types. Thus the number of parquet splits are the real count multiplies number of materialized columns. Furthermore, according to the Parquet definition, it allows mixed compression codecs on different columns. This’s handled in this patch as well. A parquet file using gzip and snappy compression codec will be reported as: FileFormats: PARQUET/(GZIP,SNAPPY):1 This patch introduces a compression types set for the above cases. Testing: Add end-to-end tests handling parquet files with all columns compressed in snappy, and handling parquet files with multi compression codec. Change-Id: Iaacc2d775032f5707061e704f12e0a63cde695d1 --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/multi_compression_parquet_data/README A testdata/multi_compression_parquet_data/tinytable_0_gzip_snappy.parq A testdata/multi_compression_parquet_data/tinytable_1_snappy_gzip.parq A testdata/workloads/functional-query/queries/QueryTest/hdfs_parquet_scan_node_profile.test M tests/query_test/test_scanners.py 9 files changed, 85 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/8147/1 -- To view, visit http://gerrit.cloudera.org:8080/8147 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iaacc2d775032f5707061e704f12e0a63cde695d1 Gerrit-Change-Number: 8147 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang
[Impala-ASF-CR] IMPALA-5975: Work around broken beeline clients
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8132 ) Change subject: IMPALA-5975: Work around broken beeline clients .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1268/ -- To view, visit http://gerrit.cloudera.org:8080/8132 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id8b9f3dde4445513f1f389785a002c6cc6b3dada Gerrit-Change-Number: 8132 Gerrit-PatchSet: 2 Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Tue, 26 Sep 2017 23:26:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5975: Work around broken beeline clients
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8132 ) Change subject: IMPALA-5975: Work around broken beeline clients .. Patch Set 2: Code-Review+2 Carrying +2 -- To view, visit http://gerrit.cloudera.org:8080/8132 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id8b9f3dde4445513f1f389785a002c6cc6b3dada Gerrit-Change-Number: 8132 Gerrit-PatchSet: 2 Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Tue, 26 Sep 2017 23:25:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5940: Avoid log spew by using Status::Expected.
Mostafa Mokhtar has posted comments on this change. ( http://gerrit.cloudera.org:8080/8100 ) Change subject: IMPALA-5940: Avoid log spew by using Status::Expected. .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8100/2/be/src/runtime/client-cache.h File be/src/runtime/client-cache.h: http://gerrit.cloudera.org:8080/#/c/8100/2/be/src/runtime/client-cache.h@243 PS2, Line 243: return Status::NoTrace(TErrorCode::RPC_RECV_TIMEOUT, strings::Substitute( Is it necessary to have Status write the message to the log? is it not sufficient to write an info message via LOG.info. -- To view, visit http://gerrit.cloudera.org:8080/8100 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I38088482377a1c3e794a9c8178ef83f29957a330 Gerrit-Change-Number: 8100 Gerrit-PatchSet: 2 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Tue, 26 Sep 2017 23:08:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5940: Avoid log spew by using Status::Expected.
Hello Alex Behm, Mostafa Mokhtar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8100 to look at the new patch set (#2). Change subject: IMPALA-5940: Avoid log spew by using Status::Expected. .. IMPALA-5940: Avoid log spew by using Status::Expected. In IMPALA-5926, we fixed a case where closing the session triggered a stack trace in the logs which impacted performance for short-running queries. Looking at log files from several active clusters, I identified a few other cases where we could clean up log spew with the same (trivial) approach. The following messages will no longer log: "Failed to codegen MaterializeTuple() due to unsupported type: $0" "Not implemented for this format." "ScalarFnCall Codegen not supported for CHAR" "Could not codegen CodegenMaterializeExprs: " "Could not codegen TupleRowComparator::Compare(): $0" These codegen related messages were happening at every execution node, and were therefore very common. In addition, the following messages, which were very frequent in the logs, now will log, but without the back trace: "... Client ... timed-out during recv call." "Query Id ... not found." To preserve the source of the messages, I used preprocessor macros __FILE__ and __LINE__ and stuffed them into the error string. As you can see in the worked example in the testing section below, __FILE__ includes the full source code path, which is uglier than the logging equivalent. (glog has a helper called "const_basename" which handles this at https://github.com/google/glog/blob/2a6df66252d266080489c310b8146e63b66b2add/src/utilities.cc#L279) The snippet I used to identify these was: find . -type f -name '*IMPALAD*.gz' | xargs gzcat | awk '/^I/ { if(x) { print x; } x = "" } /status.cc/ { x=" "; } { if(x) { x=x $0 } }' | sed -e 's/0x[0-9a-fx]* //g' | sed -e 's/[0-9a-f]\{16\}:[0-9a-f]*/QUERYID/g' | tr -s '\t' ' ' | tr '[0-9]' 'N' | sort | uniq -c | sort -n | tee output.txt I also analyzed some logs using SQL, against a pre-processed logs table: with v as ( select regexp_replace( regexp_replace( translate(substr(message, 42), "\n\t", " "), "[a-zA-Z0-9.-]*[.][a-zA-Z0-9-]*:[0-9]*", ""), "@.*$", "@@@...") as m from logs_table where `class`="status.cc") select m, count(*) from v group by 1 order by 2 desc limit 100 Testing: * Automated tests. * Manual testing for one NoTrace code path: $ impala-python shell/gen-py/ImpalaService/ImpalaService-remote -h localhost:21000 GetRuntimeProfile 'beeswaxd.ttypes.QueryHandle()' Traceback (most recent call last): File "shell/gen-py/ImpalaService/ImpalaService-remote", line 106, in pp.pprint(client.GetRuntimeProfile(eval(args[0]),)) File "/home/philip/src/impala/shell/gen-py/ImpalaService/ImpalaService.py", line 161, in GetRuntimeProfile return self.recv_GetRuntimeProfile() File "/home/philip/src/impala/shell/gen-py/ImpalaService/ImpalaService.py", line 184, in recv_GetRuntimeProfile raise result.error beeswaxd.ttypes.BeeswaxException: BeeswaxException(handle=QueryHandle(log_context='', id=''), log_context='', SQLState='HY000', _message='GetRuntimeProfile error: /home/philip/src/impala/be/src/service/impala-server.cc:641: Query id 0:0 not found.\n', errorCode=0) $ tail -n 1 logs/cluster/impalad.INFO I0926 14:51:19.867837 30582 status.cc:147] /home/philip/src/impala/be/src/service/impala-server.cc:641: Query id 0:0 not found. Change-Id: I38088482377a1c3e794a9c8178ef83f29957a330 --- M be/src/common/status.cc M be/src/common/status.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exprs/scalar-fn-call.cc M be/src/runtime/client-cache.h M be/src/runtime/tuple.cc M be/src/service/impala-server.cc M be/src/util/tuple-row-compare.cc 9 files changed, 70 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8100/2 -- To view, visit http://gerrit.cloudera.org:8080/8100 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I38088482377a1c3e794a9c8178ef83f29957a330 Gerrit-Change-Number: 8100 Gerrit-PatchSet: 2 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-5940. Avoid log spew by using Status::Expected.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8100 ) Change subject: IMPALA-5940. Avoid log spew by using Status::Expected. .. Patch Set 1: (1 comment) Thanks for the review! I added in the two messages that Mostafa mentioned on the JIRA. Doing so introduced two complexities: * Whereas I believe we could never log the messages in the first draft, the new messages being silenced here (especially RPC timeout) should be logged. * Now that there's no stack trace, the fact that the messages come as being from "status.cc" is not useful. I see the following ways forward: * Use draft 1 to silence the easy messages; ignore this new draft 2. * Use draft 2. Notably, it uses the __FILE__ preprocessor macro. __func__ is also a possibility. * Change the usage pattern. For example STATUS_AND_LOG(...) could be a macro that does VLOG and the Status generation. I was very hesitant to add macros for this so didn't go that route. * Something else? I want to highlight a few more things about patch 2: * I started off with a signature like Status(..., bool silent, bool omit_backtrace), but there were only 3 possible good states. I had bugs in my implementation, so decided on an enum instead. It's all private, so there's little API risk here. * There's a copy of ErrorMsg going on when Status::NoTrace() takes an error code and a string. We do several of them in various Status signatures. Given that we used to also compute an entire back trace, I think this is ok. * I didn't add a new test. I didn't find any tests that look like they're testing our logging behavior. Happy for suggestions, though I think it's overkill. * I don't know how to reproduce RPC timeouts. I "kill -STOP"ed an impalad, but that wasn't enough. Curious for pointers. http://gerrit.cloudera.org:8080/#/c/8100/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8100/1//COMMIT_MSG@7 PS1, Line 7: IMPALA-5940. Avoid log spew by using Status::Expected. > IMPALA-5940: (colon not dot) Done -- To view, visit http://gerrit.cloudera.org:8080/8100 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I38088482377a1c3e794a9c8178ef83f29957a330 Gerrit-Change-Number: 8100 Gerrit-PatchSet: 1 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Tue, 26 Sep 2017 23:03:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5975: Work around broken beeline clients
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8132 ) Change subject: IMPALA-5975: Work around broken beeline clients .. Patch Set 1: Thanks for testing! Still lgtm. -- To view, visit http://gerrit.cloudera.org:8080/8132 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id8b9f3dde4445513f1f389785a002c6cc6b3dada Gerrit-Change-Number: 8132 Gerrit-PatchSet: 1 Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Tue, 26 Sep 2017 22:39:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5975: Work around broken beeline clients
Zach Amsden has posted comments on this change. ( http://gerrit.cloudera.org:8080/8132 ) Change subject: IMPALA-5975: Work around broken beeline clients .. Patch Set 1: >From private build and test 6418, 14:21:05 Executing: create-load-data.sh 14:21:05 Loading Hive Builtins (logging to /data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/load-hive-builtins.log)... 14:23:20 OK (Took: 2 min 15 sec) 14:23:20 Generating HBase data (logging to /data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/create-hbase.log)... 14:23:49 OK (Took: 0 min 29 sec) 14:23:49 Creating /test-warehouse HDFS directory (logging to /data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/create-test-warehouse-dir.log)... 14:23:53 OK (Took: 0 min 4 sec) 14:23:53 Derived params for create-load-data.sh: 14:23:53 EXPLORATION_STRATEGY=exhaustive 14:23:53 SKIP_METADATA_LOAD=0 14:23:53 SKIP_SNAPSHOT_LOAD=0 14:23:53 SNAPSHOT_FILE= 14:23:53 CM_HOST= 14:23:53 REMOTE_LOAD= 14:23:53 Starting Impala cluster (logging to /data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/start-impala-cluster.log)... 14:24:06 OK (Took: 0 min 13 sec) 14:24:06 Setting up HDFS environment (logging to /data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/setup-hdfs-env.log)... 14:24:21 OK (Took: 0 min 15 sec) 14:24:21 Loading custom schemas (logging to /data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/load-custom-schemas.log)... 14:25:24 OK (Took: 1 min 3 sec) 14:25:24 Loading functional-query data (logging to /data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/load-functional-query.log)... 15:17:32 OK (Took: 52 min 8 sec) 15:17:32 Loading TPC-H data (logging to /data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/load-tpch.log)... 15:42:56 OK (Took: 25 min 24 sec) 15:42:56 Loading nested data (logging to /data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/load-nested.log)... 15:48:33 OK (Took: 5 min 37 sec) 15:48:33 Loading TPC-DS data (logging to /data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/load-tpcds.log)... 16:16:20 OK (Took: 27 min 46 sec) 16:16:20 Loading auxiliary workloads (logging to /data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/load-aux-workloads.log)... 16:27:10 OK (Took: 10 min 51 sec) 16:27:10 Loading dependent tables (logging to /data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/copy-and-load-dependent-tables.log)... 16:28:18 OK (Took: 1 min 7 sec) 16:28:18 Loading custom data (logging to /data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/load-custom-data.log)... 16:30:16 OK (Took: 1 min 58 sec) 16:30:16 Creating many block table (logging to /data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/create-table-many-blocks.log)... 16:31:18 OK (Took: 1 min 2 sec) 16:31:18 Loading Kudu functional (logging to /data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/load-kudu.log)... 16:35:57 OK (Took: 4 min 39 sec) 16:35:57 Loading Kudu TPCH (logging to /data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/load-kudu-tpch.log)... 16:42:17 OK (Took: 6 min 20 sec) 16:42:17 Loading Hive UDFs (logging to /data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/build-and-copy-hive-udfs.log)... 16:43:46 OK (Took: 1 min 29 sec) 16:43:46 Running custom post-load steps (logging to /data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/custom-post-load-steps.log)... 16:44:20 OK (Took: 0 min 34 sec) 16:44:20 Caching test tables (logging to /data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/cache-test-tables.log)... 16:44:28 OK (Took: 0 min 8 sec) 16:44:28 Loading external data sources (logging to /data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/load-ext-data-source.log)... 16:44:39 OK (Took: 0 min 11 sec) 16:44:39 Splitting HBase (logging to /data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/create-hbase.log)... 16:45:15 OK (Took: 0 min 36 sec) 16:45:15 Creating internal HBase table (logging to /data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/create-internal-hbase-table.log)... 16:46:01 OK (Took: 0 min 46 sec) 16:46:01 Computing table stats (logging to /data/jenkins/workspace/impala-private-build-and-test/repos/Impala/logs/data_loading/compute-table-stats.log)... 16:58:20 OK
[Impala-ASF-CR] IMPALA-4682 Fix IllegalStateException issue
anujphadke has posted comments on this change. ( http://gerrit.cloudera.org:8080/8143 ) Change subject: IMPALA-4682 Fix IllegalStateException issue .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/8143/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8143/1//COMMIT_MSG@7 PS1, Line 7: IMPALA-4682 Fix IllegalStateException issue Can you state the issue in short? http://gerrit.cloudera.org:8080/#/c/8143/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8143/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@731 PS1, Line 731: + selectListItem.toSql()); Can we add a test for this? -- To view, visit http://gerrit.cloudera.org:8080/8143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I57c20aeed401275d45913fedfd61c206c38641b7 Gerrit-Change-Number: 8143 Gerrit-PatchSet: 1 Gerrit-Owner: Zoram ThangaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Tue, 26 Sep 2017 21:44:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4682 Fix IllegalStateException issue
Zoram Thanga has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8143 Change subject: IMPALA-4682 Fix IllegalStateException issue .. IMPALA-4682 Fix IllegalStateException issue When one runs a query like 'select * from t order by count(a)' we are throwing an IllegalStateException. The desired behavior here is to handle the problem as an AnalysisExeption with a more meaningful error message. This patch fixes the handling of '*' in this context, so that the error becomes "ERROR: AnalysisException: select list expression not produced by aggregation output (missing from GROUP BY clause?): *" Note that the second changed line is required because selectListItem.Expr_ is null when SelectListItem.isStar_ is true. Change-Id: I57c20aeed401275d45913fedfd61c206c38641b7 --- M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java 1 file changed, 1 insertion(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/8143/1 -- To view, visit http://gerrit.cloudera.org:8080/8143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I57c20aeed401275d45913fedfd61c206c38641b7 Gerrit-Change-Number: 8143 Gerrit-PatchSet: 1 Gerrit-Owner: Zoram Thanga
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/7731 ) Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. IMPALA-5538: Use explicit catalog versions for deleted objects This commit changes the way deletions are handled in the catalog and disseminated to the impalad nodes through the statestore. Previously, deletions of catalog objects were not explicitly annotated with the catalog version in which the deletion occured and the impalads were using the max catalog version in a catalog update in order to decide whether a deletion should be applied to the local catalog cache or not. This works correctly under the assumption that all the changes that occurred in the catalog between an update's min and max catalog version are included in that update, i.e. no gaps or missing updates. With the upcoming fix for IMPALA-5058, that constraint will be relaxed, thus allowing for gaps in the catalog updates. To avoid breaking the existing behavior, this patch introduced the following changes: * Deletions in the catalog are explicitly recorded in a log with the catalog version in which they occurred. As before, added and deleted catalog objects are sent to the statestore. * Topic entries associated with deleted catalog objects have non-empty values (besided keys) that contain minimal object metadata including the catalog version. * Statestore is no longer using the existence or not of topic entry values in order to identify deleted topic entries. Deleted topic entries should be explicitly marked as such by the statestore subscribers that produce them. * Statestore subscribers now use the 'deleted' flag to determine if a topic entry corresponds to a deleted item. * Impalads use the deleted objects' catalog versions when updating the local catalog cache from a catalog update and not the update's maximum catalog version. Testing: - No new tests were added as these paths are already exercised by existing tests. - Run all core tests. Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Reviewed-on: http://gerrit.cloudera.org:8080/7731 Reviewed-by: Dimitris TsirogiannisTested-by: Impala Public Jenkins --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalog-util.cc M be/src/catalog/catalog-util.h M be/src/catalog/catalog.cc M be/src/catalog/catalog.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/service/impala-server.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M common/thrift/CatalogInternalService.thrift M common/thrift/StatestoreService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M tests/statestore/test_statestore.py 21 files changed, 480 insertions(+), 441 deletions(-) Approvals: Dimitris Tsirogiannis: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-Change-Number: 7731 Gerrit-PatchSet: 9 Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7731 ) Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-Change-Number: 7731 Gerrit-PatchSet: 8 Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 26 Sep 2017 20:20:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5870: Improve runtime profile for partial sort
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8123 ) Change subject: IMPALA-5870: Improve runtime profile for partial sort .. Patch Set 3: The GVO failed because there was a kudu insert test that checked the profile for the 'SpilledRuns: 0' line. Since that is now gone, the test checks for 'SortType: Partial' (the real point of the test was that the query wouldn't have run previously due to the mem limit, which still applies, so we're not particularly losing any coverage). -- To view, visit http://gerrit.cloudera.org:8080/8123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b Gerrit-Change-Number: 8123 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 26 Sep 2017 19:46:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5870: Improve runtime profile for partial sort
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8123 to look at the new patch set (#3). Change subject: IMPALA-5870: Improve runtime profile for partial sort .. IMPALA-5870: Improve runtime profile for partial sort A recent change (IMPALA-5498) added the ability to do partial sorts, which divide their input up into runs each of which is sorted individually, avoiding the need to spill. Some of the debug output wasn't updated vs. regular sorts, leading to confusion. This patch removes the counters 'SpilledRuns' and 'MergesPerformed' since they will always be 0, and it renames the 'IntialRunsCreated' counter to 'RunsCreated' since the 'Initial' refers to the fact that in a regular sort those runs may be spilled or merged. It also adds a profile info string 'SortType' that can take the values 'Total', 'TopN', or 'Partial' to reflect the type of exec node being used. Example profile snippet for a partial sort: SORT_NODE (id=2):(Total: 403.261us, non-child: 382.029us, % non-child: 94.73%) SortType: Partial ExecOption: Codegen Enabled - NumRowsPerRun: (Avg: 44 (44) ; Min: 44 (44) ; Max: 44 (44) ; Number of samples: 1) - InMemorySortTime: 34.201us - PeakMemoryUsage: 2.02 MB (2117632) - RowsReturned: 44 (44) - RowsReturnedRate: 109.11 K/sec - RunsCreated: 1 (1) - SortDataSize: 572.00 B (572) Testing: - Manually ran several sorting queries and inspected their profiles - Updated a kudu_insert test that relied on the 'SpilledRuns' counter to be 0 for a partial sort. Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b --- M be/src/exec/partial-sort-node.cc M be/src/exec/sort-node.cc M be/src/exec/topn-node.cc M be/src/runtime/sorter.cc M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test 5 files changed, 11 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8123/3 -- To view, visit http://gerrit.cloudera.org:8080/8123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b Gerrit-Change-Number: 8123 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5376: Implement all TPCDS test cases or alternates for Impala.
Tim Wood has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/8140 ) Change subject: IMPALA-5376: Implement all TPCDS test cases or alternates for Impala. .. IMPALA-5376: Implement all TPCDS test cases or alternates for Impala. Main source for TPCDS query and result definitions: https://github.com/gregrahn/tpcds-kit. TPC-DS v2.5.0 qualification queries from G. Rahn, Cloudera, Inc. Data set constructed in mini-cluster using $IMPALA_HOME/buildall.sh -testdata This commit continues previous work on IMPALA-5376 in the ASF Impala repo and the Cloudera Gerrit service. This commit splits multi-query tests in the TPC-DS suite definition into one query and result set per test file, as the test framework requires. Names for such files have -1, -2... inner suffixes. The complete TPC-DS test suite runs with passes, skips and xfails, but no failures, as reflected by runs of $IMPALA_HOME/tests/run-tests.py query_test/test_tpcds_queries.py ... Expected result sets come from the TPC-DS kit. Some TPC-DS test cases in this commit have been modified in sematically-neutral ways so as to pass on Impala; others are marked to skip or xfail due to bugs. Tests that flap are marked to skip, with a bug ID, since they don't reliably pass or xfail. The tests/query_test/test_tpcds_queries.py driver file is authoritative for the active/skip/xfail status for each case and a brief reason. The following list describes the current status as: --- test-name deviance from TPC-DS spec changes made --- tpcds-q22a.test RESULT MISMATCH in LSD of AVG() values Fixed AVG()s --- tpcds-q30.test UNRECOGNIZED CHARACTER MARKED XFAIL, IMPALA-5961. --- tpcds-q31.test RESULT MISMATCH in LSD of DECIMAL values ADDED TRUNCATE(2)s AROUND LAST 4 COLUMNS. MARKED SKIP, IMPALA-5956 --- tpcds-q35a.test RESULT MISMATCH MARKED XFAIL, IMPALA-5950. --- tpcds-q36a.test RESULT MISMATCH MARKED XFAIL, IMPALA-4741 --- tpcds-q39.test MULTIPLE RESULT SET not recognized by test framework MARKED XFAIL. --- tpcds-q47.test RESULT MISMATCH in LSD of DECIMAL values ADDED TRUNCATE(2) TO 8th COLUMN OF WITH TABLE, TAKE ACTUAL RESULT AS EXPECTED. --- tpcds-q49.test RESULT MISMATCH in LSD of DECIMAL values MARKED XFAIL, IMPALA-5945 --- tpcds-q57.test RESULT MISMATCH, excess scale in DECIMAL values FIXED, ADDED TRUNCATE(2) AROUND 6th COLUMN. --- tpcds-q58.test RESULT MISMATCH in DECIMAL values MARKED XFAIL. IMPALA-5946 --- tpcds-q59.test RESULT MISMATCH, excess scale in DECIMAL values FIXED, ADDED TRUNCATE(2) AROUND 4th-10th COLUMNS. --- tpcds-q61.test RESULT MISMATCH in DECIMAL value FIXED. CAST RESULT QUOTIENT TO DECIMAL(15, 4), TAKE ACTUAL RESULT AS EXPECTED --- tpcds-q63.test RESULT MISMATCH, excess scale in DECIMAL values ADDED TRUNCATE(2) TO 3rd COLUMN --- tpcds-q64.test RESULT MISMATCH ADDED ORDER BY COLUMNS. --- tpcds-q66.test RESULT MISMATCH MARKED XFAIL, IMPALA-4741 --- tpcds-q77a.test RESULT MISMATCH FIXED. TAKE ACTUAL RESULT AS EXPECTED --- tpcds-q78.test RESULT MISMATCH FIXED. TAKE ACTUAL RESULT AS EXPECTED --- tpcds-q83.test RESULT MISMATCH MARKED XFAIL. IMPALA-5945. --- tpcds-q85.test MISSING TABLE "reason" MARKED XFAIL, IMPALA-5960 --- tpcds-q86a.test RESULT MISMATCH FIXED. TAKE ACTUAL RESULT AS EXPECTED --- tpcds-q89.test RESULT MISMATCH, DECIMAL values flap MARKED XFAIL. ADDED ROUND(2) TO 8th COLUMN, TAKE ACTUAL RESULTS AS EXPECTED, IMPALA-5956. --- tpcds-q90.test RESULT MISMATCH MARKED XFAIL, IMPALA-5945. --- tpcds-q93.test MISSING TABLE "reason" MARKED XFAIL, IMPALA-5960 --- tpcds-q98.test RESULT MISMATCH FIXED, ADDED ROUND() TO LAST COLUMN Trial run to suppress fractional comparison diffs. Still not sure of the general approach for DOUBLE & DECIMAL in all tests. Change-Id: I026a0123b035b6828ef35b599dda50aadf0171bb --- A testdata/workloads/tpcds/queries/tpcds-q10.test A testdata/workloads/tpcds/queries/tpcds-q10a.test A testdata/workloads/tpcds/queries/tpcds-q11.test A testdata/workloads/tpcds/queries/tpcds-q12.test A testdata/workloads/tpcds/queries/tpcds-q13.test A testdata/workloads/tpcds/queries/tpcds-q14-1.test A testdata/workloads/tpcds/queries/tpcds-q14-2.test A testdata/workloads/tpcds/queries/tpcds-q14a-1.test A testdata/workloads/tpcds/queries/tpcds-q14a-2.test A testdata/workloads/tpcds/queries/tpcds-q15.test A testdata/workloads/tpcds/queries/tpcds-q16.test A testdata/workloads/tpcds/queries/tpcds-q17.test A testdata/workloads/tpcds/queries/tpcds-q18.test A testdata/workloads/tpcds/queries/tpcds-q18a.test A testdata/workloads/tpcds/queries/tpcds-q20.test A testdata/workloads/tpcds/queries/tpcds-q21.test A testdata/workloads/tpcds/queries/tpcds-q22.test A testdata/workloads/tpcds/queries/tpcds-q22a.test M testdata/workloads/tpcds/queries/tpcds-q23-1.test M testdata/workloads/tpcds/queries/tpcds-q23-2.test A testdata/workloads/tpcds/queries/tpcds-q24-1.test A testdata/workloads/tpcds/queries/tpcds-q24-2.test A
[Impala-ASF-CR] Trial run to suppress fractional comparison diffs. Still not sure of the general approach for DOUBLE & DECIMAL in all tests.
Tim Wood has restored this change. ( http://gerrit.cloudera.org:8080/8140 ) Change subject: Trial run to suppress fractional comparison diffs. Still not sure of the general approach for DOUBLE & DECIMAL in all tests. .. Restored -- To view, visit http://gerrit.cloudera.org:8080/8140 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: restore Gerrit-Change-Id: I026a0123b035b6828ef35b599dda50aadf0171bb Gerrit-Change-Number: 8140 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Wood
[Impala-ASF-CR] [DOCS] Fill in release note subtopics for Apache Impala 2.10
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/7958 ) Change subject: [DOCS] Fill in release note subtopics for Apache Impala 2.10 .. [DOCS] Fill in release note subtopics for Apache Impala 2.10 Primarily just pointing to the list of issues in the changelog. Those cover the different use cases for the different parts of the release notes -- fixed issues, new features, and incompatible changes. Change-Id: Ide38c1e1c64dac287b180b39ebb8e7735d457ce3 Reviewed-on: http://gerrit.cloudera.org:8080/7958 Reviewed-by: Michael BrownTested-by: Impala Public Jenkins --- M docs/impala_keydefs.ditamap M docs/topics/impala_fixed_issues.xml M docs/topics/impala_incompatible_changes.xml M docs/topics/impala_new_features.xml 4 files changed, 54 insertions(+), 1 deletion(-) Approvals: Michael Brown: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ide38c1e1c64dac287b180b39ebb8e7735d457ce3 Gerrit-Change-Number: 7958 Gerrit-PatchSet: 4 Gerrit-Owner: John Russell Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] [DOCS] Fill in release note subtopics for Apache Impala 2.10
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7958 ) Change subject: [DOCS] Fill in release note subtopics for Apache Impala 2.10 .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide38c1e1c64dac287b180b39ebb8e7735d457ce3 Gerrit-Change-Number: 7958 Gerrit-PatchSet: 3 Gerrit-Owner: John RussellGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Comment-Date: Tue, 26 Sep 2017 18:44:18 + Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] Fill in release note subtopics for Apache Impala 2.10
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7958 ) Change subject: [DOCS] Fill in release note subtopics for Apache Impala 2.10 .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-docs-submit/155/ -- To view, visit http://gerrit.cloudera.org:8080/7958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide38c1e1c64dac287b180b39ebb8e7735d457ce3 Gerrit-Change-Number: 7958 Gerrit-PatchSet: 3 Gerrit-Owner: John RussellGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Comment-Date: Tue, 26 Sep 2017 18:39:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8023 ) Change subject: IMPALA-4856: Port data stream service to KRPC .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-mgr.cc File be/src/runtime/krpc-data-stream-mgr.cc: http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-mgr.cc@198 PS1, Line 198: AddData Isn't the point of the deserialize pool to deserialize the payload early? Here, we're just calling AddData() on the payloads for early senders after the corresponding receiver has been created. http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-recvr.cc File be/src/runtime/krpc-data-stream-recvr.cc: http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-recvr.cc@165 PS1, Line 165: Either we'll consume a row batch from batch_queue_, or it's empty Shouldn't there always be something in the batch_queue_ if there's something in the blocked_senders_ list? Since we fill the blocked_senders_ only if the queue is at its limit. And we also logically service the batches from batch_queue_ first before servicing the batches from the blocked_senders_ list. http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-recvr.cc@166 PS1, Line 166: There is a window Just to make things clearer, could you specify what there's a window for? http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-recvr.cc@225 PS1, Line 225: There is a problem here. When we release lock_here, an arbitrary number of senders could call AddBatch(), and all their batches would get enqueued even though the ExceedsLimit() would be true. This breaks the guarantee of the queue not being over committed more than a single batch. http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-recvr.cc@284 PS1, Line 284: for (const auto& queue_entry: batch_queue_) delete queue_entry.second; batch_queue_.clear() ? -- To view, visit http://gerrit.cloudera.org:8080/8023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1 Gerrit-Change-Number: 8023 Gerrit-PatchSet: 1 Gerrit-Owner: Michael HoGerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Tue, 26 Sep 2017 18:11:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Trial run to suppress fractional comparison diffs. Still not sure of the general approach for DOUBLE & DECIMAL in all tests.
Tim Wood has abandoned this change. ( http://gerrit.cloudera.org:8080/8140 ) Change subject: Trial run to suppress fractional comparison diffs. Still not sure of the general approach for DOUBLE & DECIMAL in all tests. .. Abandoned Wrong Gerrit changeID. -- To view, visit http://gerrit.cloudera.org:8080/8140 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I026a0123b035b6828ef35b599dda50aadf0171bb Gerrit-Change-Number: 8140 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Wood
[Impala-ASF-CR] Trial run to suppress fractional comparison diffs. Still not sure of the general approach for DOUBLE & DECIMAL in all tests.
Tim Wood has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8140 Change subject: Trial run to suppress fractional comparison diffs. Still not sure of the general approach for DOUBLE & DECIMAL in all tests. .. Trial run to suppress fractional comparison diffs. Still not sure of the general approach for DOUBLE & DECIMAL in all tests. Change-Id: I026a0123b035b6828ef35b599dda50aadf0171bb --- M testdata/workloads/tpcds/queries/tpcds-q26.test M testdata/workloads/tpcds/queries/tpcds-q53.test 2 files changed, 3 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/8140/1 -- To view, visit http://gerrit.cloudera.org:8080/8140 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I026a0123b035b6828ef35b599dda50aadf0171bb Gerrit-Change-Number: 8140 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Wood
[Impala-ASF-CR] IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet
Hello Joe McDonnell, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8056 to look at the new patch set (#5). Change subject: IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet .. IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet test_scanners_fuzz.py currently tests compressed parquet but does not test uncompressed parquet. This fix adds a new test case for uncompressed parquet. Change-Id: I760de7203a51cf82b16016fa8043cadc7c8325bc --- M tests/query_test/test_scanners_fuzz.py 1 file changed, 42 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/8056/5 -- To view, visit http://gerrit.cloudera.org:8080/8056 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I760de7203a51cf82b16016fa8043cadc7c8325bc Gerrit-Change-Number: 8056 Gerrit-PatchSet: 5 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Joe McDonnellGerrit-Reviewer: Pranay Singh
[Impala-ASF-CR] IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet
Pranay Singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/8056 ) Change subject: IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet .. Patch Set 4: (5 comments) http://gerrit.cloudera.org:8080/#/c/8056/2/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/8056/2/tests/query_test/test_scanners_fuzz.py@97 PS2, Line 97: if table_format.file_format != 'parquet': pytest.sk > Another option is to keep the clone code in run_fuzz_test, but change the a Done http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@101 PS4, Line 101: """ Clone an existing parquet table with codec as none in the : unique database. This cloned table is passed to run_fuzz_test : which clones the table and corrupts the table. The test later : checks that there is no crash while performing SQL queries on : a corrupt table. : """ > I think this comment should focus on why this test is different from the ot Done http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@111 PS4, Line 111: db_name = unique_database > I would prefer to emphasize that the source and destination are the unique_ Done http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@117 PS4, Line 117: functional_parquet.alltypes > Can we extend this to do fuzzing on decimal_tbl as well? I was thinking thi Done http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@118 PS4, Line 118: .format(fq_tbl_name)) > This indentation is a bit awkward. I don't think .format should be on its o Moved the format in the same line as select. -- To view, visit http://gerrit.cloudera.org:8080/8056 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I760de7203a51cf82b16016fa8043cadc7c8325bc Gerrit-Change-Number: 8056 Gerrit-PatchSet: 4 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Joe McDonnellGerrit-Reviewer: Pranay Singh Gerrit-Comment-Date: Tue, 26 Sep 2017 17:29:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7731 ) Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1267/ -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-Change-Number: 7731 Gerrit-PatchSet: 8 Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 26 Sep 2017 16:20:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/7731 ) Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 8: Code-Review+2 Rebase and keep Dan's +2 -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-Change-Number: 7731 Gerrit-PatchSet: 8 Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 26 Sep 2017 16:20:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5983: Fix crash in to/from utc timestamp("10:00:00", 'MSK')
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8139 ) Change subject: IMPALA-5983: Fix crash in to/from_utc_timestamp("10:00:00", 'MSK') .. Patch Set 1: Thanks for fixing this. Can you add some tests? -- To view, visit http://gerrit.cloudera.org:8080/8139 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I592389333a16a15a680beed389e2702dfc16fe1d Gerrit-Change-Number: 8139 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Tue, 26 Sep 2017 16:16:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5983: Fix crash in to/from utc timestamp("10:00:00", 'MSK')
Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8139 Change subject: IMPALA-5983: Fix crash in to/from_utc_timestamp("10:00:00", 'MSK') .. IMPALA-5983: Fix crash in to/from_utc_timestamp("10:00:00", 'MSK') Moscow timezone is handled differrently than other timezones, and it was possible to cause unhandled boost exception by trying to convert "dateless" timestamps like "10:00:00". These timestamps cannot be handled by timestamp conversion generally, because daylight saving time logic needs month and day to work correctly. The condition HasDateOrTime() incorrectly suggested that these timestamps can be handled, so I made the condition stricter. Change-Id: I592389333a16a15a680beed389e2702dfc16fe1d --- M be/src/exprs/timestamp-functions.cc 1 file changed, 2 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/8139/1 -- To view, visit http://gerrit.cloudera.org:8080/8139 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I592389333a16a15a680beed389e2702dfc16fe1d Gerrit-Change-Number: 8139 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer
[Impala-ASF-CR] IMPALA-4786: Clean up how ImpalaServers are created
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8076 ) Change subject: IMPALA-4786: Clean up how ImpalaServers are created .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/8076/2/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8076/2/be/src/service/impala-server.h@999 PS2, Line 999: /// Init() were <= 0. > add to comment: Done http://gerrit.cloudera.org:8080/#/c/8076/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8076/2/be/src/service/impala-server.cc@2078 PS2, Line 2078: } > i think we still need to reset these pointers since they hold a shared_ptr Done -- To view, visit http://gerrit.cloudera.org:8080/8076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If388c5618258a9c4529cd1d63e956566b92bd0d8 Gerrit-Change-Number: 8076 Gerrit-PatchSet: 3 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Tue, 26 Sep 2017 14:38:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4786: Clean up how ImpalaServers are created
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8076 to look at the new patch set (#3). Change subject: IMPALA-4786: Clean up how ImpalaServers are created .. IMPALA-4786: Clean up how ImpalaServers are created ImpalaServer had to be created via an awkward CreateImpalaServer() method that took a lot of arguments. This patch refactors that code (in anticipation of some KRPC changes) to follow a more normal lifecycle: 1. ImpalaServer* server = new ImpalaServer(ExecEnv*) 2. RETURN_IF_ERROR(server->Init()) // for error-returning init operations 3. RETURN_IF_ERROR(server->Start()) 4. server->Join() Also add ExecEnv::Init(), and move calls to ExecEnv::StartServices() to ImpalaServer::StartServices(). This captures a dependency that KRPC will rely on - where initialization of both ExecEnv and ImpalaServer need to happen before services are started. This sets up a clean-up of InProcessImpalaServer, which is too heavy for the work that it does. That work is deferred to a follow-on patch. This is a slightly cleaned up version of Henry's abandoned patch: https://gerrit.cloudera.org/#/c/7673/ Change-Id: If388c5618258a9c4529cd1d63e956566b92bd0d8 --- M be/src/exprs/expr-test.cc M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/scheduling/scheduler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/impalad-main.cc M be/src/testutil/in-process-servers.cc M be/src/testutil/in-process-servers.h M be/src/util/hdfs-util-test.cc 10 files changed, 128 insertions(+), 137 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/8076/3 -- To view, visit http://gerrit.cloudera.org:8080/8076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If388c5618258a9c4529cd1d63e956566b92bd0d8 Gerrit-Change-Number: 8076 Gerrit-PatchSet: 3 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Sailesh Mukil