[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has uploaded a new patch set (#5). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default 2. Share the file handle between scan ranges from the same file The scan range no longer maintains its own Hdfs file handle. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. Scan ranges that are accessing data cached by Hdfs will get a file handle from the cache, but the file handle will be kept on the scan range for the time that the scan range is in use. This prevents the cache from closing the file handle while the data buffer is in use. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0, file handle caching is off and the previous behavior applies. Tests: test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". In the delete case, we expect zero rows. In the move case, we expect the same number of rows. TODO: 1. Determine appropriate defaults. 2. Other tests a. File overwrite b. Any others? 3. For scan ranages that use Hdfs caching, should there be some sharing at the scanner level? Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h A be/src/runtime/disk-io-mgr-handle-cache.h A be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/query_test/test_hdfs_fd_caching.py 11 files changed, 732 insertions(+), 368 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/5 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-2550: Switch to per-query exec rpc .. Patch Set 14: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel KornackerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5197: Erroneous corrupted Parquet file message
Hello Alex Behm, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6787 to look at the new patch set (#6). Change subject: IMPALA-5197: Erroneous corrupted Parquet file message .. IMPALA-5197: Erroneous corrupted Parquet file message The Parquet file column reader may fail in the middle of producing a scratch tuple batch for various reasons such as exceeding memory limit or cancellation. In which case, the scratch tuple batch may not have materialized all the rows in a row group. We shouldn't erroneously report that the file is corrupted in this case as the column reader didn't completely read the entire row group. A new test case is added to verify that we won't see this error message. A new failpoint phase GETNEXT_SCANNER is also added to differentiate it from the GETNEXT in the scan node itself. Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M common/thrift/PlanNodes.thrift M tests/common/impala_service.py M tests/failure/test_failpoints.py 9 files changed, 40 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/6787/6 -- To view, visit http://gerrit.cloudera.org:8080/6787 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Wood
[Impala-ASF-CR] IMPALA-5197: Erroneous corrupted Parquet file message
Michael Ho has posted comments on this change. Change subject: IMPALA-5197: Erroneous corrupted Parquet file message .. Patch Set 5: Code-Review+2 (1 comment) Rebase. Carry +2. http://gerrit.cloudera.org:8080/#/c/6787/4/tests/failure/test_failpoints.py File tests/failure/test_failpoints.py: Line 140: # IMPALA-5197: None of the debug action should trigger corrupted file message > actions Done -- To view, visit http://gerrit.cloudera.org:8080/6787 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Wood Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization
Alex Behm has posted comments on this change. Change subject: IMPALA-5036: Parquet count star optimization .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/6812/1//COMMIT_MSG Commit Message: PS1, Line 10: statistic > How about "we use the Parquet field RowGroup.num_rows"? Works for me. http://gerrit.cloudera.org:8080/#/c/6812/1/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 440: *dst_slot = file_metadata_.row_groups[row_group_idx_].num_rows; > There's also FileMetaData::num_rows. Can't we use that instead of looping o We could, but not sure it's worth it. One scanner does not necessarily process an entire Parquet file, so we'd need to make sure that exactly one scanner thread deals with the entire file just for this special case. Taras, maybe you can take a look and see how invasive that would be? Line 1455: // Column readers are not needed because we are not reading from any columns if this > Can we then optimize something like The transformation is only valid if l_comment is non-nullable. We have no concept of nullability for HDFS tables. http://gerrit.cloudera.org:8080/#/c/6812/1/testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test File testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test: Line 34: | output: sum_zero_if_empty(functional_parquet.alltypes.parquet-stats: num_rows) > i don't know what this means. Personally, I prefer to show what is actually being executed in the explain plan. Otherwise, if something goes wrong it could be hard to debug because we do not know which code path it is taking. Do you have an alternative proposal for showing that the optimized path is being taken? How would we debug/support/test this feature? How will users understand their the query plan? Let's start a thread/doc about these usability/supportability issues. -- To view, visit http://gerrit.cloudera.org:8080/6812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5137: Support Kudu UNIXTIME MICROS as Impala TIMESTAMP
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP .. Patch Set 7: (2 comments) Thanks for the reviews. I'm going to squash in the "TimestampValue constructors refactoring" change [1] since that also has a +2 and it's related. 1: https://gerrit.cloudera.org/#/c/6510/11 http://gerrit.cloudera.org:8080/#/c/6526/7/be/src/exec/kudu-util.h File be/src/exec/kudu-util.h: Line 89: if (COPY_STRINGS) { > it doesn't feel like this particular if should be very performance-relevant Done http://gerrit.cloudera.org:8080/#/c/6526/7/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: Line 1: // Licensed to the Apache Software Foundation (ASF) under one > this file is included in bunch of .h files. now that you pulled out a bunch Good call -- To view, visit http://gerrit.cloudera.org:8080/6526 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5287: Test skip.header.line.count on gzip .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5287: Test skip.header.line.count on gzip .. IMPALA-5287: Test skip.header.line.count on gzip This change fixed IMPALA-4873 by adding the capability to supply a dict 'test_file_vars' to run_test_case(). Keys in this dict will be replaced with their values inside test queries before they are executed. Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70 Reviewed-on: http://gerrit.cloudera.org:8080/6817 Reviewed-by: Michael BrownTested-by: Impala Public Jenkins --- M testdata/bin/generate-schema-statements.py M testdata/data/README A testdata/data/table_with_header.gz A testdata/data/table_with_header_2.gz M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/QueryTest/hdfs-text-scan-with-header.test M tests/common/impala_test_suite.py M tests/query_test/test_scanners.py 9 files changed, 75 insertions(+), 39 deletions(-) Approvals: Impala Public Jenkins: Verified Michael Brown: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization
Mostafa Mokhtar has posted comments on this change. Change subject: IMPALA-5036: Parquet count star optimization .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/6812/1/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 455: DCHECK_LE(row_group_rows_read_, file_metadata_.num_rows); What if file_metadata_.num_rows or file_metadata_.row_groups[row_group_idx_].num_rows have negative values? We have seen cases where a single file had too many rows which causes an overflow and stats had a negative value. Line 1455: // Column readers are not needed because we are not reading from any columns if this > DCHECK that there is exactly one materialized slot Can we then optimize something like select count(l_comment) from lineitem to select count(*) from lineitem The later is 7x faster. http://gerrit.cloudera.org:8080/#/c/6812/1/testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test File testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test: Line 34: | output: sum_zero_if_empty(functional_parquet.alltypes.parquet-stats: num_rows) > i don't know what this means. Why do we need to print this information in the plan? Won't this be enabled for all Parquet files moving forward? -- To view, visit http://gerrit.cloudera.org:8080/6812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5245: fix ASAN buffer-allocator-test
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5245: fix ASAN buffer-allocator-test .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f295f841b8b6e7996d54bdf2dfc0a092dcb864c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5245: fix ASAN buffer-allocator-test
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5245: fix ASAN buffer-allocator-test .. IMPALA-5245: fix ASAN buffer-allocator-test * Use the allocator_may_return_null=1 ASAN option so that the allocation just fails instead of crashing the process. * Work around an bug in the implementation of the option where posix_memalign() does not return ENOMEM. I filed an upstream bug https://bugs.llvm.org/show_bug.cgi?id=32968. Change-Id: I2f295f841b8b6e7996d54bdf2dfc0a092dcb864c Reviewed-on: http://gerrit.cloudera.org:8080/6819 Reviewed-by: Jim AppleTested-by: Impala Public Jenkins --- M be/src/runtime/bufferpool/buffer-allocator-test.cc M be/src/runtime/bufferpool/system-allocator.cc M be/src/testutil/mem-util.h M be/src/util/aligned-new.h M be/src/util/bloom-filter.cc M bin/run-backend-tests.sh M bin/start-catalogd.sh M bin/start-impalad.sh M bin/start-statestored.sh 9 files changed, 12 insertions(+), 10 deletions(-) Approvals: Impala Public Jenkins: Verified Jim Apple: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I2f295f841b8b6e7996d54bdf2dfc0a092dcb864c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4866: Hash join node does not apply limits correctly
anujphadke has posted comments on this change. Change subject: IMPALA-4866: Hash join node does not apply limits correctly .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/6778/1//COMMIT_MSG Commit Message: PS1, Line 11: exchange > typo Done http://gerrit.cloudera.org:8080/#/c/6778/1/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 506: DCHECK(status.ok()); > there are a lot of other places that this loop can exit. Can't the out_batc I have moved the check outside the while loop, so that this gets invoked everytime we break out of it. Ran a bunch of queries and tried to verify that the limit gets applied correctly Found a few cases where num_rows_returned was not incremented and the ReachedLimit check was not correctly applied. PS1, Line 580: > >= ? Removed this check now. Please ignore. Line 581: DCHECK(current_probe_row_ == NULL); > I think you need to update num_rows_returned_ as well Removed this check. Please ignore. http://gerrit.cloudera.org:8080/#/c/6778/1/tests/common/test_dimensions.py File tests/common/test_dimensions.py: PS1, Line 114: # Don't run with NUM_NODES=1 due to IMPALA-561 : # ALL_CLUSTER_SIZES = [0, 1] > update comment Ran the private tests by changing this value ( but without the fix to correctly apply the limits). Current tests did not catch this. I am reverting it back and adding a few tests which can catch this issue. -- To view, visit http://gerrit.cloudera.org:8080/6778 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadkeGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-5036: Parquet count star optimization .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6812/1/testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test File testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test: Line 34: | output: sum_zero_if_empty(functional_parquet.alltypes.parquet-stats: num_rows) i don't know what this means. could we reference 'count(*)' directly? that's going to be more intuitive than (num_rows) -- To view, visit http://gerrit.cloudera.org:8080/6812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4866: Hash join node does not apply limits correctly
anujphadke has uploaded a new patch set (#2). Change subject: IMPALA-4866: Hash join node does not apply limits correctly .. IMPALA-4866: Hash join node does not apply limits correctly Hash join node currently does not apply the limits correctly. This issue gets masked most of the times since the planner sticks an exchange node on top of most of the joins. This issue gets exposed when NUM_NODES=1. Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f --- M be/src/exec/partitioned-hash-join-node.cc A testdata/workloads/functional-query/queries/QueryTest/single-node-joins-with-limits.test M tests/query_test/test_join_queries.py 3 files changed, 56 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6778/2 -- To view, visit http://gerrit.cloudera.org:8080/6778 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I414124f8bb6f8b2af2df468e1c23418d05a0e29f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadkeGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-5137: Support Kudu UNIXTIME MICROS as Impala TIMESTAMP
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP .. Patch Set 7: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/6526/7/be/src/exec/kudu-util.h File be/src/exec/kudu-util.h: Line 89: if (COPY_STRINGS) { it doesn't feel like this particular if should be very performance-relevant. let's make it a standard function again, with an extra bool param. http://gerrit.cloudera.org:8080/#/c/6526/7/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: Line 1: // Licensed to the Apache Software Foundation (ASF) under one this file is included in bunch of .h files. now that you pulled out a bunch of function bodies, those includes may not be necessary anymore. (i see one in util/promise.h, for instance). would be nice to clean that up as well. doesn't have to be as part of this change, but it's good to do some housecleaning as you go. -- To view, visit http://gerrit.cloudera.org:8080/6526 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5085: large rows in BufferedTupleStreamV2
Tim Armstrong has uploaded a new patch set (#3). Change subject: IMPALA-5085: large rows in BufferedTupleStreamV2 .. IMPALA-5085: large rows in BufferedTupleStreamV2 The stream defaults to pages of default_page_len_. If a row doesn't fit in that page, it will allocate another page up to max_page_len_ bytes and append a single row to that page, then immediately advance to the next page. This means that when writing a stream, the large page only needs to be kept in memory temporarily, which helps with memory requirements. E.g. consider a hash join that is repartitioning 1 unpinned stream into 16 unpinned streams. We will need default_page_len_ * 14 + max_page_len_ * 2 buffers because when processing a large row we only need one large write buffer at a time. The large row cases are not as optimised for memory consumption or performance - queries processing very large numbers of large rows are an extreme edge case that is likely to hit other performance bottlenecks first. Pages with large rows can have up to 50% internal fragmentation. I tried to avoid adding empty pages to the stream, but it was tricky to avoid in the case of a read/write stream where the read iterator was already pointing at the write page - in that case we can end up with an empty page. To avoid duplicating more logic between AddRow() and AllocateRow() I restructured things so that AddRowSlow() is implemented in terms of AllocateRowSlow(). AllocateRow() now takes a function as an argument to populate the row. Testing: * Added tests for the case where 0 rows are added to the stream * Extend BigRow to exercise the new code. * Also test large strings and read/write streams. Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5 --- M be/src/runtime/buffered-tuple-stream-v2-test.cc M be/src/runtime/buffered-tuple-stream-v2.cc M be/src/runtime/buffered-tuple-stream-v2.h M be/src/runtime/buffered-tuple-stream-v2.inline.h M common/thrift/generate_error_codes.py 5 files changed, 537 insertions(+), 231 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/6638/3 -- To view, visit http://gerrit.cloudera.org:8080/6638 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5085: large rows in BufferedTupleStreamV2
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5085: large rows in BufferedTupleStreamV2 .. Patch Set 3: Rebased onto the async pin change -- To view, visit http://gerrit.cloudera.org:8080/6638 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 9: (5 comments) http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/aggregation-node.cc File be/src/exec/aggregation-node.cc: Line 66: singleton_intermediate_tuple_(nullptr), initialize in .h where possible Line 88: RowDescriptor build_row_desc(intermediate_tuple_desc_, false); i would intuitively think that the build row is the input row. Line 97: RETURN_IF_ERROR(build_expr->Init(state, build_row_desc)); it feels dicey to pass a reference to an automatic variable, Init() might hold on to it (and there is no reason it shouldn't - descriptors should last for the lifetime of the query). this row descriptor should be member var and created in the c'tor. Line 232: ScalarExprEvaluator* const* evaluators = _evaluators_[0]; isn't that the same as const X**? http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr.h File be/src/exprs/scalar-expr.h: Line 156: friend class AggFnEvaluator; why are these all friends? it looks like more of the "protected" functions need to be public. (why is Init() protected?) -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5158: Part 1: include untracked memory in MemTracker dumps
Tim Armstrong has uploaded a new patch set (#2). Change subject: IMPALA-5158: Part 1: include untracked memory in MemTracker dumps .. IMPALA-5158: Part 1: include untracked memory in MemTracker dumps The MemTracker dumps are easier to understand if all children of the process tracker actually add up to the process consumption. There is always a discrepancy, which corresponds (mostly) to untracked memory. This commit adds a new line item in the MemTracker dump which is the process consumption value minus the sum of all its children. Here is an example of the new output: Process: Limit=8.35 GB Total=123.74 MB Peak=219.93 MB Free Disk IO Buffers: Total=1.37 MB Peak=1.37 MB RequestPool=default-pool: Total=609.09 KB Peak=10.14 MB Query(fd40f6b7542f7e7c:4aef2723): Total=609.09 KB Peak=1.19 MB Fragment fd40f6b7542f7e7c:4aef2723: Total=609.09 KB Peak=1.19 MB AGGREGATION_NODE (id=3): Total=24.00 KB Peak=36.00 KB Exprs: Total=4.00 KB Peak=4.00 KB NESTED_LOOP_JOIN_NODE (id=2): Total=412.90 KB Peak=412.90 KB Nested Loop Join Builder: Total=392.00 KB Peak=392.00 KB HDFS_SCAN_NODE (id=1): Total=160.00 KB Peak=768.00 KB Exprs: Total=4.00 KB Peak=4.00 KB HDFS_SCAN_NODE (id=0): Total=0 Peak=344.00 KB PLAN_ROOT_SINK: Total=0 Peak=0 CodeGen: Total=4.19 KB Peak=267.50 KB Block Manager: Limit=6.68 GB Total=0 Peak=0 RequestPool=fe-eval-exprs: Total=0 Peak=4.00 KB Untracked Memory: Total=121.78 MB Change-Id: I77e5b2ef5f85f82ee0226e4dad638233fec9428f --- M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h 2 files changed, 46 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/6820/2 -- To view, visit http://gerrit.cloudera.org:8080/6820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I77e5b2ef5f85f82ee0226e4dad638233fec9428f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-2550: Switch to per-query exec rpc .. Patch Set 14: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/549/ -- To view, visit http://gerrit.cloudera.org:8080/6535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel KornackerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-2550: Switch to per-query exec rpc .. Patch Set 14: Code-Review+2 fixing some 'unused results' warnings. -- To view, visit http://gerrit.cloudera.org:8080/6535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel KornackerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc
Hello Impala Public Jenkins, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6535 to look at the new patch set (#14). Change subject: IMPALA-2550: Switch to per-query exec rpc .. IMPALA-2550: Switch to per-query exec rpc Coordinator: - FragmentInstanceState -> BackendState, which in turn records FragmentInstanceStats QueryState - does query-wide setup in a separate thread (which also launches the instance exec threads) - has a query-wide 'prepared' state at which point all static setup is done and all FragmentInstanceStates are accessible Also renamed QueryExecState to ClientRequestState. Simplified handling of execution status (in FragmentInstanceState): - status only transmitted via ReportExecStatus rpc - in particular, it's not returned anymore from the Cancel rpc FIS: Fixed bugs related to partially-prepared state (in Close() and ReleaseThreadToken()) Change-Id: I20769e420711737b6b385c744cef4851cee3facd --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/codegen-anyval.h M be/src/common/status.cc M be/src/common/status.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exprs/expr-test.cc M be/src/runtime/CMakeLists.txt M be/src/runtime/buffered-block-mgr-test.cc M be/src/runtime/buffered-tuple-stream-v2-test.cc M be/src/runtime/bufferpool/buffer-pool-test.cc A be/src/runtime/coordinator-backend-state.cc A be/src/runtime/coordinator-backend-state.h A be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/data-stream-test.cc A be/src/runtime/debug-options.cc A be/src/runtime/debug-options.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h D be/src/runtime/plan-fragment-executor.cc D be/src/runtime/plan-fragment-executor.h M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-exec-mgr.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/test-env.cc M be/src/runtime/test-env.h M be/src/scheduling/query-schedule.cc M be/src/scheduling/query-schedule.h M be/src/service/CMakeLists.txt M be/src/service/child-query.cc M be/src/service/child-query.h R be/src/service/client-request-state.cc R be/src/service/client-request-state.h M be/src/service/fe-support.cc M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-http-handler.cc M be/src/service/impala-internal-service.cc M be/src/service/impala-internal-service.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/testutil/desc-tbl-builder.cc M be/src/testutil/fault-injection-util.h M be/src/util/error-util-test.cc M be/src/util/error-util.cc M be/src/util/error-util.h M be/src/util/uid-util.h M common/thrift/ExecStats.thrift M common/thrift/ImpalaInternalService.thrift M tests/common/test_result_verifier.py 73 files changed, 3,884 insertions(+), 3,782 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/6535/14 -- To view, visit http://gerrit.cloudera.org:8080/6535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel KornackerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
If you don't populate it for CREATE TABLE SORT BY() then I think you should remove it altogether. Dimitris On Mon, May 8, 2017 at 4:33 PM, Lars Volkerwrote: > > > On Tue, May 9, 2017 at 1:24 AM, Dimitris Tsirogiannis < > dtsirogian...@cloudera.com> wrote: > >> The other alternative would be to populate it with the partitioning >> columns, if any. Thoughts? >> >> > Adding the partitioning columns to the sort by list is not supported. They > can be added to the pre-insert sort not using the clustered hint. I'm not > sure though if I understood your statement correctly. > > My question was what to do if a user executes "ALTER TABLE SORT BY();", > meaning to remove all sort by columns. Should we remove all columns from > the property, or remove the property altogether? > > >> Dimitris >> >> On Mon, May 8, 2017 at 4:02 PM, Lars Volker wrote: >> >>> >>> On Mon, May 8, 2017 at 11:41 PM, Marcel Kornacker >>> wrote: >>> On Mon, May 8, 2017 at 9:50 AM, Dimitris Tsirogiannis < dtsirogian...@cloudera.com> wrote: > I believe it sends the wrong message and is probably confusing to > throw an error when someone writes a CREATE TABLE with an empty SORT BY() > but allow the same clause in an ALTER. No doubt the users can read the > documentation and figure it out but its an extra step. Also, as Marcel > mentions, scripting may be easier as you don't have to figure out whether > to add a clause or not. Anyways, the patch is great as is, I just wanted > to > mention this. > Sounds like we should allow an empty list then. >>> >>> I will change the parser accordingly. On a related note, ALTER TABLE >>> SORT BY () will leave an empty property 'sort.by.columns'. Should it remove >>> the property altogether instead? >>> > > > Dimitris > > On Mon, May 8, 2017 at 7:50 AM, Marcel Kornacker > wrote: > >> >> >> On Sat, May 6, 2017 at 7:57 PM, Dimitris Tsirogiannis < >> dtsirogian...@cloudera.com> wrote: >> >>> For consistency, I believe we should at least allow an empty SORT >>> BY() clause in the CREATE TABLE statement, but I'll defer the decision >>> to >>> Alex or Marcel. >>> >> >> Do we think there'll be scripts generating create table statements >> with empty sort-by clauses? I'm not sure there's a benefit to supporting >> that (but happy to stand corrected). >> >> >>> >>> Dimitris >>> >>> On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) < >>> ger...@cloudera.org> wrote: >>> Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 20: > There is still one thing that is not clear to me. Why is it allowed > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE > TABLE? The easiest way to specify an empty list of sort by columns during CREATE TABLE is to omit the SORT BY clause altogether. To keep things simple I did not add additional support for an empty SORT BY () clause. For ALTER TABLE there needs to be a way to specify an empty list of columns, but since SORT BY is used to identify the command, the most simple form seemed to be an empty column list(). Do you think we should allow CREATE TABLE SORT BY() in addition to specify an empty set of column? -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 20 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No >>> >>> >> > >>> >> >
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 9: (10 comments) Still trying to come to grips with it all but did an initial pass over the query execution changes. http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/data-sink.h File be/src/exec/data-sink.h: Line 122: /// Output expressions to convert row batches onto output values. It's confusing that these aren't used in all subclasses of DataSink (e.g. DataStreamSender, JoinBuildSink). Clarify that not all sinks use these? http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: PS9, Line 69: expr_mem_pool Does this mean that multiple scanner threads share a single MemPool? That doesn't seem safe. http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/sort-node.h File be/src/exec/sort-node.h: PS9, Line 68: materialize_tuple_ materialize_tuple_ doesn't seem to be defined. I think we always need these exprs anyway, right? Line 69: std::vectorslot_materialize_exprs_; See my comment about the name in sorter.h. http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/subplan-node.h File be/src/exec/subplan-node.h: Line 65: /// tree rooted at 'node'. * and do any initialization that is required as a result of setting the subplan. http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exec/topn-node.h File be/src/exec/topn-node.h: PS9, Line 78: slot_materialize_exprs_ output_tuple_exprs_? Usually when we have exprs materializing tuples it's named after the tuple it's materializing. http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.h File be/src/exprs/agg-fn-evaluator.h: PS9, Line 94: Clone Maybe this should be called ShallowClone() to make it more explicit. Without reading the function comment, I would assume it did a deep clone. http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/runtime/sorter.h File be/src/runtime/sorter.h: PS9, Line 182: slot_materialize_exprs_ Maybe something like 'sort_tuple_exprs_'? I think the important thing is that there is one expr per slot in the sort tuple. http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/util/tuple-row-compare.cc File be/src/util/tuple-row-compare.cc: Line 49: ScalarExprEvaluator::Close(key_expr_evaluators_lhs_, state); opened_ = false? http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/util/tuple-row-compare.h File be/src/util/tuple-row-compare.h: PS9, Line 141: key_expr_evaluators_lhs_ ordering_expr_evaluators_lhs_ to match the expr name? -- To view, visit http://gerrit.cloudera.org:8080/5483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefdc9aeeba033355cb9497e3a5d2363627dcf2f3 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5197: Erroneous corrupted Parquet file message
Dan Hecht has posted comments on this change. Change subject: IMPALA-5197: Erroneous corrupted Parquet file message .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6787 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Wood Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 21: Rebased, added support for CREATE TABLE SORT BY() with empty sort by list. Will need to rebase again though, since some conflicting change got merge in the meantime. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 21 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#21). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can only be specified for Hdfs tables and effectiveness may vary based on storage type; for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). Until then, it is possible to specify sort by columns using both ways at the same time and the column lists will be concatenated. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 33 files changed, 1,379 insertions(+), 89 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/21 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 21 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.
Hello Matthew Jacobs, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5715 to look at the new patch set (#10). Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build. .. IMPALA-4669: [KUTIL] Add kudu_util library to the build. A few miscellaneous changes to allow kudu_util to compile with Impala. Add kudu_version.cc to substitute for the version.cc file that is automatically built during the full Kudu build. Set LZ4_DISABLE_DEPRECATE_WARNINGS to allow Kudu's compressor utility to use deprecated names for LZ4 methods. Add NO_NVM_SUPPORT flag to Kudu build (plan to upstream this later) to disable building with nvm support, removing a library dependency. Also remove imported FindOpenSSL.cmake in favour of the standard one provided by cmake itself. Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 --- M CMakeLists.txt M be/CMakeLists.txt M be/src/common/CMakeLists.txt A be/src/common/kudu_version.cc M be/src/common/logging.cc M be/src/exec/kudu-util.h A be/src/kudu/gutil M be/src/kudu/util/CMakeLists.txt M be/src/kudu/util/cache.cc M be/src/kudu/util/compression/compression_codec.cc M be/src/kudu/util/env.cc M be/src/kudu/util/flags.cc A be/src/kudu/util/kudu_export.h M be/src/kudu/util/logging.cc M be/src/kudu/util/minidump.cc D cmake_modules/FindOpenSSL.cmake 16 files changed, 137 insertions(+), 87 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/5715/10 -- To view, visit http://gerrit.cloudera.org:8080/5715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
On Tue, May 9, 2017 at 1:24 AM, Dimitris Tsirogiannis < dtsirogian...@cloudera.com> wrote: > The other alternative would be to populate it with the partitioning > columns, if any. Thoughts? > > Adding the partitioning columns to the sort by list is not supported. They can be added to the pre-insert sort not using the clustered hint. I'm not sure though if I understood your statement correctly. My question was what to do if a user executes "ALTER TABLE SORT BY();", meaning to remove all sort by columns. Should we remove all columns from the property, or remove the property altogether? > Dimitris > > On Mon, May 8, 2017 at 4:02 PM, Lars Volkerwrote: > >> >> On Mon, May 8, 2017 at 11:41 PM, Marcel Kornacker >> wrote: >> >>> >>> >>> On Mon, May 8, 2017 at 9:50 AM, Dimitris Tsirogiannis < >>> dtsirogian...@cloudera.com> wrote: >>> I believe it sends the wrong message and is probably confusing to throw an error when someone writes a CREATE TABLE with an empty SORT BY() but allow the same clause in an ALTER. No doubt the users can read the documentation and figure it out but its an extra step. Also, as Marcel mentions, scripting may be easier as you don't have to figure out whether to add a clause or not. Anyways, the patch is great as is, I just wanted to mention this. >>> >>> Sounds like we should allow an empty list then. >>> >> >> I will change the parser accordingly. On a related note, ALTER TABLE SORT >> BY () will leave an empty property 'sort.by.columns'. Should it remove the >> property altogether instead? >> >>> >>> Dimitris On Mon, May 8, 2017 at 7:50 AM, Marcel Kornacker wrote: > > > On Sat, May 6, 2017 at 7:57 PM, Dimitris Tsirogiannis < > dtsirogian...@cloudera.com> wrote: > >> For consistency, I believe we should at least allow an empty SORT >> BY() clause in the CREATE TABLE statement, but I'll defer the decision to >> Alex or Marcel. >> > > Do we think there'll be scripts generating create table statements > with empty sort-by clauses? I'm not sure there's a benefit to supporting > that (but happy to stand corrected). > > >> >> Dimitris >> >> On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) < >> ger...@cloudera.org> wrote: >> >>> Lars Volker has posted comments on this change. >>> >>> Change subject: IMPALA-4166: Add SORT BY sql clause >>> >>> .. >>> >>> >>> Patch Set 20: >>> >>> > There is still one thing that is not clear to me. Why is it allowed >>> > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE >>> > TABLE? >>> >>> The easiest way to specify an empty list of sort by columns during >>> CREATE TABLE is to omit the SORT BY clause altogether. To keep things >>> simple I did not add additional support for an empty SORT BY () clause. >>> >>> For ALTER TABLE there needs to be a way to specify an empty list of >>> columns, but since SORT BY is used to identify the command, the most >>> simple >>> form seemed to be an empty column list(). >>> >>> Do you think we should allow CREATE TABLE SORT BY() in addition to >>> specify an empty set of column? >>> >>> -- >>> To view, visit http://gerrit.cloudera.org:8080/6495 >>> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings >>> >>> Gerrit-MessageType: comment >>> Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 >>> Gerrit-PatchSet: 20 >>> Gerrit-Project: Impala-ASF >>> Gerrit-Branch: master >>> Gerrit-Owner: Lars Volker >>> Gerrit-Reviewer: Alex Behm >>> Gerrit-Reviewer: Dimitris Tsirogiannis >>> Gerrit-Reviewer: Lars Volker >>> Gerrit-Reviewer: Marcel Kornacker >>> Gerrit-Reviewer: Thomas Tauber-Marshall >>> Gerrit-HasComments: No >>> >> >> > >>> >> >
[Impala-ASF-CR] Bump Kudu version to 7533364
Impala Public Jenkins has posted comments on this change. Change subject: Bump Kudu version to 7533364 .. Patch Set 2: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/545/ -- To view, visit http://gerrit.cloudera.org:8080/6797 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I88dc2d425bd3aff70c95d51818d0450709123d27 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
The other alternative would be to populate it with the partitioning columns, if any. Thoughts? Dimitris On Mon, May 8, 2017 at 4:02 PM, Lars Volkerwrote: > > On Mon, May 8, 2017 at 11:41 PM, Marcel Kornacker > wrote: > >> >> >> On Mon, May 8, 2017 at 9:50 AM, Dimitris Tsirogiannis < >> dtsirogian...@cloudera.com> wrote: >> >>> I believe it sends the wrong message and is probably confusing to throw >>> an error when someone writes a CREATE TABLE with an empty SORT BY() but >>> allow the same clause in an ALTER. No doubt the users can read the >>> documentation and figure it out but its an extra step. Also, as Marcel >>> mentions, scripting may be easier as you don't have to figure out whether >>> to add a clause or not. Anyways, the patch is great as is, I just wanted to >>> mention this. >>> >> >> Sounds like we should allow an empty list then. >> > > I will change the parser accordingly. On a related note, ALTER TABLE SORT > BY () will leave an empty property 'sort.by.columns'. Should it remove the > property altogether instead? > >> >> >>> >>> >>> Dimitris >>> >>> On Mon, May 8, 2017 at 7:50 AM, Marcel Kornacker >>> wrote: >>> On Sat, May 6, 2017 at 7:57 PM, Dimitris Tsirogiannis < dtsirogian...@cloudera.com> wrote: > For consistency, I believe we should at least allow an empty SORT BY() > clause in the CREATE TABLE statement, but I'll defer the decision to Alex > or Marcel. > Do we think there'll be scripts generating create table statements with empty sort-by clauses? I'm not sure there's a benefit to supporting that (but happy to stand corrected). > > Dimitris > > On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) < > ger...@cloudera.org> wrote: > >> Lars Volker has posted comments on this change. >> >> Change subject: IMPALA-4166: Add SORT BY sql clause >> >> .. >> >> >> Patch Set 20: >> >> > There is still one thing that is not clear to me. Why is it allowed >> > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE >> > TABLE? >> >> The easiest way to specify an empty list of sort by columns during >> CREATE TABLE is to omit the SORT BY clause altogether. To keep things >> simple I did not add additional support for an empty SORT BY () clause. >> >> For ALTER TABLE there needs to be a way to specify an empty list of >> columns, but since SORT BY is used to identify the command, the most >> simple >> form seemed to be an empty column list(). >> >> Do you think we should allow CREATE TABLE SORT BY() in addition to >> specify an empty set of column? >> >> -- >> To view, visit http://gerrit.cloudera.org:8080/6495 >> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings >> >> Gerrit-MessageType: comment >> Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 >> Gerrit-PatchSet: 20 >> Gerrit-Project: Impala-ASF >> Gerrit-Branch: master >> Gerrit-Owner: Lars Volker >> Gerrit-Reviewer: Alex Behm >> Gerrit-Reviewer: Dimitris Tsirogiannis >> Gerrit-Reviewer: Lars Volker >> Gerrit-Reviewer: Marcel Kornacker >> Gerrit-Reviewer: Thomas Tauber-Marshall >> Gerrit-HasComments: No >> > > >>> >> >
[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build. .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/5715/7/be/src/kudu/util/status.h File be/src/kudu/util/status.h: Line 122: #ifndef KUDU_LOG > Fixed it properly: the alternative definition was coming from stubs.h, but I don't see a change here -- To view, visit http://gerrit.cloudera.org:8080/5715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-2550: Switch to per-query exec rpc .. Patch Set 13: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/544/ -- To view, visit http://gerrit.cloudera.org:8080/6535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel KornackerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
On Mon, May 8, 2017 at 11:41 PM, Marcel Kornackerwrote: > > > On Mon, May 8, 2017 at 9:50 AM, Dimitris Tsirogiannis < > dtsirogian...@cloudera.com> wrote: > >> I believe it sends the wrong message and is probably confusing to throw >> an error when someone writes a CREATE TABLE with an empty SORT BY() but >> allow the same clause in an ALTER. No doubt the users can read the >> documentation and figure it out but its an extra step. Also, as Marcel >> mentions, scripting may be easier as you don't have to figure out whether >> to add a clause or not. Anyways, the patch is great as is, I just wanted to >> mention this. >> > > Sounds like we should allow an empty list then. > I will change the parser accordingly. On a related note, ALTER TABLE SORT BY () will leave an empty property 'sort.by.columns'. Should it remove the property altogether instead? > > >> >> >> Dimitris >> >> On Mon, May 8, 2017 at 7:50 AM, Marcel Kornacker >> wrote: >> >>> >>> >>> On Sat, May 6, 2017 at 7:57 PM, Dimitris Tsirogiannis < >>> dtsirogian...@cloudera.com> wrote: >>> For consistency, I believe we should at least allow an empty SORT BY() clause in the CREATE TABLE statement, but I'll defer the decision to Alex or Marcel. >>> >>> Do we think there'll be scripts generating create table statements with >>> empty sort-by clauses? I'm not sure there's a benefit to supporting that >>> (but happy to stand corrected). >>> >>> Dimitris On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) < ger...@cloudera.org> wrote: > Lars Volker has posted comments on this change. > > Change subject: IMPALA-4166: Add SORT BY sql clause > .. > > > Patch Set 20: > > > There is still one thing that is not clear to me. Why is it allowed > > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE > > TABLE? > > The easiest way to specify an empty list of sort by columns during > CREATE TABLE is to omit the SORT BY clause altogether. To keep things > simple I did not add additional support for an empty SORT BY () clause. > > For ALTER TABLE there needs to be a way to specify an empty list of > columns, but since SORT BY is used to identify the command, the most > simple > form seemed to be an empty column list(). > > Do you think we should allow CREATE TABLE SORT BY() in addition to > specify an empty set of column? > > -- > To view, visit http://gerrit.cloudera.org:8080/6495 > To unsubscribe, visit http://gerrit.cloudera.org:8080/settings > > Gerrit-MessageType: comment > Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 > Gerrit-PatchSet: 20 > Gerrit-Project: Impala-ASF > Gerrit-Branch: master > Gerrit-Owner: Lars Volker > Gerrit-Reviewer: Alex Behm > Gerrit-Reviewer: Dimitris Tsirogiannis > Gerrit-Reviewer: Lars Volker > Gerrit-Reviewer: Marcel Kornacker > Gerrit-Reviewer: Thomas Tauber-Marshall > Gerrit-HasComments: No > >>> >> >
[Impala-ASF-CR] IMPALA-5137: Support Kudu UNIXTIME MICROS as Impala TIMESTAMP
Matthew Jacobs has uploaded a new patch set (#7). Change subject: IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP .. IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP Adds Impala support for TIMESTAMP types stored in Kudu. Impala stores TIMESTAMP values in 96-bits and has nanosecond precision. Kudu's timestamp is a 64-bit microsecond delta from the Unix epoch (called UNIXTIME_MICROS), so a conversion is necessary. When writing to Kudu, TIMESTAMP values in nanoseconds are averaged to the nearest microsecond. When reading from Kudu, the KuduScanner returns UNIXTIME_MICROS with 8bytes of padding so Impala can convert the value to a TimestampValue in-line and copy the entire row. TODO: Kudu still needs to provide a knob to enable this: https://gerrit.cloudera.org/#/c/6624/ Testing: Updated the functional_kudu schema to use TIMESTAMPs instead of converting to STRING, so this provides some decent coverage. Some BE tests were added, and some EE tests as well. TODO: Support pushing down TIMESTAMP predicates TODO: Support TIMESTAMPs in range partitioning expressions Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d --- M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/cast-functions-ir.cc M be/src/exprs/kudu-partition-expr.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h A be/src/runtime/timestamp-value.inline.h M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/util/KuduUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/datasets/functional/functional_schema_template.sql A testdata/workloads/functional-query/queries/QueryTest/kudu-overflow-ts-abort-on-error.test A testdata/workloads/functional-query/queries/QueryTest/kudu-overflow-ts.test M testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test M tests/query_test/test_kudu.py M tests/query_test/test_queries.py M tests/query_test/test_scanners.py 26 files changed, 497 insertions(+), 231 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6526/7 -- To view, visit http://gerrit.cloudera.org:8080/6526 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types
Hello Marcel Kornacker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6563 to look at the new patch set (#12). Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types .. IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types This change adds functionality to write and read parquet::Statistics for Decimal, String, and Timestamp values. As an exception, we don't read statistics for CHAR columns, since CHAR support is broken in Impala (IMPALA-1652). This change also switches from using the deprecated fields 'min' and 'max' to populate the new fields 'min_value' and 'max_value' in parquet::Statistics, that were added in parquet-format pull request #46. The HdfsParquetScanner will preferably read the new fields if they are populated and if the column order 'TypeDefinedOrder' has been used to compute the statistics. For columns without a column order set or with only the deprecated fields populated, the scanner will read them only if they are of simple numeric type, i.e. boolean, integer, or floating point. This change removes the validation of the Parquet Statistics we write to Hive from the tests, since Hive does not write the new fields. Instead it adds a parquet file written by Hive that uses the deprecated fields for its statistics. It uses that file to exercise the fallback logic for supported types in a test. This change also cleans up the interface of ParquetPlainEncoder in parquet-common.h. Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-column-stats.inline.h M be/src/exec/parquet-common.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M be/src/exec/parquet-plain-test.cc M be/src/util/dict-encoding.h M common/thrift/parquet.thrift M testdata/data/README A testdata/data/deprecated_statistics.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-deprecated-stats.test M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test M testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test M tests/query_test/test_insert_parquet.py M tests/query_test/test_parquet_stats.py 19 files changed, 976 insertions(+), 349 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6563/12 -- To view, visit http://gerrit.cloudera.org:8080/6563 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types
Lars Volker has posted comments on this change. Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types .. Patch Set 11: Thanks for having a look. -- To view, visit http://gerrit.cloudera.org:8080/6563 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types
Lars Volker has posted comments on this change. Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/6563/11/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test File testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test: Line 184: aggregation(SUM, NumDictFilteredRowGroups): 2 > I don't think it will reduce coverage. '01/01/11' is a value that doesn't e Done -- To view, visit http://gerrit.cloudera.org:8080/6563 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5158: Part 1: include untracked memory in MemTracker dumps
Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/6820 Change subject: IMPALA-5158: Part 1: include untracked memory in MemTracker dumps .. IMPALA-5158: Part 1: include untracked memory in MemTracker dumps The MemTracker dumps are easier to understand if all children of the process tracker actually add up to the process consumption. There is always a discrepancy, which corresponds (mostly) to untracked memory. This commit adds a new line item in the MemTracker dump which is the process consumption value minus the sum of all its children. Here is an example of the new output: Process: Limit=8.35 GB Total=123.74 MB Peak=219.93 MB Free Disk IO Buffers: Total=1.37 MB Peak=1.37 MB RequestPool=default-pool: Total=609.09 KB Peak=10.14 MB Query(fd40f6b7542f7e7c:4aef2723): Total=609.09 KB Peak=1.19 MB Fragment fd40f6b7542f7e7c:4aef2723: Total=609.09 KB Peak=1.19 MB AGGREGATION_NODE (id=3): Total=24.00 KB Peak=36.00 KB Exprs: Total=4.00 KB Peak=4.00 KB NESTED_LOOP_JOIN_NODE (id=2): Total=412.90 KB Peak=412.90 KB Nested Loop Join Builder: Total=392.00 KB Peak=392.00 KB HDFS_SCAN_NODE (id=1): Total=160.00 KB Peak=768.00 KB Exprs: Total=4.00 KB Peak=4.00 KB HDFS_SCAN_NODE (id=0): Total=0 Peak=344.00 KB PLAN_ROOT_SINK: Total=0 Peak=0 CodeGen: Total=4.19 KB Peak=267.50 KB Block Manager: Limit=6.68 GB Total=0 Peak=0 RequestPool=fe-eval-exprs: Total=0 Peak=4.00 KB Untracked Memory: Total=121.78 MB Change-Id: I77e5b2ef5f85f82ee0226e4dad638233fec9428f --- M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h 2 files changed, 47 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/6820/1 -- To view, visit http://gerrit.cloudera.org:8080/6820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I77e5b2ef5f85f82ee0226e4dad638233fec9428f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types
Joe McDonnell has posted comments on this change. Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/6563/11/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test File testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test: Line 184: aggregation(SUM, NumDictFilteredRowGroups): 2 > Joe, stats filtering reduced the number or row groups that make it into dic I don't think it will reduce coverage. '01/01/11' is a value that doesn't exist, and I just want to make sure we can eliminate row groups based on that. Could you add a line showing that stats filtering is eliminating the rest: aggregation(SUM, NumStatsFilteredRowGroups): 22 -- To view, visit http://gerrit.cloudera.org:8080/6563 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5207,IMPALA-5214: distcc fixes
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5207,IMPALA-5214: distcc fixes .. IMPALA-5207,IMPALA-5214: distcc fixes enable_distcc should reset IMPALA_DISTCC_ENABLED, otherwise disable_distcc is not reversible. Remove the requirement of a toolchain at /opt/Impala-Toolchain. We can easily identify paths starting with $IMPALA_TOOLCHAIN and remap them in distcc.sh. Testing: Did a local build with distcc with IMPALA_TOOLCHAIN at a different location. Tried toggling disable_distcc and enable_distcc. Change-Id: Ic6456d0101cd15287c543cb576be6cd2391f1f26 Reviewed-on: http://gerrit.cloudera.org:8080/6655 Reviewed-by: Tim ArmstrongReviewed-by: Matthew Jacobs Tested-by: Impala Public Jenkins --- M bin/distcc/README.md M bin/distcc/distcc.sh M bin/distcc/distcc_env.sh 3 files changed, 12 insertions(+), 25 deletions(-) Approvals: Impala Public Jenkins: Verified Matthew Jacobs: Looks good to me, approved Tim Armstrong: Looks good to me, but someone else must approve -- To view, visit http://gerrit.cloudera.org:8080/6655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ic6456d0101cd15287c543cb576be6cd2391f1f26 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5197: Erroneous corrupted Parquet file message
Alex Behm has posted comments on this change. Change subject: IMPALA-5197: Erroneous corrupted Parquet file message .. Patch Set 4: Code-Review+1 (1 comment) Thanks for trying alternate testing routes. I'm ok with these changes. http://gerrit.cloudera.org:8080/#/c/6787/4/tests/failure/test_failpoints.py File tests/failure/test_failpoints.py: Line 140: # IMPALA-5197: None of the debug action should trigger corrupted file message actions -- To view, visit http://gerrit.cloudera.org:8080/6787 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Wood Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types
Lars Volker has posted comments on this change. Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/6563/11/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test File testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test: Line 184: aggregation(SUM, NumDictFilteredRowGroups): 2 Joe, stats filtering reduced the number or row groups that make it into dict filtering. Will this reduce test coverage for dict filtering? -- To view, visit http://gerrit.cloudera.org:8080/6563 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types
Hello Marcel Kornacker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6563 to look at the new patch set (#11). Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types .. IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types This change adds functionality to write and read parquet::Statistics for Decimal, String, and Timestamp values. As an exception, we don't read statistics for CHAR columns, since CHAR support is broken in Impala (IMPALA-1652). This change also switches from using the deprecated fields 'min' and 'max' to populate the new fields 'min_value' and 'max_value' in parquet::Statistics, that were added in parquet-format pull request #46. The HdfsParquetScanner will preferably read the new fields if they are populated and if the column order 'TypeDefinedOrder' has been used to compute the statistics. For columns without a column order set or with only the deprecated fields populated, the scanner will read them only if they are of simple numeric type, i.e. boolean, integer, or floating point. This change removes the validation of the Parquet Statistics we write to Hive from the tests, since Hive does not write the new fields. Instead it adds a parquet file written by Hive that uses the deprecated fields for its statistics. It uses that file to exercise the fallback logic for supported types in a test. This change also cleans up the interface of ParquetPlainEncoder in parquet-common.h. Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-column-stats.inline.h M be/src/exec/parquet-common.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M be/src/exec/parquet-plain-test.cc M be/src/util/dict-encoding.h M common/thrift/parquet.thrift M testdata/data/README A testdata/data/deprecated_statistics.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-deprecated-stats.test M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test M testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test M tests/query_test/test_insert_parquet.py M tests/query_test/test_parquet_stats.py 19 files changed, 975 insertions(+), 349 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6563/11 -- To view, visit http://gerrit.cloudera.org:8080/6563 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.
Henry Robinson has posted comments on this change. Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build. .. Patch Set 8: (2 comments) PS8 is just a rebase. PS9 includes the review responses. http://gerrit.cloudera.org:8080/#/c/5715/7/be/src/kudu/util/compression/compression_codec.cc File be/src/kudu/util/compression/compression_codec.cc: Line 28: #ifdef DISALLOW_COPY_AND_ASSIGN > Maybe comment that snappy.h redefines this so that we don't lose track of w Done http://gerrit.cloudera.org:8080/#/c/5715/7/be/src/kudu/util/status.h File be/src/kudu/util/status.h: Line 122: #ifndef KUDU_LOG > Is this still needed? I don't see how this can get defined twice. Fixed it properly: the alternative definition was coming from stubs.h, but we can prevent that include with a compile-time definition. -- To view, visit http://gerrit.cloudera.org:8080/5715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.
Hello Matthew Jacobs, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5715 to look at the new patch set (#9). Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build. .. IMPALA-4669: [KUTIL] Add kudu_util library to the build. A few miscellaneous changes to allow kudu_util to compile with Impala. Add kudu_version.cc to substitute for the version.cc file that is automatically built during the full Kudu build. Set LZ4_DISABLE_DEPRECATE_WARNINGS to allow Kudu's compressor utility to use deprecated names for LZ4 methods. Add NO_NVM_SUPPORT flag to Kudu build (plan to upstream this later) to disable building with nvm support, removing a library dependency. Also remove imported FindOpenSSL.cmake in favour of the standard one provided by cmake itself. Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 --- M CMakeLists.txt M be/CMakeLists.txt M be/src/common/CMakeLists.txt A be/src/common/kudu_version.cc M be/src/common/logging.cc M be/src/exec/kudu-util.h A be/src/kudu/gutil M be/src/kudu/util/CMakeLists.txt M be/src/kudu/util/cache.cc M be/src/kudu/util/compression/compression_codec.cc M be/src/kudu/util/env.cc M be/src/kudu/util/flags.cc A be/src/kudu/util/kudu_export.h M be/src/kudu/util/logging.cc M be/src/kudu/util/minidump.cc M be/src/kudu/util/status.h D cmake_modules/FindOpenSSL.cmake 17 files changed, 139 insertions(+), 87 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/5715/9 -- To view, visit http://gerrit.cloudera.org:8080/5715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5137: Support Kudu UNIXTIME MICROS as Impala TIMESTAMP
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP .. Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/6526/6/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: Line 150: row_format_flags |= kudu::client::KuduScanner::PAD_UNIXTIME_MICROS_TO_16_BYTES; > why start with no_flags instead of setting this directly in l149? Done Line 201: if (slot->type().type != TYPE_TIMESTAMP) continue; > pull out timestamp slots into a separate vector (which you set up in prepar Done http://gerrit.cloudera.org:8080/#/c/6526/6/be/src/exec/kudu-util.cc File be/src/exec/kudu-util.cc: Line 111: Status WriteKuduRowValue(kudu::KuduPartialRow* row, int col, PrimitiveType type, > fix signature (output params go at end). Done. I think the order may have been such that copy_strings can have a default parameter. I moved 'copy_strings' to a template variable though to avoid branching when calling this. That also means this fn is moved to the header, feel free to push back if you think that warrants more build-time impact than it is worth. http://gerrit.cloudera.org:8080/#/c/6526/6/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: Line 19: #include "runtime/timestamp-value.inline.h" > remove this, unless you're calling these functions here (which doesn't appe Done Line 98: if (UNLIKELY(nullptr == localtime_r(, ))) { > fix order Done http://gerrit.cloudera.org:8080/#/c/6526/6/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: Line 25 > are there other includes you can remove? Looks like I can remove another boost header. I think I still need gregorian and local_time for types used in function definitions in this file. compiler_config appears to affect compilation options [1], and seems risky to touch without understanding it more. 1: http://www.boost.org/doc/libs/1_64_0/boost/date_time/compiler_config.hpp http://gerrit.cloudera.org:8080/#/c/6526/6/be/src/runtime/timestamp-value.inline.h File be/src/runtime/timestamp-value.inline.h: Line 34: bool TimestampValue::UtcToUnixTime(time_t* unix_time) const { > you need to prefix all of these with inline. Done -- To view, visit http://gerrit.cloudera.org:8080/6526 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5287: Test skip.header.line.count on gzip .. Patch Set 4: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/547/ -- To view, visit http://gerrit.cloudera.org:8080/6817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip
Lars Volker has posted comments on this change. Change subject: IMPALA-5287: Test skip.header.line.count on gzip .. Patch Set 3: (1 comment) Thanks for the reviews! http://gerrit.cloudera.org:8080/#/c/6817/3/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: PS3, Line 2092: '${{env:IMPALA_HOME}} > Sorry that you did more work. I just wanted to know the difference and why No worries, I had tried {impala_home} myself but it didn't work. Your question made me curious why, so I learned something. -- To view, visit http://gerrit.cloudera.org:8080/6817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5239: transfer reservations between trackers
Tim Armstrong has uploaded a new patch set (#3). Change subject: IMPALA-5239: transfer reservations between trackers .. IMPALA-5239: transfer reservations between trackers This is a primitive needed to implement claiming and distribution of initial reservations. It supports transferring reservation between any two ReservationTrackers under the same query. Testing: * Added a test that exercises transfer between trackers at different levels and with different relationships. Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3 --- M be/src/runtime/bufferpool/reservation-tracker-test.cc M be/src/runtime/bufferpool/reservation-tracker.cc M be/src/runtime/bufferpool/reservation-tracker.h 3 files changed, 243 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/6708/3 -- To view, visit http://gerrit.cloudera.org:8080/6708 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I21f008abaf1aa4fcd2d854769a603b97589af3b3 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-5197: Erroneous corrupted Parquet file message
Michael Ho has uploaded a new patch set (#4). Change subject: IMPALA-5197: Erroneous corrupted Parquet file message .. IMPALA-5197: Erroneous corrupted Parquet file message The Parquet file column reader may fail in the middle of producing a scratch tuple batch for various reasons such as exceeding memory limit or cancellation. In which case, the scratch tuple batch may not have materialized all the rows in a row group. We shouldn't erroneously report that the file is corrupted in this case as the column reader didn't completely read the entire row group. A new test case is added to verify that we won't see this error message. A new failpoint phase GETNEXT_SCANNER is also added to differentiate it from the GETNEXT in the scan node itself. Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M common/thrift/PlanNodes.thrift M tests/common/impala_service.py M tests/failure/test_failpoints.py 9 files changed, 40 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/6787/4 -- To view, visit http://gerrit.cloudera.org:8080/6787 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Wood
[Impala-ASF-CR] IMPALA-5245: fix ASAN buffer-allocator-test
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5245: fix ASAN buffer-allocator-test .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/546/ -- To view, visit http://gerrit.cloudera.org:8080/6819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f295f841b8b6e7996d54bdf2dfc0a092dcb864c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip
Michael Brown has posted comments on this change. Change subject: IMPALA-5287: Test skip.header.line.count on gzip .. Patch Set 4: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/6817/3/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: PS3, Line 2092: '{impala_home}/testda > Specifying {impala_home} here didn't work, since it does not get replaced i Sorry that you did more work. I just wanted to know the difference and why it existed. -- To view, visit http://gerrit.cloudera.org:8080/6817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization
Lars Volker has posted comments on this change. Change subject: IMPALA-5036: Parquet count star optimization .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/6812/1//COMMIT_MSG Commit Message: PS1, Line 10: statistic > Suggestions? How important do you think it is to distinguish the Parquet nu How about "we use the Parquet field RowGroup.num_rows"? http://gerrit.cloudera.org:8080/#/c/6812/1/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 440: *dst_slot = file_metadata_.row_groups[row_group_idx_].num_rows; There's also FileMetaData::num_rows. Can't we use that instead of looping over the row groups? -- To view, visit http://gerrit.cloudera.org:8080/6812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5197: Erroneous corrupted Parquet file message
Michael Ho has posted comments on this change. Change subject: IMPALA-5197: Erroneous corrupted Parquet file message .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/6787/3/tests/failure/test_failpoints.py File tests/failure/test_failpoints.py: PS3, Line 140: IMPALA-5251 Oops. I meant IMPALA-5197. -- To view, visit http://gerrit.cloudera.org:8080/6787 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Wood Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5184: build fe against both Hive 1 & 2 APIs
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5184: build fe against both Hive 1 & 2 APIs .. IMPALA-5184: build fe against both Hive 1 & 2 APIs This adds a compatibility shim layer with Hive 1 and Hive 2 implementations. The version-specific code lives in fe/src/compat-hive-$IMPALA_HIVE_MAJOR_VERSION and common/thrift/hive-$IMPALA_HIVE_MAJOR_VERSION-api/ The shim adds wrapper methods to handle differing method signatures and and config variables that changed slightly. Some thrift classes were also moved from the 'cli' to 'rpc' package. We work around these by implementing subclasses with the same name in a different package for compatibility or by implementing shim methods that operate on the classes. We also need to change the package in the TCLIService.thrift, which is done with a search-and-replace. Also avoid the sticky config variable problem with some of the source paths by requiring an _OVERRIDE suffix on the variable to override it from the environment. Testing: Made sure that I could build Impala on master as normal, and also with the following config overrides in bin/impala-config-local.sh: export IMPALA_HADOOP_VERSION=3.0.0-alpha1-cdh6.x-SNAPSHOT export IMPALA_HBASE_VERSION=2.0.0-cdh6.x-SNAPSHOT export IMPALA_HIVE_VERSION=2.1.0-cdh6.x-SNAPSHOT export IMPALA_SENTRY_VERSION=1.5.1-cdh6.x-SNAPSHOT export IMPALA_PARQUET_VERSION=1.5.0-cdh6.x-SNAPSHOT I manually assembled the dependencies by copying the following files from the Hive 2 source and from a Hadoop 3 build: $CDH_COMPONENTS_HOME/hive-2.1.0-cdh6.x-SNAPSHOT/src/metastore/if/hive_metastore.thrift $CDH_COMPONENTS_HOME/hadoop-3.0.0-alpha1-cdh6.x-SNAPSHOT/lib/native/* $CDH_COMPONENTS_HOME/hadoop-3.0.0-alpha1-cdh6.x-SNAPSHOT/include/hdfs.h Change-Id: Ifbc265281c04fe3136bc3c920dbac966742ce09a Reviewed-on: http://gerrit.cloudera.org:8080/5538 Reviewed-by: Tim ArmstrongTested-by: Impala Public Jenkins --- M bin/impala-config.sh M common/thrift/.gitignore M common/thrift/CMakeLists.txt R common/thrift/hive-1-api/TCLIService.thrift M fe/pom.xml A fe/src/compat-hive-1/java/org/apache/hive/service/rpc/thrift/TGetCatalogsReq.java A fe/src/compat-hive-1/java/org/apache/hive/service/rpc/thrift/TGetColumnsReq.java A fe/src/compat-hive-1/java/org/apache/hive/service/rpc/thrift/TGetFunctionsReq.java A fe/src/compat-hive-1/java/org/apache/hive/service/rpc/thrift/TGetInfoReq.java A fe/src/compat-hive-1/java/org/apache/hive/service/rpc/thrift/TGetSchemasReq.java A fe/src/compat-hive-1/java/org/apache/hive/service/rpc/thrift/TGetTablesReq.java A fe/src/compat-hive-1/java/org/apache/impala/compat/MetastoreShim.java A fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateDataSrcStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateDbStmt.java M fe/src/main/java/org/apache/impala/analysis/DropDataSrcStmt.java M fe/src/main/java/org/apache/impala/analysis/Subquery.java M fe/src/main/java/org/apache/impala/analysis/TableName.java M fe/src/main/java/org/apache/impala/analysis/TypeDef.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/service/FrontendTest.java 27 files changed, 492 insertions(+), 85 deletions(-) Approvals: Impala Public Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5538 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ifbc265281c04fe3136bc3c920dbac966742ce09a Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5184: build fe against both Hive 1 & 2 APIs
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5184: build fe against both Hive 1 & 2 APIs .. Patch Set 11: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5538 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifbc265281c04fe3136bc3c920dbac966742ce09a Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5197: Erroneous corrupted Parquet file message
Michael Ho has posted comments on this change. Change subject: IMPALA-5197: Erroneous corrupted Parquet file message .. Patch Set 3: PS3 added a new assert in test_failpoints.py and removed the new test in test_scanners.py. -- To view, visit http://gerrit.cloudera.org:8080/6787 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Wood Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip
Lars Volker has posted comments on this change. Change subject: IMPALA-5287: Test skip.header.line.count on gzip .. Patch Set 3: (5 comments) Thanks for the comments. Please see PS4. http://gerrit.cloudera.org:8080/#/c/6817/3/testdata/data/README File testdata/data/README: PS3, Line 100: ,, > double comma Done http://gerrit.cloudera.org:8080/#/c/6817/3/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: PS3, Line 2092: '${{env:IMPALA_HOME}} > How does this differ from {impala_home} ? Specifying {impala_home} here didn't work, since it does not get replaced in DEPENDENT_LOAD sections. This is due to the somewhat convoluted way we build these. Instead, one pair of {} will be stripped by the .format() call and the result will be passed to Hive, which (I think) will then expand the variable. I added the replacement. Should we replace all other occurrences in this file in this commit? http://gerrit.cloudera.org:8080/#/c/6817/3/tests/common/impala_test_suite.py File tests/common/impala_test_suite.py: PS3, Line 288: replace > "replaced" Done Line 289: values in queries before they are executed. > Mention that callers need to be careful about key name choices. For example Done http://gerrit.cloudera.org:8080/#/c/6817/3/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: PS3, Line 685: cls.ImpalaTestMatrix.add_constraint(lambda v:\ : v.get_value('table_format').file_format == 'text' and\ > The backslashes are not needed. Fixed here and elsewhere. -- To view, visit http://gerrit.cloudera.org:8080/6817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6817 to look at the new patch set (#4). Change subject: IMPALA-5287: Test skip.header.line.count on gzip .. IMPALA-5287: Test skip.header.line.count on gzip This change fixed IMPALA-4873 by adding the capability to supply a dict 'test_file_vars' to run_test_case(). Keys in this dict will be replaced with their values inside test queries before they are executed. Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70 --- M testdata/bin/generate-schema-statements.py M testdata/data/README A testdata/data/table_with_header.gz A testdata/data/table_with_header_2.gz M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/QueryTest/hdfs-text-scan-with-header.test M tests/common/impala_test_suite.py M tests/query_test/test_scanners.py 9 files changed, 75 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/6817/4 -- To view, visit http://gerrit.cloudera.org:8080/6817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-4192: Disentangle Expr and ExprContext
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4192: Disentangle Expr and ExprContext .. Patch Set 9: (45 comments) first half http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exec/exec-node.h File be/src/exec/exec-node.h: Line 285: boost::scoped_ptr expr_mem_tracker_; > There will be one per PlanNode in the future. Currently, it still counts to if this is per querystate, the mempool would need to be thread-safe, not sure that's worth it (and if this is for evaluators, why have it per plannode?). also, since we now clearly distinguish exprs and their evaluators, and this is specifically for evaluators, let's express that in the variable name. expr_eval_mem_pool_? http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.cc File be/src/exprs/agg-fn-evaluator.cc: Line 96: const AggFn& agg_fn, AggFnEvaluator** result) { move input params to front Line 161: Status AggFnEvaluator::Open(vector& evaluators, RuntimeState* state) { why not const&? Line 514: for (int i = 0; i < evaluators.size(); ++i) { single line, here and subsequent functions, where possible http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn-evaluator.h File be/src/exprs/agg-fn-evaluator.h: Line 80: /// and caches all constant input arguments. regarding the caching: is there already a todo to move that into the expr setup (as opposed to evaluator setup)? http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/agg-fn.cc File be/src/exprs/agg-fn.cc: Line 68: input_expr->AssignFnCtxIdx(_ctx_idx); scalarexpr does that in create. do the same for this class? http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/expr.cc File be/src/exprs/expr.cc: Line 87: const int num_children = nodes[*node_idx].num_children; we don't use 'const ..' elsewhere for automatic variables. Line 88: const bool isAggFn = root->IsAggFn(); formatting Line 92: // Reset the function context index for each input expression. what is this referring to? http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/expr.h File be/src/exprs/expr.h: Line 81: /// status on failure. > 'fn_ctx_idx_ptr' removed. It doesn't take Expr** as 'root' is expected to b should this be a private function and the two subclasses be friends (to call this)? http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/expr.h File be/src/exprs/expr.h: Line 82: static Status CreateTree(ObjectPool* pool, const TExpr& texpr, Expr* root); this is really CreateChildren, not CreateTree. move in/out params to end Line 134: /// Creates an expr tree with root 'parent' via depth-first traversal. misleading: the root of the created tree is not 'parent' (the root is installed as a new child into 'parent') Line 135: /// Called recursively to create expr trees of sub-expressions. "for subexpressions" Line 140: /// node_idx: index in 'nodes' of the root of the next child TExprNode tree let's call this root_idx, makes it easier to follow Line 146: static Status CreateTreeFromThrift(ObjectPool* pool, while we're cleaning up, move in/out params to end http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr-evaluator.cc File be/src/exprs/scalar-expr-evaluator.cc: Line 1: // Licensed to the Apache Software Foundation (ASF) under one could you flag what's new in this file that i should look at? Line 69: inited_(false), initialize vars in .h where practical Line 82: (*evaluator)->fn_ctxs_.clear(); why do you need that (for a newly created object)? Line 86: for (FunctionContext* fn_ctx : (*evaluator)->fn_ctxs_) DCHECK(fn_ctx != nullptr); do that in CreateFnCtxs? Line 133: if (opened_) return Status::OK(); what's the need for guarding against multiple open calls? (how would that happen?) Line 138: is_clone_? FunctionContext::THREAD_LOCAL : FunctionContext::FRAGMENT_LOCAL; is this really instance_local? if so, leave todo in the right places (presumably at least udf.h) Line 155: delete fn_ctxs_[i]; instead put a unique_ptr into fn_ctxs? http://gerrit.cloudera.org:8080/#/c/5483/7/be/src/exprs/scalar-expr-evaluator.h File be/src/exprs/scalar-expr-evaluator.h: Line 72: std::vector * evaluators) WARN_UNUSED_RESULT; > Yes, ideally, that should be done in Expr::Init(). This is being tracked by todo? Line 111: /// should only be called after Open() has been called on this expr. Returns an error if > There is subtlety in this. "expr" is potentially a sub-expression of root_ but why not just call GetValue(nullptr) on the subexpr? http://gerrit.cloudera.org:8080/#/c/5483/9/be/src/exprs/scalar-expr-evaluator.h File be/src/exprs/scalar-expr-evaluator.h: Line 55: /// rooted at a node is defind by [fn_ctx_idx_start_, fn_ctx_idx_end_). defined Line 65: static Status Create(ObjectPool* pool, RuntimeState* state, MemPool* mem_pool,
[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build. .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/5715/7/be/src/kudu/util/compression/compression_codec.cc File be/src/kudu/util/compression/compression_codec.cc: Line 28: #ifdef DISALLOW_COPY_AND_ASSIGN Maybe comment that snappy.h redefines this so that we don't lose track of why this is here. http://gerrit.cloudera.org:8080/#/c/5715/7/be/src/kudu/util/status.h File be/src/kudu/util/status.h: Line 122: #ifndef KUDU_LOG Is this still needed? I don't see how this can get defined twice. -- To view, visit http://gerrit.cloudera.org:8080/5715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5036: Parquet count star optimization
Alex Behm has posted comments on this change. Change subject: IMPALA-5036: Parquet count star optimization .. Patch Set 1: (36 comments) http://gerrit.cloudera.org:8080/#/c/6812/1//COMMIT_MSG Commit Message: PS1, Line 10: statistic > I would prefer to use a different word than "statistic" in this context. We Suggestions? How important do you think it is to distinguish the Parquet num_rows from parquet::Statistics? In the not-too-distant future we will use parquet::Statistics for optimizing min/max aggregates also. Seems fair enough to call all these "Parquet stats". Line 13: Mention the new rewrite rule http://gerrit.cloudera.org:8080/#/c/6812/1/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 426: // Populate the slot with Parquet num rows statistic. Populate the single slot with the Parquet num rows statistic. Line 431: memset(tuple_buf, 0, tuple_buf_size); Don't think we usually do this, and I don't think it's necessary. Line 432: while (!row_batch->AtCapacity()) { Do we have a suitable file with many row groups for testing this logic? We can decrease the batch_size in the test. Line 488: DCHECK(row_group_idx_ <= file_metadata_.row_groups.size()); Why this change? Line 1455: // Column readers are not needed because we are not reading from any columns if this DCHECK that there is exactly one materialized slot http://gerrit.cloudera.org:8080/#/c/6812/1/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: Line 158: const bool optimize_parquet_count_star() { return optimize_parquet_count_star_; } const function Line 373: bool optimize_parquet_count_star_; const http://gerrit.cloudera.org:8080/#/c/6812/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: Line 585: if (copiedInputExprs.size() != inputExprs.size()) return; Have you tried also substituting the FunctionCallExpr.mergeAggInputFn_ in AggregateInfo.substitute() so the Update() and Merge() are consistent again? I understand what the issue is, but this does not seem quite right. http://gerrit.cloudera.org:8080/#/c/6812/1/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java: Line 331:* Return true if the slots being materialized are all partition columns. The new behavior is tricky to reason about, can we simplify it? Line 335: if (materializedSlots.isEmpty()) return true; I think this might break the behavior of OPTIMIZE_PARTITION_KEY_SCANS=true for unpartitioned tables. What happens when you SET OPTIMIZE_PARTITION_KEY_SCANS=true and then run "select count(distinct 1) from unpartitioned_table"? http://gerrit.cloudera.org:8080/#/c/6812/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: Line 131: // Set to true when this scan node can optimize a count(*) query by populating the Sentence is a little misleading because this flag is initialized to true if the query block has an aggregation that is amenable to being optimized with Parquet stats. But then the flag is changed to have a different meaning, so it's a little difficult to understand what it does. How about something along the lines of this: 1. Pass in a 'private final' flag that indicates whether the aggregation is amenable to optimization. This flag never changes. 2. If the optimization can be performed, set the smap. Use the smap nullness to determine whether the optimization was applied. Line 136: protected ExprSubstitutionMap aggSmap_; // Should be applied to the AggregateInfo from the same query block. Expand the comment to explain why we cannot use PlanNode.outputSmap_ for this purpose. Suggest: optimizedAggSmap_ Line 243: if (optimizeParquetCountStar_ && fileFormats.size() == 1 && conjuncts_.isEmpty() && Create a helper function for this optimization, and describe in its function comment what this optimization does and how it works. In particular, that we are adding a new scan slot. Line 245: // Create two functions that we will put into an smap. We want to replace the Preconditions.checkState(desc_.getPath().destTable() == null); Preconditions.checkState(collectionConjuncts_.isEmpty()); The first condition asserts that we are not scanning a nested collection. Line 246: // count function with a sum function. count() and sum() mention that it is a special sum() function http://gerrit.cloudera.org:8080/#/c/6812/1/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: Line 213:* @param limit can just remove this http://gerrit.cloudera.org:8080/#/c/6812/1/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File
[Impala-ASF-CR] IMPALA-5197: Erroneous corrupted Parquet file message
Michael Ho has uploaded a new patch set (#3). Change subject: IMPALA-5197: Erroneous corrupted Parquet file message .. IMPALA-5197: Erroneous corrupted Parquet file message The Parquet file column reader may fail in the middle of producing a scratch tuple batch for various reasons such as exceeding memory limit or cancellation. In which case, the scratch tuple batch may not have materialized all the rows in a row group. We shouldn't erroneously report that the file is corrupted in this case as the column reader didn't completely read the entire row group. A new test case is added to verify that we won't see this error message. A new failpoint phase GETNEXT_SCANNER is also added to differentiate it from the GETNEXT in the scan node itself. Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M common/thrift/PlanNodes.thrift M tests/common/impala_service.py M tests/failure/test_failpoints.py 9 files changed, 40 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/6787/3 -- To view, visit http://gerrit.cloudera.org:8080/6787 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Wood
[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip
David Knupp has posted comments on this change. Change subject: IMPALA-5287: Test skip.header.line.count on gzip .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6817/1/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: PS1, Line 688: > Cool idea. I did this. Can you add this JIRA to the commit message? :-) https://issues.apache.org/jira/browse/IMPALA-4873 -- To view, visit http://gerrit.cloudera.org:8080/6817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip
Michael Brown has posted comments on this change. Change subject: IMPALA-5287: Test skip.header.line.count on gzip .. Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/6817/3/testdata/data/README File testdata/data/README: PS3, Line 100: ,, double comma http://gerrit.cloudera.org:8080/#/c/6817/3/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: PS3, Line 2092: '${{env:IMPALA_HOME}} How does this differ from {impala_home} ? http://gerrit.cloudera.org:8080/#/c/6817/3/tests/common/impala_test_suite.py File tests/common/impala_test_suite.py: PS3, Line 288: replace "replaced" Line 289: values in queries before they are executed. Mention that callers need to be careful about key name choices. For example, '$FILESYSTEM_PREFIX' is a bad choice for a key when using this dict. http://gerrit.cloudera.org:8080/#/c/6817/3/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: PS3, Line 685: cls.ImpalaTestMatrix.add_constraint(lambda v:\ : v.get_value('table_format').file_format == 'text' and\ The backslashes are not needed. -- To view, visit http://gerrit.cloudera.org:8080/6817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6817 to look at the new patch set (#3). Change subject: IMPALA-5287: Test skip.header.line.count on gzip .. IMPALA-5287: Test skip.header.line.count on gzip This change adds the capability to supply a dict 'test_file_vars' to run_test_case(). Keys in this dict will be replaced with their values inside test queries before they are executed. Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70 --- M testdata/data/README A testdata/data/table_with_header.gz A testdata/data/table_with_header_2.gz M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/QueryTest/hdfs-text-scan-with-header.test M tests/common/impala_test_suite.py M tests/query_test/test_scanners.py 8 files changed, 60 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/6817/3 -- To view, visit http://gerrit.cloudera.org:8080/6817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip
Lars Volker has posted comments on this change. Change subject: IMPALA-5287: Test skip.header.line.count on gzip .. Patch Set 2: (1 comment) Thanks for the review comments and the good suggestions. Please see PS 3. http://gerrit.cloudera.org:8080/#/c/6817/1/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: PS1, Line 688: > I didn't realize there was magic that changed the default database dependin Cool idea. I did this. -- To view, visit http://gerrit.cloudera.org:8080/6817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5169: Add support for async pins in buffer pool
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5169: Add support for async pins in buffer pool .. Patch Set 7: Rebased -- To view, visit http://gerrit.cloudera.org:8080/6612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibdf074c1ac4405d6f08d623ba438a85f7d39fd79 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5169: Add support for async pins in buffer pool
Tim Armstrong has uploaded a new patch set (#7). Change subject: IMPALA-5169: Add support for async pins in buffer pool .. IMPALA-5169: Add support for async pins in buffer pool Makes Pin() do async reads behind-the-scenes, instead of blocking until the read completes. The blocking is done instead when the client tries to access the buffer via PageHandle::GetBuffer() or ExtractBuffer(). This is implemented with a new sub-state of "pinned" where the page has a buffer and consumes reservation but the buffer does not contain valid data. Motivation: This unlocks various opportunities to overlap read I/Os with other work: * Reads to different disks can execute in parallel * I/O and computation can be overlapped. This initially benefits BufferedTupleStream::PinStream(), where many pages are pinned at once. With this change the reads run asynchronously. This can potentially lead to large speedups when spilling. E.g. if the pages for a Hash Join's partition are spread across 10 disks, we could get 10x the read throughput, plus overlap the I/O with hash table build. In future we can use this to do read-ahead over unpinned BufferedTupleStreams or for unpinned Runs in Sorter, but this requires changes to the client code to Pin() pages in advance. Testing: * BufferedTupleStreamV2 already exercises this. * Various BufferPool tests already exercise this. * Added a basic test to cover edge cases made possible by the new state transitions. * Extended the randomised test to cover this. Change-Id: Ibdf074c1ac4405d6f08d623ba438a85f7d39fd79 --- M be/src/runtime/buffered-tuple-stream-v2.cc M be/src/runtime/buffered-tuple-stream-v2.h M be/src/runtime/bufferpool/buffer-pool-internal.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h 8 files changed, 418 insertions(+), 153 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6612/7 -- To view, visit http://gerrit.cloudera.org:8080/6612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibdf074c1ac4405d6f08d623ba438a85f7d39fd79 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6817 to look at the new patch set (#2). Change subject: IMPALA-5287: Test skip.header.line.count on gzip .. IMPALA-5287: Test skip.header.line.count on gzip This change adds the capability to supply a dict 'test_file_vars' to run_test_case(). Keys in this dict will be replaced with their values inside test queries before they are executed. Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70 --- A testdata/data/table_with_header.gz A testdata/data/table_with_header_2.gz M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/QueryTest/hdfs-text-scan-with-header.test M tests/common/impala_test_suite.py M tests/query_test/test_scanners.py 7 files changed, 46 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/6817/2 -- To view, visit http://gerrit.cloudera.org:8080/6817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] Bump Kudu version to 7533364
Impala Public Jenkins has posted comments on this change. Change subject: Bump Kudu version to 7533364 .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/545/ -- To view, visit http://gerrit.cloudera.org:8080/6797 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I88dc2d425bd3aff70c95d51818d0450709123d27 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] Bump Kudu version to 7533364
Matthew Jacobs has posted comments on this change. Change subject: Bump Kudu version to 7533364 .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6797 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I88dc2d425bd3aff70c95d51818d0450709123d27 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip
Michael Brown has posted comments on this change. Change subject: IMPALA-5287: Test skip.header.line.count on gzip .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6817/1/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: PS1, Line 688: > Good catch. I want to use the default databases for the scans, so that the I didn't realize there was magic that changed the default database depending on fileformat/encoding/compression/etc. I was going to suggest you preserve the unique_database usage and access the 'mixed' table via full path: "$DATABASE.mixed". But I just looked at the code, and use_db is overloaded to deal with both what you want to use, and what I was going to suggest, and both aren't possible together, I think. What you really need is a way to pass in another database to be used in substitution in this test and access it in the test files, say "$DB2.mixed". Idea: What do you think about adding a new parameter to run_test_case() that takes in a dictionary, where the keys are testfile substitution variables and the values are the value to be substituted? This means you don't add a separate parameter "db2" to run_test_suite() but instead support arbitrary substitution. The literal way this would be done would be: def test_text_scanner_with_header(self, vector, unique_database): self.run_test_case( 'QueryTest/hdfs-text-scan-with-header', vector, testfile_vars={ '$DB2': unique_database, }) run_test_case() would just iterate over the keys and values in testfile_vars, and your testfile now can specify any variables it wants for substitution. Your default database will still be correct based on file format etc. You refer to the mixed table as "$DB2.mixed". -- To view, visit http://gerrit.cloudera.org:8080/6817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5166: clean up BufferPool counters
Tim Armstrong has uploaded a new patch set (#3). Change subject: IMPALA-5166: clean up BufferPool counters .. IMPALA-5166: clean up BufferPool counters Misc changes to improve usability of the profiles. * Separate out detailed BufferPool metrics into a "Buffer pool" sub-profile. * Only create the limit counter if there is a limit * Show BufferPool using in query MemTracker (it was accidentally disabled before because there was no query-level profile). * Reduce clutter in MemTracker dump by only showing buffer pool reservation, not usage (the usage was misleading anyway because it didn't include child usage). * Remove TotalUnpinnedBytes, which had limited value - WriteIoBytes and PeakUnpinnedBytes can answer most of the same questions - i.e. did it unpin any pages, and how many did it need to write to disk. * Add buffer pool metrics to /memz (if buffer pool is enabled) and reorder /memz so more useful information is up the top. Change-Id: I34b7f4d94c3d396ac89026c7559d6b2c6e02697c --- M be/src/runtime/bufferpool/buffer-allocator.cc M be/src/runtime/bufferpool/buffer-pool-counters.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/reservation-tracker.cc M be/src/runtime/mem-tracker.cc M be/src/util/default-path-handlers.cc M be/src/util/memory-metrics.cc M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M common/thrift/metrics.json M www/memz.tmpl 12 files changed, 137 insertions(+), 90 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/6690/3 -- To view, visit http://gerrit.cloudera.org:8080/6690 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34b7f4d94c3d396ac89026c7559d6b2c6e02697c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-2550: Switch to per-query exec rpc .. Patch Set 13: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/544/ -- To view, visit http://gerrit.cloudera.org:8080/6535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel KornackerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-2550: Switch to per-query exec rpc .. Patch Set 13: Code-Review+2 rebase -- To view, visit http://gerrit.cloudera.org:8080/6535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel KornackerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6535 to look at the new patch set (#13). Change subject: IMPALA-2550: Switch to per-query exec rpc .. IMPALA-2550: Switch to per-query exec rpc Coordinator: - FragmentInstanceState -> BackendState, which in turn records FragmentInstanceStats QueryState - does query-wide setup in a separate thread (which also launches the instance exec threads) - has a query-wide 'prepared' state at which point all static setup is done and all FragmentInstanceStates are accessible Also renamed QueryExecState to ClientRequestState. Simplified handling of execution status (in FragmentInstanceState): - status only transmitted via ReportExecStatus rpc - in particular, it's not returned anymore from the Cancel rpc FIS: Fixed bugs related to partially-prepared state (in Close() and ReleaseThreadToken()) Change-Id: I20769e420711737b6b385c744cef4851cee3facd --- M be/src/benchmarks/expr-benchmark.cc M be/src/codegen/codegen-anyval.h M be/src/common/status.cc M be/src/common/status.h M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-avro-table-writer.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/exec/hdfs-table-writer.cc M be/src/exec/hdfs-table-writer.h M be/src/exprs/expr-test.cc M be/src/runtime/CMakeLists.txt M be/src/runtime/buffered-block-mgr-test.cc M be/src/runtime/buffered-tuple-stream-v2-test.cc M be/src/runtime/bufferpool/buffer-pool-test.cc A be/src/runtime/coordinator-backend-state.cc A be/src/runtime/coordinator-backend-state.h A be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/data-stream-test.cc A be/src/runtime/debug-options.cc A be/src/runtime/debug-options.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h D be/src/runtime/plan-fragment-executor.cc D be/src/runtime/plan-fragment-executor.h M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-exec-mgr.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/test-env.cc M be/src/runtime/test-env.h M be/src/scheduling/query-schedule.cc M be/src/scheduling/query-schedule.h M be/src/service/CMakeLists.txt M be/src/service/child-query.cc M be/src/service/child-query.h R be/src/service/client-request-state.cc R be/src/service/client-request-state.h M be/src/service/fe-support.cc M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-http-handler.cc M be/src/service/impala-internal-service.cc M be/src/service/impala-internal-service.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/testutil/desc-tbl-builder.cc M be/src/testutil/fault-injection-util.h M be/src/util/error-util-test.cc M be/src/util/error-util.cc M be/src/util/error-util.h M be/src/util/uid-util.h M common/thrift/ExecStats.thrift M common/thrift/ImpalaInternalService.thrift M tests/common/test_result_verifier.py 73 files changed, 3,873 insertions(+), 3,775 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/6535/13 -- To view, visit http://gerrit.cloudera.org:8080/6535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel KornackerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip
Lars Volker has posted comments on this change. Change subject: IMPALA-5287: Test skip.header.line.count on gzip .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6817/1/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: PS1, Line 688: > Can you talk about why you removed unique_database? When I look at QueryTes Good catch. I want to use the default databases for the scans, so that the test runs with functional and functional_text_gzip. Do you see a better option than splitting it up into two tests, one for the scans and one containing the DDLs? -- To view, visit http://gerrit.cloudera.org:8080/6817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5220: memory maintenance cleanup
Hello Matthew Jacobs, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6626 to look at the new patch set (#9). Change subject: IMPALA-5220: memory maintenance cleanup .. IMPALA-5220: memory maintenance cleanup Remove logic that tries to release pages from TcMalloc's page heap: TCMalloc's behaviour changed so that it automatically does this with "aggressive decommit" and committed spans can't accumulate in the page heap. Also increase the memory maintenance interval - 1s is too aggressive and can free memory that will be imminently reused by a running query, e.g. particularly buffer pool buffers. Testing: Tried to reproduce IMPALA-2800 in a couple of ways: * Ran a big aggregation locally and cancelled it * Looked at memz/ of some live clusters (production and stress test). In all cases "Bytes in page heap freelist" was 0. This confirms that IMPALA-2800 was already solved, probably by the gperftools 2.5 upgrade, where aggressive decommit would mean that memory is released to the system in free() instead of the ReleaseFreeMemory() callst. I was able to confirm that the ReleaseFreeMemory() calls were unnecessary to avoid retaining memory by running a couple of stress tests locally with this patch and checking that "Bytes in page heap freelist" was 0 after the change and that memory consumption was generally sensible. Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701 --- M be/src/common/init.cc M be/src/runtime/exec-env.cc M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h M be/src/util/memory-metrics.h 5 files changed, 25 insertions(+), 89 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6626/9 -- To view, visit http://gerrit.cloudera.org:8080/6626 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-5220: memory maintenance cleanup
Hello Matthew Jacobs, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6626 to look at the new patch set (#8). Change subject: IMPALA-5220: memory maintenance cleanup .. IMPALA-5220: memory maintenance cleanup Remove logic that tries to release pages from TcMalloc's page heap: TCMalloc's behaviour changed so that it automatically does this with "aggressive decommit" and committed spans can't accumulate in the page heap. Also increase the memory maintenance interval - 1s is too aggressive and can free memory that will be imminently reused by a running query, e.g. particularly buffer pool buffers. Testing: Tried to reproduce IMPALA-2800 in a couple of ways: * Ran a big aggregation locally and cancelled it * Looked at memz/ of some live clusters (production and stress test). In all cases "Bytes in page heap freelist" was 0. This confirms that IMPALA-2800 was already solved, probably by the gperftools 2.5 upgrade, where aggressive decommit would mean that memory is released to the system in free() instead of the ReleaseFreeMemory() callst. I was able to confirm that the ReleaseFreeMemory() calls were unnecessary to avoid retaining memory by running a couple of stress tests locally with this patch and checking that "Bytes in page heap freelist" was 0 after the change and that memory consumption was generally sensible. Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701 --- M be/src/common/init.cc M be/src/runtime/exec-env.cc M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h M be/src/util/memory-metrics.h 5 files changed, 25 insertions(+), 89 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6626/8 -- To view, visit http://gerrit.cloudera.org:8080/6626 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-5220: memory maintenance cleanup
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5220: memory maintenance cleanup .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/6626/7/be/src/common/init.cc File be/src/common/init.cc: PS7, Line 137: whenever : // consumption is checked > is that true? consumption() doesn't refresh it -- I think only mem-tracker You're right. Updated the comment. -- To view, visit http://gerrit.cloudera.org:8080/6626 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0f822b294ab253d6f2828fc52f353aecaaf9b701 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip
Michael Brown has posted comments on this change. Change subject: IMPALA-5287: Test skip.header.line.count on gzip .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/6817/1/testdata/workloads/functional-query/queries/QueryTest/hdfs-text-scan-with-header.test File testdata/workloads/functional-query/queries/QueryTest/hdfs-text-scan-with-header.test: PS1, Line 115: drop table if exists mixed; DDL http://gerrit.cloudera.org:8080/#/c/6817/1/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: PS1, Line 688: Can you talk about why you removed unique_database? When I look at QueryTest/hdfs-text-scan-with-header I see DDL statements which could cause collisions. There are 4 versions of this test in exhaustive: -- To view, visit http://gerrit.cloudera.org:8080/6817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 20: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 20 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5207,IMPALA-5214: distcc fixes
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5207,IMPALA-5214: distcc fixes .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/6655/3/bin/distcc/distcc_env.sh File bin/distcc/distcc_env.sh: PS3, Line 101: > why is this not necessary? switch_compiler does it already -- To view, visit http://gerrit.cloudera.org:8080/6655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic6456d0101cd15287c543cb576be6cd2391f1f26 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5207,IMPALA-5214: distcc fixes
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5207,IMPALA-5214: distcc fixes .. Patch Set 3: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/543/ -- To view, visit http://gerrit.cloudera.org:8080/6655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic6456d0101cd15287c543cb576be6cd2391f1f26 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5245: fix ASAN buffer-allocator-test
Jim Apple has posted comments on this change. Change subject: IMPALA-5245: fix ASAN buffer-allocator-test .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f295f841b8b6e7996d54bdf2dfc0a092dcb864c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5245: fix ASAN buffer-allocator-test
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5245: fix ASAN buffer-allocator-test .. Patch Set 2: I ran buffer-allocator-test locally with an ASAN build. I was able to reproduce the problem before the fix and confirmed it went away after the fix. -- To view, visit http://gerrit.cloudera.org:8080/6819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f295f841b8b6e7996d54bdf2dfc0a092dcb864c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip
Dan Hecht has posted comments on this change. Change subject: IMPALA-5287: Test skip.header.line.count on gzip .. Patch Set 1: Update testdata/data/README to describe those two new files? -- To view, visit http://gerrit.cloudera.org:8080/6817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5287: Test skip.header.line.count on gzip
Alex Behm has posted comments on this change. Change subject: IMPALA-5287: Test skip.header.line.count on gzip .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie3f3c29a42501cfb2751f7ad0af166eb88f63b70 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5207,IMPALA-5214: distcc fixes
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5207,IMPALA-5214: distcc fixes .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/6655/3/bin/distcc/distcc_env.sh File bin/distcc/distcc_env.sh: PS3, Line 101: why is this not necessary? -- To view, visit http://gerrit.cloudera.org:8080/6655 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic6456d0101cd15287c543cb576be6cd2391f1f26 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5245: fix ASAN buffer-allocator-test
Jim Apple has posted comments on this change. Change subject: IMPALA-5245: fix ASAN buffer-allocator-test .. Patch Set 2: Looks good. One last question - how did you test? -- To view, visit http://gerrit.cloudera.org:8080/6819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f295f841b8b6e7996d54bdf2dfc0a092dcb864c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5245: fix ASAN buffer-allocator-test
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5245: fix ASAN buffer-allocator-test .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6819/1/be/src/runtime/bufferpool/system-allocator.cc File be/src/runtime/bufferpool/system-allocator.cc: Line 124: // Workaround ASAN bug where posix_memalign returns 0 even when allocation fails. > There are a couple of other places where posix_memalign is called, too. Can I think we can avoid adding the workaround there since OOM in those places is unexpected (and ASAN is not a production config, so it's fine to fail hard). I added DCHECKs instead. -- To view, visit http://gerrit.cloudera.org:8080/6819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f295f841b8b6e7996d54bdf2dfc0a092dcb864c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5245: fix ASAN buffer-allocator-test
Tim Armstrong has uploaded a new patch set (#2). Change subject: IMPALA-5245: fix ASAN buffer-allocator-test .. IMPALA-5245: fix ASAN buffer-allocator-test * Use the allocator_may_return_null=1 ASAN option so that the allocation just fails instead of crashing the process. * Work around an bug in the implementation of the option where posix_memalign() does not return ENOMEM. I filed an upstream bug https://bugs.llvm.org/show_bug.cgi?id=32968. Change-Id: I2f295f841b8b6e7996d54bdf2dfc0a092dcb864c --- M be/src/runtime/bufferpool/buffer-allocator-test.cc M be/src/runtime/bufferpool/system-allocator.cc M be/src/testutil/mem-util.h M be/src/util/aligned-new.h M be/src/util/bloom-filter.cc M bin/run-backend-tests.sh M bin/start-catalogd.sh M bin/start-impalad.sh M bin/start-statestored.sh 9 files changed, 12 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/6819/2 -- To view, visit http://gerrit.cloudera.org:8080/6819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2f295f841b8b6e7996d54bdf2dfc0a092dcb864c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Jim Apple
[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types
Hello Marcel Kornacker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6563 to look at the new patch set (#10). Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types .. IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types This change adds functionality to write and read parquet::Statistics for Decimal, String, and Timestamp values. As an exception, we don't read statistics for CHAR columns, since CHAR support is broken in Impala (IMPALA-1652). This change also switches from using the deprecated fields 'min' and 'max' to populate the new fields 'min_value' and 'max_value' in parquet::Statistics, that were added in parquet-format pull request #46. The HdfsParquetScanner will preferably read the new fields if they are populated and if the column order 'TypeDefinedOrder' has been used to compute the statistics. For columns without a column order set or with only the deprecated fields populated, the scanner will read them only if they are of simple numeric type, i.e. boolean, integer, or floating point. This change removes the validation of the Parquet Statistics we write to Hive from the tests, since Hive does not write the new fields. Instead it adds a parquet file written by Hive that uses the deprecated fields for its statistics. It uses that file to exercise the fallback logic for supported types in a test. This change also cleans up the interface of ParquetPlainEncoder in parquet-common.h. Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-column-stats.inline.h M be/src/exec/parquet-common.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M be/src/exec/parquet-plain-test.cc M be/src/util/dict-encoding.h M common/thrift/parquet.thrift M testdata/data/README A testdata/data/deprecated_statistics.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-deprecated-stats.test M testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test M tests/query_test/test_insert_parquet.py M tests/query_test/test_parquet_stats.py 18 files changed, 974 insertions(+), 348 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6563/10 -- To view, visit http://gerrit.cloudera.org:8080/6563 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5184: build fe against both Hive 1 & 2 APIs
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5184: build fe against both Hive 1 & 2 APIs .. Patch Set 11: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/541/ -- To view, visit http://gerrit.cloudera.org:8080/5538 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifbc265281c04fe3136bc3c920dbac966742ce09a Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5245: fix ASAN buffer-allocator-test
Jim Apple has posted comments on this change. Change subject: IMPALA-5245: fix ASAN buffer-allocator-test .. Patch Set 1: (1 comment) Thanks for fixing this! http://gerrit.cloudera.org:8080/#/c/6819/1/be/src/runtime/bufferpool/system-allocator.cc File be/src/runtime/bufferpool/system-allocator.cc: Line 124: // Workaround ASAN bug where posix_memalign returns 0 even when allocation fails. There are a couple of other places where posix_memalign is called, too. Can you fix those, too? -- To view, visit http://gerrit.cloudera.org:8080/6819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f295f841b8b6e7996d54bdf2dfc0a092dcb864c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Jim Apple Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types
Lars Volker has posted comments on this change. Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types .. Patch Set 8: (2 comments) Thanks for the review. I addressed your comments in PS9. Next, I'll rebase the change. http://gerrit.cloudera.org:8080/#/c/6563/6/be/src/exec/parquet-column-stats.cc File be/src/exec/parquet-column-stats.cc: Line 104: case TYPE_DECIMAL: > doesn't look like it does, but if i'm wrong, that would need to be disabled The stats Update() method receives the same values that the ParquetPlainEncoder writes to the file as data. The len in those values is set to UnpaddedCharLength(). test_write_statistics_char_types() also ensures that they are written unpadded. http://gerrit.cloudera.org:8080/#/c/6563/8/be/src/exec/parquet-column-stats.h File be/src/exec/parquet-column-stats.h: Line 99: // Copies the memory of 'value' into 'buffer' and make 'value' point to 'buffer'. > i didn't realize that this resets 'buffer' ('copies into' can also mean app I expanded the comment. -- To view, visit http://gerrit.cloudera.org:8080/6563 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types
Hello Marcel Kornacker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6563 to look at the new patch set (#9). Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types .. IMPALA-4815, IMPALA-4817, IMPALA-4819: Write and Read Parquet Statistics for remaining types This change adds functionality to write and read parquet::Statistics for Decimal, String, and Timestamp values. As an exception, we don't read statistics for CHAR columns, since CHAR support is broken in Impala (IMPALA-1652). This change also switches from using the deprecated fields 'min' and 'max' to populate the new fields 'min_value' and 'max_value' in parquet::Statistics, that were added in parquet-format pull request #46. The HdfsParquetScanner will preferably read the new fields if they are populated and if the column order 'TypeDefinedOrder' has been used to compute the statistics. For columns without a column order set or with only the deprecated fields populated, the scanner will read them only if they are of simple numeric type, i.e. boolean, integer, or floating point. This change removes the validation of the Parquet Statistics we write to Hive from the tests, since Hive does not write the new fields. Instead it adds a parquet file written by Hive that uses the deprecated fields for its statistics. It uses that file to exercise the fallback logic for supported types in a test. This change also cleans up the interface of ParquetPlainEncoder in parquet-common.h. Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-column-stats.inline.h M be/src/exec/parquet-common.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M be/src/exec/parquet-plain-test.cc M be/src/util/dict-encoding.h M common/thrift/parquet.thrift M testdata/data/README A testdata/data/deprecated_statistics.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-deprecated-stats.test M testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test M tests/query_test/test_insert_parquet.py M tests/query_test/test_parquet_stats.py 18 files changed, 911 insertions(+), 285 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6563/9 -- To view, visit http://gerrit.cloudera.org:8080/6563 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 4: (8 comments) http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: Line 1309: string key = string(fname) + ":" + std::to_string(mtime); you can get rid of dealing with strings here (and doing a bunch of allocations) entirely by using something out of util/hash-util.h to compute the hash on fname and mtime separately (no need to convert it to a string). Line 1315: // Get lock that's obvious, no need to paraphrase a single statement (unless there's something subtle going on). Line 1318: // Look up in the multimap (find range of values for key) same here Line 1326: if (!elem->reference_count_) { use explicit numeric comparisons, this isn't a bool (although maybe it should be). Line 1331: ImpaladMetrics::IO_MGR_NUM_CACHED_FILE_HANDLES->Increment(-1L); hm, i'm not sure we should do that. it's still a cached file handle. Line 1395: if (p.size_ > p.capacity_) { single line http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr.h File be/src/runtime/disk-io-mgr.h: Line 273:: fh_(fh), reference_count_(0) {} single line Line 276: int reference_count_; why refcount instead of just 'bool in_use' or something like that? -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes