[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8461 ) Change subject: IMPALA-6151: add query-level fragment/backend counters .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc@993 PS2, Line 993: RETURN_IF_ERROR(CancelInternal(query_id, check_inflight, cause)); > The only reason CancelInternal() can fail is if the query isn't registered Good point http://gerrit.cloudera.org:8080/#/c/8461/3/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8461/3/be/src/service/impala-server.cc@1052 PS3, Line 1052: ImpaladMetrics::NUM_QUERIES_REGISTERED->Increment(-1L); Why keep it above ArchiveQuery()? Not your change, but the function comment on ArchiveQuery seems wrong and confusing with decrementing the metric here. -- To view, visit http://gerrit.cloudera.org:8080/8461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9 Gerrit-Change-Number: 8461 Gerrit-PatchSet: 3 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 04 Nov 2017 04:23:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8023 ) Change subject: IMPALA-4856: Port data stream service to KRPC .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc File be/src/runtime/krpc-data-stream-mgr.cc: http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc@199 PS7, Line 199: DeserializeTask payload = : {DeserializeTaskType::EARLY_SENDERS, finst_id, dest_node_id, 0}; : deserialize_pool_.Offer(move(payload)); > doesn't this mean we make early sender draining single threaded? shoudl we Yes. Actually, I just realized that early senders may also be passed by incoming row batches before they are deserialized by the deserialization thread pool, leading to extended response time for early senders. A simpler scheme would be to actually put the early senders into the 'deferred_batches' queue of the corresponding sender's queue so new incoming row batches cannot pass it. Regarding the parallelism for draining the deferred_batches queue, one simple thing to do is to enqueue as many deserialization requests as there are entries in the 'deferred_batch' queue. The deserialization thread logic will simply peek the first entry of the queue and try to insert it if there is space. An entry is popped off the queue only if it can be inserted. This may be wasteful if the 'batch_queue' fills up before all deserialization thread requests are drained but hopefully the peeking logic shouldn't take too long. We can be more fancy and record the deserialized size of each entry in deferred_batches_ and determine how many entries we can pop off deferred_batches_ queue without going over the limit. http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc@235 PS7, Line 235: for (const unique_ptr& ctx : early_senders.waiting_sender_ctxs) { > shouldn't we process waiting_sender_ctxs before closed_sender_ctxs? Otherwi Yes, it's impossible for the same sender in both queues at the same time but yeah, I can switch the order if it's easier to understand. -- To view, visit http://gerrit.cloudera.org:8080/8023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1 Gerrit-Change-Number: 8023 Gerrit-PatchSet: 7 Gerrit-Owner: Michael HoGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Sat, 04 Nov 2017 04:00:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6144: PublishFilter() continues to run after query failure/cancellation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8455 ) Change subject: IMPALA-6144: PublishFilter() continues to run after query failure/cancellation .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed Gerrit-Change-Number: 8455 Gerrit-PatchSet: 2 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 04 Nov 2017 02:08:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6144: PublishFilter() continues to run after query failure/cancellation
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8455 ) Change subject: IMPALA-6144: PublishFilter() continues to run after query failure/cancellation .. IMPALA-6144: PublishFilter() continues to run after query failure/cancellation The CoordinatorBackendState::PublishFilter() function does not check for query failure/cancellation. So if runtime filters are being published during/after a failure, they will not be cancelled and still be sent out which may take a while depending on the size of the cluster. Also, these functions could potentially hold very large amounts of untracked memory. This patch fixes it by checking for cancellation/failure in PublishFilter. Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed Reviewed-on: http://gerrit.cloudera.org:8080/8455 Reviewed-by: Bharath VissapragadaReviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator.cc 2 files changed, 9 insertions(+), 0 deletions(-) Approvals: Bharath Vissapragada: Looks good to me, but someone else must approve Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed Gerrit-Change-Number: 8455 Gerrit-PatchSet: 3 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Fix errant, newline-including log directory.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8459 ) Change subject: Fix errant, newline-including log directory. .. Fix errant, newline-including log directory. We've seen cases where a directory named "cluster\n " sneaks into the logs directory. The intention is that $IMPALA_ALL_LOGS_DIRS is a space-separated list, but clean.sh (as opposed to buildall.sh) treats it as a single argument. I tested this manually: Before: $ls logs/ be_tests/ custom_cluster_tests/ data_loading/ ee_tests/ fe_tests/ Bad command: $mkdir -p "${IMPALA_ALL_LOGS_DIRS}" After: $ls logs/ be_tests/ cluster? / custom_cluster_tests/ data_loading/ ee_tests/ fe_tests/ Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317 Reviewed-on: http://gerrit.cloudera.org:8080/8459 Reviewed-by: Tim ArmstrongTested-by: Impala Public Jenkins --- M bin/clean.sh 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8459 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317 Gerrit-Change-Number: 8459 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Fix errant, newline-including log directory.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8459 ) Change subject: Fix errant, newline-including log directory. .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8459 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317 Gerrit-Change-Number: 8459 Gerrit-PatchSet: 1 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 04 Nov 2017 01:42:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8461 ) Change subject: IMPALA-6151: add query-level fragment/backend counters .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc@993 PS2, Line 993: RETURN_IF_ERROR(CancelInternal(query_id, check_inflight, cause)); > What if we hit an error here? Need to still maintain the metric. The only reason CancelInternal() can fail is if the query isn't registered - otherwise this would be really broken. By decrementing it below we're keeping the count in sync with the entries in client_request_state_map_, which seems better. http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc@1008 PS2, Line 1008: ImpaladMetrics::NUM_QUERIES_REGISTERED->Increment(-1L); > To be most accurate, shouldn't we move this to the end of this function? Th Done http://gerrit.cloudera.org:8080/#/c/8461/2/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/8461/2/common/thrift/metrics.json@477 PS2, Line 477: "label": "Queries Registered", > Imo we need to be careful with terminology. Our query states are already ha I added to the description. I agree that we should be careful here - I picked this definition of queries since it's the most expansive and doesn't depend on the different query states directly. -- To view, visit http://gerrit.cloudera.org:8080/8461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9 Gerrit-Change-Number: 8461 Gerrit-PatchSet: 2 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 04 Nov 2017 00:39:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters
Hello Thomas Tauber-Marshall, Mostafa Mokhtar, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8461 to look at the new patch set (#3). Change subject: IMPALA-6151: add query-level fragment/backend counters .. IMPALA-6151: add query-level fragment/backend counters This adds NumBackends, NumFragments and NumFragmentInstances counters to the query profile to make it easier to manually or programmatically analyse the query. Also add a num-queries-registered metric to track the number of queries that have been executed but are not yet unregistered. Testing: Ran "select count(*) from alltypessmall" and checked profile: Backend startup latencies: Count: 3, min / max: 1ms / 1ms, 25th %-ile: 1ms, 50th %-ile: 1ms, 75th %-ile: 1ms, 90th %-ile: 1ms, 95th %-ile: 1ms, 99.9th %-ile: 1ms Per Node Peak Memory Usage: tarmstrong-box:22000(1.10 MB) tarmstrong-box:22001(1.02 MB) tarmstrong-box:22002(1.02 MB) - FiltersReceived: 0 (0) - FinalizationTimer: 0.000ns - NumBackends: 3 (3) - NumFragmentInstances: 4 (4) - NumFragments: 2 (2) Ran some queries and manually checked the num-queries-registered metric on the /metrics page when the queries were running and after they finished. Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9 --- M be/src/runtime/coordinator.cc M be/src/service/impala-server.cc M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M common/thrift/metrics.json 5 files changed, 36 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/8461/3 -- To view, visit http://gerrit.cloudera.org:8080/8461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9 Gerrit-Change-Number: 8461 Gerrit-PatchSet: 3 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7822 ) Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. Patch Set 9: Code-Review+2 LGTM. Probably best to wait until after the weekend to merge it though :). -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 9 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 04 Nov 2017 00:29:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2281: Replace FNV with FastHash in exchange nodes
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8417 ) Change subject: IMPALA-2281: Replace FNV with FastHash in exchange nodes .. Patch Set 2: (12 comments) Did a first pass over it. http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/codegen/gen_ir_descriptions.py File be/src/codegen/gen_ir_descriptions.py: http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/codegen/gen_ir_descriptions.py@99 PS2, Line 99: ["HASH_FAST", "IrFastHash"], Do we use this? I don't see any places where it's used currently. http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/runtime/data-stream-sender.cc File be/src/runtime/data-stream-sender.cc: http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/runtime/data-stream-sender.cc@a468 PS2, Line 468: Lol, weird. http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/runtime/data-stream-sender.cc@474 PS2, Line 474: partition_expr_evals_[j Use 'eval' directly. http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/runtime/raw-value.cc File be/src/runtime/raw-value.cc: http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/runtime/raw-value.cc@220 PS2, Line 220: return HashUtil::FastHash64(v, static_cast(type.GetByteSize()), seed); I think this might be slightly slower - with the previous approach I think we get a specialised version of the hash function for that input length for each data type, whereas here we have to go through another switch in GetByteSize(). http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/util/hash-util.h File be/src/util/hash-util.h: http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/util/hash-util.h@235 PS2, Line 235: /* The MIT License I think this should go at the top of the file beneath the apache license. Then we can just say that the FastHash64 implementation came with that license. E.g. /* FastHash64 implementation derived from MIT-licensed code written by Zilong Tan The MIT License ... */ http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/util/hash-util.h@263 PS2, Line 263: size_t use int64_t - we generally use signed integers for lengths. http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/util/hash-util.h@266 PS2, Line 266: (const uint64_t *) We use c-style casts - i.e. reinterpret_cast<> http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/util/hash-util.h@267 PS2, Line 267: pos I believe the C++ standard doesn't allow pointer arithmetic on void - we should convert to uint8_t for doing that. http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/util/hash-util.h@278 PS2, Line 278: (const unsigned char* See above comments. http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/util/hash-util.h@282 PS2, Line 282: (uint64_t) should use static_cast() http://gerrit.cloudera.org:8080/#/c/8417/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-runtime.test File testdata/workloads/functional-query/queries/QueryTest/nested-types-runtime.test: http://gerrit.cloudera.org:8080/#/c/8417/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-runtime.test@145 PS2, Line 145: 8,'k1',-1,'k1',1 Why does this make a difference when we have VERIFY_IS_EQUAL_SORTED above? http://gerrit.cloudera.org:8080/#/c/8417/2/testdata/workloads/tpcds/queries/tpcds-q77a.test File testdata/workloads/tpcds/queries/tpcds-q77a.test: http://gerrit.cloudera.org:8080/#/c/8417/2/testdata/workloads/tpcds/queries/tpcds-q77a.test@125 PS2, Line 125: 'catalog channel',NULL,538912.55,2050279.74,-1383554.73 I'll file a JIRA to make this test deterministically pass: IMPALA-6155. In the meantime, can you change this to RESULTS: VERIFY_IS_EQUAL_SORTED That will make it ignore the order of results. -- To view, visit http://gerrit.cloudera.org:8080/8417 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I778317d982dcdb94173a369a65b39f32b4f7ded2 Gerrit-Change-Number: 8417 Gerrit-PatchSet: 2 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 04 Nov 2017 00:28:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/7822 ) Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. Patch Set 8: (2 comments) @Tim, can you take a last look at the changes and carry over the +2 if they are ok? http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc File be/src/exec/parquet-plain-test.cc: http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc@33 PS7, Line 33: int Encode(const InternalType& t, int encoded_byte_size, uint8_t* buffer, > Oh ok, thanks for the explanation. I think we should also add a comment men Done http://gerrit.cloudera.org:8080/#/c/7822/8/testdata/data/binary_decimal_dictionary.parquet File testdata/data/binary_decimal_dictionary.parquet: PS8: > We should also mention how these files were generated in the README Done -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 8 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 04 Nov 2017 00:23:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Hello Lars Volker, Matthew Jacobs, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7822 to look at the new patch set (#9). Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. IMPALA-2494: Support for byte array encoded decimals in Parquet scanner Extendes parquet column reader and associated classes to allow for more than one possible physical type for a given logical type. This patch only adds support for variable sized byte array encoded decimals and more will be added in upcoming commits. Also, column level metadata verification which was currently being done per row group will now only be done once per column per file. Testing: Added backend test for verifying newly added decimal types are decoded correctly. Added Query test that decodes both plain and dictionary-encoded decimals using binary encoding. Performance: Initial perf testing using tpcds_1000 shows no regression. Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e --- M be/src/exec/data-source-scan-node.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-column-stats.inline.h M be/src/exec/parquet-common.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M be/src/exec/parquet-plain-test.cc M be/src/util/dict-encoding.h M be/src/util/dict-test.cc M testdata/data/README A testdata/data/binary_decimal_dictionary.parquet A testdata/data/binary_decimal_no_dictionary.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-formats.test M tests/query_test/test_scanners.py 19 files changed, 639 insertions(+), 283 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/7822/9 -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 9 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4835: Part 1: simplify I/O mgr mem mgmt
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8414 ) Change subject: IMPALA-4835: Part 1: simplify I/O mgr mem mgmt .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/8414/4/be/src/runtime/disk-io-mgr-reader-context.cc File be/src/runtime/disk-io-mgr-reader-context.cc: http://gerrit.cloudera.org:8080/#/c/8414/4/be/src/runtime/disk-io-mgr-reader-context.cc@50 PS4, Line 50: DCHECK(buffer->scan_range()->external_buffer_tag_ == ScanRange::ExternalBufferTag::NO_BUFFER) long line http://gerrit.cloudera.org:8080/#/c/8414/4/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/8414/4/be/src/runtime/disk-io-mgr.cc@462 PS4, Line 462: // Buffers the were not allocated by DiskIoMgr don't need to be freed. existing typo: that -- To view, visit http://gerrit.cloudera.org:8080/8414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If5cb42437d11c13bc4a55c3ab426b66777332bd1 Gerrit-Change-Number: 8414 Gerrit-PatchSet: 4 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Sat, 04 Nov 2017 00:20:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6087: Revisit tests withheld from TPC-DS suite for use of TRUNCATE
Tim Wood has posted comments on this change. ( http://gerrit.cloudera.org:8080/8372 ) Change subject: IMPALA-6087: Revisit tests withheld from TPC-DS suite for use of TRUNCATE .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q47.test File testdata/workloads/tpcds/queries/tpcds-q47.test: http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q47.test@3 PS4, Line 3: -- RESULT MISMATCH FROM ORIGINAL in LSD of DECIMAL values, E.G, > Would be good to use an example to better describe the mismatch Done -- To view, visit http://gerrit.cloudera.org:8080/8372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I79d2e34621639c8f8c4c4eb0b0944eaefca13a7a Gerrit-Change-Number: 8372 Gerrit-PatchSet: 5 Gerrit-Owner: Tim WoodGerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Wood Gerrit-Comment-Date: Sat, 04 Nov 2017 00:19:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h File be/src/statestore/statestore.h: http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@383 PS2, Line 383: d subscriber > > Where ever the 'SubscriberId' is Thanks for the explanation. Yea my point was if we're going to have a unique RegistrationId anyway, why have a SubscriberId. It seemed redundant. But as you pointed out, it looks like the subscriber chooses the subscriber_id and not the statestore. So, it would be hard to enforce this. Let's leave this for now. http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc@414 PS2, Line 414: onst SubscriberId& subscriber_id, : const TUniqueId& registration_id > Not sure I understand. We get the subscriber/registration_id from the Sched Nvm, my bad, I thought both were coming from the Subscriber object. Ignore this. http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc@278 PS3, Line 278: lock_guard l(subscribers_lock_); : lock_guard t(topic_lock_); I just noticed this. Getting a SpinLock before getting a mutex is an anti-pattern. Even attempting to get a spinlock while already holding a spinlock is also not exactly a great idea. However, our SpinLock implementation sleeps after a few cycles of trying to obtain the lock anyway. Do we know if we do a lot of work holding the topic_lock_? If not, let's change this to a SpinLock too. (The GatherTopicUpdates() holds topic_lock_ and iterates through a nested loop, but I'm not sure how many iterations that would be in the worst case). If it looks like we will end up doing a lot of work holing the lock, we can be safe and just turn the 'subscribers_lock_' back to a mutex. http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.cc@415 PS3, Line 415: const TUniqueId& const RegistrationId& -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Sat, 04 Nov 2017 00:18:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1422: support a constant on LHS of IN predicates.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8322 ) Change subject: IMPALA-1422: support a constant on LHS of IN predicates. .. Patch Set 6: (12 comments) Some high-level comments before I dig deeper. http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java File fe/src/main/java/org/apache/impala/analysis/SlotRef.java: http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@94 PS2, Line 94: // TODO: derived slot refs (e.g., star-expanded) will not have rawPath set. Why can't this be addressed in this patch? http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@287 PS2, Line 287:* rewritten, null is returned. If 'inPred' is rewritten, the resulting expression ... and the RHS is a subquery. http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@299 PS2, Line 299:* 2) Predicate is NOT IN and RHS returns a single row. What if the RHS subquery is correlated? http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@319 PS2, Line 319:* NOT EXISTS (RHS AND (C IS NULL OR (CASE WHEN e IS NULL THEN C ELSE e END) = C) Is the transformation even correct if RHS is correlated? http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@312 PS6, Line 312:*a) No group by or analytic function in subquery. Ideally, we should not have to distinguish between cases 3a and 3b, and we should always do this rewrite: C NOT IN (RHS) Rewrites to: NOT EXISTS (SELECT 1 FROM (RHS) v WHERE C IS NULL OR IFNULL(v.e, C) = C) My understanding is that such a rewrite dues not work for case 3a when there is correlation, so we use an alternate rewrite that does not require a new inline view. I think we should reorganize the comments here to reflect this line of thinking and the limitations. For example, something like this: 3) Predicate is NOT IN and RHS returns multiple rows. General rewrite: a) No group by or analytic b) Group by or analytic function in subquery http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@316 PS6, Line 316:*REWRITE: Use IFNULL() instead of CASE to simplify. http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@319 PS6, Line 319:* NOT EXISTS (RHS AND (C IS NULL OR (CASE WHEN e IS NULL THEN C ELSE e END) = C) I think there's another case where this rewrite is not correct: when the subquery has an order by + limit. Basically, we are reimplementing the "predicate push down" correctness checks here. I'm wondering whether we should reduce the scope of this patch and only allow those case where the generic rewrite works. In a separate JIRA we should consider whether we can make the generic rewrite also work for case 3a, and if not what to do instead. What do you think? http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@321 PS6, Line 321:*Example: The following query gives me an IllegalStateException: explain select id from functional.alltypes t1 where 10 not in (select id from functional.alltypestiny order by id limit 2); Existing bug or related to this change? http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@328 PS6, Line 328:*Note: it useful to think of C NOT IN (RHS) as the boolean expansion: it is useful http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@340 PS6, Line 340:*b) Group by or analytic function in subquery. extra space after "by" http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@345 PS6, Line 345:* NOT EXISTS (SELECT x FROM (RHS) t WHERE C = t or t IS NOT NULL) What does C=t mean here? "t" is a table alias and cannot be compared to a C It's not immediately clear to me why the rewrite here is equivalent to the generic rewrite described in 3a. Please explain (see my suggestion above). Alternatively, I'm also ok if you want to just use the generic rewrite here. http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@354 PS6, Line 354:* - Assumes that subquery is uncorrelated,
[Impala-ASF-CR] IMPALA-6087: Revisit tests withheld from TPC-DS suite for use of TRUNCATE
Tim Wood has posted comments on this change. ( http://gerrit.cloudera.org:8080/8372 ) Change subject: IMPALA-6087: Revisit tests withheld from TPC-DS suite for use of TRUNCATE .. Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q26.test File testdata/workloads/tpcds/queries/tpcds-q26.test: http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q26.test@3 PS4, Line 3: -- RESULT MISMATCH FROM ORIGINAL, E.G., > Please use a more descriptive comment. Done http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q26.test@4 PS4, Line 4: -- Expected: 'AAID',78.33,126.37,0.00,61.94 > What does this mean? Done http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q59.test File testdata/workloads/tpcds/queries/tpcds-q59.test: http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q59.test@3 PS4, Line 3: with > Remove line Done http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q63.test File testdata/workloads/tpcds/queries/tpcds-q63.test: http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q63.test@3 PS4, Line 3: select > Remove line Done -- To view, visit http://gerrit.cloudera.org:8080/8372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I79d2e34621639c8f8c4c4eb0b0944eaefca13a7a Gerrit-Change-Number: 8372 Gerrit-PatchSet: 5 Gerrit-Owner: Tim WoodGerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Wood Gerrit-Comment-Date: Sat, 04 Nov 2017 00:15:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6087: Revisit tests withheld from TPC-DS suite for use of TRUNCATE
Hello Greg Rahn, Matthew Mulder, Michael Brown, Mostafa Mokhtar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8372 to look at the new patch set (#5). Change subject: IMPALA-6087: Revisit tests withheld from TPC-DS suite for use of TRUNCATE .. IMPALA-6087: Revisit tests withheld from TPC-DS suite for use of TRUNCATE The .test files in this commit are ones held out from IMPALA-5376 because of observed anomalies with returned decimal precision that I previously controlled with TRUNCATE(). This ticket was filed after team members expressed a preference not to use TRUNCATE() and to use actual results generated within numerical range of expected, unless the results could not be controlled otherwise. Rounds of testing for this commit show that the earlier anomalies no longer occur, clearing the way for their inclusion in our TPC-DS suite. It has been observed that these tests tend to fail with the DECIMAL_V2 option set (unless the test does so explicitly). Testing: The gerrit_verify_dryrun_external (build #24) job passes with this change. The fix (rebased hereon) to IMPALA-6106 (finally) cleared up a source of flapping for these tests. Change-Id: I79d2e34621639c8f8c4c4eb0b0944eaefca13a7a --- A testdata/workloads/tpcds/queries/tpcds-q26.test A testdata/workloads/tpcds/queries/tpcds-q28.test M testdata/workloads/tpcds/queries/tpcds-q39-1.test M testdata/workloads/tpcds/queries/tpcds-q39-2.test A testdata/workloads/tpcds/queries/tpcds-q47.test A testdata/workloads/tpcds/queries/tpcds-q57.test A testdata/workloads/tpcds/queries/tpcds-q59.test M testdata/workloads/tpcds/queries/tpcds-q61.test A testdata/workloads/tpcds/queries/tpcds-q63.test M testdata/workloads/tpcds/queries/tpcds-q77a.test M testdata/workloads/tpcds/queries/tpcds-q78.test M testdata/workloads/tpcds/queries/tpcds-q86a.test M tests/query_test/test_tpcds_queries.py 13 files changed, 849 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/8372/5 -- To view, visit http://gerrit.cloudera.org:8080/8372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I79d2e34621639c8f8c4c4eb0b0944eaefca13a7a Gerrit-Change-Number: 8372 Gerrit-PatchSet: 5 Gerrit-Owner: Tim WoodGerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Wood
[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8461 ) Change subject: IMPALA-6151: add query-level fragment/backend counters .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc@993 PS2, Line 993: RETURN_IF_ERROR(CancelInternal(query_id, check_inflight, cause)); What if we hit an error here? Need to still maintain the metric. http://gerrit.cloudera.org:8080/#/c/8461/2/be/src/service/impala-server.cc@1008 PS2, Line 1008: ImpaladMetrics::NUM_QUERIES_REGISTERED->Increment(-1L); To be most accurate, shouldn't we move this to the end of this function? There's still some non-trivial work in thus function after this point. http://gerrit.cloudera.org:8080/#/c/8461/2/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/8461/2/common/thrift/metrics.json@477 PS2, Line 477: "label": "Queries Registered", Imo we need to be careful with terminology. Our query states are already hard to reason about for users. How do the queries displayed in our "/queries" page relate to "registered" queries? I know the answer, but users might not due to confusion with what "registered", "in-flight", and "waiting-to-be-closed" mean. I don't really have a good alternative to "registered", but we should at least add in the description that "registered" refers to "in-flight" and "waiting-to-be-closed" queries from the "/queries" page. -- To view, visit http://gerrit.cloudera.org:8080/8461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9 Gerrit-Change-Number: 8461 Gerrit-PatchSet: 2 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 23:51:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4591: Bound Kudu client error mem usage
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8464 ) Change subject: IMPALA-4591: Bound Kudu client error mem usage .. Patch Set 1: (8 comments) This is a nice improvement. Had some questions/ideas but as-is this seems strictly better than before. http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h File be/src/exec/kudu-table-sink.h: http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h@105 PS1, Line 105: allocated nit: "consumed" to be consistent with the memtracker terminology. http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h@106 PS1, Line 106: allocated_mem_ Maybe client_tracked_bytes_ to make it clearer that the unit is bytes and it's tracked for accounting purposes. http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc File be/src/exec/kudu-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@a52 PS1, Line 52: Was this flag documented? Just wondering if we should consider what happens if someone set this manually. http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@39 PS1, Line 39: DEFINE_int32(kudu_mutation_buffer_size, DEFAULT_KUDU_MUTATION_BUFFER_SIZE, Just throwing out ideas here, but did we think about pros/cons of making these query options? I guess mostly these defaults should be fine but sometimes it turns out to be inconvenient to only have a global setting (and to have to restart Impala for it to take effect.) http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@124 PS1, Line 124: FLAGS_kudu_error_buffer_size > 1024 ? FLAGS_kudu_error_buffer_size : 1024; Is this equivalent to the following? max(1024, FLAGS_kudu_error_buffer_size) http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@132 PS1, Line 132: RETURN_IF_ERROR(state->exec_env()->GetKuduClient(table_desc_->kudu_master_addresses(), _)); nit: long line. http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py File tests/custom_cluster/test_kudu.py: http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py@66 PS1, Line 66: @CustomClusterTestSuite.with_args(impalad_args="-kudu_error_buffer_size=1024") It might be faster to make this a regular query test but insert more data so that it exceeds the default error buffer limit. Starting up a new cluster takes a long time (to be honest, this is fine though). http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py@74 PS1, Line 74: prsent present -- To view, visit http://gerrit.cloudera.org:8080/8464 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14 Gerrit-Change-Number: 8464 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 23:41:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4835 (prep only): create io subfolder and namespace
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8424 ) Change subject: IMPALA-4835 (prep only): create io subfolder and namespace .. Patch Set 5: Rebased since some things changed underneath this patch. -- To view, visit http://gerrit.cloudera.org:8080/8424 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If807f93a47d8027a43e56dd80b1b535d0bb74e1b Gerrit-Change-Number: 8424 Gerrit-PatchSet: 5 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 23:24:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4835 (prep only): create io subfolder and namespace
Hello Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8424 to look at the new patch set (#5). Change subject: IMPALA-4835 (prep only): create io subfolder and namespace .. IMPALA-4835 (prep only): create io subfolder and namespace Instead of using the DiskIoMgr class as a namespace, which prevents forward-declaration of inner classes, create an impala::io namespace and unnested the inner class. This is done in anticipation of DiskIoMgr depending on BufferPool. This helps avoid a circular dependency between DiskIoMgr, TmpFileMgr and BufferPool headers that could not be broken with forward declarations. Testing: Ran core tests. Change-Id: If807f93a47d8027a43e56dd80b1b535d0bb74e1b --- M be/CMakeLists.txt M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node-mt.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/runtime/CMakeLists.txt D be/src/runtime/disk-io-mgr.h M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h A be/src/runtime/io/CMakeLists.txt R be/src/runtime/io/disk-io-mgr-internal.h R be/src/runtime/io/disk-io-mgr-stress-test.cc R be/src/runtime/io/disk-io-mgr-stress.cc R be/src/runtime/io/disk-io-mgr-stress.h R be/src/runtime/io/disk-io-mgr-test.cc R be/src/runtime/io/disk-io-mgr.cc A be/src/runtime/io/disk-io-mgr.h R be/src/runtime/io/handle-cache.h R be/src/runtime/io/handle-cache.inline.h R be/src/runtime/io/request-context.cc R be/src/runtime/io/request-context.h A be/src/runtime/io/request-ranges.h R be/src/runtime/io/scan-range.cc M be/src/runtime/row-batch.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/test-env.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h 38 files changed, 1,417 insertions(+), 1,311 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/8424/5 -- To view, visit http://gerrit.cloudera.org:8080/8424 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If807f93a47d8027a43e56dd80b1b535d0bb74e1b Gerrit-Change-Number: 8424 Gerrit-PatchSet: 5 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 11: (7 comments) http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@27 PS9, Line 27: > RUNTIME_FILTER_WAIT_TIME_MS applies the same to bloom and min-max - its the Thanks for explaining. Can you add this text into a Google Doc for us to keep track of the evolution/meaning of these query options? No need to polish, just put it somewhere. I think the new types of filters will affect our query options in non-trivial ways and we should come up with a plan that minimizes user confusion, adding new options, and deprecating options. http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@41 PS9, Line 41: > Those results are posted in the review comments. I can include them here as Thanks. No need to add here. http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift@202 PS9, Line 202: // Maximum number of bloom runtime filters allowed per query > There's basically two reasons for this: Thanks. Please add this text to a Google Doc for tracking the evolution of these query options. Even though the min/max filters are smaller and bounded in size, I think extreme queries with a large number of joins and join conditions can still cause havoc. Keep in mind that we now allow an *unbounded* number of such filters. Crazy queries will happen. We should not hold this patch, but the lack of safeguards is concerning to me and we should continue thinking. http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/PlanNodes.thrift@103 PS9, Line 103: 12: optional string kudu_col_name case sensitive? http://gerrit.cloudera.org:8080/#/c/7793/11/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/7793/11/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@359 PS11, Line 359: public Operator getJoinOp() { return exprCmpOp_; } getExprCmpOp() http://gerrit.cloudera.org:8080/#/c/7793/11/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@602 PS11, Line 602: if (!(targetExpr instanceof SlotRef) I think only explicit casts are problematic. Implicit casts should be ok, or am I missing something? http://gerrit.cloudera.org:8080/#/c/7793/11/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test File testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test: http://gerrit.cloudera.org:8080/#/c/7793/11/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test@144 PS11, Line 144: on a.month = cast(b.month + 1 as int); Was this your change? Why the change? -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 11 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Anonymous Coward #345 Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 03 Nov 2017 23:22:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5017: Error on decimal overflow
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8404 ) Change subject: IMPALA-5017: Error on decimal overflow .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/8404/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8404/1//COMMIT_MSG@12 PS1, Line 12: In this patch, the beharior is changed so that an error is produced in > typo: beharior should be behavior I'll also accept behaviour ;) http://gerrit.cloudera.org:8080/#/c/8404/1//COMMIT_MSG@27 PS1, Line 27: select avg(dec_38_19) from decimal_tbl I guess these regressions occur regardless of the width of the input decimal, rigth? http://gerrit.cloudera.org:8080/#/c/8404/1/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/8404/1/be/src/exprs/aggregate-functions-ir.cc@413 PS1, Line 413: abs(avg->sum_val16) > DecimalUtil::MAX_UNSCALED_DECIMAL16 - abs(src.val4))) { This isn't correct if sum_val16 and src.val* have opposite sign, is it? http://gerrit.cloudera.org:8080/#/c/8404/1/be/src/exprs/aggregate-functions-ir.cc@420 PS1, Line 420: abs(avg->sum_val16) > DecimalUtil::MAX_UNSCALED_DECIMAL16 - abs(src.val8))) { What do you think about only checking for overflow of the integer type at this point and checking for whether it fits in the decimal type during finalisation? Might be cheaper. We could also probably do a cheaper initial check of sum_val16 versus as constant in the 4 and 8 byte cases since they can't possible overflow unless the sum is already quite large. -- To view, visit http://gerrit.cloudera.org:8080/8404 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019 Gerrit-Change-Number: 8404 Gerrit-PatchSet: 1 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 03 Nov 2017 23:21:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8034 ) Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. Patch Set 15: (3 comments) http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@77 PS15, Line 77: void ClearIndices() { > We generally use the more concise one-line formatting for very short single To add to what Tim said, this code didn't actually change, so it is nice to avoid touching code if you can. It makes it easier to track down what code change last impacted a piece of code. http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@152 PS15, Line 152: /// Destructor, release the bytes used by Memtracker. : ~DictEncoder() { : ReleaseBytes(); : DCHECK(dict_bytes_cnt_ == 0); : } I think we can omit this destructor, since DictEncoderBase does the same thing. http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@296 PS15, Line 296: /// Destructor, release the bytes used by Memtracker and close. : ~DictDecoder() { : ReleaseBytes(); : DCHECK(dict_bytes_cnt_ == 0); :} I think we can omit this destructor, as DictDecoderBase does this. -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-Change-Number: 8034 Gerrit-PatchSet: 15 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bikramjeet VigGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 23:14:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8034 ) Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. Patch Set 15: (8 comments) Pretty close, have some formatting nits (nothing horrible, just trying to keep the codebase consistent) and one perf concern. http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@54 PS15, Line 54: : dict_encoded_size_(0), pool_(NULL), dict_bytes_cnt_(0), dict_mem_tracker_(NULL) {} Ok to leave for now but we've generally been moving towards using the recent C++ extension that allows initialising member variables to constants at their declartion. http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@59 PS15, Line 59: DCHECK DCHECK_EQ(dict_bytes_cnt_, 0); http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@77 PS15, Line 77: void ClearIndices() { We generally use the more concise one-line formatting for very short single-statement functions like this one. http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@123 PS15, Line 123: formatting nit: * goes on the left with the type name. http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@168 PS15, Line 168: inline nit: inline isn't necessary. It isn't harmful but can be confusing to have unnecessary modifiers. http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@269 PS15, Line 269: * nit: * should be on left. http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@330 PS15, Line 330: inline nit: unnecessary inline http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@374 PS15, Line 374: ConsumeBytes(sizeof(node)); I'm concerned that calling MemTracker::Consume for every node added to the table could hurt performance in some workloads since it will end up incrementing the memory consumption counter in the root process-wide MemTracker up to 40k times per dictionary. (In contrast, MemPool::Allocate() is generally fine since it allocates memory in chunks). How about we track the memory for some number of nodes at a time. E.g. const int NODE_MEM_TRACKING_GRANULARITY = 4096; ... if (nodes % NODE_MEM_TRACKING_GRANULARITY == 0) { ConsumeBytes(sizeof(node) * NODE_MEM_TRACKING_GRANULARITY); } -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-Change-Number: 8034 Gerrit-PatchSet: 15 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bikramjeet VigGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 23:07:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8023 ) Change subject: IMPALA-4856: Port data stream service to KRPC .. Patch Set 7: (37 comments) http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.h File be/src/runtime/krpc-data-stream-mgr.h: http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.h@102 PS7, Line 102: the batch is added : /// to the receiver's 'deferred_batches_' it's not really the batch added. and it's not just a single structure for the receiver (it may go into one of many queues for merging exchange). So how about saying: ... the RPC state is saved into a deferred queue. http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.h@104 PS7, Line 104: from the pending sender queue how about: ... from a deferred RPC queue and the row batch is deserialized. http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.h@393 PS7, Line 393: quick comment for why we define a move constructor and move operator=, since we don't typically want to define those. http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc File be/src/runtime/krpc-data-stream-mgr.cc: http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc@199 PS7, Line 199: DeserializeTask payload = : {DeserializeTaskType::EARLY_SENDERS, finst_id, dest_node_id, 0}; : deserialize_pool_.Offer(move(payload)); doesn't this mean we make early sender draining single threaded? shoudl we instead use the sender_id in this case as well and offer work per sender? or do we think this doesn't matter? http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc@217 PS7, Line 217: already_unregistered that shouldn't be possible in the DEFERRED_BATCHES case, right? so i'd probably move this DCHECK into the cases below so you can tighten it up. http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc@235 PS7, Line 235: for (const unique_ptr& ctx : early_senders.waiting_sender_ctxs) { shouldn't we process waiting_sender_ctxs before closed_sender_ctxs? Otherwise, if the same sender is in both lists we'll process those RPCs out of order. I guess that can't really happen given the current implementation of not responding to early RPCs and that senders only let one in flight, but it still seems to make more sense to do it the other way around. http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc@236 PS7, Line 236: already_unregistered why is this possible in the waiting_sender_ctxs case but not the closed_sender_ctxs case? http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc@247 PS7, Line 247: already_unregistered why is that possible? http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc@248 PS7, Line 248: recvr->AddDeferredBatches(task.sender_id); So I guess we no longer multithread within a single sender queue (and for non-merging, within a single receiver) doing it this way. I think that's okay but was it intentional? http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-recvr.h File be/src/runtime/krpc-data-stream-recvr.h: http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-recvr.h@78 PS7, Line 78: The caller must acquire data from the : /// returned batch is that talking about calling TransferAllResources(), or can the caller do it directly? http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc File be/src/runtime/krpc-data-stream-recvr.cc: http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc@127 PS6, Line 127: // If true, the receiver fragment for this stream got cancelled. > For the non-merging case, there is essentially only one queue. As mentioned elsewhere, I'm not totally convinced yet that this is the right way to do it but, yes, we can think about it more and change it later if necessary. http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-recvr.cc File be/src/runtime/krpc-data-stream-recvr.cc: http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-recvr.cc@72 PS7, Line 72: s typo http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-recvr.cc@77 PS7, Line 77: Adds as many deferred batches as possible hmm I'm still not convinced this is the right thing to do (in the merging case). It seems like it's left up to chance as to the order that deferred batches are drained across the sender queues. But we can think about this more and address it later. http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-recvr.cc@97 PS7, Line 97: (1) 'batch_queue' is empty and there is no
[Impala-ASF-CR] IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8418 ) Change subject: IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8418/2/docs/topics/impala_known_issues.xml File docs/topics/impala_known_issues.xml: http://gerrit.cloudera.org:8080/#/c/8418/2/docs/topics/impala_known_issues.xml@951 PS2, Line 951: Medium Missed this on the first pass - shouldn't this be high? -- To view, visit http://gerrit.cloudera.org:8080/8418 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb Gerrit-Change-Number: 8418 Gerrit-PatchSet: 2 Gerrit-Owner: John RussellGerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 22:48:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6144: PublishFilter() continues to run after query failure/cancellation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8455 ) Change subject: IMPALA-6144: PublishFilter() continues to run after query failure/cancellation .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1434/ -- To view, visit http://gerrit.cloudera.org:8080/8455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed Gerrit-Change-Number: 8455 Gerrit-PatchSet: 2 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 22:47:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6144: PublishFilter() continues to run after query failure/cancellation
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8455 ) Change subject: IMPALA-6144: PublishFilter() continues to run after query failure/cancellation .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed Gerrit-Change-Number: 8455 Gerrit-PatchSet: 2 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 22:45:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8436 ) Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction .. Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.h@376 PS1, Line 376: virtual bool DictionaryReferencesBuffer() const { return false; } > Add comment. Make pure if possible. Alternatively, you could do something s I have removed this and the other similar functions, and used directly slot_desc_->type().IsVarLenStringType() instead. http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@272 PS1, Line 272: virtual bool DictionaryReferencesBuffer() const { > add "override" This function was removed. http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@541 PS1, Line 541: some > how about rewording this to "Column readers for types that require ..." This function was removed. http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@542 PS1, Line 542: inline bool DictionaryReferencesBufferInline() const { > nit: single line This function was removed. http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@945 PS1, Line 945: tmp_buffer > better name Done http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@947 PS1, Line 947: int uncompressed_size = decompressor_.get() != nullptr > Move this to L954 Done http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@951 PS1, Line 951: if (decompressor_.get() == nullptr && !DictionaryReferencesBuffer()) { > Can you add a brief comment to explain how the 4 different cases are handle Done -- To view, visit http://gerrit.cloudera.org:8080/8436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 Gerrit-Change-Number: 8436 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Fri, 03 Nov 2017 22:39:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
Hello Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8436 to look at the new patch set (#4). Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction .. IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction During dictionary constructon, most types are copied from the parquet dictionary page, but StringValues keep pointers to it. In this case, the dictionary page must be kept and attached to the last row batch that references it. In case of other types, it is safe do delete the dictionary page after the construction of the dictionary. This patch contains two optimizations: - dictionary pages are deleted as soon as possible for non string types - in non-compressed and non-string case, an unnecessary copy is avoided Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 --- M be/src/exec/parquet-column-readers.cc 1 file changed, 35 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8436/4 -- To view, visit http://gerrit.cloudera.org:8080/8436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 Gerrit-Change-Number: 8436 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-6087: Revisit tests withheld from TPC-DS suite for use of TRUNCATE
Mostafa Mokhtar has posted comments on this change. ( http://gerrit.cloudera.org:8080/8372 ) Change subject: IMPALA-6087: Revisit tests withheld from TPC-DS suite for use of TRUNCATE .. Patch Set 4: (5 comments) http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q26.test File testdata/workloads/tpcds/queries/tpcds-q26.test: http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q26.test@3 PS4, Line 3: -- RESULT MISMATCH FROM ORIGINAL Please use a more descriptive comment. What kind of mismatch? Which column? Description of mismatch. http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q26.test@4 PS4, Line 4: -- TAKE ACTUAL RESULT AS EXPECTED What does this mean? http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q47.test File testdata/workloads/tpcds/queries/tpcds-q47.test: http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q47.test@3 PS4, Line 3: -- RESULT MISMATCH FROM ORIGINAL in LSD of DECIMAL values Would be good to use an example to better describe the mismatch http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q59.test File testdata/workloads/tpcds/queries/tpcds-q59.test: http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q59.test@3 PS4, Line 3: Remove line http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q63.test File testdata/workloads/tpcds/queries/tpcds-q63.test: http://gerrit.cloudera.org:8080/#/c/8372/4/testdata/workloads/tpcds/queries/tpcds-q63.test@3 PS4, Line 3: Remove line -- To view, visit http://gerrit.cloudera.org:8080/8372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I79d2e34621639c8f8c4c4eb0b0944eaefca13a7a Gerrit-Change-Number: 8372 Gerrit-PatchSet: 4 Gerrit-Owner: Tim WoodGerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Wood Gerrit-Comment-Date: Fri, 03 Nov 2017 22:35:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
Hello Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8436 to look at the new patch set (#3). Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction .. IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction During dictionary constructon, most types are copied from the parquet dictionary page, but StringValues keep pointers to it. In this case, the dictionary page must be kept and attached to the last row batch that references it. In case of other types, it is safe do delete the dictionary page after the construction of the dictionary. This patch contains two optimizations: - dictionary pages are deleted as soon as possible for non string types - in non-compressed and non-string case, an unnecessary copy is avoided Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 --- M be/src/exec/parquet-column-readers.cc 1 file changed, 36 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8436/3 -- To view, visit http://gerrit.cloudera.org:8080/8436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 Gerrit-Change-Number: 8436 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-6121: remove I/O mgr request context cache
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8408 ) Change subject: IMPALA-6121: remove I/O mgr request context cache .. Patch Set 6: Code-Review+1 Rebased since some of the patches under it changed. -- To view, visit http://gerrit.cloudera.org:8080/8408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I91414eceaa4938fccd74686fe6bebede6ef36108 Gerrit-Change-Number: 8408 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 22:30:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
Hello Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8436 to look at the new patch set (#2). Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction .. IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction During dictionary constructon, most types are copied from the parquet dictionary page, but StringValues keep pointers to it. In this case, the dictionary page must be kept and attached to the last row batch that references it. In case of other types, it is safe do delete the dictionary page after the construction of the dictionary. This patch contains two optimizations: - dictionary pages are deleted as soon as possible for non string types - in non-compressed and non-string case, an unnecessary copy is avoided Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 --- M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h 2 files changed, 38 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8436/2 -- To view, visit http://gerrit.cloudera.org:8080/8436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 Gerrit-Change-Number: 8436 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-6134: Update code base to use impala::ConditionVariable
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8428 ) Change subject: IMPALA-6134: Update code base to use impala::ConditionVariable .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8428/1/be/src/util/condition-variable.h File be/src/util/condition-variable.h: http://gerrit.cloudera.org:8080/#/c/8428/1/be/src/util/condition-variable.h@65 PS1, Line 65: bool TimedWait(boost::unique_lock& lock, > Now that I looked into the callsites more carefully, I realized that there I like those names - good to distinguish between the two cases. Seems reasonable, especially considering there's precedent in the standard library. -- To view, visit http://gerrit.cloudera.org:8080/8428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3085c6dcb42350b61244df6e7f091a1e7db356c9 Gerrit-Change-Number: 8428 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 03 Nov 2017 22:22:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Fix errant, newline-including log directory.
Tim Armstrong has removed a vote on this change. Change subject: Fix errant, newline-including log directory. .. Removed Verified-1 by Impala Public Jenkins (255) -- To view, visit http://gerrit.cloudera.org:8080/8459 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317 Gerrit-Change-Number: 8459 Gerrit-PatchSet: 1 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Fix errant, newline-including log directory.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8459 ) Change subject: Fix errant, newline-including log directory. .. Patch Set 1: Looks like IMPALA-6092. We definitely shouldn't be hitting a DCHECK regardless so I filed IMPALA-6154 -- To view, visit http://gerrit.cloudera.org:8080/8459 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317 Gerrit-Change-Number: 8459 Gerrit-PatchSet: 1 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 22:20:22 + Gerrit-HasComments: No
[Impala-ASF-CR] Fix errant, newline-including log directory.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8459 ) Change subject: Fix errant, newline-including log directory. .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1433/ -- To view, visit http://gerrit.cloudera.org:8080/8459 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317 Gerrit-Change-Number: 8459 Gerrit-PatchSet: 1 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 22:20:31 + Gerrit-HasComments: No
[Impala-ASF-CR] Fix errant, newline-including log directory.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8459 ) Change subject: Fix errant, newline-including log directory. .. Patch Set 1: Looking through https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/574/artifact/Impala/logs_static/logs/, I think the following is the relevant issue. I'm hesitant to believe that a change to bin/clean.sh could have caused this. F1103 20:52:33.350702 67042 scalar-expr-evaluator.cc:70] Check failed: !initialized_ || closed_ I1103 20:52:33.415714 67008 status.cc:124] ImpalaRuntimeException: Unable to find class. CAUSED BY: ClassNotFoundException: org.apache.impala.TestUdf @ 0x15a0bef impala::Status::Status() @ 0x19a42c8 impala::JniUtil::GetJniExceptionMsg() @ 0x1d28b85 impala::HiveUdfCall::OpenEvaluator() @ 0x1d2b76d impala::ScalarExpr::OpenEvaluator() @ 0x1d6f55d impala::ScalarFnCall::OpenEvaluator() @ 0x1d31d94 impala::ScalarExprEvaluator::Clone() @ 0x1d31fa6 impala::ScalarExprEvaluator::Clone() @ 0x1ad088c impala::HdfsScanner::Open() @ 0x1b10f80 impala::HdfsTextScanner::Open() @ 0x1ab4fdf impala::HdfsScanNodeBase::CreateAndOpenScanner() @ 0x1aa9f99 impala::HdfsScanNode::ProcessSplit() @
[Impala-ASF-CR] IMPALA-4591: Bound Kudu client error mem usage
Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8464 Change subject: IMPALA-4591: Bound Kudu client error mem usage .. IMPALA-4591: Bound Kudu client error mem usage Previously, Kudu client errors could grow in size unbounded, potentially causing the process to be killed. This patch sets a bound on the mem that can be used for these error messages, with the size determined by the flag 'kudu_error_buffer_size'. If the errors for a Kudu client exceed this size, the query will fail, as some errors will be dropped and we won't be able to tell if all of the errors can be safely ignored. Testing: - Added a custom cluster test that verifies that a query that exceeds the limit fails. Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14 --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h M tests/custom_cluster/test_kudu.py 3 files changed, 39 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/8464/1 -- To view, visit http://gerrit.cloudera.org:8080/8464 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14 Gerrit-Change-Number: 8464 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] Fix errant, newline-including log directory.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8459 ) Change subject: Fix errant, newline-including log directory. .. Patch Set 1: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1432/ -- To view, visit http://gerrit.cloudera.org:8080/8459 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317 Gerrit-Change-Number: 8459 Gerrit-PatchSet: 1 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 21:39:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8438 ) Change subject: IMPALA-6137: fix text scanner split delim mem mgmt .. IMPALA-6137: fix text scanner split delim mem mgmt The bug was that the buffer pointed to by byte_buffer_ptr_ could be freed by ReleaseCompletedResources() before CheckForSplitDelimiter() was called. The simple fix is to copy out the single byte that is needed each time the buffer is filled. Testing: Ran exhaustive query tests under ASAN with --disable_mem_pools=true. Before the change test_text_split_delimiters reliably caused an ASAN failure when run with --disable_mem_pools=true. We should get this coverage automatically once the I/O mgr switches to using the buffer pool, which uses ASAN poisoning on freed buffers. Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c Reviewed-on: http://gerrit.cloudera.org:8080/8438 Reviewed-by: Tim ArmstrongTested-by: Impala Public Jenkins --- M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h 2 files changed, 43 insertions(+), 7 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c Gerrit-Change-Number: 8438 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8438 ) Change subject: IMPALA-6137: fix text scanner split delim mem mgmt .. Patch Set 6: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c Gerrit-Change-Number: 8438 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Fri, 03 Nov 2017 21:38:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6144: PublishFilter() continues to run after query failure/cancellation
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8455 ) Change subject: IMPALA-6144: PublishFilter() continues to run after query failure/cancellation .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed Gerrit-Change-Number: 8455 Gerrit-PatchSet: 2 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 21:06:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8461 ) Change subject: IMPALA-6151: add query-level fragment/backend counters .. Patch Set 2: I thought about adding metrics to track queries in other states - e.g. running but not cancelled, but that is trickier since the query lifecycle is more complicated and varies more between DML/DDL/queries. -- To view, visit http://gerrit.cloudera.org:8080/8461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9 Gerrit-Change-Number: 8461 Gerrit-PatchSet: 2 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 21:00:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8461 ) Change subject: IMPALA-6151: add query-level fragment/backend counters .. Patch Set 2: I also added a metric to track running queries, which mostafa requested. -- To view, visit http://gerrit.cloudera.org:8080/8461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9 Gerrit-Change-Number: 8461 Gerrit-PatchSet: 2 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 20:59:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters
Hello Thomas Tauber-Marshall, Mostafa Mokhtar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8461 to look at the new patch set (#2). Change subject: IMPALA-6151: add query-level fragment/backend counters .. IMPALA-6151: add query-level fragment/backend counters This adds NumBackends, NumFragments and NumFragmentInstances counters to the query profile to make it easier to manually or programmatically analyse the query. Also add a num-queries-registered metric to track the number of queries that have been executed but are not yet unregistered. Testing: Ran "select count(*) from alltypessmall" and checked profile: Backend startup latencies: Count: 3, min / max: 1ms / 1ms, 25th %-ile: 1ms, 50th %-ile: 1ms, 75th %-ile: 1ms, 90th %-ile: 1ms, 95th %-ile: 1ms, 99.9th %-ile: 1ms Per Node Peak Memory Usage: tarmstrong-box:22000(1.10 MB) tarmstrong-box:22001(1.02 MB) tarmstrong-box:22002(1.02 MB) - FiltersReceived: 0 (0) - FinalizationTimer: 0.000ns - NumBackends: 3 (3) - NumFragmentInstances: 4 (4) - NumFragments: 2 (2) Ran some queries and manually checked the num-queries-registered metric on the /metrics page when the queries were running and after they finished. Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9 --- M be/src/runtime/coordinator.cc M be/src/service/impala-server.cc M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h M common/thrift/metrics.json 5 files changed, 36 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/8461/2 -- To view, visit http://gerrit.cloudera.org:8080/8461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9 Gerrit-Change-Number: 8461 Gerrit-PatchSet: 2 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness
John Russell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8418 ) Change subject: IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/8418/1/docs/topics/impala_known_issues.xml File docs/topics/impala_known_issues.xml: http://gerrit.cloudera.org:8080/#/c/8418/1/docs/topics/impala_known_issues.xml@936 PS1, Line 936: Examine the HDFS_SCAN_NODE portion of a query profile that scans the > This unfortunately won't give accurate info for all queries: if the query i Done http://gerrit.cloudera.org:8080/#/c/8418/1/docs/topics/impala_known_issues.xml@937 PS1, Line 937: suspected table. Use a query that performs a full table scan, and materializes the column > It might be helpful to note common cases where uncompressed Parquet is/isn' Done -- To view, visit http://gerrit.cloudera.org:8080/8418 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb Gerrit-Change-Number: 8418 Gerrit-PatchSet: 2 Gerrit-Owner: John RussellGerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 20:37:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness
Hello Greg Rahn, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8418 to look at the new patch set (#2). Change subject: IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness .. IMPALA-4539: [DOCS] Add known issue for uncompressed Parquet correctness Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb --- M docs/topics/impala_known_issues.xml 1 file changed, 45 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/8418/2 -- To view, visit http://gerrit.cloudera.org:8080/8418 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I731eb0e029dc3cc251f4df0c5a8ad281c81595cb Gerrit-Change-Number: 8418 Gerrit-PatchSet: 2 Gerrit-Owner: John RussellGerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6144: PublishFilter() continues to run after query failure/cancellation
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8455 ) Change subject: IMPALA-6144: PublishFilter() continues to run after query failure/cancellation .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed Gerrit-Change-Number: 8455 Gerrit-PatchSet: 2 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 20:24:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6087: Revisit tests withheld from TPC-DS suite for use of TRUNCATE
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/8372 ) Change subject: IMPALA-6087: Revisit tests withheld from TPC-DS suite for use of TRUNCATE .. Patch Set 4: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I79d2e34621639c8f8c4c4eb0b0944eaefca13a7a Gerrit-Change-Number: 8372 Gerrit-PatchSet: 4 Gerrit-Owner: Tim WoodGerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Matthew Mulder Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Wood Gerrit-Comment-Date: Fri, 03 Nov 2017 20:23:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6144: PublishFilter() continues to run after query failure/cancellation
Hello Bharath Vissapragada, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8455 to look at the new patch set (#2). Change subject: IMPALA-6144: PublishFilter() continues to run after query failure/cancellation .. IMPALA-6144: PublishFilter() continues to run after query failure/cancellation The CoordinatorBackendState::PublishFilter() function does not check for query failure/cancellation. So if runtime filters are being published during/after a failure, they will not be cancelled and still be sent out which may take a while depending on the size of the cluster. Also, these functions could potentially hold very large amounts of untracked memory. This patch fixes it by checking for cancellation/failure in PublishFilter. Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator.cc 2 files changed, 9 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/8455/2 -- To view, visit http://gerrit.cloudera.org:8080/8455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed Gerrit-Change-Number: 8455 Gerrit-PatchSet: 2 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6144: PublishFilter() continues to run after query failure/cancellation
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8455 ) Change subject: IMPALA-6144: PublishFilter() continues to run after query failure/cancellation .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator-backend-state.cc@394 PS1, Line 394: { > same Done http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator-backend-state.cc@394 PS1, Line 394: { > Also, should we check if the backend is done too? Done http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator.cc@1082 PS1, Line 1082: TPublishFilterParams rpc_params; > lock_? I removed this check since the check in L1106 should be sufficient. http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator.cc@1106 PS1, Line 1106: if (filter_updates_received_->value() == 0) { > I think checking for the filter disabled is sufficient in this function, si Good point. I removed the above check. Plus acquiring the lock above to check the status would be very expensive. http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator.cc@1130 PS1, Line 1130: swap(rpc_params.bloom_filter, aggregated_filter); > I think we should also track the rpc_params memory and prevent the query fr Sure I opened IMPALA-6153 and linked it to IMPALA-1575. -- To view, visit http://gerrit.cloudera.org:8080/8455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed Gerrit-Change-Number: 8455 Gerrit-PatchSet: 2 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 20:13:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] Explain Boost setting needed for 96-bit timestamps
John Russell has abandoned this change. ( http://gerrit.cloudera.org:8080/7983 ) Change subject: [DOCS] Explain Boost setting needed for 96-bit timestamps .. Abandoned Suggested workaround was not a robust or future-proof solution. -- To view, visit http://gerrit.cloudera.org:8080/7983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I4b67cd7762f682c3a054e0d9641080aa51801c83 Gerrit-Change-Number: 7983 Gerrit-PatchSet: 2 Gerrit-Owner: John RussellGerrit-Reviewer: John Russell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8461 ) Change subject: IMPALA-6151: add query-level fragment/backend counters .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9 Gerrit-Change-Number: 8461 Gerrit-PatchSet: 1 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Fri, 03 Nov 2017 19:07:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 ) Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq .. Patch Set 10: Code-Review+1 Carry Alex's +1 -- To view, visit http://gerrit.cloudera.org:8080/8172 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941 Gerrit-Change-Number: 8172 Gerrit-PatchSet: 10 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 18:59:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 ) Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq .. Patch Set 9: (1 comment) Rebased onto master now that the predecessor patch is merged. http://gerrit.cloudera.org:8080/#/c/8172/9/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/8172/9/be/src/exec/hdfs-scan-node-base.cc@401 PS9, Line 401: state->AcquireReaderContext(reader_context_); It's also possible to remove this logic as a consequence of this patch, but I ended up doing it in a follow-on patch. https://gerrit.cloudera.org/#/c/8408/ -- To view, visit http://gerrit.cloudera.org:8080/8172 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941 Gerrit-Change-Number: 8172 Gerrit-PatchSet: 9 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 18:58:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6151: add query-level fragment/backend counters
Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8461 Change subject: IMPALA-6151: add query-level fragment/backend counters .. IMPALA-6151: add query-level fragment/backend counters This adds NumBackends, NumFragments and NumFragmentInstances counters to the query profile to make it easier to manually or programmatically analyse the query. Testing: Ran "select count(*) from alltypessmall" and checked profile: Backend startup latencies: Count: 3, min / max: 1ms / 1ms, 25th %-ile: 1ms, 50th %-ile: 1ms, 75th %-ile: 1ms, 90th %-ile: 1ms, 95th %-ile: 1ms, 99.9th %-ile: 1ms Per Node Peak Memory Usage: tarmstrong-box:22000(1.10 MB) tarmstrong-box:22001(1.02 MB) tarmstrong-box:22002(1.02 MB) - FiltersReceived: 0 (0) - FinalizationTimer: 0.000ns - NumBackends: 3 (3) - NumFragmentInstances: 4 (4) - NumFragments: 2 (2) Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9 --- M be/src/runtime/coordinator.cc 1 file changed, 12 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/8461/1 -- To view, visit http://gerrit.cloudera.org:8080/8461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I3df350414733e98d1ec28adc1c98f45bb0c4e3e9 Gerrit-Change-Number: 8461 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-2235: Fix current db when shell auto-reconnects
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8368 ) Change subject: IMPALA-2235: Fix current db when shell auto-reconnects .. Patch Set 3: Code-Review+1 Will give phil another chance to look -- To view, visit http://gerrit.cloudera.org:8080/8368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e Gerrit-Change-Number: 8368 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 03 Nov 2017 18:21:30 + Gerrit-HasComments: No
[Impala-ASF-CR] Fix errant, newline-including log directory.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8459 ) Change subject: Fix errant, newline-including log directory. .. Patch Set 1: Thanks for fixing this, this has caused me headaches. -- To view, visit http://gerrit.cloudera.org:8080/8459 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317 Gerrit-Change-Number: 8459 Gerrit-PatchSet: 1 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 18:18:04 + Gerrit-HasComments: No
[Impala-ASF-CR] Fix errant, newline-including log directory.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8459 ) Change subject: Fix errant, newline-including log directory. .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8459 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317 Gerrit-Change-Number: 8459 Gerrit-PatchSet: 1 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 18:17:39 + Gerrit-HasComments: No
[Impala-ASF-CR] Fix errant, newline-including log directory.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8459 ) Change subject: Fix errant, newline-including log directory. .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1432/ -- To view, visit http://gerrit.cloudera.org:8080/8459 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317 Gerrit-Change-Number: 8459 Gerrit-PatchSet: 1 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 18:17:51 + Gerrit-HasComments: No
[Impala-ASF-CR] Fix errant, newline-including log directory.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8459 Change subject: Fix errant, newline-including log directory. .. Fix errant, newline-including log directory. We've seen cases where a directory named "cluster\n " sneaks into the logs directory. The intention is that $IMPALA_ALL_LOGS_DIRS is a space-separated list, but clean.sh (as opposed to buildall.sh) treats it as a single argument. I tested this manually: Before: $ls logs/ be_tests/ custom_cluster_tests/ data_loading/ ee_tests/ fe_tests/ Bad command: $mkdir -p "${IMPALA_ALL_LOGS_DIRS}" After: $ls logs/ be_tests/ cluster? / custom_cluster_tests/ data_loading/ ee_tests/ fe_tests/ Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317 --- M bin/clean.sh 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/8459/1 -- To view, visit http://gerrit.cloudera.org:8080/8459 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317 Gerrit-Change-Number: 8459 Gerrit-PatchSet: 1 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8438 ) Change subject: IMPALA-6137: fix text scanner split delim mem mgmt .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/8438/5/be/src/exec/hdfs-text-scanner.h File be/src/exec/hdfs-text-scanner.h: http://gerrit.cloudera.org:8080/#/c/8438/5/be/src/exec/hdfs-text-scanner.h@144 PS5, Line 144: in > if? Done -- To view, visit http://gerrit.cloudera.org:8080/8438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c Gerrit-Change-Number: 8438 Gerrit-PatchSet: 5 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Fri, 03 Nov 2017 18:15:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt
Hello anujphadke, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8438 to look at the new patch set (#6). Change subject: IMPALA-6137: fix text scanner split delim mem mgmt .. IMPALA-6137: fix text scanner split delim mem mgmt The bug was that the buffer pointed to by byte_buffer_ptr_ could be freed by ReleaseCompletedResources() before CheckForSplitDelimiter() was called. The simple fix is to copy out the single byte that is needed each time the buffer is filled. Testing: Ran exhaustive query tests under ASAN with --disable_mem_pools=true. Before the change test_text_split_delimiters reliably caused an ASAN failure when run with --disable_mem_pools=true. We should get this coverage automatically once the I/O mgr switches to using the buffer pool, which uses ASAN poisoning on freed buffers. Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c --- M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h 2 files changed, 43 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/8438/6 -- To view, visit http://gerrit.cloudera.org:8080/8438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c Gerrit-Change-Number: 8438 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8438 ) Change subject: IMPALA-6137: fix text scanner split delim mem mgmt .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1431/ -- To view, visit http://gerrit.cloudera.org:8080/8438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c Gerrit-Change-Number: 8438 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Fri, 03 Nov 2017 18:16:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8438 ) Change subject: IMPALA-6137: fix text scanner split delim mem mgmt .. Patch Set 6: Code-Review+2 carry -- To view, visit http://gerrit.cloudera.org:8080/8438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c Gerrit-Change-Number: 8438 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Fri, 03 Nov 2017 18:15:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6148: Specifying thirdparty deps as URLs
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8456 Change subject: IMPALA-6148: Specifying thirdparty deps as URLs .. IMPALA-6148: Specifying thirdparty deps as URLs If the environment variable $IMPALA__URL is configured for a thirdparty dependency, use that to download it instead of the s3://native-toolchain bucket. This makes testing against arbitrary versions of the dependencies easier. I did a little bit of refactoring while here, creating a small class for a Package to handle reading the environment variables. I also changed bootstrap_toolchain.py to use Python logging, which cleans up the output during the multi-threaded downloading. I tested this by both with customized URLs and by running the regular build (pre-review-test, without most of the slow test suites). Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422 --- M bin/bootstrap_toolchain.py 1 file changed, 77 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/8456/2 -- To view, visit http://gerrit.cloudera.org:8080/8456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422 Gerrit-Change-Number: 8456 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8023 ) Change subject: IMPALA-4856: Port data stream service to KRPC .. Patch Set 7: (5 comments) http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.h File be/src/runtime/krpc-data-stream-mgr.h: http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.h@430 PS7, Line 430: i id http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.h@436 PS7, Line 436: specifies identifies http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc File be/src/runtime/krpc-data-stream-mgr.cc: http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc@61 PS7, Line 61: nit: blank space http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-sender.h File be/src/runtime/krpc-data-stream-sender.h: http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-sender.h@136 PS7, Line 136: // nit: /// http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-sender.cc File be/src/runtime/krpc-data-stream-sender.cc: http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-sender.cc@182 PS7, Line 182: It points to delete. -- To view, visit http://gerrit.cloudera.org:8080/8023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1 Gerrit-Change-Number: 8023 Gerrit-PatchSet: 7 Gerrit-Owner: Michael HoGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Fri, 03 Nov 2017 18:13:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8438 ) Change subject: IMPALA-6137: fix text scanner split delim mem mgmt .. Patch Set 5: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/8438/5/be/src/exec/hdfs-text-scanner.h File be/src/exec/hdfs-text-scanner.h: http://gerrit.cloudera.org:8080/#/c/8438/5/be/src/exec/hdfs-text-scanner.h@144 PS5, Line 144: in if? -- To view, visit http://gerrit.cloudera.org:8080/8438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c Gerrit-Change-Number: 8438 Gerrit-PatchSet: 5 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Fri, 03 Nov 2017 18:10:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6134: Update code base to use impala::ConditionVariable
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8428 ) Change subject: IMPALA-6134: Update code base to use impala::ConditionVariable .. Patch Set 1: I don't think it's necessary to unit-test it, given it's a thin wrapper. The test code would be more complex than the product code. -- To view, visit http://gerrit.cloudera.org:8080/8428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3085c6dcb42350b61244df6e7f091a1e7db356c9 Gerrit-Change-Number: 8428 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 03 Nov 2017 17:45:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt
Hello anujphadke, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8438 to look at the new patch set (#5). Change subject: IMPALA-6137: fix text scanner split delim mem mgmt .. IMPALA-6137: fix text scanner split delim mem mgmt The bug was that the buffer pointed to by byte_buffer_ptr_ could be freed by ReleaseCompletedResources() before CheckForSplitDelimiter() was called. The simple fix is to copy out the single byte that is needed each time the buffer is filled. Testing: Ran exhaustive query tests under ASAN with --disable_mem_pools=true. Before the change test_text_split_delimiters reliably caused an ASAN failure when run with --disable_mem_pools=true. We should get this coverage automatically once the I/O mgr switches to using the buffer pool, which uses ASAN poisoning on freed buffers. Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c --- M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h 2 files changed, 43 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/8438/5 -- To view, visit http://gerrit.cloudera.org:8080/8438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c Gerrit-Change-Number: 8438 Gerrit-PatchSet: 5 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8438 ) Change subject: IMPALA-6137: fix text scanner split delim mem mgmt .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/8438/4/be/src/exec/hdfs-text-scanner.h File be/src/exec/hdfs-text-scanner.h: http://gerrit.cloudera.org:8080/#/c/8438/4/be/src/exec/hdfs-text-scanner.h@90 PS4, Line 90: /// CheckForSplitDelimiter(). Valid if 'byte_buffer_filled_' is true. > Maybe briefly explain why we have it: the character is needed after the byt Done http://gerrit.cloudera.org:8080/#/c/8438/4/be/src/exec/hdfs-text-scanner.h@144 PS4, Line 144: virtual > I'm not following why we need a second virtual method. Is the wrapper meant We don't, that was my mistake. Removed the virtual. -- To view, visit http://gerrit.cloudera.org:8080/8438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c Gerrit-Change-Number: 8438 Gerrit-PatchSet: 4 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Fri, 03 Nov 2017 17:40:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6137: fix text scanner split delim mem mgmt
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8438 ) Change subject: IMPALA-6137: fix text scanner split delim mem mgmt .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/8438/4/be/src/exec/hdfs-text-scanner.h File be/src/exec/hdfs-text-scanner.h: http://gerrit.cloudera.org:8080/#/c/8438/4/be/src/exec/hdfs-text-scanner.h@90 PS4, Line 90: /// CheckForSplitDelimiter(). Valid if 'byte_buffer_filled_' is true. Maybe briefly explain why we have it: the character is needed after the byte buffer is returned, or whatever. http://gerrit.cloudera.org:8080/#/c/8438/4/be/src/exec/hdfs-text-scanner.h@144 PS4, Line 144: virtual I'm not following why we need a second virtual method. Is the wrapper meant to always apply to all subclasses? If so, it shouldn't need (or want it) to be virtual right? -- To view, visit http://gerrit.cloudera.org:8080/8438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iddbb5cf6acc8f0b0e0b4c205c334f21e03d06f1c Gerrit-Change-Number: 8438 Gerrit-PatchSet: 4 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Fri, 03 Nov 2017 16:52:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6134: Update code base to use impala::ConditionVariable
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8428 ) Change subject: IMPALA-6134: Update code base to use impala::ConditionVariable .. Patch Set 1: (1 comment) > Should I create tests for condition-variable.h? We don't seem to have unit > tests yet. I think ideally we'd have tests for helper code like this. I'm ok if we think it's not worth it here given the low complexity. http://gerrit.cloudera.org:8080/#/c/8428/1/be/src/util/condition-variable.h File be/src/util/condition-variable.h: http://gerrit.cloudera.org:8080/#/c/8428/1/be/src/util/condition-variable.h@38 PS1, Line 38: void Wait(boost::unique_lock& lock) { > About the 'struct' keyword: it is something that C requires, but in C++ it Thanks! -- To view, visit http://gerrit.cloudera.org:8080/8428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3085c6dcb42350b61244df6e7f091a1e7db356c9 Gerrit-Change-Number: 8428 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 03 Nov 2017 16:38:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8434 ) Change subject: IMPALA-5564: Release lock during planning. (wip) .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8434/1/be/src/service/client-request-state.h File be/src/service/client-request-state.h: http://gerrit.cloudera.org:8080/#/c/8434/1/be/src/service/client-request-state.h@342 PS1, Line 342: is_planning_ > I started out with this being planning_finished_ (or some such). I ran into I see - the ExecuteMetadataOp() cases. -- To view, visit http://gerrit.cloudera.org:8080/8434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b Gerrit-Change-Number: 8434 Gerrit-PatchSet: 1 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Fri, 03 Nov 2017 16:33:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2181: Add query option levels for display
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8447 ) Change subject: IMPALA-2181: Add query option levels for display .. Patch Set 5: Code-Review-1 (3 comments) I took a quick look through. I'm uncomfortable with this change as it stands, because I feel it abuses the Thrift structures too much. Specifically, keeping the map of query option levels in TQueryOptions seems wrong: you're really describing "Query Option Metadata" (which might include name, description/help, and level), rather than the query options themselves. And, for sending this stuff to impala-shell, abusing the 'description' field seems wrong. (In fact, impala-shell should actually show useful help for these options. Things like "MT_DOP" are completely opaque. Implementation wise, I think we also need to worry about what happens for the HiveServer2 side of things. Specifically, we respond to "SET" queries over HS2 at https://github.com/apache/incubator-impala/blob/1803b403e38b3f952afc4ff3fd3e5f4d14c088f8/be/src/service/client-request-state.cc#L194. I think we want to stay away from adding information into the Beeswax API that isn't also exposed in HS2. One option is to let "SET ALL" be meaningful as a query for HS2, though I don't quite know what that means in terms of the schema returned: how would you divide the regular/advanced/deprecated options over that API? http://gerrit.cloudera.org:8080/#/c/8447/5/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/8447/5/common/thrift/ImpalaInternalService.thrift@65 PS5, Line 65: // Levels to use when displaying query options from Impala shell Let's leave a pointer here indicating where the mapping from the options in TQueryOptions to these levels is defined. http://gerrit.cloudera.org:8080/#/c/8447/5/common/thrift/ImpalaInternalService.thrift@297 PS5, Line 297: // Map for query option name to option level mapping. Populated by ImpalaServer : // on startup : 61: required mapquery_option_levels; This worries me a bit, for a couple of reasons. First, 'required' is forever. (See https://diwakergupta.github.io/thrift-missing-guide/#_defining_structs.) I think it means that a Thrift client will refuse to serialize a struct if this isn't set. For some uses, like TClientRequest, it makes no sense for us to carry this around. It's possible it works out because it can be set to the empty map often, but it's still odd. Second, this isn't a query option. It's certainly describing query options, but it's very very different from, say, "mem_limit", which applies to the query. http://gerrit.cloudera.org:8080/#/c/8447/5/shell/impala_client.py File shell/impala_client.py: http://gerrit.cloudera.org:8080/#/c/8447/5/shell/impala_client.py@100 PS5, Line 100: def get_query_option_level_from_description(self, description): Have we decided we can't change ConfigVariable to include a level? It seems like this is very much not a description, and we're abusing it. /** Represents a Hadoop-style configuration variable. */ struct ConfigVariable { 1: string key, 2: string value, 3: string description } -- To view, visit http://gerrit.cloudera.org:8080/8447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1 Gerrit-Change-Number: 8447 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 03 Nov 2017 16:33:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2181: Add query option levels for display
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8447 ) Change subject: IMPALA-2181: Add query option levels for display .. Patch Set 5: Sorry I didn't see the discussion on the JIRA about "advanced". I don't think that solves the problem that the JIRA was meaning to solve (though I don't think implementation would have to change too much). I'll add a comment to the JIRA. -- To view, visit http://gerrit.cloudera.org:8080/8447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1 Gerrit-Change-Number: 8447 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 03 Nov 2017 16:17:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6144: UpdateFilter()/PublishFilter() continue to run after query failure/cancellation
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8455 ) Change subject: IMPALA-6144: UpdateFilter()/PublishFilter() continue to run after query failure/cancellation .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator.cc@1130 PS1, Line 1130: swap(rpc_params.bloom_filter, aggregated_filter); I think we should also track the rpc_params memory and prevent the query from being unregistered and releasing admission control resources until this function has exited - could you maybe file a follow-up JIRA? That might depend on my admission control change anyway. -- To view, visit http://gerrit.cloudera.org:8080/8455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed Gerrit-Change-Number: 8455 Gerrit-PatchSet: 1 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 16:08:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 11: (19 comments) http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@13 PS9, Line 13: In RuntimeFilterGenerator in the planner, each hash join node > ... each hash join node generates a bloom and min-max filter for each equi- Done http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@27 PS9, Line 27: > Not specific to the code changes, and I don't expect a response here (proba RUNTIME_FILTER_WAIT_TIME_MS applies the same to bloom and min-max - its the time a scan node will wait for any filter of either type to arrive. Similarly, RUNTIME_FILTER_MODE works the same for min-max as its worked for bloom. The size related params - RUNTIME_BLOOM_FILTER_SIZE, RUNTIME_FILTER_MIN/MAX_SIZE only apply to bloom filters as the mem used by min-max is small and bounded. I've updated some comments to make the above clearer. One tricky case is DISABLE_ROW_RUNTIME_FILTERING. Kudu applies the filters on both a per-partition and per-row basis. At the moment, there's no way to change this - the filters are applied through the same interface as eg. predicates that are pushed down, so Kudu assumes that they have to be applied for correctness, not just for performance. So, I could see the argument either way - we could disable Kudu filters if DISABLE_ROW_RUNTIME_FILTERING is true, though it would also disable the partition filtering, or we could just not consider DISABLE_ROW_RUNTIME_FILTERING for Kudu filters. The remaining filter related option is MAX_NUM_RUNTIME_FILTERS, which I've addressed in another comment. http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@41 PS9, Line 41: > Contrived extreme queries are good data points, but how about running the T Those results are posted in the review comments. I can include them here as well, I just felt it made the commit message excessively large. http://gerrit.cloudera.org:8080/#/c/7793/9//COMMIT_MSG@44 PS9, Line 44: timings are averages of 3 runs. > does not eliminate Done http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/runtime/timestamp-value.h@164 PS8, Line 164: tvalue.ti > I'm confused by this variable name. Done http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter-test.cc File be/src/util/min-max-filter-test.cc: http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter-test.cc@27 PS8, Line 27: > Can you maybe briefly describe what is tested for each filter type? There s Done http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter-test.cc@252 PS8, Line 252: MinMaxFilter::Create(tFilter, string_type, _pool, _pool); > Would TestEnv work? That is the semi-standard way to create an ExecEnv for Done http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter-test.cc@353 PS8, Line 353: t3.ToTColumnValue(); > Remove? Done http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter.cc File be/src/util/min-max-filter.cc: http://gerrit.cloudera.org:8080/#/c/7793/8/be/src/util/min-max-filter.cc@246 PS8, Line 246: null > nullptr? Done http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/7793/9/common/thrift/ImpalaInternalService.thrift@202 PS9, Line 202: // Maximum number of bloom runtime filters allowed per query > I think I understand why you did this, but it seems confusing from a user's There's basically two reasons for this: - We don't want to regress queries by eliminating bloom filters that used to be generated. - The purpose of this param in the first place was to prevent the coordinator from being overwhelmed. Min-max filters are less heavy-weight than bloom filters so they don't put as much stress on the coordinator anyways. Given that, I think that it would be fine to maintain this behavior even once HDFS can do min-max and Kudu can do bloom. Another concern though is what to do if we add in-list filters, which are probably similarly heavy-weight to bloom filters. One option would be to deprecate this and have separate max_num_bloom and max_num_in_list params, but that may be confusing and requires user action to keep things working as expected anyways. Another option at that point would be to keep this param and make it "Maximum number of expensive filters (bloom or in-list)", and then there's just once we have to worry about regressing queries. Or maybe we should just apply this to min-max filters as well, since its the least confusing option in the long run, possibly bumping up the default a
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim Armstrong, Todd Lipcon, Mostafa Mokhtar, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7793 to look at the new patch set (#11). Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter or a min-max filter, depending on if it has HDFS or Kudu targets, respectively. In RuntimeFilterGenerator in the planner, each hash join node generates a bloom and min-max filter for each equi-join predicate, but only those filters that end up being assigned to a target make it into the final plan. Min-max filters are only assigned to Kudu scans if the target expr is a column, as Kudu doesn't support bounds on general exprs, and only if the join op is '=' and not 'is distinct from', as Kudu doesn't support returning NULLs if a bound is set. Min-max filters are inserted into by the PartitionedHashJoinBuilder. Codegen is used to eliminate branching on the type of filter. String min-max filters truncate their bounds at 1024 chars, so that the max amount of memory used by min-max filters is negligible. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Functional Testing: - Added new planner tests and updated the old ones. (in old tests, a lot of runtime filters are renumbered as we always generate min-max filters even if they don't end up getting assigned and they take up some of the RF ids). - Updated existing runtime filter tests to work with Kudu. - Added e2e tests for min-max filter specific functionality. Perf Testing: - All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu, timings are averages of 3 runs. - Ran a contrived query with a filter that does not eliminate any rows (full self join of lineitem). The difference in running time was negligible - 24.46s with filters on, 24.15s with filters off for a ~1% slowdown. - Ran a contrived query with a filter that elimiates all rows (self join on lineitem with a join condition that never matches). The filters resulted in a significant speedup - 0.26s with filters on, 1.46s with filters off for a ~5.6x speedup. This query is added to targeted-perf. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node-mt.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/runtime/timestamp-value.h M be/src/service/impala-internal-service.cc M be/src/util/CMakeLists.txt A be/src/util/min-max-filter-ir.cc A be/src/util/min-max-filter-test.cc A be/src/util/min-max-filter.cc A be/src/util/min-max-filter.h M common/thrift/Data.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim Armstrong, Todd Lipcon, Mostafa Mokhtar, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7793 to look at the new patch set (#10). Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter or a min-max filter, depending on if it has HDFS or Kudu targets, respectively. In RuntimeFilterGenerator in the planner, each partitioned hash join generates a bloom and min-max filter, but only those filters that end up being assigned to a target make it into the final plan. Min-max filters are only assigned to Kudu scans if the target expr is a column, as Kudu doesn't support bounds on general exprs, and only if the join op is '=' and not 'is distinct from', as Kudu doesn't support returning NULLs if a bound is set. Min-max filters are inserted into by the PartitionedHashJoinBuilder. Codegen is used to eliminate branching on the type of filter. String min-max filters truncate their bounds at 1024 chars, so that the max amount of memory used by min-max filters is negligible. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Functional Testing: - Added new planner tests and updated the old ones. (in old tests, a lot of runtime filters are renumbered as we always generate min-max filters even if they don't end up getting assigned and they take up some of the RF ids). - Updated existing runtime filter tests to work with Kudu. - Added e2e tests for min-max filter specific functionality. Perf Testing: - All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu, timings are averages of 3 runs. - Ran a contrived query with a filter that does eliminate any rows (full self join of lineitem). The difference in running time was negligible - 24.46s with filters on, 24.15s with filters off for a ~1% slowdown. - Ran a contrived query with a filter that elimiates all rows (self join on lineitem with a join condition that never matches). The filters resulted in a significant speedup - 0.26s with filters on, 1.46s with filters off for a ~5.6x speedup. This query is added to targeted-perf. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node-mt.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/runtime/timestamp-value.h M be/src/service/impala-internal-service.cc M be/src/util/CMakeLists.txt A be/src/util/min-max-filter-ir.cc A be/src/util/min-max-filter-test.cc A be/src/util/min-max-filter.cc A be/src/util/min-max-filter.h M common/thrift/Data.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M
[Impala-ASF-CR] IMPALA-6144: UpdateFilter()/PublishFilter() continue to run after query failure/cancellation
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8455 ) Change subject: IMPALA-6144: UpdateFilter()/PublishFilter() continue to run after query failure/cancellation .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator-backend-state.cc@394 PS1, Line 394: if (!status_.ok()) return; > same Also, should we check if the backend is done too? I think we should also include a comment explaining what this achieves. http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator.cc@1106 PS1, Line 1106: if (state->disabled()) return; I think checking for the filter disabled is sufficient in this function, since it's disabled when resources are released. -- To view, visit http://gerrit.cloudera.org:8080/8455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed Gerrit-Change-Number: 8455 Gerrit-PatchSet: 1 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 16:02:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8023 ) Change subject: IMPALA-4856: Port data stream service to KRPC .. Patch Set 7: (81 comments) http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/exec/kudu-util.h File be/src/exec/kudu-util.h: http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/exec/kudu-util.h@45 PS6, Line 45: .ok()) > nit: use same mangling between this and KUDU_RETURN_IF_ERROR (prepend seems Done http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/exec-env.cc@89 PS6, Line 89: "processing threads. If left at default value 0, it will be set to number of CPU " > how are these defaults chosen? I didn't really change these parameters from Henry's patch. There are no good defaults for the service queue size any way (IMPALA-6116) but 1024 seems to be a reasonable bound for untracked memory consumption. The latest PS sets num_svc_threads to match the number of cores on the system. http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.h File be/src/runtime/krpc-data-stream-mgr.h: http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.h@97 PS6, Line 97: etion at any one : /// time). The rate of transmission is controlled by the receiver: a sender will only : /// schedule batch transmission when the previous transmissi > i'm confused what the "two" cases are from this comment. Also, I think it' Done http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.h@103 PS6, Line 103: batch que > deserialized? Done http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.h@104 PS6, Line 104: moved from t > what is this "respectively" referring to? The two cases. Removed as this seems unnecessary. http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.h@249 PS6, Line 249: for Transm > unclear what that means, maybe stale comment? And this should say something Done http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.h@258 PS6, Line 258: > isn't that part of "memory pointed to by 'request'"? If so and you want to Done http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.h@294 PS6, Line 294: > How about getting rid of this typedef? The code seems easier to understand This data structure has been renamed to DeserializeTask and also changed in the latest patch. http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.h@428 PS6, Line 428: > a sender Done http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h File be/src/runtime/krpc-data-stream-mgr.h: http://gerrit.cloudera.org:8080/#/c/8023/3/be/src/runtime/krpc-data-stream-mgr.h@314 PS3, Line 314: t > Done Oops..missed it. Will update in later patch. http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.cc File be/src/runtime/krpc-data-stream-mgr.cc: http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.cc@59 PS6, Line 59: > I don't have a strong preference either way, but it'd be nice to be consist My general guideline is to use lambda if it's one or two liners and use bind for larger functions. I guess there many different opinions on this topic. I try to optimize for readability. http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-mgr.cc@322 PS6, Line 322: tonicMillis() + STREAM_E > shouldn't be plural Done http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.h File be/src/runtime/krpc-data-stream-recvr.h: http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.h@119 PS6, Line 119: row batch i > row batch is deserialized Done http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc File be/src/runtime/krpc-data-stream-recvr.cc: http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc@52 PS6, Line 52: total amount of > it's not clear what that means just from reading the comment. It'd be nice Done http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc@113 PS6, Line 113: erialized > how about using unique_ptr since this owns the row batch (until it's transf Done http://gerrit.cloudera.org:8080/#/c/8023/6/be/src/runtime/krpc-data-stream-recvr.cc@127 PS6, Line 127: // If true, the receiver fragment for this stream got cancelled. > given that a single soft limit is imposed across all sender queues, does it For the non-merging case, there is essentially only one queue. For the merging case, I can see that it's possible for a blocked row batch to a sender queue being passed by either new incoming row batches or blocked row batches to another sender queue.
[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC
Hello Sailesh Mukil, Mostafa Mokhtar, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8023 to look at the new patch set (#7). Change subject: IMPALA-4856: Port data stream service to KRPC .. IMPALA-4856: Port data stream service to KRPC This patch implements a new data stream service which utilizes KRPC. Similar to the thrift RPC implementation, there are 3 major components to the data stream services: KrpcDataStreamSender serializes and sends row batches materialized by a fragment instance to a KrpcDataStreamRecvr. KrpcDataStreamMgr is responsible for routing an incoming row batch to the appropriate receiver. The data stream service runs on the port FLAGS_krpc_port which is 29000 by default. Unlike the implementation with thrift RPC, KRPC provides an asynchronous interface for invoking remote methods. As a result, KrpcDataStreamSender doesn't need to create a thread per connection. There is one connection between two Impalad nodes for each direction (i.e. client and server). Multiple queries can multi-plex on the same connection for transmitting row batches between two Impalad nodes. The asynchronous interface also prevents some issues with thrift RPC in which a thread may be stuck in an RPC call without being able to check for cancellation. A TransmitData() call with KRPC is in essence a trio of RpcController, a serialized protobuf request buffer and a protobuf response buffer. The call is invoked via a DataStreamService proxy object. The serialized tuple offsets and row batches are sent via "sidecars" in KRPC to avoid extra copy into the serialized request buffer. Each impalad node creates a singleton DataStreamService object at start-up time. All incoming calls are served by a service thread pool created as part of DataStreamService. By default, there are 64 service threads. The service threads are shared across all queries so the RPC handler should avoid blocking as much as possible. In thrift RPC implementation, we make a thrift thread handling a TransmitData() RPC to block for extended period of time when the receiver is not yet created when the call arrives. In KRPC implementation, we store TransmitData() or EndDataStream() requests which arrive before the receiver is ready in a per-receiver early sender list stored in KrpcDataStreamMgr. These RPC calls will be processed and responded to when the receiver is created or when timeout occurs. Similarly, there is limited space in the sender queues in KrpcDataStreamRecvr. If adding a row batch to a queue in KrpcDataStreamRecvr causes the buffer limit to exceed, the request will be stashed in a blocked_sender_ queue to be processed later. The stashed RPC request will not be responded to until it is processed so as to exert back pressure to the client. An alternative would be to reply with an error and the request / row batches need to be sent again. This may end up consuming more network bandwidth than the thrift RPC implementation. This change adopts the behavior of allowing one stashed request per sender. All rpc requests and responses are serialized using protobuf. The equivalent of TRowBatch would be ProtoRowBatch which contains a serialized header about the meta-data of the row batch and two Kudu Slice objects which contain pointers to the actual data (i.e. tuple offsets and tuple data). This patch is based on an abandoned patch by Henry Robinson. TESTING --- * Build passed with FLAGS_use_krpc=true. TO DO - * Port some BE tests to KRPC services. Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1 --- M be/src/common/status.cc M be/src/common/status.h M be/src/exec/data-sink.cc M be/src/exec/exchange-node.cc M be/src/exec/kudu-util.h M be/src/rpc/CMakeLists.txt M be/src/rpc/rpc-mgr.cc M be/src/rpc/rpc-mgr.h M be/src/runtime/CMakeLists.txt M be/src/runtime/data-stream-mgr-base.h M be/src/runtime/data-stream-mgr.h M be/src/runtime/data-stream-recvr.h M be/src/runtime/data-stream-sender.h M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/krpc-data-stream-mgr.cc M be/src/runtime/krpc-data-stream-mgr.h M be/src/runtime/krpc-data-stream-recvr.cc M be/src/runtime/krpc-data-stream-recvr.h A be/src/runtime/krpc-data-stream-sender.cc A be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/service/CMakeLists.txt A be/src/service/data-stream-service.cc A be/src/service/data-stream-service.h M be/src/service/impala-server.cc M cmake_modules/FindProtobuf.cmake M common/protobuf/CMakeLists.txt A common/protobuf/common.proto A common/protobuf/data_stream_service.proto A common/protobuf/row_batch.proto M common/thrift/generate_error_codes.py 33 files changed, 3,124 insertions(+), 180 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8023/7 -- To view, visit http://gerrit.cloudera.org:8080/8023 To unsubscribe,
[Impala-ASF-CR] IMPALA-2235: Fix current db when shell auto-reconnects
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/8368 ) Change subject: IMPALA-2235: Fix current db when shell auto-reconnects .. Patch Set 3: (2 comments) I removed ClusterController and left test_breakpad.py untouched. The functionality available in CustomClusterTestSuite was enough for my test cases. http://gerrit.cloudera.org:8080/#/c/8368/2/tests/custom_cluster/test_shell_interactive_reconnect.py File tests/custom_cluster/test_shell_interactive_reconnect.py: http://gerrit.cloudera.org:8080/#/c/8368/2/tests/custom_cluster/test_shell_interactive_reconnect.py@25 PS2, Line 25: > Perhaps use an actual tempfile created with tempfile.[something] Done http://gerrit.cloudera.org:8080/#/c/8368/2/tests/shell/util.py File tests/shell/util.py: http://gerrit.cloudera.org:8080/#/c/8368/2/tests/shell/util.py@115 PS2, Line 115: move_shell_h > maybe move_shell_history()/restore_shell_history()? It's more verbose but I Done -- To view, visit http://gerrit.cloudera.org:8080/8368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e Gerrit-Change-Number: 8368 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 03 Nov 2017 15:03:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2235: Fix current db when shell auto-reconnects
Hello Philip Zeyliger, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8368 to look at the new patch set (#3). Change subject: IMPALA-2235: Fix current db when shell auto-reconnects .. IMPALA-2235: Fix current db when shell auto-reconnects When precmd tested the connection it didn't validate that if we are using the previously selected DB. The _validate_database method is responsible for that, but it only appended the "use " command to the cmdqueue (command queue of Cmd class). But, at this point we might already have commands in the command queue that will run before the "use " command. Also, the command processed by precmd can entirely skip the cmdqueue, therefore it is not enough to insert the "use " command to the front of cmdqueue. We need to issue the USE command with the onecmd() method to execute it immediately. I extended the _validate_database method with an "immediately" flag. If this is true, _validate_database will use the onecmd() method. Otherwise, it will append the USE command to the cmdqueue to maintain the previous behaviour. I added a new automated test suite named test_shell_interactive_reconnect.py to the "custom cluster" tests. It sets the default database, and after reconnection it checks if the shell set it again automatically. One test case checks if the shell set the default db after manually reconnecting to the impala daemon by issuing the CONNECT command. The other test case checks if the shell set the default db after automatic reconnection due to cluster restart. I needed to backup the impala shell history file because I didn't want to pollute it by the test cases (just like the way it is done in tests/shell/test_shell_interactive.py). I created utility functions for this in tests/shell/util.py and now test_shell_interactive.py and the newly created test suite are using these utility functions. Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e --- M shell/impala_shell.py A tests/custom_cluster/test_shell_interactive_reconnect.py M tests/shell/test_shell_interactive.py M tests/shell/util.py 4 files changed, 102 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/8368/3 -- To view, visit http://gerrit.cloudera.org:8080/8368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e Gerrit-Change-Number: 8368 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-2181: Add query option levels for display
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/8447 ) Change subject: IMPALA-2181: Add query option levels for display .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/8447/5/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/8447/5/be/src/service/query-options.cc@618 PS5, Line 618: Probably I'm missing something here, but shouldn't this be ADVANCED? http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/8447/2/shell/impala_shell.py@127 PS2, Line 127: PRINT_REGULAR_OPTIONS_ONLY = 1 : PRINT_ALL_OPTIONS = 2 > I found it more verbose to have 2 const instead of a bool. Ok, I see your point. You could also put the PRINT_* members into a separate class to emulate enums (see class CmdStatus in L61-L69). Same goes for QUERY_OPTION_* members. On the other hand this might be an overkill, so use your best judgement :) http://gerrit.cloudera.org:8080/#/c/8447/2/tests/shell/test_shell_interactive.py File tests/shell/test_shell_interactive.py: http://gerrit.cloudera.org:8080/#/c/8447/2/tests/shell/test_shell_interactive.py@351 PS2, Line 351:result = shell1.get_result() : assert "Query options (defaults shown in []):" in result.stdout : assert "ABORT_ON_ERROR" in result.stdout > No. SUPPORT_START_OVER as I discussed with Tim is a tricky one and we don't Thanks for the explanation. I still find confusing that in impala::PopulateQueryOptionLevels() the query option level for SUPPORT_START_OVER is explicitly set to REGULAR. Shouldn't it be ADVANCED? -- To view, visit http://gerrit.cloudera.org:8080/8447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1 Gerrit-Change-Number: 8447 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 03 Nov 2017 11:20:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6134: Update code base to use impala::ConditionVariable
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/8428 ) Change subject: IMPALA-6134: Update code base to use impala::ConditionVariable .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/8428/1/be/src/util/condition-variable.h File be/src/util/condition-variable.h: http://gerrit.cloudera.org:8080/#/c/8428/1/be/src/util/condition-variable.h@48 PS1, Line 48: const timespec* timeout > I feel like this should be a const&, since it has to be non-null. That woul OK, will do it. http://gerrit.cloudera.org:8080/#/c/8428/1/be/src/util/condition-variable.h@65 PS1, Line 65: bool TimedWait(boost::unique_lock& lock, > This looks like it only has one callsite. There's another callsite in Block Now that I looked into the callsites more carefully, I realized that there are two use cases for TimedWait. In Impala both of them are used. One use case is when we wait until a point in time: auto deadline = calculate_deadline(); while (! condition) { if (!cv.TimedWait(lock, deadline) { // deadline reached without notification return Status::ERROR; } } The other use case is waiting in a loop, and signaling some status periodically: while (! condition) { cv.TimedWait(lock, seconds(1)); if (condition) break; else SendReport(); } Both API can be expressed in terms of the other, however it is a bit awkward (expressing deadline in terms of duration is more awkward, so if we have to choose, I think we should go with supporting deadlines). There are examples for both in the code base: FragmentInstanceState::ReportProfileThread: wait duration in terms of wait until deadline. Promise::Get: wait until deadline in terms of wait duration. Now I am on the side of support both use case with the two APIs, like std::condition_variable has wait_for and wait_until member functions. Maybe we can use these names also to be more explicit. What do you think? -- To view, visit http://gerrit.cloudera.org:8080/8428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3085c6dcb42350b61244df6e7f091a1e7db356c9 Gerrit-Change-Number: 8428 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 03 Nov 2017 10:05:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Hello Sailesh Mukil, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8449 to look at the new patch set (#3). Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. IMPALA-3613: Avoid topic updates to unregistered subscriber instances Bug: Without this patch, when a subscriber repeatedly reconnects to the statestore, the latter queues the initial heartbeat message and a bunch of topic updates to every instance of the registered subscriber. These queued updates are eventually picked up by the heartbeating/topic update threads and the corresponding RPCs are made to the subscribers. The subscriber then rejects these updates since they were meant for an earlier registration. This is usually possible if the subscriber has some network problems leading to failing RPCs. Such a node is eventually marked by the statestore as bad, but depending on the configurations set, the issue can snowball into DDOS kind of attack when the entire thread pool of heartbeating/topic updates is filled with instances from the problematic host. This can result in the statestore missing timely heartbeats to other subscribers making them reconnect. This worsens the situation and the resulting topic updates for the reconnects will fully saturate the network on the statestore host, until the statestore daemon is restarted. Fix: This patch maps topic updates/heartbeats to a specific subscriber registered instance rather to a subscriber id (that stays same across reconnects). That way, when we encounter a topic update that was meant to a stale subscriber, we can simply reject it. Testing: Tested this locally by adding relevant logging. I made the subscribers to reconnect aggressively(a) and delaying heartbeats from the statestore side (b,c). (a) --statestore_subscriber_timeout_seconds=1 (b) --statestore_max_missed_heartbeats=1000 (c) --statestore_heartbeat_frequency_ms=6 Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 --- M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h 4 files changed, 78 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8449/3 -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-6144: UpdateFilter()/PublishFilter() continue to run after query failure/cancellation
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8455 ) Change subject: IMPALA-6144: UpdateFilter()/PublishFilter() continue to run after query failure/cancellation .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator-backend-state.cc@394 PS1, Line 394: if (!status_.ok()) return; same http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/8455/1/be/src/runtime/coordinator.cc@1082 PS1, Line 1082: if (!query_status_.ok()) return; lock_? -- To view, visit http://gerrit.cloudera.org:8080/8455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I400456ad85adb9c23d2d432d772311fa4dcff2ed Gerrit-Change-Number: 8455 Gerrit-PatchSet: 1 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Bharath Vissapragada Gerrit-Comment-Date: Fri, 03 Nov 2017 06:44:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Install OpenJDK-dbg for development environments.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8431 ) Change subject: Install OpenJDK-dbg for development environments. .. Install OpenJDK-dbg for development environments. Updates the boostrap script to install the OpenJDK debug symbols. OpenJDK comes with a gdb stack unwinder that errors out if the OpenJDK debugging symbols aren't installed. This fixes the following error message in gdb: Installing openjdk unwinder Traceback (most recent call last): File "/usr/share/gdb/auto-load/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so-gdb.py", line 52, in class Types(object): File "/usr/share/gdb/auto-load/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so-gdb.py", line 66, in Types nmethodp_t = gdb.lookup_type('nmethod').pointer() gdb.error: No type named nmethod. I tested this by installing the package manually on Ubuntu16.04 and using gdb a bit. Change-Id: I9aa3a5e1bbba55a4b0b489e4a4dfa39e35fc0ae0 Reviewed-on: http://gerrit.cloudera.org:8080/8431 Reviewed-by: Jim AppleTested-by: Impala Public Jenkins --- M bin/bootstrap_system.sh 1 file changed, 2 insertions(+), 1 deletion(-) Approvals: Jim Apple: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8431 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I9aa3a5e1bbba55a4b0b489e4a4dfa39e35fc0ae0 Gerrit-Change-Number: 8431 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] Install OpenJDK-dbg for development environments.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8431 ) Change subject: Install OpenJDK-dbg for development environments. .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8431 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9aa3a5e1bbba55a4b0b489e4a4dfa39e35fc0ae0 Gerrit-Change-Number: 8431 Gerrit-PatchSet: 2 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Fri, 03 Nov 2017 06:41:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore-subscriber.h File be/src/statestore/statestore-subscriber.h: http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore-subscriber.h@165 PS2, Line 165: typedef TUniqueId RegistrationId; > You can move this typedef to statestore.h and use the same type in statesto Done http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h File be/src/statestore/statestore.h: http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@383 PS2, Line 383: SubscriberId > Where ever the 'SubscriberId' is > required, we just get it from the Subscriber object anyway, and > that object can be retrieved using the unique registration id. Not sure I follow. If you see OfferUpdate()/DoSubscriberUpdate(), we only keep track of ScheduledSubscriberUpdate objects and get the corresponding Subscriber based on ScheduledSubscriberUpdate.subscriber_Id. So we don't have a Subscriber object handy to get the subscriber_id in all cases. Also, if we remove SubscriberId everywhere, this means that we need to change the subscribers_ map structure to map from RegistrationId -> Subscriber objects. Doing so we can't look up by subscriber_id, which is required in RegisterSubscriber() (At that point, we don't assign a RegistrationId yet to the new instance) SubscriberMap::iterator subscriber_it = subscribers_.find(subscriber_id); if (subscriber_it != subscribers_.end()) { UnregisterSubscriber(subscriber_it->second.get()); } We can still figure out a way, but it seemed unnecessarily complex to me. Thoughts? http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@481 PS2, Line 481: subscriber exists > Could you clarify what "exists" means here exactly? It could be confused wi Done http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@482 PS2, Line 482: registration_ids > nit: registration_id Done http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.h@484 PS2, Line 484: std::shared_ptr* subscriber > Add a comment about what is returned in this out parameter. clarified http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc@414 PS2, Line 414: onst SubscriberId& subscriber_id, : const TUniqueId& registration_id > It looks like it just makes sense to pass 'const Subscriber&' here? Is ther Not sure I understand. We get the subscriber/registration_id from the ScheduledSubscriberUpdate object and not the Subscriber object. Do you mean we should pass directly pass ScheduledSubscriberUpdate instead? If thats the case, the signature seems kinda weird :) http://gerrit.cloudera.org:8080/#/c/8449/2/be/src/statestore/statestore.cc@634 PS2, Line 634: if (!RegisteredSubscriberExists(subscriber_to_update.first, subscriber_to_update.second, > I'm a little worried that we're contending for a mutex two more times in th Good point. I too think (theoretically) spinlock is probably a better choice to avoid context switching. Also, ~1000 entries seem like a reasonable estimate for foreseeable future :). Also, I think for string based hashing of ~1000 entries, it is reasonable to assume a O(1) average case lookup (even though the worst case is O(N)). I created a microbenchmark to see if the lock type makes a difference. In the benchmark, I measured how long it takes to get 100 heartbeats for a given subscriber (with heart beating thread pool sizes of 10/100). I didn't see any noticeable difference for 100 subscribers, but beyond that, the test runs into flaky socket connection issues. I admit that this is not representative of the real world use case because in real clusters, the statestore CPU would be much busier and the context switching could be more expensive. -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 2 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Fri, 03 Nov 2017 06:22:18 + Gerrit-HasComments: Yes