[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue
Henry Robinson has submitted this change and it was merged. Change subject: IMPALA-5773: Correctly account for memory used in data stream receiver queue .. IMPALA-5773: Correctly account for memory used in data stream receiver queue DataStreamRecvrs keep one or more queues of batches received to provide some buffering. Each queue has a fixed byte size capacity. The estimate of the contribution of a new RowBatch to that queue was using the compressed size of the TRowBatch it would be deserialized from, which is the wrong value (since the batch is uncompressed after deserialization). * Add RowBatch::Get[Des|S]erializedSize(const TRowBatch&) to RowBatch * Fix the estimate to use the uncompressed size. * Add a DataStreamReceiver child profile to the exchg node so that the peak memory used by the receiver can be monitored easily. Confirmed that the following query: select count(distinct concat(cast(l_comment as char(120)), cast(l_comment as char(120)), cast(l_comment as char(120)), cast(l_comment as char(120)), cast(l_comment as char(120)), cast(l_comment as char(120))) from lineitem; succeeds with a mem-limit of 800Mb. Before this patch it would fail in a one-node cluster as the datastream recvr would buffer more batches than the memory limit would allow. Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a Reviewed-on: http://gerrit.cloudera.org:8080/7646 Tested-by: Impala Public Jenkins Reviewed-by: Henry Robinson --- M be/src/benchmarks/row-batch-serialize-benchmark.cc M be/src/exec/exchange-node.cc M be/src/runtime/data-stream-mgr-base.h M be/src/runtime/data-stream-mgr.cc M be/src/runtime/data-stream-mgr.h M be/src/runtime/data-stream-recvr.cc M be/src/runtime/data-stream-recvr.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/krpc-data-stream-mgr.cc M be/src/runtime/krpc-data-stream-mgr.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h 12 files changed, 54 insertions(+), 48 deletions(-) Approvals: Impala Public Jenkins: Verified Henry Robinson: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue
Henry Robinson has posted comments on this change. Change subject: IMPALA-5773: Correctly account for memory used in data stream receiver queue .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5773: Correctly account for memory used in data stream receiver queue .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5773: Correctly account for memory used in data stream receiver queue .. Patch Set 4: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1036/ -- To view, visit http://gerrit.cloudera.org:8080/7646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5773: Correctly account for memory used in data stream receiver queue .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1037/ -- To view, visit http://gerrit.cloudera.org:8080/7646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue
Jim Apple has posted comments on this change. Change subject: IMPALA-5773: Correctly account for memory used in data stream receiver queue .. Patch Set 4: > Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1036/ https://jenkins.impala.io/view/Utility/job/ubuntu-16.04-from-scratch/43/ Failed. I think this might need a rebase. -- To view, visit http://gerrit.cloudera.org:8080/7646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5773: Correctly account for memory used in data stream receiver queue .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1036/ -- To view, visit http://gerrit.cloudera.org:8080/7646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue
Michael Ho has posted comments on this change. Change subject: IMPALA-5773: Correctly account for memory used in data stream receiver queue .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue
Henry Robinson has posted comments on this change. Change subject: IMPALA-5773: Correctly account for memory used in data stream receiver queue .. Patch Set 3: (2 comments) I didn't notice any significant performance regressions when I ran TPCDS on a 16-node cluster. But this is an obvious correctness issue; we'll do more perf analysis shortly. http://gerrit.cloudera.org:8080/#/c/7646/3/be/src/runtime/data-stream-mgr.cc File be/src/runtime/data-stream-mgr.cc: PS3, Line 80: int buffer_size > For consistency, should this be int64_t too ? Done http://gerrit.cloudera.org:8080/#/c/7646/3/be/src/runtime/data-stream-recvr.cc File be/src/runtime/data-stream-recvr.cc: PS3, Line 154: int64_t batch_size = RowBatch::GetDeserializedSize(thrift_batch); > nit: move this closer to l165 where it's used Done -- To view, visit http://gerrit.cloudera.org:8080/7646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue
Hello Matthew Jacobs, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7646 to look at the new patch set (#4). Change subject: IMPALA-5773: Correctly account for memory used in data stream receiver queue .. IMPALA-5773: Correctly account for memory used in data stream receiver queue DataStreamRecvrs keep one or more queues of batches received to provide some buffering. Each queue has a fixed byte size capacity. The estimate of the contribution of a new RowBatch to that queue was using the compressed size of the TRowBatch it would be deserialized from, which is the wrong value (since the batch is uncompressed after deserialization). * Add RowBatch::Get[Des|S]erializedSize(const TRowBatch&) to RowBatch * Fix the estimate to use the uncompressed size. * Add a DataStreamReceiver child profile to the exchg node so that the peak memory used by the receiver can be monitored easily. Confirmed that the following query: select count(distinct concat(cast(l_comment as char(120)), cast(l_comment as char(120)), cast(l_comment as char(120)), cast(l_comment as char(120)), cast(l_comment as char(120)), cast(l_comment as char(120))) from lineitem; succeeds with a mem-limit of 800Mb. Before this patch it would fail in a one-node cluster as the datastream recvr would buffer more batches than the memory limit would allow. Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a --- M be/src/benchmarks/row-batch-serialize-benchmark.cc M be/src/exec/exchange-node.cc M be/src/runtime/data-stream-mgr-base.h M be/src/runtime/data-stream-mgr.cc M be/src/runtime/data-stream-mgr.h M be/src/runtime/data-stream-recvr.cc M be/src/runtime/data-stream-recvr.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/krpc-data-stream-mgr.cc M be/src/runtime/krpc-data-stream-mgr.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h 12 files changed, 54 insertions(+), 48 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/7646/4 -- To view, visit http://gerrit.cloudera.org:8080/7646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5773: Correctly account for memory used in data stream receiver queue .. Patch Set 3: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/7646/3/be/src/runtime/data-stream-recvr.cc File be/src/runtime/data-stream-recvr.cc: PS3, Line 154: int64_t batch_size = RowBatch::GetDeserializedSize(thrift_batch); nit: move this closer to l165 where it's used -- To view, visit http://gerrit.cloudera.org:8080/7646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue
Michael Ho has posted comments on this change. Change subject: IMPALA-5773: Correctly account for memory used in data stream receiver queue .. Patch Set 3: (1 comment) Is there any regression in performance ? http://gerrit.cloudera.org:8080/#/c/7646/3/be/src/runtime/data-stream-mgr.cc File be/src/runtime/data-stream-mgr.cc: PS3, Line 80: int buffer_size For consistency, should this be int64_t too ? -- To view, visit http://gerrit.cloudera.org:8080/7646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5773: Correctly account for memory used in data stream receiver queue .. Patch Set 3: Code-Review+1 Looks good to me, I'll just give others a chance to take a look. -- To view, visit http://gerrit.cloudera.org:8080/7646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue
Henry Robinson has posted comments on this change. Change subject: IMPALA-5773: Correctly account for memory used in data stream receiver queue .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7646/2/be/src/runtime/row-batch.cc File be/src/runtime/row-batch.cc: PS2, Line 344: int > Needs to be an int64_t now Done PS2, Line 351: int > Needs to be an int64_t now Done -- To view, visit http://gerrit.cloudera.org:8080/7646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue
Henry Robinson has uploaded a new patch set (#3). Change subject: IMPALA-5773: Correctly account for memory used in data stream receiver queue .. IMPALA-5773: Correctly account for memory used in data stream receiver queue DataStreamRecvrs keep one or more queues of batches received to provide some buffering. Each queue has a fixed byte size capacity. The estimate of the contribution of a new RowBatch to that queue was using the compressed size of the TRowBatch it would be deserialized from, which is the wrong value (since the batch is uncompressed after deserialization). * Add RowBatch::Get[Des|S]erializedSize(const TRowBatch&) to RowBatch * Fix the estimate to use the uncompressed size. * Add a DataStreamReceiver child profile to the exchg node so that the peak memory used by the receiver can be monitored easily. Confirmed that the following query: select count(distinct concat(cast(l_comment as char(120)), cast(l_comment as char(120)), cast(l_comment as char(120)), cast(l_comment as char(120)), cast(l_comment as char(120)), cast(l_comment as char(120))) from lineitem; succeeds with a mem-limit of 800Mb. Before this patch it would fail in a one-node cluster as the datastream recvr would buffer more batches than the memory limit would allow. Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a --- M be/src/benchmarks/row-batch-serialize-benchmark.cc M be/src/runtime/data-stream-mgr.cc M be/src/runtime/data-stream-recvr.cc M be/src/runtime/data-stream-recvr.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h 7 files changed, 39 insertions(+), 32 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/7646/3 -- To view, visit http://gerrit.cloudera.org:8080/7646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5773: Correctly account for memory used in data stream receiver queue .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/7646/2/be/src/runtime/row-batch.cc File be/src/runtime/row-batch.cc: PS2, Line 344: int Needs to be an int64_t now PS2, Line 351: int Needs to be an int64_t now http://gerrit.cloudera.org:8080/#/c/7646/1/be/src/runtime/row-batch.h File be/src/runtime/row-batch.h: Line 304: /// deserialized form. If a row batch is compressed, its serialized size can be much > Done - not all serialized batches are compressed, so need to be a bit caref Good point. I think the new comment is helpful. -- To view, visit http://gerrit.cloudera.org:8080/7646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue
Henry Robinson has uploaded a new patch set (#2). Change subject: IMPALA-5773: Correctly account for memory used in data stream receiver queue .. IMPALA-5773: Correctly account for memory used in data stream receiver queue DataStreamRecvrs keep one or more queues of batches received to provide some buffering. Each queue has a fixed byte size capacity. The estimate of the contribution of a new RowBatch to that queue was using the compressed size of the TRowBatch it would be deserialized from, which is the wrong value (since the batch is uncompressed after deserialization). * Add RowBatch::Get[Des|S]erializedSize(const TRowBatch&) to RowBatch * Fix the estimate to use the uncompressed size. * Add a DataStreamReceiver child profile to the exchg node so that the peak memory used by the receiver can be monitored easily. Confirmed that the following query: select count(distinct concat(cast(l_comment as char(120)), cast(l_comment as char(120)), cast(l_comment as char(120)), cast(l_comment as char(120)), cast(l_comment as char(120)), cast(l_comment as char(120))) from lineitem; succeeds with a mem-limit of 800Mb. Before this patch it would fail in a one-node cluster as the datastream recvr would buffer more batches than the memory limit would allow. Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a --- M be/src/benchmarks/row-batch-serialize-benchmark.cc M be/src/runtime/data-stream-mgr.cc M be/src/runtime/data-stream-recvr.cc M be/src/runtime/data-stream-recvr.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h 7 files changed, 38 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/7646/2 -- To view, visit http://gerrit.cloudera.org:8080/7646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue
Henry Robinson has posted comments on this change. Change subject: IMPALA-5773: Correctly account for memory used in data stream receiver queue .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/7646/1/be/src/runtime/row-batch.cc File be/src/runtime/row-batch.cc: PS1, Line 343: int > Would it make sense to change to int64_t to avoid potential overflows? Not Done http://gerrit.cloudera.org:8080/#/c/7646/1/be/src/runtime/row-batch.h File be/src/runtime/row-batch.h: Line 304: /// deserialized form. > Maybe mention compression explicitly. E.g. serialized (i.e. compressed) or Done - not all serialized batches are compressed, so need to be a bit careful there, but have tried to word it clearly. -- To view, visit http://gerrit.cloudera.org:8080/7646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5773: Correctly account for memory used in data stream receiver queue .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/7646/1/be/src/runtime/row-batch.cc File be/src/runtime/row-batch.cc: PS1, Line 343: int Would it make sense to change to int64_t to avoid potential overflows? Not sure if that causes problems at any callsites. http://gerrit.cloudera.org:8080/#/c/7646/1/be/src/runtime/row-batch.h File be/src/runtime/row-batch.h: Line 304: /// deserialized form. Maybe mention compression explicitly. E.g. serialized (i.e. compressed) or deserialized (i.e. uncompressed). -- To view, visit http://gerrit.cloudera.org:8080/7646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue
Henry Robinson has uploaded a new change for review. http://gerrit.cloudera.org:8080/7646 Change subject: IMPALA-5773: Correctly account for memory used in data stream receiver queue .. IMPALA-5773: Correctly account for memory used in data stream receiver queue DataStreamRecvrs keep one or more queues of batches received to provide some buffering. Each queue has a fixed byte size capacity. The estimate of the contribution of a new RowBatch to that queue was using the compressed size of the TRowBatch it would be deserialized from, which is the wrong value (since the batch is uncompressed after deserialization). * Add RowBatch::Get[Des|S]erializedSize(const TRowBatch&) to RowBatch * Fix the estimate to use the uncompressed size. * Add a DataStreamReceiver child profile to the exchg node so that the peak memory used by the receiver can be monitored easily. Confirmed that the following query: select count(distinct concat(cast(l_comment as char(120)), cast(l_comment as char(120)), cast(l_comment as char(120)), cast(l_comment as char(120)), cast(l_comment as char(120)), cast(l_comment as char(120))) from lineitem; succeeds with a mem-limit of 800Mb. Before this patch it would fail in a one-node cluster as the datastream recvr would buffer more batches than the memory limit would allow. Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a --- M be/src/benchmarks/row-batch-serialize-benchmark.cc M be/src/runtime/data-stream-mgr.cc M be/src/runtime/data-stream-recvr.cc M be/src/runtime/data-stream-sender.cc M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h 6 files changed, 31 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/7646/1 -- To view, visit http://gerrit.cloudera.org:8080/7646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson