[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue

2017-08-11 Thread Henry Robinson (Code Review)
Henry Robinson has submitted this change and it was merged.

Change subject: IMPALA-5773: Correctly account for memory used in data stream 
receiver queue
..


IMPALA-5773: Correctly account for memory used in data stream receiver queue

DataStreamRecvrs keep one or more queues of batches received to provide
some buffering. Each queue has a fixed byte size capacity. The estimate
of the contribution of a new RowBatch to that queue was using the
compressed size of the TRowBatch it would be deserialized from, which is
the wrong value (since the batch is uncompressed after deserialization).

* Add RowBatch::Get[Des|S]erializedSize(const TRowBatch&) to RowBatch
* Fix the estimate to use the uncompressed size.
* Add a DataStreamReceiver child profile to the exchg node so that the
  peak memory used by the receiver can be monitored easily.

Confirmed that the following query:

select count(distinct concat(cast(l_comment as char(120)),
 cast(l_comment as char(120)),
 cast(l_comment as char(120)),
 cast(l_comment as char(120)),
 cast(l_comment as char(120)),
 cast(l_comment as char(120))) from lineitem;

succeeds with a mem-limit of 800Mb. Before this patch it would fail in a
one-node cluster as the datastream recvr would buffer more batches than
the memory limit would allow.

Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
Reviewed-on: http://gerrit.cloudera.org:8080/7646
Tested-by: Impala Public Jenkins
Reviewed-by: Henry Robinson 
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/exec/exchange-node.cc
M be/src/runtime/data-stream-mgr-base.h
M be/src/runtime/data-stream-mgr.cc
M be/src/runtime/data-stream-mgr.h
M be/src/runtime/data-stream-recvr.cc
M be/src/runtime/data-stream-recvr.h
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/krpc-data-stream-mgr.cc
M be/src/runtime/krpc-data-stream-mgr.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
12 files changed, 54 insertions(+), 48 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Henry Robinson: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue

2017-08-11 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5773: Correctly account for memory used in data stream 
receiver queue
..


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue

2017-08-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5773: Correctly account for memory used in data stream 
receiver queue
..


Patch Set 5: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue

2017-08-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5773: Correctly account for memory used in data stream 
receiver queue
..


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1036/

-- 
To view, visit http://gerrit.cloudera.org:8080/7646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue

2017-08-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5773: Correctly account for memory used in data stream 
receiver queue
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1037/

-- 
To view, visit http://gerrit.cloudera.org:8080/7646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue

2017-08-11 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-5773: Correctly account for memory used in data stream 
receiver queue
..


Patch Set 4:

> Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1036/

https://jenkins.impala.io/view/Utility/job/ubuntu-16.04-from-scratch/43/ 
Failed. I think this might need a rebase.

-- 
To view, visit http://gerrit.cloudera.org:8080/7646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue

2017-08-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5773: Correctly account for memory used in data stream 
receiver queue
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1036/

-- 
To view, visit http://gerrit.cloudera.org:8080/7646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue

2017-08-11 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-5773: Correctly account for memory used in data stream 
receiver queue
..


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue

2017-08-11 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5773: Correctly account for memory used in data stream 
receiver queue
..


Patch Set 3:

(2 comments)

I didn't notice any significant performance regressions when I ran TPCDS on a 
16-node cluster. But this is an obvious correctness issue; we'll do more perf 
analysis shortly.

http://gerrit.cloudera.org:8080/#/c/7646/3/be/src/runtime/data-stream-mgr.cc
File be/src/runtime/data-stream-mgr.cc:

PS3, Line 80:  int buffer_size
> For consistency, should this be int64_t too ?
Done


http://gerrit.cloudera.org:8080/#/c/7646/3/be/src/runtime/data-stream-recvr.cc
File be/src/runtime/data-stream-recvr.cc:

PS3, Line 154: int64_t batch_size = RowBatch::GetDeserializedSize(thrift_batch);
> nit: move this closer to l165 where it's used
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue

2017-08-11 Thread Henry Robinson (Code Review)
Hello Matthew Jacobs, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7646

to look at the new patch set (#4).

Change subject: IMPALA-5773: Correctly account for memory used in data stream 
receiver queue
..

IMPALA-5773: Correctly account for memory used in data stream receiver queue

DataStreamRecvrs keep one or more queues of batches received to provide
some buffering. Each queue has a fixed byte size capacity. The estimate
of the contribution of a new RowBatch to that queue was using the
compressed size of the TRowBatch it would be deserialized from, which is
the wrong value (since the batch is uncompressed after deserialization).

* Add RowBatch::Get[Des|S]erializedSize(const TRowBatch&) to RowBatch
* Fix the estimate to use the uncompressed size.
* Add a DataStreamReceiver child profile to the exchg node so that the
  peak memory used by the receiver can be monitored easily.

Confirmed that the following query:

select count(distinct concat(cast(l_comment as char(120)),
 cast(l_comment as char(120)),
 cast(l_comment as char(120)),
 cast(l_comment as char(120)),
 cast(l_comment as char(120)),
 cast(l_comment as char(120))) from lineitem;

succeeds with a mem-limit of 800Mb. Before this patch it would fail in a
one-node cluster as the datastream recvr would buffer more batches than
the memory limit would allow.

Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/exec/exchange-node.cc
M be/src/runtime/data-stream-mgr-base.h
M be/src/runtime/data-stream-mgr.cc
M be/src/runtime/data-stream-mgr.h
M be/src/runtime/data-stream-recvr.cc
M be/src/runtime/data-stream-recvr.h
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/krpc-data-stream-mgr.cc
M be/src/runtime/krpc-data-stream-mgr.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
12 files changed, 54 insertions(+), 48 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/7646/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue

2017-08-11 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5773: Correctly account for memory used in data stream 
receiver queue
..


Patch Set 3: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7646/3/be/src/runtime/data-stream-recvr.cc
File be/src/runtime/data-stream-recvr.cc:

PS3, Line 154: int64_t batch_size = RowBatch::GetDeserializedSize(thrift_batch);
nit: move this closer to l165 where it's used


-- 
To view, visit http://gerrit.cloudera.org:8080/7646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue

2017-08-11 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-5773: Correctly account for memory used in data stream 
receiver queue
..


Patch Set 3:

(1 comment)

Is there any regression in performance ?

http://gerrit.cloudera.org:8080/#/c/7646/3/be/src/runtime/data-stream-mgr.cc
File be/src/runtime/data-stream-mgr.cc:

PS3, Line 80:  int buffer_size
For consistency, should this be int64_t too ?


-- 
To view, visit http://gerrit.cloudera.org:8080/7646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue

2017-08-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5773: Correctly account for memory used in data stream 
receiver queue
..


Patch Set 3: Code-Review+1

Looks good to me, I'll just give others a chance to take a look.

-- 
To view, visit http://gerrit.cloudera.org:8080/7646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue

2017-08-11 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5773: Correctly account for memory used in data stream 
receiver queue
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7646/2/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

PS2, Line 344: int
> Needs to be an int64_t now
Done


PS2, Line 351: int
> Needs to be an int64_t now
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue

2017-08-11 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new patch set (#3).

Change subject: IMPALA-5773: Correctly account for memory used in data stream 
receiver queue
..

IMPALA-5773: Correctly account for memory used in data stream receiver queue

DataStreamRecvrs keep one or more queues of batches received to provide
some buffering. Each queue has a fixed byte size capacity. The estimate
of the contribution of a new RowBatch to that queue was using the
compressed size of the TRowBatch it would be deserialized from, which is
the wrong value (since the batch is uncompressed after deserialization).

* Add RowBatch::Get[Des|S]erializedSize(const TRowBatch&) to RowBatch
* Fix the estimate to use the uncompressed size.
* Add a DataStreamReceiver child profile to the exchg node so that the
  peak memory used by the receiver can be monitored easily.

Confirmed that the following query:

select count(distinct concat(cast(l_comment as char(120)),
 cast(l_comment as char(120)),
 cast(l_comment as char(120)),
 cast(l_comment as char(120)),
 cast(l_comment as char(120)),
 cast(l_comment as char(120))) from lineitem;

succeeds with a mem-limit of 800Mb. Before this patch it would fail in a
one-node cluster as the datastream recvr would buffer more batches than
the memory limit would allow.

Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/data-stream-mgr.cc
M be/src/runtime/data-stream-recvr.cc
M be/src/runtime/data-stream-recvr.h
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
7 files changed, 39 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/7646/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue

2017-08-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5773: Correctly account for memory used in data stream 
receiver queue
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7646/2/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

PS2, Line 344: int
Needs to be an int64_t now


PS2, Line 351: int
Needs to be an int64_t now


http://gerrit.cloudera.org:8080/#/c/7646/1/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

Line 304:   /// deserialized form. If a row batch is compressed, its serialized 
size can be much
> Done - not all serialized batches are compressed, so need to be a bit caref
Good point. I think the new comment is helpful.


-- 
To view, visit http://gerrit.cloudera.org:8080/7646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue

2017-08-10 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new patch set (#2).

Change subject: IMPALA-5773: Correctly account for memory used in data stream 
receiver queue
..

IMPALA-5773: Correctly account for memory used in data stream receiver queue

DataStreamRecvrs keep one or more queues of batches received to provide
some buffering. Each queue has a fixed byte size capacity. The estimate
of the contribution of a new RowBatch to that queue was using the
compressed size of the TRowBatch it would be deserialized from, which is
the wrong value (since the batch is uncompressed after deserialization).

* Add RowBatch::Get[Des|S]erializedSize(const TRowBatch&) to RowBatch
* Fix the estimate to use the uncompressed size.
* Add a DataStreamReceiver child profile to the exchg node so that the
  peak memory used by the receiver can be monitored easily.

Confirmed that the following query:

select count(distinct concat(cast(l_comment as char(120)),
 cast(l_comment as char(120)),
 cast(l_comment as char(120)),
 cast(l_comment as char(120)),
 cast(l_comment as char(120)),
 cast(l_comment as char(120))) from lineitem;

succeeds with a mem-limit of 800Mb. Before this patch it would fail in a
one-node cluster as the datastream recvr would buffer more batches than
the memory limit would allow.

Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/data-stream-mgr.cc
M be/src/runtime/data-stream-recvr.cc
M be/src/runtime/data-stream-recvr.h
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
7 files changed, 38 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/7646/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue

2017-08-10 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5773: Correctly account for memory used in data stream 
receiver queue
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7646/1/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

PS1, Line 343: int
> Would it make sense to change to int64_t to avoid potential overflows? Not 
Done


http://gerrit.cloudera.org:8080/#/c/7646/1/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

Line 304:   /// deserialized form.
> Maybe mention compression explicitly. E.g. serialized (i.e. compressed) or 
Done - not all serialized batches are compressed, so need to be a bit careful 
there, but have tried to word it clearly.


-- 
To view, visit http://gerrit.cloudera.org:8080/7646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue

2017-08-10 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5773: Correctly account for memory used in data stream 
receiver queue
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7646/1/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

PS1, Line 343: int
Would it make sense to change to int64_t to avoid potential overflows? Not sure 
if that causes problems at any callsites.


http://gerrit.cloudera.org:8080/#/c/7646/1/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

Line 304:   /// deserialized form.
Maybe mention compression explicitly. E.g. serialized (i.e. compressed) or 
deserialized (i.e. uncompressed).


-- 
To view, visit http://gerrit.cloudera.org:8080/7646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5773: Correctly account for memory used in data stream receiver queue

2017-08-10 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7646

Change subject: IMPALA-5773: Correctly account for memory used in data stream 
receiver queue
..

IMPALA-5773: Correctly account for memory used in data stream receiver queue

DataStreamRecvrs keep one or more queues of batches received to provide
some buffering. Each queue has a fixed byte size capacity. The estimate
of the contribution of a new RowBatch to that queue was using the
compressed size of the TRowBatch it would be deserialized from, which is
the wrong value (since the batch is uncompressed after deserialization).

* Add RowBatch::Get[Des|S]erializedSize(const TRowBatch&) to RowBatch
* Fix the estimate to use the uncompressed size.
* Add a DataStreamReceiver child profile to the exchg node so that the
  peak memory used by the receiver can be monitored easily.

Confirmed that the following query:

select count(distinct concat(cast(l_comment as char(120)),
 cast(l_comment as char(120)),
 cast(l_comment as char(120)),
 cast(l_comment as char(120)),
 cast(l_comment as char(120)),
 cast(l_comment as char(120))) from lineitem;

succeeds with a mem-limit of 800Mb. Before this patch it would fail in a
one-node cluster as the datastream recvr would buffer more batches than
the memory limit would allow.

Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/data-stream-mgr.cc
M be/src/runtime/data-stream-recvr.cc
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
6 files changed, 31 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/7646/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7646
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I9e90f9596ee984438e3373af05e84d361702ca6a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson