[Impala-ASF-CR] IMPALA-4591: Bound Kudu client error mem usage

2017-11-08 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8464 )

Change subject: IMPALA-4591: Bound Kudu client error mem usage
..


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h
File be/src/exec/kudu-table-sink.h:

http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h@105
PS1, Line 105: consumed
> nit: "consumed" to be consistent with the memtracker terminology.
Done


http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h@106
PS1, Line 106: client_tracked
> Maybe client_tracked_bytes_ to make it clearer that the unit is bytes and i
Done


http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc
File be/src/exec/kudu-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@a52
PS1, Line 52:
> Was this flag documented? Just wondering if we should consider what happens
No, its not documented, and of course it specifically says that it may be 
changed/removed.

Certainly happy to consider other options if you think removing the flag is too 
disruptive, eg. the error buffer size could be calculated as the difference 
between "kudu_sink_mem_required" and "kudu_mutation_buffer_size", that just 
seemed a little complicated.


http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@39
PS1, Line 39: DEFINE_int32(kudu_mutation_buffer_size, 
DEFAULT_KUDU_MUTATION_BUFFER_SIZE,
> Just throwing out ideas here, but did we think about pros/cons of making th
I think that's a reasonable idea, though like you say these probably don't need 
to be modified very often.

At the least, I think it makes sense to get this in and file a JIRA for 
followup.


http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@124
PS1, Line 124:   int64_t required_mem = FLAGS_kudu_mutation_buffer_size + 
error_buffer_size;
> Is this equivalent to the following?
Done


http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@132
PS1, Line 132:   
state->exec_env()->GetKuduClient(table_desc_->kudu_master_addresses(), 
_));
> nit: long line.
Done


http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py@66
PS1, Line 66:   
@CustomClusterTestSuite.with_args(impalad_args="-kudu_error_buffer_size=1024")
> It might be faster to make this a regular query test but insert more data s
It takes a very large number of errors to hit the default limit (>10m "Key 
already present" errors), so I don't think it ends up being any faster.


http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py@74
PS1, Line 74: presen
> present
Done



--
To view, visit http://gerrit.cloudera.org:8080/8464
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14
Gerrit-Change-Number: 8464
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 08 Nov 2017 19:40:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4591: Bound Kudu client error mem usage

2017-11-08 Thread Thomas Tauber-Marshall (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8464

to look at the new patch set (#2).

Change subject: IMPALA-4591: Bound Kudu client error mem usage
..

IMPALA-4591: Bound Kudu client error mem usage

Previously, Kudu client errors could grow in size unbounded,
potentially causing the process to be killed. This patch sets a
bound on the mem that can be used for these error messages, with
the size determined by the flag 'kudu_error_buffer_size'.

If the errors for a Kudu client exceed this size, the query will fail,
as some errors will be dropped and we won't be able to tell if all of
the errors can be safely ignored.

Testing:
- Added a custom cluster test that verifies that a query that exceeds
  the limit fails.

Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14
---
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M tests/custom_cluster/test_kudu.py
3 files changed, 41 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/8464/2
--
To view, visit http://gerrit.cloudera.org:8080/8464
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14
Gerrit-Change-Number: 8464
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4591: Bound Kudu client error mem usage

2017-11-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8464 )

Change subject: IMPALA-4591: Bound Kudu client error mem usage
..


Patch Set 1:

(8 comments)

This is a nice improvement. Had some questions/ideas but as-is this seems 
strictly better than before.

http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h
File be/src/exec/kudu-table-sink.h:

http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h@105
PS1, Line 105: allocated
nit: "consumed" to be consistent with the memtracker terminology.


http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h@106
PS1, Line 106: allocated_mem_
Maybe client_tracked_bytes_ to make it clearer that the unit is bytes and it's 
tracked for accounting purposes.


http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc
File be/src/exec/kudu-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@a52
PS1, Line 52:
Was this flag documented? Just wondering if we should consider what happens if 
someone set this manually.


http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@39
PS1, Line 39: DEFINE_int32(kudu_mutation_buffer_size, 
DEFAULT_KUDU_MUTATION_BUFFER_SIZE,
Just throwing out ideas here, but did we think about pros/cons of making these 
query options? I guess mostly these defaults should be fine but sometimes it 
turns out to be inconvenient to only have a global setting (and to have to 
restart Impala for it to take effect.)


http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@124
PS1, Line 124:   FLAGS_kudu_error_buffer_size > 1024 ? 
FLAGS_kudu_error_buffer_size : 1024;
Is this equivalent to the following?

  max(1024, FLAGS_kudu_error_buffer_size)


http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@132
PS1, Line 132:   
RETURN_IF_ERROR(state->exec_env()->GetKuduClient(table_desc_->kudu_master_addresses(),
 _));
nit: long line.


http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py@66
PS1, Line 66:   
@CustomClusterTestSuite.with_args(impalad_args="-kudu_error_buffer_size=1024")
It might be faster to make this a regular query test but insert more data so 
that it exceeds the default error buffer limit. Starting up a new cluster takes 
a long time (to be honest, this is fine though).


http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py@74
PS1, Line 74: prsent
present



--
To view, visit http://gerrit.cloudera.org:8080/8464
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14
Gerrit-Change-Number: 8464
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 03 Nov 2017 23:41:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4591: Bound Kudu client error mem usage

2017-11-03 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8464


Change subject: IMPALA-4591: Bound Kudu client error mem usage
..

IMPALA-4591: Bound Kudu client error mem usage

Previously, Kudu client errors could grow in size unbounded,
potentially causing the process to be killed. This patch sets a
bound on the mem that can be used for these error messages, with
the size determined by the flag 'kudu_error_buffer_size'.

If the errors for a Kudu client exceed this size, the query will fail,
as some errors will be dropped and we won't be able to tell if all of
the errors can be safely ignored.

Testing:
- Added a custom cluster test that verifies that a query that exceeds
  the limit fails.

Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14
---
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M tests/custom_cluster/test_kudu.py
3 files changed, 39 insertions(+), 26 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/8464/1
--
To view, visit http://gerrit.cloudera.org:8080/8464
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14
Gerrit-Change-Number: 8464
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall