[Impala-ASF-CR] IMPALA-4591: Bound Kudu client error mem usage
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8464 ) Change subject: IMPALA-4591: Bound Kudu client error mem usage .. Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h File be/src/exec/kudu-table-sink.h: http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h@105 PS1, Line 105: consumed > nit: "consumed" to be consistent with the memtracker terminology. Done http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h@106 PS1, Line 106: client_tracked > Maybe client_tracked_bytes_ to make it clearer that the unit is bytes and i Done http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc File be/src/exec/kudu-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@a52 PS1, Line 52: > Was this flag documented? Just wondering if we should consider what happens No, its not documented, and of course it specifically says that it may be changed/removed. Certainly happy to consider other options if you think removing the flag is too disruptive, eg. the error buffer size could be calculated as the difference between "kudu_sink_mem_required" and "kudu_mutation_buffer_size", that just seemed a little complicated. http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@39 PS1, Line 39: DEFINE_int32(kudu_mutation_buffer_size, DEFAULT_KUDU_MUTATION_BUFFER_SIZE, > Just throwing out ideas here, but did we think about pros/cons of making th I think that's a reasonable idea, though like you say these probably don't need to be modified very often. At the least, I think it makes sense to get this in and file a JIRA for followup. http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@124 PS1, Line 124: int64_t required_mem = FLAGS_kudu_mutation_buffer_size + error_buffer_size; > Is this equivalent to the following? Done http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@132 PS1, Line 132: state->exec_env()->GetKuduClient(table_desc_->kudu_master_addresses(), _)); > nit: long line. Done http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py File tests/custom_cluster/test_kudu.py: http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py@66 PS1, Line 66: @CustomClusterTestSuite.with_args(impalad_args="-kudu_error_buffer_size=1024") > It might be faster to make this a regular query test but insert more data s It takes a very large number of errors to hit the default limit (>10m "Key already present" errors), so I don't think it ends up being any faster. http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py@74 PS1, Line 74: presen > present Done -- To view, visit http://gerrit.cloudera.org:8080/8464 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14 Gerrit-Change-Number: 8464 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 08 Nov 2017 19:40:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4591: Bound Kudu client error mem usage
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8464 to look at the new patch set (#2). Change subject: IMPALA-4591: Bound Kudu client error mem usage .. IMPALA-4591: Bound Kudu client error mem usage Previously, Kudu client errors could grow in size unbounded, potentially causing the process to be killed. This patch sets a bound on the mem that can be used for these error messages, with the size determined by the flag 'kudu_error_buffer_size'. If the errors for a Kudu client exceed this size, the query will fail, as some errors will be dropped and we won't be able to tell if all of the errors can be safely ignored. Testing: - Added a custom cluster test that verifies that a query that exceeds the limit fails. Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14 --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h M tests/custom_cluster/test_kudu.py 3 files changed, 41 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/8464/2 -- To view, visit http://gerrit.cloudera.org:8080/8464 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14 Gerrit-Change-Number: 8464 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4591: Bound Kudu client error mem usage
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8464 ) Change subject: IMPALA-4591: Bound Kudu client error mem usage .. Patch Set 1: (8 comments) This is a nice improvement. Had some questions/ideas but as-is this seems strictly better than before. http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h File be/src/exec/kudu-table-sink.h: http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h@105 PS1, Line 105: allocated nit: "consumed" to be consistent with the memtracker terminology. http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.h@106 PS1, Line 106: allocated_mem_ Maybe client_tracked_bytes_ to make it clearer that the unit is bytes and it's tracked for accounting purposes. http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc File be/src/exec/kudu-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@a52 PS1, Line 52: Was this flag documented? Just wondering if we should consider what happens if someone set this manually. http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@39 PS1, Line 39: DEFINE_int32(kudu_mutation_buffer_size, DEFAULT_KUDU_MUTATION_BUFFER_SIZE, Just throwing out ideas here, but did we think about pros/cons of making these query options? I guess mostly these defaults should be fine but sometimes it turns out to be inconvenient to only have a global setting (and to have to restart Impala for it to take effect.) http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@124 PS1, Line 124: FLAGS_kudu_error_buffer_size > 1024 ? FLAGS_kudu_error_buffer_size : 1024; Is this equivalent to the following? max(1024, FLAGS_kudu_error_buffer_size) http://gerrit.cloudera.org:8080/#/c/8464/1/be/src/exec/kudu-table-sink.cc@132 PS1, Line 132: RETURN_IF_ERROR(state->exec_env()->GetKuduClient(table_desc_->kudu_master_addresses(), _)); nit: long line. http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py File tests/custom_cluster/test_kudu.py: http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py@66 PS1, Line 66: @CustomClusterTestSuite.with_args(impalad_args="-kudu_error_buffer_size=1024") It might be faster to make this a regular query test but insert more data so that it exceeds the default error buffer limit. Starting up a new cluster takes a long time (to be honest, this is fine though). http://gerrit.cloudera.org:8080/#/c/8464/1/tests/custom_cluster/test_kudu.py@74 PS1, Line 74: prsent present -- To view, visit http://gerrit.cloudera.org:8080/8464 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14 Gerrit-Change-Number: 8464 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 23:41:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4591: Bound Kudu client error mem usage
Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8464 Change subject: IMPALA-4591: Bound Kudu client error mem usage .. IMPALA-4591: Bound Kudu client error mem usage Previously, Kudu client errors could grow in size unbounded, potentially causing the process to be killed. This patch sets a bound on the mem that can be used for these error messages, with the size determined by the flag 'kudu_error_buffer_size'. If the errors for a Kudu client exceed this size, the query will fail, as some errors will be dropped and we won't be able to tell if all of the errors can be safely ignored. Testing: - Added a custom cluster test that verifies that a query that exceeds the limit fails. Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14 --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h M tests/custom_cluster/test_kudu.py 3 files changed, 39 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/8464/1 -- To view, visit http://gerrit.cloudera.org:8080/8464 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I186ddb3f3b5865e08f17dba57cf6640591d06b14 Gerrit-Change-Number: 8464 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall