Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9783 )
Change subject: KUDU-16 pt 1: add server-side limits on scanners ...................................................................... Patch Set 3: (10 comments) http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/common/scan_spec.h File src/kudu/common/scan_spec.h: http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/common/scan_spec.h@42 PS2, Line 42: lower_bound_partition_key_(), > warning: initializer for member 'lower_bound_partition_key_' is redundant [ Done http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/common/scan_spec.h@43 PS2, Line 43: exclusive_upper_bound_partition_key_(), > warning: initializer for member 'exclusive_upper_bound_partition_key_' is r Done http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/common/scan_spec.h@45 PS2, Line 45: has_limit_(false), > do you really need this. can't you get away with just having a has_limit() I think having limit = 0 meaning "no limit" is not intuitive, although yes some invalid limit (e.g. -1) would work. This is actually an artifact of the fact that I implemented this with uint64_t first and didn't want limit = 0 to mean "no limit". Even with int64_t, having this as a separate field also simplifies cases where a user might specify a negative limit. Sure, we could disallow that and return an error or something, but that should be done at the client level. On the server side, I think it makes sense to just short-circuit (which isn't how it is now, but let me change that), and that's a harder call to make if we base short circuiting off of the value alone e.g. if we have -1 be the default, should we scan everything, or scan nothing? I suppose if we make the default LLONG_MAX, we could have it scan nothing. WDYT? http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/common/scan_spec.cc File src/kudu/common/scan_spec.cc: http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/common/scan_spec.cc@72 PS2, Line 72: has_limit_ && limit_ <= 0 > right, same point. this seems redudant. could just call the has_limit() fun Addressed in the other comment http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/common/wire_protocol.h File src/kudu/common/wire_protocol.h: http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/common/wire_protocol.h@157 PS2, Line 157: const boost::optional<int64_t>& max_num_rows_to_return = boost::none > likely best move this before the pad_ argument. it's possible we remove it Done http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/tserver/scanners.h File src/kudu/tserver/scanners.h: http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/tserver/scanners.h@368 PS2, Line 368: : // The number of rows that > docs Done http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/tserver/tablet_server-test.cc File src/kudu/tserver/tablet_server-test.cc: http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/tserver/tablet_server-test.cc@2102 PS2, Line 2102: const int64_t kHighLimit = kNumRows * 2; > warning: either cast from 'int' to 'const int64_t' (aka 'const long') is in Done http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/tserver/tablet_server-test.cc@2107 PS2, Line 2107: int num_rows = AllowSlowTests() ? 10000 : 1000; > warning: missing username/bug in TODO [google-readability-todo] This comment is five years old and I think is out of datae. http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/tserver/tserver.proto File src/kudu/tserver/tserver.proto: http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/tserver/tserver.proto@229 PS2, Line 229: repeated ColumnRangePredicatePB DEPRECATED_range_predicates = 3; > seems like an unrelated change that could go in a separate change set? Done http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/tserver/tserver.proto@282 PS2, Line 282: // The maximum number of rows to scan with the new scanner. : // : // The scanner will automatically stop yielding results and close itself : // after reaching this num > how does this relate to 'batch_size_bytes' in the scan request below. we sh It doesn't really relate `batch_size_bytes`. It imposes a limit to the total limit returned by a scanner, whereas `batch_size_bytes` imposes how many can be returned at a time. I updated the comment to maybe make it more clear that this is a property of the scanner (whereas batch_size_bytes is a property of each scan). I'm not sure I follow the second point. AFAIK we don't, we'll scan for as long as there are more rowsets to scan. -- To view, visit http://gerrit.cloudera.org:8080/9783 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3ca5ddff09f4910d12f80259bd11e4027f99aa7c Gerrit-Change-Number: 9783 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Fri, 23 Mar 2018 21:39:56 +0000 Gerrit-HasComments: Yes
