Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9783 )

Change subject: KUDU-16 pt 1: add server-side limits on scanners
......................................................................


Patch Set 3:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/common/scan_spec.h
File src/kudu/common/scan_spec.h:

http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/common/scan_spec.h@42
PS2, Line 42:       lower_bound_partition_key_(),
> warning: initializer for member 'lower_bound_partition_key_' is redundant [
Done


http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/common/scan_spec.h@43
PS2, Line 43:       exclusive_upper_bound_partition_key_(),
> warning: initializer for member 'exclusive_upper_bound_partition_key_' is r
Done


http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/common/scan_spec.h@45
PS2, Line 45:       has_limit_(false),
> do you really need this. can't you get away with just having a has_limit()
I think having limit = 0 meaning "no limit" is not intuitive, although yes some 
invalid limit (e.g. -1) would work. This is actually an artifact of the fact 
that I implemented this with uint64_t first and didn't want limit = 0 to mean 
"no limit".

Even with int64_t, having this as a separate field also simplifies cases where 
a user might specify a negative limit. Sure, we could disallow that and return 
an error or something, but that should be done at the client level. On the 
server side, I think it makes sense to just short-circuit (which isn't how it 
is now, but let me change that), and that's a harder call to make if we base 
short circuiting off of the value alone e.g. if we have -1 be the default, 
should we scan everything, or scan nothing? I suppose if we make the default 
LLONG_MAX, we could have it scan nothing. WDYT?


http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/common/scan_spec.cc
File src/kudu/common/scan_spec.cc:

http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/common/scan_spec.cc@72
PS2, Line 72: has_limit_ && limit_ <= 0
> right, same point. this seems redudant. could just call the has_limit() fun
Addressed in the other comment


http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/common/wire_protocol.h
File src/kudu/common/wire_protocol.h:

http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/common/wire_protocol.h@157
PS2, Line 157: const boost::optional<int64_t>& max_num_rows_to_return = 
boost::none
> likely best move this before the pad_ argument. it's possible we remove it
Done


http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/tserver/scanners.h
File src/kudu/tserver/scanners.h:

http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/tserver/scanners.h@368
PS2, Line 368:
             :   // The number of rows that
> docs
Done


http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/tserver/tablet_server-test.cc
File src/kudu/tserver/tablet_server-test.cc:

http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/tserver/tablet_server-test.cc@2102
PS2, Line 2102:   const int64_t kHighLimit = kNumRows * 2;
> warning: either cast from 'int' to 'const int64_t' (aka 'const long') is in
Done


http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/tserver/tablet_server-test.cc@2107
PS2, Line 2107:   int num_rows = AllowSlowTests() ? 10000 : 1000;
> warning: missing username/bug in TODO [google-readability-todo]
This comment is five years old and I think is out of datae.


http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/tserver/tserver.proto
File src/kudu/tserver/tserver.proto:

http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/tserver/tserver.proto@229
PS2, Line 229:   repeated ColumnRangePredicatePB DEPRECATED_range_predicates = 
3;
> seems like an unrelated change that could go in a separate change set?
Done


http://gerrit.cloudera.org:8080/#/c/9783/2/src/kudu/tserver/tserver.proto@282
PS2, Line 282:   // The maximum number of rows to scan with the new scanner.
             :   //
             :   // The scanner will automatically stop yielding results and 
close itself
             :   // after reaching this num
> how does this relate to 'batch_size_bytes' in the scan request below. we sh
It doesn't really relate `batch_size_bytes`. It imposes a limit to the total 
limit returned by a scanner, whereas `batch_size_bytes` imposes how many can be 
returned at a time. I updated the comment to maybe make it more clear that this 
is a property of the scanner (whereas batch_size_bytes is a property of each 
scan).

I'm not sure I follow the second point. AFAIK we don't, we'll scan for as long 
as there are more rowsets to scan.



--
To view, visit http://gerrit.cloudera.org:8080/9783
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3ca5ddff09f4910d12f80259bd11e4027f99aa7c
Gerrit-Change-Number: 9783
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Fri, 23 Mar 2018 21:39:56 +0000
Gerrit-HasComments: Yes

Reply via email to