[kudu-CR] WIP: Expose "raw" mode in KuduScanner and allow to pass flags

2017-05-01 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: WIP: Expose "raw" mode in KuduScanner and allow to pass flags
..


Patch Set 5:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/6624/5//COMMIT_MSG
Commit Message:

PS5, Line 10: should only
> Is this a stylistic choice? Or is it by necessity? Meaning, is it impossibl
I changed this according to feedback.


PS5, Line 10: retreived
> retrieved
Done


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/client.h
File src/kudu/client/client.h:

Line 2018:   enum { kDefaultRawModeFlags = -1 };
> This is confusing. If the idea is to provide a constant that logically mean
k, changed this to 0, changed EnableRawModeWithFlags to SetRawModeFlags


Line 2018:   enum { kDefaultRawModeFlags = -1 };
> yea, I'd expect the default to be 0
Done


Line 2028: PAD_UNIXTIME_MICROS_TO_16_BYTES = 1
> The idea is for these flags not to be mutually exclusive, right? If so, the
yeah, it's just that at this point we only have one. I'll use that notation to 
make it clear.


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/scan_configuration.cc
File src/kudu/client/scan_configuration.cc:

Line 31: using strings::Substitute;
> Not used?
Done


Line 182:   raw_mode_flags_ = flags;
> we should probably check that there are no unsupported flags set
Done


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/scan_configuration.h
File src/kudu/client/scan_configuration.h:

Line 139: return raw_mode_flags_ != KuduScanner::kDefaultRawModeFlags;
> strikes me as odd, since this is -1 (all flags set)
yeah, was kind of thinking of using the first (sign) bit to distinguish between 
raw mode and non-raw mode. changed this to 0 anyway.


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

Line 245:   (configuration().raw_mode_flags() & 
KuduScanner::PAD_UNIXTIME_MICROS_TO_16_BYTES) ==
> == shouldn't be necessary.
Done


PS5, Line 536: raw_mode_flags_ != KuduScanner::kDefaultRawModeFlags 
> yea, this defaultRawModeFlags thing is funny looking here - I'd expect the 
I've changed this to NO_FLAGS. I don't think we need to go into format 
versioning since it's not really required for this patch or presently. If we 
ever add it in the future we can add a flag for the old format (i.e. 
RAW_MODE_V1 as you suggest) and have NO_FLAGS mean the unaltered "client 
supported" format. Maybe it would make sense then to also have ROW_FORMAT_V2 so 
that you could padd other format modifiers with both versions, but again I 
don't think we need to handle that right now. and the current impl doesn't 
preclude adding that later.


Line 550: LOG(FATAL) << "Cannot extract rows, \"raw\" mode";
> would it be possible for a user in a release build to hit this case? Or wou
Users can hit this if they set a modifier flag but then use 
KuduScanner::NextBatch(vector* rows) to decode the rows.


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/scanner-internal.h
File src/kudu/client/scanner-internal.h:

Line 256:   Status Reset(rpc::RpcController* controller,
> warning: function 'kudu::client::KuduScanBatch::Data::Reset' has a definiti
Done


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/tserver/scanners.h
File src/kudu/tserver/scanners.h:

Line 100:   int64_t raw_mode_flags() const { return raw_mode_flags_; }
> yea
oops. Done


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

Line 342:   virtual void set_raw_mode_flags(int64_t raw_mode_flags) {}
> If ScanResultCollector is to be a generic "interface" as its class comment 
I was torn about this. I pondered the alternative you suggest, but I don't like 
that I'd have to have either checksum collector either implement this as a no 
op or have to make this return a status. Both options seem worse than having an 
empty impl here, IMO


Line 410:   void set_raw_mode_flags(int64_t raw_mode_flags) override {
> This is unintuitive; since collectors are constructed per RPC, I would have
Done


Line 410:   void set_raw_mode_flags(int64_t raw_mode_flags) override {
> would it be simpler if we pass the raw-mode flags on every ContinueScan RPC
In order to do that we'd have to move the field from NewScanRequestPB to 
ScanRequestPB which I don't think makes much sense. Particularly because, as we 
add flags, we'd have to perform all the necessary validation once per continue 
scan request versus once per new scan.


Line 411: if (raw_mode_flags == -1) return;
> This special case won't be necessary if the default value is 0 rather than 
Done


Line 413: == RawModeFlags::PAD_UNIX_TIME_MICROS_TO_16_BYTES) {
> Why is the == necessary? If you mask one bit flag from raw_mode_flags, the 
Done


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/tserver/tserver.proto
File src/kudu/tserver/tserver.proto:


[kudu-CR] WIP: Expose "raw" mode in KuduScanner and allow to pass flags

2017-04-20 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: WIP: Expose "raw" mode in KuduScanner and allow to pass flags
..


Patch Set 5:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/client.h
File src/kudu/client/client.h:

Line 2018:   enum { kDefaultRawModeFlags = -1 };
> This is confusing. If the idea is to provide a constant that logically mean
yea, I'd expect the default to be 0


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/scan_configuration.cc
File src/kudu/client/scan_configuration.cc:

Line 182:   raw_mode_flags_ = flags;
we should probably check that there are no unsupported flags set


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/scan_configuration.h
File src/kudu/client/scan_configuration.h:

Line 139: return raw_mode_flags_ != KuduScanner::kDefaultRawModeFlags;
strikes me as odd, since this is -1 (all flags set)


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

PS5, Line 536: raw_mode_flags_ != KuduScanner::kDefaultRawModeFlags 
yea, this defaultRawModeFlags thing is funny looking here - I'd expect the 
default "0" to mean not enabled, and maybe allocate a flag called 'RAW_MODE_V1" 
or something? That way if we ever changed our internal format, we could switch 
to setting RAW_MODE_V2 instead so the client can indicate its compatibility 
with the current version


Line 550: LOG(FATAL) << "Cannot extract rows, \"raw\" mode";
would it be possible for a user in a release build to hit this case? Or would 
this really be indicative of a bug on our side?


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/tserver/scanners.h
File src/kudu/tserver/scanners.h:

Line 100:   int64_t raw_mode_flags() const { return raw_mode_flags_; }
> Why is this a ScannerManager thing? I thought it'd only be per scanner?
yea


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

Line 410:   void set_raw_mode_flags(int64_t raw_mode_flags) override {
> This is unintuitive; since collectors are constructed per RPC, I would have
would it be simpler if we pass the raw-mode flags on every ContinueScan RPC 
rather than storing it as part of the scanner object? Then we would be able to 
avoid the changes to the Scanner interface, as well as solve this problem


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/tserver/tserver.proto
File src/kudu/tserver/tserver.proto:

Line 270:   // "Raw" mode flags.
> I understand the desire for a "raw" mode client-side, so that users who sig
+1, it shouldn't distinguish raw vs not-raw. At this level all scans are kind 
of "raw"

Perhaps it should be called "format_flags" or something? As for a flag bitset 
vs separate booleans I dont have a strong opinion. I suppose separate booleans 
would be less dense in memory than a bitset, but doubt it matters.


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: Expose "raw" mode in KuduScanner and allow to pass flags

2017-04-19 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: WIP: Expose "raw" mode in KuduScanner and allow to pass flags
..


Patch Set 5:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/6624/5//COMMIT_MSG
Commit Message:

PS5, Line 10: retreived
retrieved


PS5, Line 10: should only
Is this a stylistic choice? Or is it by necessity? Meaning, is it impossible to 
safely retrieve data from a "normal" scanner via direct_data()/indirect_data()?


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/client.h
File src/kudu/client/client.h:

Line 2018:   enum { kDefaultRawModeFlags = -1 };
This is confusing. If the idea is to provide a constant that logically means 
"raw mode on but no flags", it should have a value 0 and be part of the 
RawModeFlags enum. But if it means "no raw mode at all"...isn't that the state 
of the world if you don't call EnableRawModeWithFlags()?

Or is this some internal thing? In which case, why is it here at all?


Line 2028: PAD_UNIXTIME_MICROS_TO_16_BYTES = 1
The idea is for these flags not to be mutually exclusive, right? If so, the 
value of the first one should be "1 << 0" so it's clear that the next ones 
should be "1 << 1", "1 << 2", etc. That's how they can be bitflags and OR'ed 
together.


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/scan_configuration.cc
File src/kudu/client/scan_configuration.cc:

Line 31: using strings::Substitute;
Not used?


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

Line 245:   (configuration().raw_mode_flags() & 
KuduScanner::PAD_UNIXTIME_MICROS_TO_16_BYTES) ==
== shouldn't be necessary.


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/tserver/scanners.h
File src/kudu/tserver/scanners.h:

Line 100:   int64_t raw_mode_flags() const { return raw_mode_flags_; }
Why is this a ScannerManager thing? I thought it'd only be per scanner?


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

Line 342:   virtual void set_raw_mode_flags(int64_t raw_mode_flags) {}
> warning: parameter 'raw_mode_flags' is unused [misc-unused-parameters]
If ScanResultCollector is to be a generic "interface" as its class comment 
suggests, it shouldn't provide this default no-op implementation.


Line 410:   void set_raw_mode_flags(int64_t raw_mode_flags) override {
This is unintuitive; since collectors are constructed per RPC, I would have 
expected the right value for pad_unixtime_micros_to_16_bytes_ to be provided 
directly to the constructor. But I think I understand why: the collector is 
stack-allocated before fetching the Scanner, and reversing the order would 
require heap-allocating the collector.

Maybe you can try to explain this chicken-and-egg problem somewhere in the 
collector comments so it's clear for others?


Line 411: if (raw_mode_flags == -1) return;
This special case won't be necessary if the default value is 0 rather than -1.


Line 413: == RawModeFlags::PAD_UNIX_TIME_MICROS_TO_16_BYTES) {
Why is the == necessary? If you mask one bit flag from raw_mode_flags, the 
result will be either non-zero (requested) or zero (not requested).


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/tserver/tserver.proto
File src/kudu/tserver/tserver.proto:

Line 270:   // "Raw" mode flags.
I understand the desire for a "raw" mode client-side, so that users who sign up 
for timestamp padding are forced to also call  direct_data()/indirect_data() 
instead of row-by-row access.

But why is it exposed to the server? AFAICT, there's no server-side effect. The 
only thing that matters is the timestamp padding, and that can be exposed via 
something like TimestampPaddingMode (similar to ReadMode or OrderMode).


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] WIP: Expose "raw" mode in KuduScanner and allow to pass flags

2017-04-19 Thread David Ribeiro Alves (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6624

to look at the new patch set (#5).

Change subject: WIP: Expose "raw" mode in KuduScanner and allow to pass flags
..

WIP: Expose "raw" mode in KuduScanner and allow to pass flags

This adds a way to set a KuduScanner to "raw" mode. In this mode
the data should only be retreived through the direct_data() and
indirect_data() in KuduScanBatch.

It also adds a way to pass "flags" encoded as bits in an int64_t
'raw_mode_flags_' var.

The only use for these flags, presently, is to set
PAD_UNIXTIME_MICROS_TO_16_BYTES, making sure that the server
pads slots for UNIXTIME_MICROS with an additional 8 bytes to
the left.

WIP: Working on a directed test, just making sure I didn't
break anything with the piping.

Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/scanners-test.cc
M src/kudu/tserver/scanners.cc
M src/kudu/tserver/scanners.h
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tserver.proto
13 files changed, 160 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/6624/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon