[kudu-CR] KUDU-981 (part 1): validate identifiers as UTF8 with no null bytes

2016-12-01 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: KUDU-981 (part 1): validate identifiers as UTF8 with no null 
bytes
..


Patch Set 2: Code-Review+2

Carrying +2s

-- 
To view, visit http://gerrit.cloudera.org:8080/5296
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3b05e2882c9c2ce9b47c16450d9d54a04d3e38b
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: No


[kudu-CR] KUDU-981 (part 1): validate identifiers as UTF8 with no null bytes

2016-12-01 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged.

Change subject: KUDU-981 (part 1): validate identifiers as UTF8 with no null 
bytes
..


KUDU-981 (part 1): validate identifiers as UTF8 with no null bytes

Ensures that table names and identifiers are valid UTF8 with no embedded
null bytes which could cause problems.

KUDU-981 proposes more stringent restrictions such as limiting to only
ASCII. We should collect more info from users before making such a
limitation since it may affect Chinese users, etc. This more lenient
validation is meant more at catching bugs and identifiers that may cause
downstream software to crash.

Change-Id: Ic3b05e2882c9c2ce9b47c16450d9d54a04d3e38b
Reviewed-on: http://gerrit.cloudera.org:8080/5296
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon 
---
M src/kudu/client/client-test.cc
M src/kudu/master/catalog_manager.cc
2 files changed, 43 insertions(+), 15 deletions(-)

Approvals:
  Todd Lipcon: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5296
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ic3b05e2882c9c2ce9b47c16450d9d54a04d3e38b
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-981 (part 1): validate identifiers as UTF8 with no null bytes

2016-12-01 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: KUDU-981 (part 1): validate identifiers as UTF8 with no null 
bytes
..


Patch Set 1:

> Forcing everyone to use UTF-8 is pragmatic, but it's also restrictive for 
> users who want to use their own random encoding (granted, the Linux kernel 
> policy is incompatible with many encodings). What do you think of adopting a 
> similar policy for Kudu?

I think the days of non-UTF8 international encodings are starting to wane, no?

More importantly, we currently use the 'string' type in our .protos for such 
identifiers, and protobuf prescribes that strings should be utf8. That has 
specific impact on the Java APIs which expose these things as 'String' and 
hard-code UTF8. So, I think in practice we are already enforcing UTF8 for any 
apps that use the Java client, and this is just adding server-side validation 
that prevents C++ users from creating tables that can't be read from Java.

-- 
To view, visit http://gerrit.cloudera.org:8080/5296
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3b05e2882c9c2ce9b47c16450d9d54a04d3e38b
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: No


[kudu-CR] KUDU-981 (part 1): validate identifiers as UTF8 with no null bytes

2016-12-01 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change.

Change subject: KUDU-981 (part 1): validate identifiers as UTF8 with no null 
bytes
..


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5296/1//COMMIT_MSG
Commit Message:

PS1, Line 12: limiting to only
: ASCII
> I'm pretty strongly opposed to this.
Me too. Though Impala does restrict identifiers to ASCII-only. See 
http://www.cloudera.com/documentation/enterprise/5-7-x/topics/impala_identifiers.html


-- 
To view, visit http://gerrit.cloudera.org:8080/5296
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3b05e2882c9c2ce9b47c16450d9d54a04d3e38b
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-981 (part 1): validate identifiers as UTF8 with no null bytes

2016-11-30 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: KUDU-981 (part 1): validate identifiers as UTF8 with no null 
bytes
..


Patch Set 1: Code-Review+2

(1 comment)

I'm fond of the policy followed by the Linux kernel with respect to directory 
entries:
1. The kernel uses '\0' as string termination markers, so userspace must do the 
same (and can't use them mid-string).
2. The kernel will look for '/' (encoded as per US-ASCII) to split paths, so 
they take on special meaning in userspace.
3. The rest is fair game.

Forcing everyone to use UTF-8 is pragmatic, but it's also restrictive for users 
who want to use their own random encoding (granted, the Linux kernel policy is 
incompatible with many encodings). What do you think of adopting a similar 
policy for Kudu?

Separately, this change is backwards incompatible and should be documented, and 
perhaps even gated with a flag (not sure if opt-in or opt-out though).

http://gerrit.cloudera.org:8080/#/c/5296/1//COMMIT_MSG
Commit Message:

PS1, Line 12: limiting to only
: ASCII
I'm pretty strongly opposed to this.


-- 
To view, visit http://gerrit.cloudera.org:8080/5296
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3b05e2882c9c2ce9b47c16450d9d54a04d3e38b
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-981 (part 1): validate identifiers as UTF8 with no null bytes

2016-11-30 Thread Todd Lipcon (Code Review)
Todd Lipcon has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5296

Change subject: KUDU-981 (part 1): validate identifiers as UTF8 with no null 
bytes
..

KUDU-981 (part 1): validate identifiers as UTF8 with no null bytes

Ensures that table names and identifiers are valid UTF8 with no embedded
null bytes which could cause problems.

KUDU-981 proposes more stringent restrictions such as limiting to only
ASCII. We should collect more info from users before making such a
limitation since it may affect Chinese users, etc. This more lenient
validation is meant more at catching bugs and identifiers that may cause
downstream software to crash.

Change-Id: Ic3b05e2882c9c2ce9b47c16450d9d54a04d3e38b
---
M src/kudu/client/client-test.cc
M src/kudu/master/catalog_manager.cc
2 files changed, 43 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/96/5296/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5296
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic3b05e2882c9c2ce9b47c16450d9d54a04d3e38b
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon