[kudu-CR] KUDU-2303: Add ignoreNull option to upsertRows

2018-03-29 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9834 )

Change subject: KUDU-2303: Add ignoreNull option to upsertRows
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@7
PS2, Line 7: ignoreNull
Curious to hear others' opinions, but to me a name like 'treatNullAsUnset' or 
'nullIsUnset' is more clear than 'ignoreNull'. Thoughts?



--
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 2
Gerrit-Owner: Fengling Wang 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Thu, 29 Mar 2018 16:56:28 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2303: Add ignoreNull option to upsertRows

2018-03-29 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9834 )

Change subject: KUDU-2303: Add ignoreNull option to upsertRows
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@7
PS2, Line 7: KUDU-2303
I think you mean 2250?



--
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 2
Gerrit-Owner: Fengling Wang 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Thu, 29 Mar 2018 16:48:19 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2303: Add ignoreNull option to upsertRows

2018-03-29 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9834 )

Change subject: KUDU-2303: Add ignoreNull option to upsertRows
..


Patch Set 2:

(1 comment)

> Patch Set 2:
>
> (5 comments)

http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala:

http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala@43
PS2, Line 43: private[kudu] case object UpsertIgnoreNull extends OperationType {
> If we keep adding options to modify the behavior of these operations we're
On second thought it may be cleaner to just introduce a WriteOptions class and 
allow extensibility via adding fields.  This could be passed to the 
insert/upsert/etc calls in KuduContext.



--
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 2
Gerrit-Owner: Fengling Wang 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Thu, 29 Mar 2018 14:45:11 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2303: Add ignoreNull option to upsertRows

2018-03-29 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9834 )

Change subject: KUDU-2303: Add ignoreNull option to upsertRows
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@9
PS2, Line 9: upsert only non-Null
nit: Try to wrap commit message lines at about 80 chars.


http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@10
PS2, Line 10:
nit:  Add "the".


http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@12
PS2, Line 12: This
nit: "For example, this feature is useful..."


http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@13
PS2, Line 13: where
nit: s/where/because



--
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 2
Gerrit-Owner: Fengling Wang 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Thu, 29 Mar 2018 08:14:54 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2303: Add ignoreNull option to upsertRows

2018-03-28 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9834 )

Change subject: KUDU-2303: Add ignoreNull option to upsertRows
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@9
PS2, Line 9: upsertRows
Ignoring nulls can apply equally well to any write type, not just upserts.  How 
do you feel about making this a general option for all writes?


http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@11
PS2, Line 11:
This patch doesn't expose this functionality through 
DefaultSource.createRelation, which is another important way that Spark can do 
writes to Kudu.


http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@259
PS2, Line 259: value
s/value/values


http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@326
PS2, Line 326: if (!operationType.ignoreNull) 
operation.getRow.setNull(kuduIdx)
It's probably worth adding a check that the column isn't in the primary key.  
Even though it's an error to set a primary key to null (since PK columns are 
always non-nullable), I think the error message will be better as 'can't set 
non-nullable column to null' than 'primary key column not set'.


http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala:

http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala@43
PS2, Line 43: private[kudu] case object UpsertIgnoreNull extends OperationType {
If we keep adding options to modify the behavior of these operations we're 
quickly going to have an explosion of operation types.  Might be time to 
refactor along these lines:

private[kudu] case class Insert(val ignoreDuplicateRowErrors: Boolean) extends 
OperationType {
  override def operation(table: KuduTable): Operation = table.newInsert()
}
private[kudu] case object Update extends OperationType {
  override def operation(table: KuduTable): Operation = table.newUpdate()
}
private[kudu] case class Upsert(val ignoreNull: Boolean) extends OperationType {
  override def operation(table: KuduTable): Operation = table.newUpsert()
}
private[kudu] case object Delete extends OperationType {
  override def operation(table: KuduTable): Operation = table.newDelete()
}



--
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 2
Gerrit-Owner: Fengling Wang 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Thu, 29 Mar 2018 03:12:42 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2303: Add ignoreNull option to upsertRows

2018-03-27 Thread Fengling Wang (Code Review)
Fengling Wang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9834


Change subject: KUDU-2303: Add ignoreNull option to upsertRows
..

KUDU-2303: Add ignoreNull option to upsertRows

This patch adds the ignoreNull option to upsertRows so that users can upsert 
only non-Null
columns and leave the rest of columns unchanged.

This feature is useful when users use Spark streaming to process JSON and 
upsert to Kudu,
where missing column values from JSON are set to NULL, resulting in some 
existing row
values being upserted to Null, which is not desired.

Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
M 
java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduContextTest.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala
4 files changed, 44 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/9834/1
--
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 1
Gerrit-Owner: Fengling Wang