[kudu-CR] Add RegexpKuduOperationsProducer class
Mike Percy has posted comments on this change. Change subject: Add RegexpKuduOperationsProducer class .. Patch Set 5: > Late thought: do we want this in 1.0? If so will make note to Todd > to add to his release note rework patch once this one merged (or > will add myself if Todd's gets merged first) I'm OK with it going into 1.0. What you can do is checkout his doc patch using the Gerrit "download" link, add a patch on top of it, then submit to Gerrit. Gerrit will keep track of the dependency. -- To view, visit http://gerrit.cloudera.org:8080/3883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Ara Ebrahimi Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] Add RegexpKuduOperationsProducer class
Mike Percy has submitted this change and it was merged. Change subject: Add RegexpKuduOperationsProducer class .. Add RegexpKuduOperationsProducer class This patch adds the RegexpKuduOperationsProducer class. This class serializes Event objects to Kudu inserts or upserts by decoding the body into a string, parsing the string using a regular expression, and finally mapping match groups to columns by matching the name of the match group to the name of the column. Parsed values are naively coerced to the proper type. This provides an easy-to-use but flexible way to ingest data with varying schemas into Kudu from Flume. Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea Reviewed-on: http://gerrit.cloudera.org:8080/3883 Tested-by: Kudu Jenkins Reviewed-by: Mike Percy --- A java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducer.java A java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducerTest.java 2 files changed, 498 insertions(+), 0 deletions(-) Approvals: Mike Percy: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/3883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Ara Ebrahimi Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley
[kudu-CR] Add RegexpKuduOperationsProducer class
Mike Percy has posted comments on this change. Change subject: Add RegexpKuduOperationsProducer class .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/3883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Ara Ebrahimi Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] Add RegexpKuduOperationsProducer class
Will Berkeley has posted comments on this change. Change subject: Add RegexpKuduOperationsProducer class .. Patch Set 5: Late thought: do we want this in 1.0? If so will make note to Todd to add to his release note rework patch once this one merged (or will add myself if Todd's gets merged first) -- To view, visit http://gerrit.cloudera.org:8080/3883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Ara Ebrahimi Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] Add RegexpKuduOperationsProducer class
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3883 to look at the new patch set (#5). Change subject: Add RegexpKuduOperationsProducer class .. Add RegexpKuduOperationsProducer class This patch adds the RegexpKuduOperationsProducer class. This class serializes Event objects to Kudu inserts or upserts by decoding the body into a string, parsing the string using a regular expression, and finally mapping match groups to columns by matching the name of the match group to the name of the column. Parsed values are naively coerced to the proper type. This provides an easy-to-use but flexible way to ingest data with varying schemas into Kudu from Flume. Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea --- A java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducer.java A java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducerTest.java 2 files changed, 498 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/3883/5 -- To view, visit http://gerrit.cloudera.org:8080/3883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Ara Ebrahimi Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley
[kudu-CR] Add RegexpKuduOperationsProducer class
Kudu Jenkins has posted comments on this change. Change subject: Add RegexpKuduOperationsProducer class .. Patch Set 5: Build Started http://104.196.14.100/job/kudu-gerrit/3346/ -- To view, visit http://gerrit.cloudera.org:8080/3883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Ara Ebrahimi Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] Add RegexpKuduOperationsProducer class
Will Berkeley has posted comments on this change. Change subject: Add RegexpKuduOperationsProducer class .. Patch Set 4: (7 comments) > Cool idea! Actually, I was talking to Jeremy Beard and mentioned that you had suggested an OperationsProducer that parsed Avro. He said he'd rather have one that just used regexp so it'd be a little quicker to get moving! http://gerrit.cloudera.org:8080/#/c/3883/4/java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducer.java File java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducer.java: Line 93: */ > Needs a very simple example here somewhere, say with 2 fields. Done Line 209: private void CoerceAndSet(PartialRow row, String rawVal, String colName, Type type) > Needs doc comment. Done http://gerrit.cloudera.org:8080/#/c/3883/4/java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducerTest.java File java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducerTest.java: Line 53: "(?\\d+),(?\\d+),(?\\d+),(?\\d+)," + > As in the other patch, I think naming the fields something other than a typ Done Line 70: new CreateTableOptions().addHashPartitions(ImmutableList.of("key"), 3).setNumReplicas(1); > style nit: indent 4 spaces for continuation Done Line 136: // > remove Done Line 152: String mismatchInInt = "|1,2,taco,4,5,x,y,true,1.0.2.0,999|"; > https://media.giphy.com/media/dLwB8eG7wwUDe/giphy.gif :) Tacos have successfully infiltrated the Kudu codebase ;D Line 173: eventCount * perEventRowCount, > indent 4 spaces for continuation Fixed here and elsewhere. -- To view, visit http://gerrit.cloudera.org:8080/3883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Ara Ebrahimi Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] Add RegexpKuduOperationsProducer class
Mike Percy has posted comments on this change. Change subject: Add RegexpKuduOperationsProducer class .. Patch Set 4: -Code-Review Oops, a couple things to tweak before commit I think -- To view, visit http://gerrit.cloudera.org:8080/3883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Ara Ebrahimi Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: No
[kudu-CR] Add RegexpKuduOperationsProducer class
Mike Percy has posted comments on this change. Change subject: Add RegexpKuduOperationsProducer class .. Patch Set 4: Code-Review+2 (7 comments) Cool idea! http://gerrit.cloudera.org:8080/#/c/3883/4/java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducer.java File java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducer.java: Line 93: */ Needs a very simple example here somewhere, say with 2 fields. Also, note that this relies on JDK7 "named-capturing groups" which are documented in {@link Pattern}. Add: @see Pattern Line 209: private void CoerceAndSet(PartialRow row, String rawVal, String colName, Type type) Needs doc comment. Also, prefer the following arg order: CoerceAndSet(String rawVal, String colName, Type type, PartialRow row) Since row is an in-out param and the rest are in params http://gerrit.cloudera.org:8080/#/c/3883/4/java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducerTest.java File java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducerTest.java: Line 53: "(?\\d+),(?\\d+),(?\\d+),(?\\d+)," + As in the other patch, I think naming the fields something other than a type name might make it a little clearer. Just byteField or something would help understandability, I think Line 70: new CreateTableOptions().addHashPartitions(ImmutableList.of("key"), 3).setNumReplicas(1); style nit: indent 4 spaces for continuation Line 136: // remove Line 152: String mismatchInInt = "|1,2,taco,4,5,x,y,true,1.0.2.0,999|"; https://media.giphy.com/media/dLwB8eG7wwUDe/giphy.gif :) Line 173: eventCount * perEventRowCount, indent 4 spaces for continuation -- To view, visit http://gerrit.cloudera.org:8080/3883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Ara Ebrahimi Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: Yes
[kudu-CR] Add RegexpKuduOperationsProducer class
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3883 to look at the new patch set (#4). Change subject: Add RegexpKuduOperationsProducer class .. Add RegexpKuduOperationsProducer class This patch adds the RegexpKuduOperationsProducer class. This class serializes Event objects to Kudu inserts or upserts by decoding the body into a string, parsing the string using a regular expression, and finally mapping match groups to columns by matching the name of the match group to the name of the column. Parsed values are naively coerced to the proper type. This provides an easy-to-use but flexible way to ingest data with varying schemas into Kudu from Flume. Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea --- A java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducer.java A java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducerTest.java 2 files changed, 471 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/3883/4 -- To view, visit http://gerrit.cloudera.org:8080/3883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Ara Ebrahimi Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy
[kudu-CR] Add RegexpKuduOperationsProducer class
Kudu Jenkins has posted comments on this change. Change subject: Add RegexpKuduOperationsProducer class .. Patch Set 4: Build Started http://104.196.14.100/job/kudu-gerrit/3293/ -- To view, visit http://gerrit.cloudera.org:8080/3883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Ara Ebrahimi Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: No
[kudu-CR] Add RegexpKuduOperationsProducer class
Kudu Jenkins has posted comments on this change. Change subject: Add RegexpKuduOperationsProducer class .. Patch Set 3: Build Started http://104.196.14.100/job/kudu-gerrit/3292/ -- To view, visit http://gerrit.cloudera.org:8080/3883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Ara Ebrahimi Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: No
[kudu-CR] Add RegexpKuduOperationsProducer class
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3883 to look at the new patch set (#3). Change subject: Add RegexpKuduOperationsProducer class .. Add RegexpKuduOperationsProducer class This patch adds the RegexpKuduOperationsProducer class. This class serializes Event objects to Kudu inserts or upserts by decoding the body into a string, parsing the string using a regular expression, and finally mapping match groups to columns by matching the name of the match group to the name of the column. Parsed values are naively coerced to the proper type. This provides an easy-to-use but flexible way to ingest data with varying schemas into Kudu from Flume. Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea --- A java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducer.java A java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducerTest.java 2 files changed, 473 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/3883/3 -- To view, visit http://gerrit.cloudera.org:8080/3883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Ara Ebrahimi Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy