[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)
Jean-Daniel Cryans has posted comments on this change. Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro) .. Patch Set 8: Verified+1 Overriding Jenkins since it's an unrelated failure in the master's stress test. Thank you for your contribution! -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HNGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Sandish Kumar HN Gerrit-HasComments: No
[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)
Jean-Daniel Cryans has submitted this change and it was merged. Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro) .. kudu client tools for hadoop and spark import/export(csv,parquet,avro) Change-Id: If462af948651f3869b444e82151c3559fde19142 Reviewed-on: http://gerrit.cloudera.org:8080/7421 Reviewed-by: Jean-Daniel CryansTested-by: Jean-Daniel Cryans --- M java/kudu-client-tools/pom.xml A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquetPreCheck.java M java/kudu-spark-tools/pom.xml A java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala A java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala M java/pom.xml 13 files changed, 1,200 insertions(+), 7 deletions(-) Approvals: Jean-Daniel Cryans: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HN Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Sandish Kumar HN
[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)
Jean-Daniel Cryans has posted comments on this change. Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro) .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HNGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sandish Kumar HN Gerrit-HasComments: No
[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)
Sandish Kumar HN has posted comments on this change. Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro) .. Patch Set 8: (5 comments) nit http://gerrit.cloudera.org:8080/#/c/7421/7/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala File java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala: PS7, Line 41: from > nit: from Done PS7, Line 41: Kudu > nit: Kudu Done PS7, Line 42: the following format > replace with "the following formats are supported" Done PS7, Line 107: doesn't exi > huh? :) Done PS7, Line 154: CSV > nit: CSV Done -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HNGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sandish Kumar HN Gerrit-HasComments: Yes
[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7421 to look at the new patch set (#8). Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro) .. kudu client tools for hadoop and spark import/export(csv,parquet,avro) Change-Id: If462af948651f3869b444e82151c3559fde19142 --- M java/kudu-client-tools/pom.xml A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquetPreCheck.java M java/kudu-spark-tools/pom.xml A java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala A java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala M java/pom.xml 13 files changed, 1,200 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/7421/8 -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HNGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sandish Kumar HN
[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)
Jean-Daniel Cryans has posted comments on this change. Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro) .. Patch Set 7: (6 comments) Took a closer look at the scala files, we're almost there :) http://gerrit.cloudera.org:8080/#/c/7421/7/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala File java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala: PS7, Line 41: kudu nit: Kudu PS7, Line 41: form nit: from PS7, Line 42: currently we support replace with "the following formats are supported" PS7, Line 107: doesn't not huh? :) PS7, Line 154: Csv nit: CSV Line 157: LOG.info(s"Spark version detected as" + sc.version) I'm not really familiar with Spark, is this really useful? -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HNGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sandish Kumar HN Gerrit-HasComments: Yes
[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)
Sandish Kumar HN has posted comments on this change. Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro) .. Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/7421/6//COMMIT_MSG Commit Message: PS6, Line 7: avro > Did you mean to include the avro part? Avro is available in spark client tools. people didn't want me to add much to MapReduce client tools. http://gerrit.cloudera.org:8080/#/c/7421/6/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java: Line 32: import org.apache.parquet.column.ColumnDescriptor; > nit: remove Done PS6, Line 103: > nit: Parquet is always stylized with an upper case P, please fix here and b Done PS6, Line 103: > nit: exist, same below Done -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HNGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sandish Kumar HN Gerrit-HasComments: Yes
[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7421 to look at the new patch set (#7). Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro) .. kudu client tools for hadoop and spark import/export(csv,parquet,avro) Change-Id: If462af948651f3869b444e82151c3559fde19142 --- M java/kudu-client-tools/pom.xml A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquetPreCheck.java M java/kudu-spark-tools/pom.xml A java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala A java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala M java/pom.xml 13 files changed, 1,202 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/7421/7 -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HNGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sandish Kumar HN
[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)
Jean-Daniel Cryans has posted comments on this change. Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro) .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/7421/6//COMMIT_MSG Commit Message: PS6, Line 7: avro Did you mean to include the avro part? http://gerrit.cloudera.org:8080/#/c/7421/6/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java: Line 32: import org.apache.parquet.Log; nit: remove PS6, Line 103: exists nit: exist, same below PS6, Line 103: parquet nit: Parquet is always stylized with an upper case P, please fix here and below. -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HNGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sandish Kumar HN Gerrit-HasComments: Yes
[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)
Sandish Kumar HN has posted comments on this change. Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro) .. Patch Set 6: (24 comments) http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java: Line 69: StringBuilder buf = new StringBuilder(); > I'd still advocate removing this javadoc section, it adds nothing and it's Done http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java: Line 56: > I just noticed, you shouldn't use this Log that comes from Parquet. It's be Done Line 97: // Pre-flight checks of input parquet schema and table schema. > nit: missing space after //m Done Line 99: if (schema.containsField(columnSchema.getName())) { > Why do you not also check the type? Done Line 101: .equals(getTypeName(columnSchema.getType( { > Having System.exits in the code isn't good, ideally this case would be test Done Line 101: .equals(getTypeName(columnSchema.getType( { > I'd suggestion throwing an exception and catching it in run() below. Done Line 101: .equals(getTypeName(columnSchema.getType( { > what should I use instead of system.exit(0). just LOG.error("") enough?? Done Line 104: } > Well Kudu doesn't support Parquet's TIMESTAMP, it's not about a recommendat Done http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java File java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java: Line 17: > nit: missing a blank line. Done Line 68: > nit: end comments with a period. Done http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java File java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java: Line 17: > nit: missing blank line. Done Line 50: import org.apache.kudu.mapreduce.HadoopTestingUtility; > I'd suggest having a separate test that specifically verifies the pre-fligh Done Line 50: import org.apache.kudu.mapreduce.HadoopTestingUtility; > separate test-case for pre-flight checks? and what should be the expected r Done Line 50: import org.apache.kudu.mapreduce.HadoopTestingUtility; > The goal is to test all the major code paths, right now your patch isn't te Done Line 107: } > nit: long line Done Line 111: > nit: long line Done Line 115: > openTable isn't a cheap call, do it only once. Done Line 116: KuduTable openTable = openTable(TABLE_NAME); > Use scanTableToStrings and verify all the rows instead. Better for type con Done Line 130: + "required boolean column4_b; " > nit: long line Done Line 133: SimpleGroupFactory f = new SimpleGroupFactory(schema); > Those lines are all too long, also you could probably refactor this? Done http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java File java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java: PS3, Line 204: > What strings? Done http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala File java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala: Line 119: kc.upsertRows(df, args.tableName) > Forgot to remove? Done http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala File java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala: Line 17: > nit: add a blank line. Done Line 66: "--inferschema=true"), sc) > ? Done -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HNGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sandish Kumar HN Gerrit-HasComments: Yes
[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7421 to look at the new patch set (#6). Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro) .. kudu client tools for hadoop and spark import/export(csv,parquet,avro) Change-Id: If462af948651f3869b444e82151c3559fde19142 --- M java/kudu-client-tools/pom.xml A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquetPreCheck.java M java/kudu-spark-tools/pom.xml A java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala A java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala M java/pom.xml 13 files changed, 1,203 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/7421/6 -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HNGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sandish Kumar HN
[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)
Jean-Daniel Cryans has posted comments on this change. Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro) .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/7421/5/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java: Line 99: if (!schema.containsField(columnSchema.getName())) { > Kudu has primitive-types, parquet has just Type. Example: In Kudu binary ty The goal is to not hit weird errors in ImportParquetMapper#map when doing data conversions, so you have to find the equivalences. -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HNGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sandish Kumar HN Gerrit-HasComments: Yes
[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)
Jean-Daniel Cryans has posted comments on this change. Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro) .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java: Line 56: private static final Log LOG = Log.getLog(ImportParquet.class); I just noticed, you shouldn't use this Log that comes from Parquet. It's best to use System.err.println like ImportCsv does. Line 101: System.exit(0); > what should I use instead of system.exit(0). just LOG.error("") enough?? I'd suggestion throwing an exception and catching it in run() below. http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java File java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java: Line 50: public class ITImportParquet extends BaseKuduTest { > separate test-case for pre-flight checks? and what should be the expected r The goal is to test all the major code paths, right now your patch isn't testing error cases. -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HNGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sandish Kumar HN Gerrit-HasComments: Yes
[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)
Sandish Kumar HN has posted comments on this change. Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro) .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/7421/5/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java: Line 99: if (!schema.containsField(columnSchema.getName())) { Kudu has primitive-types, parquet has just Type. Example: In Kudu binary type and in parquet its string type. How can I compare ?. please give some idea on this?? Line 100: LOG.error("The column " + columnSchema.getName() + " does not exists in parquet schema"); what should I use instead of system.exit(0). just LOG.error("") enough?? -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HNGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sandish Kumar HN Gerrit-HasComments: Yes
[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7421 to look at the new patch set (#5). Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro) .. kudu client tools for hadoop and spark import/export(csv,parquet,avro) Change-Id: If462af948651f3869b444e82151c3559fde19142 --- M java/kudu-client-tools/pom.xml A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquetPreCheck.java M java/kudu-spark-tools/pom.xml A java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala A java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala M java/pom.xml 13 files changed, 1,167 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/7421/5 -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HNGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sandish Kumar HN
[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7421 to look at the new patch set (#4). Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro) .. kudu client tools for hadoop and spark import/export(csv,parquet,avro) Change-Id: If462af948651f3869b444e82151c3559fde19142 --- M java/kudu-client-tools/pom.xml A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquetPreCheck.java M java/kudu-spark-tools/pom.xml A java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala A java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala M java/pom.xml 13 files changed, 1,167 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/7421/4 -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HNGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sandish Kumar HN
[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)
Sandish Kumar HN has posted comments on this change. Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro) .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java: Line 101: System.exit(0); > Having System.exits in the code isn't good, ideally this case would be test what should I use instead of system.exit(0). just LOG.error("") enough?? http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java File java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java: Line 50: public class ITImportParquet extends BaseKuduTest { > I'd suggest having a separate test that specifically verifies the pre-fligh separate test-case for pre-flight checks? and what should be the expected result ?? -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HNGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sandish Kumar HN Gerrit-HasComments: Yes
[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)
Jean-Daniel Cryans has posted comments on this change. Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro) .. Patch Set 3: (19 comments) http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java: Line 69:* converts RowResult to string. I'd still advocate removing this javadoc section, it adds nothing and it's a private method. http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java: Line 97: //pre-flight checks of input parquet schema and table schema nit: missing space after //m Also end your sentence with a period. Line 99: if (!schema.containsField(sche.getName())) { Why do you not also check the type? Line 101: System.exit(0); Having System.exits in the code isn't good, ideally this case would be tested and if you exit then how can you catch the error? Line 104: //Kudu does not recommend using TIMESTAMP Well Kudu doesn't support Parquet's TIMESTAMP, it's not about a recommendation. Also same nit as the comment above, and some comment regarding exit. http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java File java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java: Line 17: package org.apache.kudu.mapreduce.tools; nit: missing a blank line. Line 68: // Create a 2 lines input file nit: end comments with a period. Also I'm not following this comment. The next line creates a table with 4 tablets, 3 of which have 3 rows. Where's the 2 lines input file coming from? http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java File java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java: Line 17: package org.apache.kudu.mapreduce.tools; nit: missing blank line. Line 50: public class ITImportParquet extends BaseKuduTest { I'd suggest having a separate test that specifically verifies the pre-flight checks that are running. Line 107: String[] args = new String[] { "-D" + CommandLineParser.MASTER_ADDRESSES_KEY + "=" + getMasterAddresses(), nit: long line Line 111: Job job = ImportParquet.createSubmittableJob(parser.getConfiguration(), parser.getRemainingArgs()); nit: long line Line 115: client.newScannerBuilder(openTable(TABLE_NAME)).build())); openTable isn't a cheap call, do it only once. Line 116: assertEquals(4,getTableRows(openTable(TABLE_NAME)).get(0).getInt("key")); Use scanTableToStrings and verify all the rows instead. Better for type conversion checking. Line 130: ParquetWriter writer = new ParquetWriter(data, new GroupWriteSupport(), UNCOMPRESSED, 1024, 1024, 512, nit: long line Line 133: writer.write(f.newGroup().append("key", 1).append("column1_i", 3).append("column2_d", 2.3).append("column3_s", Those lines are all too long, also you could probably refactor this? http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java File java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java: PS3, Line 204: rowStrings What strings? http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala File java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala: Line 119: LOG.info(args.header+":"+args.delimiter+":"+args.path) Forgot to remove? http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala File java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala: Line 17: package org.apache.kudu.spark.tools nit: add a blank line. Line 66: //val table = kuduClient.openTable(TABLE_NAME) ? -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HNGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sandish Kumar HN Gerrit-HasComments: Yes
[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)
Sandish Kumar HN has posted comments on this change. Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro) .. Patch Set 3: (31 comments) http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java: PS2, Line 35: a CSV file. > nit: is there one or many files? Done Line 39: public class ExportCsv extends Configured implements Tool { > This needs a test. Done PS2, Line 50: the current configuration > I know the ImportCsv* classes is like this, but the javadoc format we use d Done PS2, Line 85: lumns into t > nit: rephrase to something like "the given table and columns into..." Done PS2, Line 86: ble > nit: Kudu Done http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java: Line 32: /** > nit: remove the extra blank line Done PS2, Line 48: > That made more sense in ImportCsbMapper, I'd just remove this javadoc compl Done Line 54: > single line: Done PS2, Line 61: > nits: Done Line 74: StringBuilder buf = new StringBuilder(); > Please revise this whole javadoc section. FWIW it might not be necessary, s Done Line 114: break; > Missing UNIXTIME_MICROS? Done http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java: PS2, Line 43: nt.Kudu > nit: Apache Parquet Done Line 47: > This needs a test. Done Line 59: static final String JOB_NAME_CONF_KEY = "importparquet.job.name"; > Same comment as before regarding javadoc. Done Line 75: Path inputDir = new Path(args[1]); > Did you forget to remove this? Done Line 88: job.setNumReduceTasks(0); > You could run some pre-flight checks like making sure that the columns matc Done Line 88: job.setNumReduceTasks(0); > you want me to read column names from Kudu table and check with input parqu Done Line 88: job.setNumReduceTasks(0); > Yes, this way the job won't start if some things just don't match. Done PS2, Line 105: Columns > Apache Parquet Done http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java: PS2, Line 39: Apache Parquet lines and turns them into Kudu I > some comments as before regarding PARQUET and Inserts Done Line 101: case DOUBLE: > I'd recommend in the Driver as part of checking the schema. Done Line 101: case DOUBLE: > If My understanding is correct. we should identify TIMESTAMP in the input p Done Line 101: case DOUBLE: > UNIXTIME_MICROS would be recognized but not supported, someone might have T Done http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java: Line 27: > Remove. Done PS2, Line 29: e > nit Done Line 37 > What's this about? Done Line 38 > Remove. Done http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala File java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala: Line 1: // Licensed to the Apache Software Foundation (ASF) under one > remove this blank line Done Line 18: package org.apache.kudu.spark.tools > add a blank line Done Line 24: import org.slf4j.{Logger, LoggerFactory} > please re-order this Done Line 29: object ImportExportKudu { > I'm less familiar with scala but at least we'll need a test. Done -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HNGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sandish Kumar HN Gerrit-HasComments: Yes
[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7421 to look at the new patch set (#3). Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro) .. kudu client tools for hadoop and spark import/export(csv,parquet,avro) Change-Id: If462af948651f3869b444e82151c3559fde19142 --- M java/kudu-client-tools/pom.xml A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java M java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java M java/kudu-spark-tools/pom.xml A java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala A java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala M java/pom.xml 13 files changed, 1,041 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/7421/3 -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HNGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sandish Kumar HN