Sandish Kumar HN has posted comments on this change. Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro) ......................................................................
Patch Set 3: (31 comments) http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java: PS2, Line 35: a CSV file. > nit: is there one or many files? Done Line 39: public class ExportCsv extends Configured implements Tool { > This needs a test. Done PS2, Line 50: the current configuration > I know the ImportCsv* classes is like this, but the javadoc format we use d Done PS2, Line 85: lumns into t > nit: rephrase to something like "the given table and columns into..." Done PS2, Line 86: ble > nit: Kudu Done http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java: Line 32: /** > nit: remove the extra blank line Done PS2, Line 48: > That made more sense in ImportCsbMapper, I'd just remove this javadoc compl Done Line 54: > single line: Done PS2, Line 61: > nits: Done Line 74: StringBuilder buf = new StringBuilder(); > Please revise this whole javadoc section. FWIW it might not be necessary, s Done Line 114: break; > Missing UNIXTIME_MICROS? Done http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java: PS2, Line 43: nt.Kudu > nit: Apache Parquet Done Line 47: > This needs a test. Done Line 59: static final String JOB_NAME_CONF_KEY = "importparquet.job.name"; > Same comment as before regarding javadoc. Done Line 75: Path inputDir = new Path(args[1]); > Did you forget to remove this? Done Line 88: job.setNumReduceTasks(0); > You could run some pre-flight checks like making sure that the columns matc Done Line 88: job.setNumReduceTasks(0); > you want me to read column names from Kudu table and check with input parqu Done Line 88: job.setNumReduceTasks(0); > Yes, this way the job won't start if some things just don't match. Done PS2, Line 105: Columns > Apache Parquet Done http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java: PS2, Line 39: Apache Parquet lines and turns them into Kudu I > some comments as before regarding PARQUET and Inserts Done Line 101: case DOUBLE: > I'd recommend in the Driver as part of checking the schema. Done Line 101: case DOUBLE: > If My understanding is correct. we should identify TIMESTAMP in the input p Done Line 101: case DOUBLE: > UNIXTIME_MICROS would be recognized but not supported, someone might have T Done http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java: Line 27: > Remove. Done PS2, Line 29: e > nit Done Line 37 > What's this about? Done Line 38 > Remove. Done http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala File java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala: Line 1: // Licensed to the Apache Software Foundation (ASF) under one > remove this blank line Done Line 18: package org.apache.kudu.spark.tools > add a blank line Done Line 24: import org.slf4j.{Logger, LoggerFactory} > please re-order this Done Line 29: object ImportExportKudu { > I'm less familiar with scala but at least we'll need a test. Done -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HN <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sandish Kumar HN <[email protected]> Gerrit-HasComments: Yes
