[jira] [Commented] (CARBONDATA-296) 1.Add CSVInputFormat to read csv files.
[ https://issues.apache.org/jira/browse/CARBONDATA-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15600821#comment-15600821 ] ASF GitHub Bot commented on CARBONDATA-296: --- Github user asfgit closed the pull request at: https://github.com/apache/incubator-carbondata/pull/233 > 1.Add CSVInputFormat to read csv files. > --- > > Key: CARBONDATA-296 > URL: https://issues.apache.org/jira/browse/CARBONDATA-296 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: QiangCai > Fix For: 0.2.0-incubating > > > Add CSVInputFormat to read csv files, it should use Univocity parser to read > csv files to get optimal performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-296) 1.Add CSVInputFormat to read csv files.
[ https://issues.apache.org/jira/browse/CARBONDATA-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588762#comment-15588762 ] ASF GitHub Bot commented on CARBONDATA-296: --- Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r84068389 --- Diff: hadoop/src/test/java/org/apache/carbondata/hadoop/csv/CSVInputFormatTest.java --- @@ -0,0 +1,171 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.carbondata.hadoop.csv; + +import java.io.File; +import java.io.FileInputStream; +import java.io.FileOutputStream; +import java.io.IOException; + +import org.apache.carbondata.hadoop.io.StringArrayWritable; + +import junit.framework.TestCase; +import org.junit.Assert; +import org.junit.Test; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.io.compress.BZip2Codec; +import org.apache.hadoop.io.compress.CompressionOutputStream; +import org.apache.hadoop.io.compress.GzipCodec; +import org.apache.hadoop.io.compress.Lz4Codec; +import org.apache.hadoop.io.compress.SnappyCodec; +import org.apache.hadoop.mapreduce.Job; +import org.apache.hadoop.mapreduce.Mapper; +import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; +import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; + +public class CSVInputFormatTest extends TestCase { + + /** + * generate compressed files, no need to call this method. + * @throws Exception + */ + public void testGenerateCompressFiles() throws Exception { +String pwd = new File("src/test/resources").getCanonicalPath(); +String inputFile = pwd + "/data.csv"; +FileInputStream input = new FileInputStream(inputFile); +Configuration conf = new Configuration(); + +// .gz +String outputFile = pwd + "/data.csv.gz"; +FileOutputStream output = new FileOutputStream(outputFile); +GzipCodec gzip = new GzipCodec(); +gzip.setConf(conf); +CompressionOutputStream outputStream = gzip.createOutputStream(output); +int i = -1; +while ((i = input.read()) != -1) { + outputStream.write(i); +} +outputStream.close(); +input.close(); + +// .bz2 +input = new FileInputStream(inputFile); +outputFile = pwd + "/data.csv.bz2"; +output = new FileOutputStream(outputFile); +BZip2Codec bzip2 = new BZip2Codec(); +bzip2.setConf(conf); +outputStream = bzip2.createOutputStream(output); +i = -1; +while ((i = input.read()) != -1) { + outputStream.write(i); +} +outputStream.close(); +input.close(); + +// .snappy +input = new FileInputStream(inputFile); +outputFile = pwd + "/data.csv.snappy"; +output = new FileOutputStream(outputFile); +SnappyCodec snappy = new SnappyCodec(); +snappy.setConf(conf); +outputStream = snappy.createOutputStream(output); +i = -1; +while ((i = input.read()) != -1) { + outputStream.write(i); +} +outputStream.close(); +input.close(); + +//.lz4 +input = new FileInputStream(inputFile); +outputFile = pwd + "/data.csv.lz4"; +output = new FileOutputStream(outputFile); +Lz4Codec lz4 = new Lz4Codec(); +lz4.setConf(conf); +outputStream = lz4.createOutputStream(output); +i = -1; +while ((i = input.read()) != -1) { + outputStream.write(i); +} +outputStream.close(); +input.close(); + + } + + /** + * CSVCheckMapper check the content of csv files. + */ + public static class CSVCheckMapper extends Mapper { +@Override +protected void map(NullWritable ke
[jira] [Commented] (CARBONDATA-296) 1.Add CSVInputFormat to read csv files.
[ https://issues.apache.org/jira/browse/CARBONDATA-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15575358#comment-15575358 ] ASF GitHub Bot commented on CARBONDATA-296: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83422823 --- Diff: hadoop/src/test/java/org/apache/carbondata/hadoop/csv/CSVInputFormatTest.java --- @@ -0,0 +1,171 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.carbondata.hadoop.csv; + +import java.io.File; +import java.io.FileInputStream; +import java.io.FileOutputStream; +import java.io.IOException; + +import org.apache.carbondata.hadoop.io.StringArrayWritable; + +import junit.framework.TestCase; +import org.junit.Assert; +import org.junit.Test; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.io.compress.BZip2Codec; +import org.apache.hadoop.io.compress.CompressionOutputStream; +import org.apache.hadoop.io.compress.GzipCodec; +import org.apache.hadoop.io.compress.Lz4Codec; +import org.apache.hadoop.io.compress.SnappyCodec; +import org.apache.hadoop.mapreduce.Job; +import org.apache.hadoop.mapreduce.Mapper; +import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; +import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; + +public class CSVInputFormatTest extends TestCase { + + /** + * generate compressed files, no need to call this method. + * @throws Exception + */ + public void testGenerateCompressFiles() throws Exception { +String pwd = new File("src/test/resources").getCanonicalPath(); +String inputFile = pwd + "/data.csv"; +FileInputStream input = new FileInputStream(inputFile); +Configuration conf = new Configuration(); + +// .gz +String outputFile = pwd + "/data.csv.gz"; +FileOutputStream output = new FileOutputStream(outputFile); +GzipCodec gzip = new GzipCodec(); +gzip.setConf(conf); +CompressionOutputStream outputStream = gzip.createOutputStream(output); +int i = -1; +while ((i = input.read()) != -1) { + outputStream.write(i); +} +outputStream.close(); +input.close(); + +// .bz2 +input = new FileInputStream(inputFile); +outputFile = pwd + "/data.csv.bz2"; +output = new FileOutputStream(outputFile); +BZip2Codec bzip2 = new BZip2Codec(); +bzip2.setConf(conf); +outputStream = bzip2.createOutputStream(output); +i = -1; +while ((i = input.read()) != -1) { + outputStream.write(i); +} +outputStream.close(); +input.close(); + +// .snappy +input = new FileInputStream(inputFile); +outputFile = pwd + "/data.csv.snappy"; +output = new FileOutputStream(outputFile); +SnappyCodec snappy = new SnappyCodec(); +snappy.setConf(conf); +outputStream = snappy.createOutputStream(output); +i = -1; +while ((i = input.read()) != -1) { + outputStream.write(i); +} +outputStream.close(); +input.close(); + +//.lz4 +input = new FileInputStream(inputFile); +outputFile = pwd + "/data.csv.lz4"; +output = new FileOutputStream(outputFile); +Lz4Codec lz4 = new Lz4Codec(); +lz4.setConf(conf); +outputStream = lz4.createOutputStream(output); +i = -1; +while ((i = input.read()) != -1) { + outputStream.write(i); +} +outputStream.close(); +input.close(); + + } + + /** + * CSVCheckMapper check the content of csv files. + */ + public static class CSVCheckMapper extends Mapper { +@Override +protected void map(NullWritable key
[jira] [Commented] (CARBONDATA-296) 1.Add CSVInputFormat to read csv files.
[ https://issues.apache.org/jira/browse/CARBONDATA-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574786#comment-15574786 ] ASF GitHub Bot commented on CARBONDATA-296: --- Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83387415 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java --- @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.carbondata.hadoop.mapreduce; + +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.io.Reader; + +import org.apache.carbondata.hadoop.io.BoundedInputStream; +import org.apache.carbondata.hadoop.io.StringArrayWritable; +import org.apache.carbondata.hadoop.util.CSVInputFormatUtil; + +import com.univocity.parsers.csv.CsvParser; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.Seekable; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.io.Text; +import org.apache.hadoop.io.compress.CodecPool; +import org.apache.hadoop.io.compress.CompressionCodec; +import org.apache.hadoop.io.compress.CompressionCodecFactory; +import org.apache.hadoop.io.compress.CompressionInputStream; +import org.apache.hadoop.io.compress.Decompressor; +import org.apache.hadoop.io.compress.SplitCompressionInputStream; +import org.apache.hadoop.io.compress.SplittableCompressionCodec; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.RecordReader; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; +import org.apache.hadoop.mapreduce.lib.input.FileSplit; +import org.apache.hadoop.util.LineReader; + +/** + * An {@link org.apache.hadoop.mapreduce.InputFormat} for csv files. Files are broken into lines. + * Values are the line of csv files. + */ +public class CSVInputFormat extends FileInputFormat { + + @Override + public RecordReader createRecordReader(InputSplit inputSplit, + TaskAttemptContext context) throws IOException, InterruptedException { +return new NewCSVRecordReader(); + } + + /** + * Treats value as line in file. Key is null. + */ + public static class NewCSVRecordReader extends RecordReader { + +private long start; +private long end; +private BoundedInputStream boundedInputStream; +private Reader reader; +private CsvParser csvParser; +private StringArrayWritable value; +private String[] columns; +private Seekable filePosition; +private boolean isCompressedInput; +private Decompressor decompressor; + +@Override +public void initialize(InputSplit inputSplit, TaskAttemptContext context) +throws IOException, InterruptedException { + FileSplit split = (FileSplit) inputSplit; + this.start = split.getStart(); + this.end = this.start + split.getLength(); + Path file = split.getPath(); + Configuration job = context.getConfiguration(); + CompressionCodec codec = (new CompressionCodecFactory(job)).getCodec(file); + FileSystem fs = file.getFileSystem(job); + FSDataInputStream fileIn = fs.open(file); + InputStream inputStream = null; + if (codec != null) { +this.isCompressedInput = true; +this.decompressor = CodecPool.getDecompressor(codec); +if (codec instanceof SplittableCompressionCodec) { + SplitCompressionInputStream scIn = ((SplittableCompressionCodec) codec) + .createInputStream(fileIn, this.decompressor, this
[jira] [Commented] (CARBONDATA-296) 1.Add CSVInputFormat to read csv files.
[ https://issues.apache.org/jira/browse/CARBONDATA-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574785#comment-15574785 ] ASF GitHub Bot commented on CARBONDATA-296: --- Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83387366 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java --- @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.carbondata.hadoop.mapreduce; + +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.io.Reader; + +import org.apache.carbondata.hadoop.io.BoundedInputStream; +import org.apache.carbondata.hadoop.io.StringArrayWritable; +import org.apache.carbondata.hadoop.util.CSVInputFormatUtil; + +import com.univocity.parsers.csv.CsvParser; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.Seekable; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.io.Text; +import org.apache.hadoop.io.compress.CodecPool; +import org.apache.hadoop.io.compress.CompressionCodec; +import org.apache.hadoop.io.compress.CompressionCodecFactory; +import org.apache.hadoop.io.compress.CompressionInputStream; +import org.apache.hadoop.io.compress.Decompressor; +import org.apache.hadoop.io.compress.SplitCompressionInputStream; +import org.apache.hadoop.io.compress.SplittableCompressionCodec; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.RecordReader; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; +import org.apache.hadoop.mapreduce.lib.input.FileSplit; +import org.apache.hadoop.util.LineReader; + +/** + * An {@link org.apache.hadoop.mapreduce.InputFormat} for csv files. Files are broken into lines. + * Values are the line of csv files. + */ +public class CSVInputFormat extends FileInputFormat { + + @Override + public RecordReader createRecordReader(InputSplit inputSplit, + TaskAttemptContext context) throws IOException, InterruptedException { +return new NewCSVRecordReader(); + } + + /** + * Treats value as line in file. Key is null. + */ + public static class NewCSVRecordReader extends RecordReader { + +private long start; +private long end; +private BoundedInputStream boundedInputStream; +private Reader reader; +private CsvParser csvParser; +private StringArrayWritable value; +private String[] columns; +private Seekable filePosition; +private boolean isCompressedInput; +private Decompressor decompressor; + +@Override +public void initialize(InputSplit inputSplit, TaskAttemptContext context) +throws IOException, InterruptedException { + FileSplit split = (FileSplit) inputSplit; + this.start = split.getStart(); --- End diff -- fixed > 1.Add CSVInputFormat to read csv files. > --- > > Key: CARBONDATA-296 > URL: https://issues.apache.org/jira/browse/CARBONDATA-296 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: QiangCai > Fix For: 0.2.0-incubating > > > Add CSVInputFormat to read csv files, it should use Univocity parser to read > csv files to get optimal performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-296) 1.Add CSVInputFormat to read csv files.
[ https://issues.apache.org/jira/browse/CARBONDATA-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574771#comment-15574771 ] ASF GitHub Bot commented on CARBONDATA-296: --- Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83386474 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java --- @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.carbondata.hadoop.mapreduce; --- End diff -- fixed > 1.Add CSVInputFormat to read csv files. > --- > > Key: CARBONDATA-296 > URL: https://issues.apache.org/jira/browse/CARBONDATA-296 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: QiangCai > Fix For: 0.2.0-incubating > > > Add CSVInputFormat to read csv files, it should use Univocity parser to read > csv files to get optimal performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-296) 1.Add CSVInputFormat to read csv files.
[ https://issues.apache.org/jira/browse/CARBONDATA-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574761#comment-15574761 ] ASF GitHub Bot commented on CARBONDATA-296: --- Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83386400 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/io/StringArrayWritable.java --- @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.carbondata.hadoop.io; + +import java.io.DataInput; +import java.io.DataOutput; +import java.io.IOException; +import java.nio.charset.Charset; +import java.util.Arrays; + +import org.apache.hadoop.io.Writable; + +/** + * A String sequence that is usable as a key or value. + */ +public class StringArrayWritable implements Writable { + private String[] values; + + public String[] toStrings() { +return values; + } + + public void set(String[] values) { +this.values = values; + } + + public String[] get() { +return values; + } + + @Override public void readFields(DataInput in) throws IOException { --- End diff -- fixed > 1.Add CSVInputFormat to read csv files. > --- > > Key: CARBONDATA-296 > URL: https://issues.apache.org/jira/browse/CARBONDATA-296 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: QiangCai > Fix For: 0.2.0-incubating > > > Add CSVInputFormat to read csv files, it should use Univocity parser to read > csv files to get optimal performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-296) 1.Add CSVInputFormat to read csv files.
[ https://issues.apache.org/jira/browse/CARBONDATA-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574281#comment-15574281 ] ASF GitHub Bot commented on CARBONDATA-296: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83359593 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java --- @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.carbondata.hadoop.mapreduce; + +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.io.Reader; + +import org.apache.carbondata.hadoop.io.BoundedInputStream; +import org.apache.carbondata.hadoop.io.StringArrayWritable; +import org.apache.carbondata.hadoop.util.CSVInputFormatUtil; + +import com.univocity.parsers.csv.CsvParser; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.Seekable; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.io.Text; +import org.apache.hadoop.io.compress.CodecPool; +import org.apache.hadoop.io.compress.CompressionCodec; +import org.apache.hadoop.io.compress.CompressionCodecFactory; +import org.apache.hadoop.io.compress.CompressionInputStream; +import org.apache.hadoop.io.compress.Decompressor; +import org.apache.hadoop.io.compress.SplitCompressionInputStream; +import org.apache.hadoop.io.compress.SplittableCompressionCodec; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.RecordReader; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; +import org.apache.hadoop.mapreduce.lib.input.FileSplit; +import org.apache.hadoop.util.LineReader; + +/** + * An {@link org.apache.hadoop.mapreduce.InputFormat} for csv files. Files are broken into lines. + * Values are the line of csv files. + */ +public class CSVInputFormat extends FileInputFormat { + + @Override + public RecordReader createRecordReader(InputSplit inputSplit, + TaskAttemptContext context) throws IOException, InterruptedException { +return new NewCSVRecordReader(); + } + + /** + * Treats value as line in file. Key is null. + */ + public static class NewCSVRecordReader extends RecordReader { --- End diff -- Why is it a static class? And you can rename it to `CSVRecordReader` > 1.Add CSVInputFormat to read csv files. > --- > > Key: CARBONDATA-296 > URL: https://issues.apache.org/jira/browse/CARBONDATA-296 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: QiangCai > Fix For: 0.2.0-incubating > > > Add CSVInputFormat to read csv files, it should use Univocity parser to read > csv files to get optimal performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-296) 1.Add CSVInputFormat to read csv files.
[ https://issues.apache.org/jira/browse/CARBONDATA-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574279#comment-15574279 ] ASF GitHub Bot commented on CARBONDATA-296: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83355842 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/io/StringArrayWritable.java --- @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.carbondata.hadoop.io; + +import java.io.DataInput; +import java.io.DataOutput; +import java.io.IOException; +import java.nio.charset.Charset; +import java.util.Arrays; + +import org.apache.hadoop.io.Writable; + +/** + * A String sequence that is usable as a key or value. + */ +public class StringArrayWritable implements Writable { + private String[] values; + + public String[] toStrings() { +return values; + } + + public void set(String[] values) { +this.values = values; + } + + public String[] get() { +return values; + } + + @Override public void readFields(DataInput in) throws IOException { --- End diff -- `@Override` should be put to previous line > 1.Add CSVInputFormat to read csv files. > --- > > Key: CARBONDATA-296 > URL: https://issues.apache.org/jira/browse/CARBONDATA-296 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: QiangCai > Fix For: 0.2.0-incubating > > > Add CSVInputFormat to read csv files, it should use Univocity parser to read > csv files to get optimal performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-296) 1.Add CSVInputFormat to read csv files.
[ https://issues.apache.org/jira/browse/CARBONDATA-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574277#comment-15574277 ] ASF GitHub Bot commented on CARBONDATA-296: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83355953 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/mapred/CSVInputFormat.java --- @@ -0,0 +1,193 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.carbondata.hadoop.mapred; + +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.io.Reader; + +import org.apache.carbondata.hadoop.io.BoundedInputStream; +import org.apache.carbondata.hadoop.io.StringArrayWritable; +import org.apache.carbondata.hadoop.util.CSVInputFormatUtil; + +import com.univocity.parsers.csv.CsvParser; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.Seekable; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.io.Text; +import org.apache.hadoop.io.compress.CodecPool; +import org.apache.hadoop.io.compress.CompressionCodec; +import org.apache.hadoop.io.compress.CompressionCodecFactory; +import org.apache.hadoop.io.compress.CompressionInputStream; +import org.apache.hadoop.io.compress.Decompressor; +import org.apache.hadoop.io.compress.SplitCompressionInputStream; +import org.apache.hadoop.io.compress.SplittableCompressionCodec; +import org.apache.hadoop.mapred.FileInputFormat; +import org.apache.hadoop.mapred.FileSplit; +import org.apache.hadoop.mapred.InputSplit; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.JobConfigurable; +import org.apache.hadoop.mapred.RecordReader; +import org.apache.hadoop.mapred.Reporter; +import org.apache.hadoop.util.LineReader; + +/** + * An {@link org.apache.hadoop.mapred.InputFormat} for csv files. Files are broken into lines. + * Values are the line of csv files. + */ +public class CSVInputFormat extends FileInputFormat implements --- End diff -- Agree, there is no legacy application use hadoop 1.X, right? > 1.Add CSVInputFormat to read csv files. > --- > > Key: CARBONDATA-296 > URL: https://issues.apache.org/jira/browse/CARBONDATA-296 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: QiangCai > Fix For: 0.2.0-incubating > > > Add CSVInputFormat to read csv files, it should use Univocity parser to read > csv files to get optimal performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-296) 1.Add CSVInputFormat to read csv files.
[ https://issues.apache.org/jira/browse/CARBONDATA-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574283#comment-15574283 ] ASF GitHub Bot commented on CARBONDATA-296: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83359938 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/util/CSVInputFormatUtil.java --- @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.carbondata.hadoop.util; + +import com.univocity.parsers.csv.CsvParserSettings; +import org.apache.hadoop.conf.Configuration; + +/** + * CSVInputFormatUtil is a util class. + */ +public class CSVInputFormatUtil { + + public static final String DELIMITER = "carbon.csvinputformat.delimiter"; + public static final String DELIMITER_DEFAULT = ","; + public static final String COMMENT = "carbon.csvinputformat.comment"; + public static final String COMMENT_DEFAULT = "#"; + public static final String QUOTE = "carbon.csvinputformat.quote"; + public static final String QUOTE_DEFAULT = "\""; + public static final String ESCAPE = "carbon.csvinputformat.escape"; + public static final String ESCAPE_DEFAULT = "\\"; + public static final String HEADER_PRESENT = "caron.csvinputformat.header.present"; + public static final boolean HEADER_PRESENT_DEFAULT = false; + + public static CsvParserSettings extractCsvParserSettings(Configuration job, long start) { --- End diff -- I think this class is not needed, move this function into CSVRecordReader class > 1.Add CSVInputFormat to read csv files. > --- > > Key: CARBONDATA-296 > URL: https://issues.apache.org/jira/browse/CARBONDATA-296 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: QiangCai > Fix For: 0.2.0-incubating > > > Add CSVInputFormat to read csv files, it should use Univocity parser to read > csv files to get optimal performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-296) 1.Add CSVInputFormat to read csv files.
[ https://issues.apache.org/jira/browse/CARBONDATA-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574278#comment-15574278 ] ASF GitHub Bot commented on CARBONDATA-296: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83360081 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java --- @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.carbondata.hadoop.mapreduce; + +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.io.Reader; + +import org.apache.carbondata.hadoop.io.BoundedInputStream; +import org.apache.carbondata.hadoop.io.StringArrayWritable; +import org.apache.carbondata.hadoop.util.CSVInputFormatUtil; + +import com.univocity.parsers.csv.CsvParser; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.Seekable; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.io.Text; +import org.apache.hadoop.io.compress.CodecPool; +import org.apache.hadoop.io.compress.CompressionCodec; +import org.apache.hadoop.io.compress.CompressionCodecFactory; +import org.apache.hadoop.io.compress.CompressionInputStream; +import org.apache.hadoop.io.compress.Decompressor; +import org.apache.hadoop.io.compress.SplitCompressionInputStream; +import org.apache.hadoop.io.compress.SplittableCompressionCodec; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.RecordReader; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; +import org.apache.hadoop.mapreduce.lib.input.FileSplit; +import org.apache.hadoop.util.LineReader; + +/** + * An {@link org.apache.hadoop.mapreduce.InputFormat} for csv files. Files are broken into lines. + * Values are the line of csv files. + */ +public class CSVInputFormat extends FileInputFormat { + + @Override + public RecordReader createRecordReader(InputSplit inputSplit, + TaskAttemptContext context) throws IOException, InterruptedException { +return new NewCSVRecordReader(); + } + + /** + * Treats value as line in file. Key is null. + */ + public static class NewCSVRecordReader extends RecordReader { + +private long start; +private long end; +private BoundedInputStream boundedInputStream; +private Reader reader; +private CsvParser csvParser; +private StringArrayWritable value; +private String[] columns; +private Seekable filePosition; +private boolean isCompressedInput; +private Decompressor decompressor; + +@Override +public void initialize(InputSplit inputSplit, TaskAttemptContext context) +throws IOException, InterruptedException { + FileSplit split = (FileSplit) inputSplit; + this.start = split.getStart(); --- End diff -- No need to use `this.start`, you can use `start` directly. The same for all occurrence in this file > 1.Add CSVInputFormat to read csv files. > --- > > Key: CARBONDATA-296 > URL: https://issues.apache.org/jira/browse/CARBONDATA-296 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: QiangCai > Fix For: 0.2.0-incubating > > > Add CSVInputFormat to read csv files, it should use Univocity parser to read > csv files to get optimal performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-296) 1.Add CSVInputFormat to read csv files.
[ https://issues.apache.org/jira/browse/CARBONDATA-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574280#comment-15574280 ] ASF GitHub Bot commented on CARBONDATA-296: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83359637 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java --- @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.carbondata.hadoop.mapreduce; + +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.io.Reader; + +import org.apache.carbondata.hadoop.io.BoundedInputStream; +import org.apache.carbondata.hadoop.io.StringArrayWritable; +import org.apache.carbondata.hadoop.util.CSVInputFormatUtil; + +import com.univocity.parsers.csv.CsvParser; --- End diff -- I think code style will fail, incorrect order > 1.Add CSVInputFormat to read csv files. > --- > > Key: CARBONDATA-296 > URL: https://issues.apache.org/jira/browse/CARBONDATA-296 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: QiangCai > Fix For: 0.2.0-incubating > > > Add CSVInputFormat to read csv files, it should use Univocity parser to read > csv files to get optimal performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-296) 1.Add CSVInputFormat to read csv files.
[ https://issues.apache.org/jira/browse/CARBONDATA-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574276#comment-15574276 ] ASF GitHub Bot commented on CARBONDATA-296: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83360131 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java --- @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.carbondata.hadoop.mapreduce; + +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.io.Reader; + +import org.apache.carbondata.hadoop.io.BoundedInputStream; +import org.apache.carbondata.hadoop.io.StringArrayWritable; +import org.apache.carbondata.hadoop.util.CSVInputFormatUtil; + +import com.univocity.parsers.csv.CsvParser; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.Seekable; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.io.Text; +import org.apache.hadoop.io.compress.CodecPool; +import org.apache.hadoop.io.compress.CompressionCodec; +import org.apache.hadoop.io.compress.CompressionCodecFactory; +import org.apache.hadoop.io.compress.CompressionInputStream; +import org.apache.hadoop.io.compress.Decompressor; +import org.apache.hadoop.io.compress.SplitCompressionInputStream; +import org.apache.hadoop.io.compress.SplittableCompressionCodec; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.RecordReader; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; +import org.apache.hadoop.mapreduce.lib.input.FileSplit; +import org.apache.hadoop.util.LineReader; + +/** + * An {@link org.apache.hadoop.mapreduce.InputFormat} for csv files. Files are broken into lines. + * Values are the line of csv files. + */ +public class CSVInputFormat extends FileInputFormat { + + @Override + public RecordReader createRecordReader(InputSplit inputSplit, + TaskAttemptContext context) throws IOException, InterruptedException { +return new NewCSVRecordReader(); + } + + /** + * Treats value as line in file. Key is null. + */ + public static class NewCSVRecordReader extends RecordReader { + +private long start; +private long end; +private BoundedInputStream boundedInputStream; +private Reader reader; +private CsvParser csvParser; +private StringArrayWritable value; +private String[] columns; +private Seekable filePosition; +private boolean isCompressedInput; +private Decompressor decompressor; + +@Override +public void initialize(InputSplit inputSplit, TaskAttemptContext context) +throws IOException, InterruptedException { + FileSplit split = (FileSplit) inputSplit; + this.start = split.getStart(); + this.end = this.start + split.getLength(); + Path file = split.getPath(); + Configuration job = context.getConfiguration(); + CompressionCodec codec = (new CompressionCodecFactory(job)).getCodec(file); + FileSystem fs = file.getFileSystem(job); + FSDataInputStream fileIn = fs.open(file); + InputStream inputStream = null; + if (codec != null) { +this.isCompressedInput = true; +this.decompressor = CodecPool.getDecompressor(codec); +if (codec instanceof SplittableCompressionCodec) { + SplitCompressionInputStream scIn = ((SplittableCompressionCodec) codec) + .createInputStream(fileIn, this.decompressor, this.
[jira] [Commented] (CARBONDATA-296) 1.Add CSVInputFormat to read csv files.
[ https://issues.apache.org/jira/browse/CARBONDATA-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574282#comment-15574282 ] ASF GitHub Bot commented on CARBONDATA-296: --- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83359457 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java --- @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.carbondata.hadoop.mapreduce; --- End diff -- suggest to move it to org.apache.carbondata.hadoop.csv. And it is carbon project internal class, not meant to be used by user application. > 1.Add CSVInputFormat to read csv files. > --- > > Key: CARBONDATA-296 > URL: https://issues.apache.org/jira/browse/CARBONDATA-296 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: QiangCai > Fix For: 0.2.0-incubating > > > Add CSVInputFormat to read csv files, it should use Univocity parser to read > csv files to get optimal performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CARBONDATA-296) 1.Add CSVInputFormat to read csv files.
[ https://issues.apache.org/jira/browse/CARBONDATA-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571101#comment-15571101 ] ASF GitHub Bot commented on CARBONDATA-296: --- GitHub user QiangCai opened a pull request: https://github.com/apache/incubator-carbondata/pull/233 [CARBONDATA-296]1.Add CSVInputFormat to read csv files. **1 Add CSVInputFormat to read csv files** MRv1: hadoop/src/main/java/org/apache/carbondata/hadoop/mapred/CSVInputFormat.java MRv2: hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java **2 Use univocity parser to parse csv files.** **3 Customize StringArrayWritable to wrap String array values of each line in csv files.** **4 Add BoundedInputStream to limit input stream** You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/incubator-carbondata dataloadinginput Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/233.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #233 commit cfb177bbca12cbe72a5947d7fdec1bc906d8aa7e Author: QiangCai Date: 2016-10-12T09:53:05Z csvinputformat > 1.Add CSVInputFormat to read csv files. > --- > > Key: CARBONDATA-296 > URL: https://issues.apache.org/jira/browse/CARBONDATA-296 > Project: CarbonData > Issue Type: Sub-task >Reporter: Ravindra Pesala >Assignee: QiangCai > Fix For: 0.2.0-incubating > > > Add CSVInputFormat to read csv files, it should use Univocity parser to read > csv files to get optimal performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)