[jira] [Commented] (CARBONDATA-286) Support Append mode when writing Dataframe to CarbonData

2016-10-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15577450#comment-15577450
 ] 

ASF GitHub Bot commented on CARBONDATA-286:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/213


> Support Append mode when writing Dataframe to CarbonData
> 
>
> Key: CARBONDATA-286
> URL: https://issues.apache.org/jira/browse/CARBONDATA-286
> Project: CarbonData
>  Issue Type: Improvement
>  Components: spark-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> Support df.write using Append as save mode, it will become a new segment in 
> CarbonData



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-318) Implement an ExternalSorter that makes maximum usage of memory while sorting

2016-10-14 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-318:
---

 Summary: Implement an ExternalSorter that makes maximum usage of 
memory while sorting
 Key: CARBONDATA-318
 URL: https://issues.apache.org/jira/browse/CARBONDATA-318
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Jacky Li


External Sorter should sort in memory until it reach configured size, then 
spill to disk. It should provide following interface:
1. insertRow/insertRowBatch: insert rows into the sorter
2. getIterator: return an iterator that iterate on sorted rows

External Sorter depends on FileWriterFactory to get a FileWriter to spill data 
into files. FileWriterFactory should be provided by user. Multiple 
implementations are possible, like writing into one folder or multiple folder



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-298) 3. Add InputProcessorStep which should iterate recordreader and parse the data as per the data type.

2016-10-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15575898#comment-15575898
 ] 

ASF GitHub Bot commented on CARBONDATA-298:
---

GitHub user ravipesala opened a pull request:

https://github.com/apache/incubator-carbondata/pull/240

[CARBONDATA-298]Added InputProcessorStep to read data from csv reader 
iterator.

Add InputProcessorStep which should iterate recordreader of csv input and 
parse the data as per the data type.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ravipesala/incubator-carbondata 
input-processor-step

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/240.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #240


commit 96c46d2d31c2f80b89ff755c3683c08b24eca042
Author: ravipesala 
Date:   2016-10-14T17:09:58Z

Added InputProcessorStep to read data from csv reader iterator.




> 3. Add InputProcessorStep which should iterate recordreader and parse the 
> data as per the data type.
> 
>
> Key: CARBONDATA-298
> URL: https://issues.apache.org/jira/browse/CARBONDATA-298
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
> Fix For: 0.2.0-incubating
>
>
> Add InputProcessorStep which should iterate recordreader/RecordBufferedWriter 
> and parse the data as per the data types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CARBONDATA-299) 4. Add dictionary generator interfaces and give implementation for pre created dictionary.

2016-10-14 Thread Ravindra Pesala (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala resolved CARBONDATA-299.

Resolution: Fixed

> 4. Add dictionary generator interfaces and give implementation for pre 
> created dictionary.
> --
>
> Key: CARBONDATA-299
> URL: https://issues.apache.org/jira/browse/CARBONDATA-299
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> Add dictionary generator interfaces and give implementation for pre-created 
> dictionary(which is generated separetly).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CARBONDATA-288) In hdfs bad record logger is failing in writting the bad records

2016-10-14 Thread Jacky Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li resolved CARBONDATA-288.
-
Resolution: Fixed

> In hdfs bad record logger is failing in writting the bad records
> 
>
> Key: CARBONDATA-288
> URL: https://issues.apache.org/jira/browse/CARBONDATA-288
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 0.2.0-incubating
>Reporter: Mohammad Shahid Khan
>Assignee: Mohammad Shahid Khan
>Priority: Minor
> Fix For: 0.2.0-incubating
>
>
> For HDFS file system 
> CarbonFile logFile = FileFactory.getCarbonFile(filePath, FileType.HDFS);
> if filePath does not exits then
> Calling CarbonFile.getPath() throws NullPointerException.
> Solution:
> If file does not exist then before accessing the file must be created first



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-280) when table properties is repeated it only set the last one

2016-10-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15575510#comment-15575510
 ] 

ASF GitHub Bot commented on CARBONDATA-280:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/204


>  when table properties is repeated it only set the last one
> ---
>
> Key: CARBONDATA-280
> URL: https://issues.apache.org/jira/browse/CARBONDATA-280
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 0.1.1-incubating
>Reporter: zhangshunyu
>Assignee: zhangshunyu
>Priority: Minor
> Fix For: 0.2.0-incubating
>
>
> when table properties is repeated it only set the last one:
> For example,
> CREATE TABLE IF NOT EXISTS carbontable
> (ID Int, date Timestamp, country String,
> name String, phonetype String, serialname String, salary Int)
> STORED BY 'carbondata'
>  TBLPROPERTIES('DICTIONARY_EXCLUDE'='country','DICTIONARY_INCLUDE'='ID',
>  'DICTIONARY_EXCLUDE'='phonetype', 'DICTIONARY_INCLUDE'='salary')
> only salary is set to DICTIONARY_INCLUDE and only phonetype is set to 
> DICTIONARY_EXCLUDE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-315) Data loading fails if parsing a double value returns infinity

2016-10-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15575376#comment-15575376
 ] 

ASF GitHub Bot commented on CARBONDATA-315:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/234#discussion_r83424392
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java
 ---
@@ -994,29 +994,31 @@ private String getCarbonLocalBaseStoreLocation() {
   }
 } else {
   try {
-out[memberMapping[dimLen + index] - meta.complexTypes.size()] =
-(isNull || msr == null || msr.length() == 0) ?
-null :
-DataTypeUtil
-.getMeasureValueBasedOnDataType(msr, 
msrDataType[meta.msrMapping[msrCount]],
-
meta.carbonMeasures[meta.msrMapping[msrCount]]);
-  } catch (NumberFormatException e) {
-try {
-  msr = msr.replaceAll(",", "");
-  out[memberMapping[dimLen + index] - 
meta.complexTypes.size()] = DataTypeUtil
+if (!isNull && null != msr && msr.length() > 0) {
+  Object measureValueBasedOnDataType = DataTypeUtil
--- End diff --

put `DataTypeUtil` in next line


> Data loading fails if parsing a double value returns infinity
> -
>
> Key: CARBONDATA-315
> URL: https://issues.apache.org/jira/browse/CARBONDATA-315
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 0.1.0-incubating, 0.2.0-incubating
>Reporter: Manish Gupta
>Assignee: Manish Gupta
>Priority: Minor
> Fix For: 0.2.0-incubating
>
>
> During data load, if a value specified is too big for a double DataType 
> column then while parsing that value as double result is returned as 
> "Infinity". Due to this while we calculate min and max value for measures in 
> carbon data writer step it throws an exception.
> ERROR 13-10 15:27:56,968 - [t3: Graph - MDKeyGent3][partitionID:0] 
> org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException
> java.util.concurrent.ExecutionException: 
> org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:188)
> at 
> org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.processWriteTaskSubmitList(CarbonFactDataHandlerColumnar.java:812)
> at 
> org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.finish(CarbonFactDataHandlerColumnar.java:779)
> at 
> org.apache.carbondata.processing.mdkeygen.MDKeyGenStep.processRow(MDKeyGenStep.java:222)
> at org.pentaho.di.trans.step.RunThread.run(RunThread.java:50)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException
> at 
> org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar$Producer.call(CarbonFactDataHandlerColumnar.java:1244)
> at 
> org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar$Producer.call(CarbonFactDataHandlerColumnar.java:1215)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> ... 1 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-296) 1.Add CSVInputFormat to read csv files.

2016-10-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15575358#comment-15575358
 ] 

ASF GitHub Bot commented on CARBONDATA-296:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/233#discussion_r83422823
  
--- Diff: 
hadoop/src/test/java/org/apache/carbondata/hadoop/csv/CSVInputFormatTest.java 
---
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.hadoop.csv;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+
+import org.apache.carbondata.hadoop.io.StringArrayWritable;
+
+import junit.framework.TestCase;
+import org.junit.Assert;
+import org.junit.Test;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.compress.BZip2Codec;
+import org.apache.hadoop.io.compress.CompressionOutputStream;
+import org.apache.hadoop.io.compress.GzipCodec;
+import org.apache.hadoop.io.compress.Lz4Codec;
+import org.apache.hadoop.io.compress.SnappyCodec;
+import org.apache.hadoop.mapreduce.Job;
+import org.apache.hadoop.mapreduce.Mapper;
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
+import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
+
+public class CSVInputFormatTest extends TestCase {
+
+  /**
+   * generate compressed files, no need to call this method.
+   * @throws Exception
+   */
+  public void testGenerateCompressFiles() throws Exception {
+String pwd = new File("src/test/resources").getCanonicalPath();
+String inputFile = pwd + "/data.csv";
+FileInputStream input = new FileInputStream(inputFile);
+Configuration conf = new Configuration();
+
+// .gz
+String outputFile = pwd + "/data.csv.gz";
+FileOutputStream output = new FileOutputStream(outputFile);
+GzipCodec gzip = new GzipCodec();
+gzip.setConf(conf);
+CompressionOutputStream outputStream = gzip.createOutputStream(output);
+int i = -1;
+while ((i = input.read()) != -1) {
+  outputStream.write(i);
+}
+outputStream.close();
+input.close();
+
+// .bz2
+input = new FileInputStream(inputFile);
+outputFile = pwd + "/data.csv.bz2";
+output = new FileOutputStream(outputFile);
+BZip2Codec bzip2 = new BZip2Codec();
+bzip2.setConf(conf);
+outputStream = bzip2.createOutputStream(output);
+i = -1;
+while ((i = input.read()) != -1) {
+  outputStream.write(i);
+}
+outputStream.close();
+input.close();
+
+// .snappy
+input = new FileInputStream(inputFile);
+outputFile = pwd + "/data.csv.snappy";
+output = new FileOutputStream(outputFile);
+SnappyCodec snappy = new SnappyCodec();
+snappy.setConf(conf);
+outputStream = snappy.createOutputStream(output);
+i = -1;
+while ((i = input.read()) != -1) {
+  outputStream.write(i);
+}
+outputStream.close();
+input.close();
+
+//.lz4
+input = new FileInputStream(inputFile);
+outputFile = pwd + "/data.csv.lz4";
+output = new FileOutputStream(outputFile);
+Lz4Codec lz4 = new Lz4Codec();
+lz4.setConf(conf);
+outputStream = lz4.createOutputStream(output);
+i = -1;
+while ((i = input.read()) != -1) {
+  outputStream.write(i);
+}
+outputStream.close();
+input.close();
+
+  }
+
+  /**
+   * CSVCheckMapper check the content of csv files.
+   */
+  public static class CSVCheckMapper extends Mapper {

[jira] [Commented] (CARBONDATA-306) block size info should be show in Desc Formatted and executor log

2016-10-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15575341#comment-15575341
 ] 

ASF GitHub Bot commented on CARBONDATA-306:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/230#discussion_r83421288
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/mdkeygen/MDKeyGenStep.java
 ---
@@ -314,7 +314,7 @@ private boolean setStepConfiguration() {
 wrapperColumnSchema = CarbonUtil
 
.getColumnSchemaList(carbonTable.getDimensionByTableName(tableName),
 carbonTable.getMeasureByTableName(tableName));
-blocksize = carbonTable.getBlocksize();
+blocksize = carbonTable.getBlocksizeInMB();
--- End diff --

should be `getBlockSizeInMB`


> block size info should be show in Desc Formatted and executor log
> -
>
> Key: CARBONDATA-306
> URL: https://issues.apache.org/jira/browse/CARBONDATA-306
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jay
>Priority: Minor
>
> when run desc formatted command, the table block size should be show, as well 
> as in executor log when run load command



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-288) In hdfs bad record logger is failing in writting the bad records

2016-10-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574856#comment-15574856
 ] 

ASF GitHub Bot commented on CARBONDATA-288:
---

Github user mohammadshahidkhan commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/218#discussion_r83391717
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java
 ---
@@ -458,9 +462,11 @@ public boolean processRow(StepMetaInterface smi, 
StepDataInterface sdi) throws K
   break;
 case REDIRECT:
   badRecordsLogRedirect = true;
+  badRecordConvertNullDisable= true;
--- End diff --

Fixed


> In hdfs bad record logger is failing in writting the bad records
> 
>
> Key: CARBONDATA-288
> URL: https://issues.apache.org/jira/browse/CARBONDATA-288
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 0.2.0-incubating
>Reporter: Mohammad Shahid Khan
>Assignee: Mohammad Shahid Khan
>Priority: Minor
> Fix For: 0.2.0-incubating
>
>
> For HDFS file system 
> CarbonFile logFile = FileFactory.getCarbonFile(filePath, FileType.HDFS);
> if filePath does not exits then
> Calling CarbonFile.getPath() throws NullPointerException.
> Solution:
> If file does not exist then before accessing the file must be created first



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-288) In hdfs bad record logger is failing in writting the bad records

2016-10-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574854#comment-15574854
 ] 

ASF GitHub Bot commented on CARBONDATA-288:
---

Github user mohammadshahidkhan commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/218#discussion_r83391617
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/BadRecordslogger.java
 ---
@@ -69,9 +68,13 @@
   private BufferedWriter bufferedCSVWriter;
   private DataOutputStream outCSVStream;
   /**
-   *
+   * bad record log file path
+   */
+  private String logFilePath;
+  /**
+   * csv file path
*/
-  private CarbonFile logFile;
+  private String csvFilePath;
--- End diff --

log file will contains bad record row with the detailed reason of the 
failure 
csv will have only the bad record row.


> In hdfs bad record logger is failing in writting the bad records
> 
>
> Key: CARBONDATA-288
> URL: https://issues.apache.org/jira/browse/CARBONDATA-288
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 0.2.0-incubating
>Reporter: Mohammad Shahid Khan
>Assignee: Mohammad Shahid Khan
>Priority: Minor
> Fix For: 0.2.0-incubating
>
>
> For HDFS file system 
> CarbonFile logFile = FileFactory.getCarbonFile(filePath, FileType.HDFS);
> if filePath does not exits then
> Calling CarbonFile.getPath() throws NullPointerException.
> Solution:
> If file does not exist then before accessing the file must be created first



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-288) In hdfs bad record logger is failing in writting the bad records

2016-10-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574850#comment-15574850
 ] 

ASF GitHub Bot commented on CARBONDATA-288:
---

Github user mohammadshahidkhan commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/218#discussion_r83391382
  
--- Diff: 
integration/spark/src/main/java/org/apache/carbondata/spark/load/CarbonLoadModel.java
 ---
@@ -117,9 +117,9 @@
   private String badRecordsLoggerEnable;
 
   /**
-   * defines the option to specify the bad record log redirect to raw csv
+   * defines the option to specify the bad record logger action
*/
-  private String badRecordsLoggerRedirect;
+  private String badRecordsLoggerAction;
--- End diff --

yes corrected


> In hdfs bad record logger is failing in writting the bad records
> 
>
> Key: CARBONDATA-288
> URL: https://issues.apache.org/jira/browse/CARBONDATA-288
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 0.2.0-incubating
>Reporter: Mohammad Shahid Khan
>Assignee: Mohammad Shahid Khan
>Priority: Minor
> Fix For: 0.2.0-incubating
>
>
> For HDFS file system 
> CarbonFile logFile = FileFactory.getCarbonFile(filePath, FileType.HDFS);
> if filePath does not exits then
> Calling CarbonFile.getPath() throws NullPointerException.
> Solution:
> If file does not exist then before accessing the file must be created first



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.

2016-10-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574804#comment-15574804
 ] 

ASF GitHub Bot commented on CARBONDATA-297:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/229


> 2. Add interfaces for data loading.
> ---
>
> Key: CARBONDATA-297
> URL: https://issues.apache.org/jira/browse/CARBONDATA-297
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Ravindra Pesala
> Fix For: 0.2.0-incubating
>
>
> Add the major interface classes for data loading so that the following jiras 
> can use this interfaces to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-296) 1.Add CSVInputFormat to read csv files.

2016-10-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574786#comment-15574786
 ] 

ASF GitHub Bot commented on CARBONDATA-296:
---

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/233#discussion_r83387415
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java 
---
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.hadoop.mapreduce;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.Reader;
+
+import org.apache.carbondata.hadoop.io.BoundedInputStream;
+import org.apache.carbondata.hadoop.io.StringArrayWritable;
+import org.apache.carbondata.hadoop.util.CSVInputFormatUtil;
+
+import com.univocity.parsers.csv.CsvParser;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.Seekable;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.io.compress.CodecPool;
+import org.apache.hadoop.io.compress.CompressionCodec;
+import org.apache.hadoop.io.compress.CompressionCodecFactory;
+import org.apache.hadoop.io.compress.CompressionInputStream;
+import org.apache.hadoop.io.compress.Decompressor;
+import org.apache.hadoop.io.compress.SplitCompressionInputStream;
+import org.apache.hadoop.io.compress.SplittableCompressionCodec;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.RecordReader;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
+import org.apache.hadoop.mapreduce.lib.input.FileSplit;
+import org.apache.hadoop.util.LineReader;
+
+/**
+ * An {@link org.apache.hadoop.mapreduce.InputFormat} for csv files.  
Files are broken into lines.
+ * Values are the line of csv files.
+ */
+public class CSVInputFormat extends FileInputFormat {
+
+  @Override
+  public RecordReader 
createRecordReader(InputSplit inputSplit,
+  TaskAttemptContext context) throws IOException, InterruptedException 
{
+return new NewCSVRecordReader();
+  }
+
+  /**
+   * Treats value as line in file. Key is null.
+   */
+  public static class NewCSVRecordReader extends 
RecordReader {
+
+private long start;
+private long end;
+private BoundedInputStream boundedInputStream;
+private Reader reader;
+private CsvParser csvParser;
+private StringArrayWritable value;
+private String[] columns;
+private Seekable filePosition;
+private boolean isCompressedInput;
+private Decompressor decompressor;
+
+@Override
+public void initialize(InputSplit inputSplit, TaskAttemptContext 
context)
+throws IOException, InterruptedException {
+  FileSplit split = (FileSplit) inputSplit;
+  this.start = split.getStart();
+  this.end = this.start + split.getLength();
+  Path file = split.getPath();
+  Configuration job = context.getConfiguration();
+  CompressionCodec codec = (new 
CompressionCodecFactory(job)).getCodec(file);
+  FileSystem fs = file.getFileSystem(job);
+  FSDataInputStream fileIn = fs.open(file);
+  InputStream inputStream = null;
+  if (codec != null) {
+this.isCompressedInput = true;
+this.decompressor = CodecPool.getDecompressor(codec);
+if (codec instanceof SplittableCompressionCodec) {
+  SplitCompressionInputStream scIn = 

[jira] [Commented] (CARBONDATA-296) 1.Add CSVInputFormat to read csv files.

2016-10-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574785#comment-15574785
 ] 

ASF GitHub Bot commented on CARBONDATA-296:
---

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/233#discussion_r83387366
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java 
---
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.hadoop.mapreduce;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.Reader;
+
+import org.apache.carbondata.hadoop.io.BoundedInputStream;
+import org.apache.carbondata.hadoop.io.StringArrayWritable;
+import org.apache.carbondata.hadoop.util.CSVInputFormatUtil;
+
+import com.univocity.parsers.csv.CsvParser;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.Seekable;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.io.compress.CodecPool;
+import org.apache.hadoop.io.compress.CompressionCodec;
+import org.apache.hadoop.io.compress.CompressionCodecFactory;
+import org.apache.hadoop.io.compress.CompressionInputStream;
+import org.apache.hadoop.io.compress.Decompressor;
+import org.apache.hadoop.io.compress.SplitCompressionInputStream;
+import org.apache.hadoop.io.compress.SplittableCompressionCodec;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.RecordReader;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
+import org.apache.hadoop.mapreduce.lib.input.FileSplit;
+import org.apache.hadoop.util.LineReader;
+
+/**
+ * An {@link org.apache.hadoop.mapreduce.InputFormat} for csv files.  
Files are broken into lines.
+ * Values are the line of csv files.
+ */
+public class CSVInputFormat extends FileInputFormat {
+
+  @Override
+  public RecordReader 
createRecordReader(InputSplit inputSplit,
+  TaskAttemptContext context) throws IOException, InterruptedException 
{
+return new NewCSVRecordReader();
+  }
+
+  /**
+   * Treats value as line in file. Key is null.
+   */
+  public static class NewCSVRecordReader extends 
RecordReader {
+
+private long start;
+private long end;
+private BoundedInputStream boundedInputStream;
+private Reader reader;
+private CsvParser csvParser;
+private StringArrayWritable value;
+private String[] columns;
+private Seekable filePosition;
+private boolean isCompressedInput;
+private Decompressor decompressor;
+
+@Override
+public void initialize(InputSplit inputSplit, TaskAttemptContext 
context)
+throws IOException, InterruptedException {
+  FileSplit split = (FileSplit) inputSplit;
+  this.start = split.getStart();
--- End diff --

fixed


> 1.Add CSVInputFormat to read csv files.
> ---
>
> Key: CARBONDATA-296
> URL: https://issues.apache.org/jira/browse/CARBONDATA-296
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: QiangCai
> Fix For: 0.2.0-incubating
>
>
> Add CSVInputFormat to read csv files, it should use Univocity parser to read 
> csv files to get optimal performance. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-305) Switching between kettle flow and new data loading flow make configurable

2016-10-14 Thread Jacky Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li updated CARBONDATA-305:

Assignee: Jacky Li

> Switching between kettle flow and new data loading flow make configurable
> -
>
> Key: CARBONDATA-305
> URL: https://issues.apache.org/jira/browse/CARBONDATA-305
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Jacky Li
>Priority: Minor
> Fix For: 0.2.0-incubating
>
>
> Switching between kettle flow and new data loading flow make configurable. 
> This configuration should switch it dynamically while loading the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CARBONDATA-316) Change BAD_RECORDS_LOGGER_ACTION to BAD_RECORDS_ACTION

2016-10-14 Thread Jacky Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li resolved CARBONDATA-316.
-
   Resolution: Fixed
Fix Version/s: 0.2.0-incubating

> Change BAD_RECORDS_LOGGER_ACTION to BAD_RECORDS_ACTION
> --
>
> Key: CARBONDATA-316
> URL: https://issues.apache.org/jira/browse/CARBONDATA-316
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Mohammad Shahid Khan
>Assignee: Mohammad Shahid Khan
>Priority: Trivial
> Fix For: 0.2.0-incubating
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CARBONDATA-310) Compilation failed when using spark 1.6.2

2016-10-14 Thread Jacky Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li resolved CARBONDATA-310.
-
   Resolution: Fixed
Fix Version/s: 0.2.0-incubating

> Compilation failed when using spark 1.6.2
> -
>
> Key: CARBONDATA-310
> URL: https://issues.apache.org/jira/browse/CARBONDATA-310
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Gin-zhj
>Assignee: Gin-zhj
>Priority: Minor
> Fix For: 0.2.0-incubating
>
>
> Compilation failed when using spark 1.6.2,
> caused by class not found: AggregateExpression



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-306) block size info should be show in Desc Formatted and executor log

2016-10-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574635#comment-15574635
 ] 

ASF GitHub Bot commented on CARBONDATA-306:
---

Github user Jay357089 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/230#discussion_r83376650
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -1422,6 +1422,7 @@ private[sql] case class DescribeCommandFormatted(
 results ++= Seq(("Table Name : ", 
relation.tableMeta.carbonTableIdentifier.getTableName, ""))
 results ++= Seq(("CARBON Store Path : ", relation.tableMeta.storePath, 
""))
 val carbonTable = relation.tableMeta.carbonTable
+results ++= Seq(("Table Block Size : ", carbonTable.getBlocksize + " 
MB", ""))
--- End diff --

done. CI passed. 
http://136.243.101.176:8080/job/ApacheCarbonManualPRBuilder/429/


> block size info should be show in Desc Formatted and executor log
> -
>
> Key: CARBONDATA-306
> URL: https://issues.apache.org/jira/browse/CARBONDATA-306
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jay
>Priority: Minor
>
> when run desc formatted command, the table block size should be show, as well 
> as in executor log when run load command



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-297) 2. Add interfaces for data loading.

2016-10-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574598#comment-15574598
 ] 

ASF GitHub Bot commented on CARBONDATA-297:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/229#discussion_r83373827
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/AbstractDataLoadProcessorStep.java
 ---
@@ -73,15 +72,15 @@ public 
AbstractDataLoadProcessorStep(CarbonDataLoadConfiguration configuration,
* Create the iterator using child iterator.
*
* @param childIter
-   * @return
+   * @return new iterator with step specific processing.
*/
-  protected Iterator getIterator(final Iterator 
childIter) {
-return new CarbonIterator() {
+  protected Iterator getIterator(final Iterator 
childIter) {
--- End diff --

ok. Added


> 2. Add interfaces for data loading.
> ---
>
> Key: CARBONDATA-297
> URL: https://issues.apache.org/jira/browse/CARBONDATA-297
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Ravindra Pesala
> Fix For: 0.2.0-incubating
>
>
> Add the major interface classes for data loading so that the following jiras 
> can use this interfaces to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)