[GitHub] carbondata issue #2876: [CARBONDATA-3054] Fix Dictionary file cannot be read...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2876
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1173/



---


[GitHub] carbondata issue #2877: [CARBONDATA-3061] Add validation for supported forma...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2877
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1172/



---


[GitHub] carbondata issue #2876: [CARBONDATA-3054] Fix Dictionary file cannot be read...

2018-10-30 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2876
  
LGTM


---


[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

2018-10-30 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r229563795
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/CarbonColumnVectorImpl.java
 ---
@@ -305,7 +301,7 @@ public void setBlockDataType(DataType blockDataType) {
   }
 
   @Override public CarbonColumnVector getDictionaryVector() {
-return dictionaryVector;
+return null;
--- End diff --

VectorizedCarbonRecordReader is handled in the same way for 
getCurrentKey(). 



---


[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

2018-10-30 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r229563694
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java
 ---
@@ -145,9 +158,33 @@ public CarbonTable 
getOrCreateCarbonTable(Configuration configuration) throws IO
   externalTableSegments.add(seg);
 }
   }
-  // do block filtering and get split
-  List splits =
-  getSplits(job, filter, externalTableSegments, null, 
partitionInfo, null);
+  List splits = new ArrayList<>();
+  if (isSDK) {
+for (CarbonFile carbonFile : 
getAllCarbonDataFiles(carbonTable.getTablePath())) {
--- End diff --

ok


---


[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

2018-10-30 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r229563660
  
--- Diff: 
store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ---
@@ -1737,4 +1738,89 @@ public void 
testReadNextRowWithProjectionAndRowUtil() {
 }
   }
 
+  @Test
+  public void testVectorReader() {
+String path = "./testWriteFiles";
+try {
+  FileUtils.deleteDirectory(new File(path));
+
+  Field[] fields = new Field[12];
+  fields[0] = new Field("stringField", DataTypes.STRING);
+  fields[1] = new Field("shortField", DataTypes.SHORT);
+  fields[2] = new Field("intField", DataTypes.INT);
+  fields[3] = new Field("longField", DataTypes.LONG);
+  fields[4] = new Field("doubleField", DataTypes.DOUBLE);
+  fields[5] = new Field("boolField", DataTypes.BOOLEAN);
+  fields[6] = new Field("dateField", DataTypes.DATE);
+  fields[7] = new Field("timeField", DataTypes.TIMESTAMP);
+  fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 
2));
+  fields[9] = new Field("varcharField", DataTypes.VARCHAR);
+  fields[10] = new Field("byteField", DataTypes.BYTE);
+  fields[11] = new Field("floatField", DataTypes.FLOAT);
+  Map map = new HashMap<>();
+  map.put("complex_delimiter_level_1", "#");
+  CarbonWriter writer = CarbonWriter.builder()
+  .outputPath(path)
+  .withLoadOptions(map)
+  .withCsvInput(new Schema(fields))
+  .writtenBy("CarbonReaderTest")
+  .build();
+
+  for (int i = 0; i < 10; i++) {
+String[] row2 = new String[]{
+"robot" + (i % 10),
+String.valueOf(i % 1),
+String.valueOf(i),
+String.valueOf(Long.MAX_VALUE - i),
+String.valueOf((double) i / 2),
+String.valueOf(true),
+"2019-03-02",
+"2019-02-12 03:03:34",
+"12.345",
+"varchar",
+String.valueOf(i),
+"1.23"
+};
+writer.write(row2);
+  }
+  writer.close();
+
+  // Read data
+  CarbonReader reader = CarbonReader
+  .builder(path, "_temp")
+  .withVectorReader(true)
+  .build();
+
+  int i = 0;
+  while (reader.hasNext()) {
+Object[] data = (Object[]) reader.readNextRow();
+
+assert (RowUtil.getString(data, 0).equals("robot" + i));
+assertEquals(RowUtil.getShort(data, 4), i);
+assertEquals(RowUtil.getInt(data, 5), i);
+assert (RowUtil.getLong(data, 6) == Long.MAX_VALUE - i);
+assertEquals(RowUtil.getDouble(data, 7), ((double) i) / 2);
+assert (RowUtil.getByte(data, 8).equals(new Byte("1")));
+assertEquals(RowUtil.getInt(data, 1), 17957);
+assertEquals(RowUtil.getLong(data, 2), 154992081400L);
+assert (RowUtil.getDecimal(data, 9).equals("12.35"));
+assert (RowUtil.getString(data, 3).equals("varchar"));
+assertEquals(RowUtil.getByte(data, 10), new 
Byte(String.valueOf(i)));
+assertEquals(RowUtil.getFloat(data, 11), new Float("1.23"));
+i++;
+  }
+  reader.close();
--- End diff --

done



---


[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

2018-10-30 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r229563709
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputSplit.java ---
@@ -138,6 +138,19 @@ public CarbonInputSplit(String segmentId, Path path, 
long start, long length, St
 version = CarbonProperties.getInstance().getFormatVersion();
   }
 
+  public CarbonInputSplit(String segmentId, Path path, long start, long 
length,
--- End diff --

ok


---


[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

2018-10-30 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r229563684
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java
 ---
@@ -145,9 +158,33 @@ public CarbonTable 
getOrCreateCarbonTable(Configuration configuration) throws IO
   externalTableSegments.add(seg);
 }
   }
-  // do block filtering and get split
-  List splits =
-  getSplits(job, filter, externalTableSegments, null, 
partitionInfo, null);
+  List splits = new ArrayList<>();
+  if (isSDK) {
--- End diff --

changed


---


[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

2018-10-30 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r229563650
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/util/CarbonVectorizedRecordReader.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.hadoop.util;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonV3DataFormatConstants;
+import org.apache.carbondata.core.datastore.block.TableBlockInfo;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.metadata.datatype.DecimalType;
+import org.apache.carbondata.core.metadata.datatype.StructField;
+import org.apache.carbondata.core.scan.executor.QueryExecutor;
+import org.apache.carbondata.core.scan.executor.QueryExecutorFactory;
+import 
org.apache.carbondata.core.scan.executor.exception.QueryExecutionException;
+import org.apache.carbondata.core.scan.model.ProjectionDimension;
+import org.apache.carbondata.core.scan.model.ProjectionMeasure;
+import org.apache.carbondata.core.scan.model.QueryModel;
+import 
org.apache.carbondata.core.scan.result.iterator.AbstractDetailQueryResultIterator;
+import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
+import org.apache.carbondata.core.scan.result.vector.CarbonColumnarBatch;
+import 
org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl;
+import org.apache.carbondata.core.util.ByteUtil;
+import org.apache.carbondata.hadoop.AbstractRecordReader;
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.log4j.Logger;
+
+/**
+ * A specialized RecordReader that reads into CarbonColumnarBatches 
directly using the
+ * carbondata column APIs and fills the data directly into columns.
+ */
+public class CarbonVectorizedRecordReader extends 
AbstractRecordReader {
+
+  private static final Logger LOGGER =
+  
LogServiceFactory.getLogService(CarbonVectorizedRecordReader.class.getName());
+
+  private CarbonColumnarBatch carbonColumnarBatch;
+
+  private QueryExecutor queryExecutor;
+
+  private int batchIdx = 0;
+
+  private int numBatched = 0;
+
+  private AbstractDetailQueryResultIterator iterator;
+
+  private QueryModel queryModel;
+
+  public CarbonVectorizedRecordReader(QueryModel queryModel) {
+this.queryModel = queryModel;
+  }
+
+  @Override public void initialize(InputSplit inputSplit, 
TaskAttemptContext taskAttemptContext)
+  throws IOException, InterruptedException {
+List splitList;
+if (inputSplit instanceof CarbonInputSplit) {
+  splitList = new ArrayList<>(1);
+  splitList.add((CarbonInputSplit) inputSplit);
+} else {
+  throw new RuntimeException("unsupported input split type: " + 
inputSplit);
+}
+List tableBlockInfoList = 
CarbonInputSplit.createBlocks(splitList);
+queryModel.setTableBlockInfos(tableBlockInfoList);
+queryModel.setVectorReader(true);
+try {
+  queryExecutor =
+  QueryExecutorFactory.getQueryExecutor(queryModel, 
taskAttemptContext.getConfiguration());
+  iterator = (AbstractDetailQueryResultIterator) 
queryExecutor.execute(queryModel);
+} catch (QueryExecutionException e) {
+  LOGGER.error(e);
+  throw new InterruptedException(e.getMessage());
+} catch (Exception e) {
+  LOGGER.error(e);
+  throw e;
+}
+  }
+
+  @Override public boolean nextKeyValue() throws IOException, 
InterruptedException {
+initBatch();
--- End diff --
 

[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

2018-10-30 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r229563643
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/util/CarbonVectorizedRecordReader.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.hadoop.util;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonV3DataFormatConstants;
+import org.apache.carbondata.core.datastore.block.TableBlockInfo;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.metadata.datatype.DecimalType;
+import org.apache.carbondata.core.metadata.datatype.StructField;
+import org.apache.carbondata.core.scan.executor.QueryExecutor;
+import org.apache.carbondata.core.scan.executor.QueryExecutorFactory;
+import 
org.apache.carbondata.core.scan.executor.exception.QueryExecutionException;
+import org.apache.carbondata.core.scan.model.ProjectionDimension;
+import org.apache.carbondata.core.scan.model.ProjectionMeasure;
+import org.apache.carbondata.core.scan.model.QueryModel;
+import 
org.apache.carbondata.core.scan.result.iterator.AbstractDetailQueryResultIterator;
+import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
+import org.apache.carbondata.core.scan.result.vector.CarbonColumnarBatch;
+import 
org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl;
+import org.apache.carbondata.core.util.ByteUtil;
+import org.apache.carbondata.hadoop.AbstractRecordReader;
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.log4j.Logger;
+
+/**
+ * A specialized RecordReader that reads into CarbonColumnarBatches 
directly using the
+ * carbondata column APIs and fills the data directly into columns.
+ */
+public class CarbonVectorizedRecordReader extends 
AbstractRecordReader {
+
+  private static final Logger LOGGER =
+  
LogServiceFactory.getLogService(CarbonVectorizedRecordReader.class.getName());
+
+  private CarbonColumnarBatch carbonColumnarBatch;
+
+  private QueryExecutor queryExecutor;
+
+  private int batchIdx = 0;
+
+  private int numBatched = 0;
+
+  private AbstractDetailQueryResultIterator iterator;
+
+  private QueryModel queryModel;
+
+  public CarbonVectorizedRecordReader(QueryModel queryModel) {
+this.queryModel = queryModel;
+  }
+
+  @Override public void initialize(InputSplit inputSplit, 
TaskAttemptContext taskAttemptContext)
+  throws IOException, InterruptedException {
+List splitList;
+if (inputSplit instanceof CarbonInputSplit) {
+  splitList = new ArrayList<>(1);
+  splitList.add((CarbonInputSplit) inputSplit);
+} else {
+  throw new RuntimeException("unsupported input split type: " + 
inputSplit);
+}
+List tableBlockInfoList = 
CarbonInputSplit.createBlocks(splitList);
+queryModel.setTableBlockInfos(tableBlockInfoList);
+queryModel.setVectorReader(true);
+try {
+  queryExecutor =
+  QueryExecutorFactory.getQueryExecutor(queryModel, 
taskAttemptContext.getConfiguration());
+  iterator = (AbstractDetailQueryResultIterator) 
queryExecutor.execute(queryModel);
+} catch (QueryExecutionException e) {
+  LOGGER.error(e);
+  throw new InterruptedException(e.getMessage());
+} catch (Exception e) {
+  LOGGER.error(e);
+  throw e;
+}
+  }
+
+  @Override public boolean nextKeyValue() throws IOException, 
InterruptedException {
+initBatch();
+if 

[GitHub] carbondata issue #2807: [CARBONDATA-2997] Support read schema from index fil...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2807
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1384/



---


[GitHub] carbondata issue #2807: [CARBONDATA-2997] Support read schema from index fil...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2807
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9435/



---


[jira] [Created] (CARBONDATA-3063) Support set carbon property in CSDK

2018-10-30 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3063:
---

 Summary: Support set carbon property in CSDK
 Key: CARBONDATA-3063
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3063
 Project: CarbonData
  Issue Type: Sub-task
Affects Versions: 1.5.1
Reporter: xubo245
Assignee: xubo245
 Fix For: 1.5.1


when user write CarbonData or read CarbonData in CSDK,  user maybe need to 
change or add carbon property to avoid some problem. such as OOM.
So we should support set carbon property in CSDK




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1383/



---


[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9434/



---


[GitHub] carbondata issue #2807: [CARBONDATA-2997] Support read schema from index fil...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2807
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1171/



---


[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1170/



---


[GitHub] carbondata issue #2869: [CARBONDATA-3057] Changes for improving carbon reade...

2018-10-30 Thread xubo245
Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/2869
  
please fix CI error @kunal642 


---


[GitHub] carbondata issue #2807: [CARBONDATA-2997] Support read schema from index fil...

2018-10-30 Thread xubo245
Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/2807
  
rebase


---


[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...

2018-10-30 Thread xubo245
Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/2804
  
@CI pass, please check again.


---


[GitHub] carbondata pull request #2871: [CARBONDATA-3051] Fix bugs in unclosed stream...

2018-10-30 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2871#discussion_r229535208
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java 
---
@@ -169,7 +170,7 @@ public CarbonReaderBuilder withHadoopConf(String key, 
String value) {
   reader.initialize(split, attempt);
   readers.add(reader);
 } catch (Exception e) {
-  reader.close();
+  CarbonUtil.closeStreams(readers.toArray(new RecordReader[0]));
--- End diff --

`CarbonUtil.closeStreams` will loop and close the readers. Calling this 
will save the loc (line of code)


---


[GitHub] carbondata pull request #2874: [CARBONDATA-3053][Cli] Fix bugs for carbon-cl...

2018-10-30 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2874#discussion_r229535025
  
--- Diff: tools/cli/src/main/java/org/apache/carbondata/tool/DataFile.java 
---
@@ -500,4 +503,7 @@ private double computePercentage(byte[] data, byte[] 
min, byte[] max, ColumnSche
 }
   }
 
+  public void close() throws IOException {
+this.fileReader.finish();
--- End diff --

this is the `close` method for `fileReader`


---


[GitHub] carbondata issue #2883: [CARBONDATA-3062] Fix Compatibility issue with cache...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2883
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9433/



---


[GitHub] carbondata issue #2883: [CARBONDATA-3062] Fix Compatibility issue with cache...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2883
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1382/



---


[GitHub] carbondata issue #2883: [CARBONDATA-3062] Fix Compatibility issue with cache...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2883
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1169/



---


[GitHub] carbondata issue #2883: [CARBONDATA-3062] Fix Compatibility issue with cache...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2883
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1381/



---


[GitHub] carbondata issue #2883: [CARBONDATA-3062] Fix Compatibility issue with cache...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2883
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9432/



---


[GitHub] carbondata issue #2868: [CARBONDATA-3052] Improve drop table performance by ...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2868
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1380/



---


[GitHub] carbondata issue #2883: [CARBONDATA-3062] Fix Compatibility issue with cache...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2883
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1168/



---


[GitHub] carbondata issue #2876: [CARBONDATA-3054] Fix Dictionary file cannot be read...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2876
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1378/



---


[GitHub] carbondata issue #2876: [CARBONDATA-3054] Fix Dictionary file cannot be read...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2876
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9429/



---


[GitHub] carbondata issue #2868: [CARBONDATA-3052] Improve drop table performance by ...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2868
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1167/



---


[GitHub] carbondata issue #2877: [CARBONDATA-3061] Add validation for supported forma...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2877
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9430/



---


[GitHub] carbondata issue #2877: [CARBONDATA-3061] Add validation for supported forma...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2877
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1379/



---


[jira] [Resolved] (CARBONDATA-3000) Provide C++ interface for writing carbon data

2018-10-30 Thread Kunal Kapoor (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-3000.
--
   Resolution: Fixed
Fix Version/s: 1.5.1

> Provide C++ interface for writing carbon data
> -
>
> Key: CARBONDATA-3000
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3000
> Project: CarbonData
>  Issue Type: Sub-task
>Affects Versions: 1.5.0
>Reporter: xubo245
>Assignee: xubo245
>Priority: Major
> Fix For: 1.5.1
>
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Provide C++ interface for writing carbon data



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata pull request #2837: [CARBONDATA-3000] Provide C++ interface for w...

2018-10-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2837


---


[GitHub] carbondata pull request #2883: [CARBONDATA-3062] Fix Compatibility issue wit...

2018-10-30 Thread Indhumathi27
GitHub user Indhumathi27 opened a pull request:

https://github.com/apache/carbondata/pull/2883

[CARBONDATA-3062] Fix Compatibility issue with cache_level as blocklet


**Why this PR for?**
In case of hybrid store we can have block as well as blocklet schema.
Scenario: 
When there is a hybrid store in which few loads are from legacy store which 
do not contain the blocklet information and hence they will be, by default have 
cache_level as BLOCK and few loads with latest store which contain the BLOCKLET 
information and have cache_level BLOCKLET. For these type of scenarios we need 
to have separate task and footer schemas. For all loads with/without blocklet 
info there will not be any additional cost of maintaining 2 variables

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Indhumathi27/carbondata column_comp

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2883.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2883






---


[jira] [Created] (CARBONDATA-3062) Fix Compatibility issue with cache_level as blocklet

2018-10-30 Thread Indhumathi Muthumurugesh (JIRA)
Indhumathi Muthumurugesh created CARBONDATA-3062:


 Summary: Fix Compatibility issue with cache_level as blocklet
 Key: CARBONDATA-3062
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3062
 Project: CarbonData
  Issue Type: Improvement
Reporter: Indhumathi Muthumurugesh
Assignee: Indhumathi Muthumurugesh






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2877: [CARBONDATA-3061] Add validation for supported forma...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2877
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1166/



---


[GitHub] carbondata pull request #2868: [CARBONDATA-3052] Improve drop table performa...

2018-10-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2868#discussion_r229363551
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/filesystem/LocalCarbonFile.java
 ---
@@ -141,7 +141,12 @@ public boolean renameTo(String changetoName) {
   }
 
   public boolean delete() {
-return file.delete();
+try {
+  return deleteFile(file.getAbsolutePath(), 
FileFactory.getFileType(file.getAbsolutePath()));
+} catch (IOException e) {
+  LOGGER.error("Exception occurred:" + e.getMessage());
--- End diff --

ok


---


[GitHub] carbondata pull request #2868: [CARBONDATA-3052] Improve drop table performa...

2018-10-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2868#discussion_r229363490
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/filesystem/CarbonFile.java
 ---
@@ -62,6 +62,11 @@
 
   boolean renameForce(String changetoName);
 
+  /**
+   * This method will delete the files recursively from file system
+   *
+   * @return
--- End diff --

ok


---


[GitHub] carbondata issue #2868: [CARBONDATA-3052] Improve drop table performance by ...

2018-10-30 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2868
  
> If the table is on S3, will it behave correctly since it does not have 
"folder" concept?

I have not changed any existing behavior, so it should work fine


---


[jira] [Created] (CARBONDATA-3061) Add validation for supported format version and Encoding type to throw proper exception to the user while reading a file

2018-10-30 Thread Manish Gupta (JIRA)
Manish Gupta created CARBONDATA-3061:


 Summary: Add validation for supported format version and Encoding 
type to throw proper exception to the user while reading a file
 Key: CARBONDATA-3061
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3061
 Project: CarbonData
  Issue Type: Improvement
Reporter: Manish Gupta
Assignee: Manish Gupta


This jira is raised to handle forward compatibility. Through this PR if any 
data file is read using a lower version (>=1.5.1), a proper exception will be 
thrown if columnar format version or any encoding type is not supported for 
read in that version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2876: [CARBONDATA-3054] Fix Dictionary file cannot be read...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2876
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1165/



---


[GitHub] carbondata issue #2861: [CARBONDATA-3025]handle passing spark appname for pa...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2861
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9428/



---


[GitHub] carbondata issue #2869: [CARBONDATA-3057] Changes for improving carbon reade...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2869
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9427/



---


[GitHub] carbondata issue #2861: [CARBONDATA-3025]handle passing spark appname for pa...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2861
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1164/



---


[GitHub] carbondata issue #2869: [CARBONDATA-3057] Changes for improving carbon reade...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2869
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1163/



---


[GitHub] carbondata issue #2882: [CARBONDATA-3060]Improve the command for cli and fix...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2882
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1162/



---


[GitHub] carbondata issue #2882: [CARBONDATA-3060]Improve the command for cli and fix...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2882
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1377/



---


[GitHub] carbondata issue #2861: [CARBONDATA-3025]handle passing spark appname for pa...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2861
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1376/



---


[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2804
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9425/



---


[GitHub] carbondata issue #2869: [CARBONDATA-3057] Changes for improving carbon reade...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2869
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1375/



---


[GitHub] carbondata issue #2873: [WIP] Fix partition load issue when custom location ...

2018-10-30 Thread kumarvishal09
Github user kumarvishal09 commented on the issue:

https://github.com/apache/carbondata/pull/2873
  
LGTM


---


[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2804
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1161/



---


[GitHub] carbondata issue #2837: [CARBONDATA-3000] Provide C++ interface for writing ...

2018-10-30 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/2837
  
LGTM


---


[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2804
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1374/



---


[GitHub] carbondata issue #2870: [HOTFIX-compatibility] Handle Lazy loading with inve...

2018-10-30 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2870
  
LGTM


---


[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1160/



---


[GitHub] carbondata issue #2867: [HOTFIX] Fixed data loading failure with safe column...

2018-10-30 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2867
  
LGTM


---


[GitHub] carbondata issue #2873: [WIP] Fix partition load issue when custom location ...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2873
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9423/



---


[GitHub] carbondata issue #2879: [CARBONDATA-3058] Fix some exception coding in data ...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2879
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1159/



---


[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9424/



---


[GitHub] carbondata issue #2882: [CARBONDATA-3060]Improve the command for cli and fix...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2882
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9421/



---


[GitHub] carbondata issue #2873: [WIP] Fix partition load issue when custom location ...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2873
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1372/



---


[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1373/



---


[GitHub] carbondata issue #2879: [CARBONDATA-3058] Fix some exception coding in data ...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2879
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9422/



---


[GitHub] carbondata issue #2879: [CARBONDATA-3058] Fix some exception coding in data ...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2879
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1371/



---


[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...

2018-10-30 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2882#discussion_r229300837
  
--- Diff: 
tools/cli/src/main/java/org/apache/carbondata/tool/DataSummary.java ---
@@ -314,23 +312,26 @@ private void printColumnStats(String columnName) 
throws IOException, MemoryExcep
   minPercent = String.format("%.1f", 
blocklet.getColumnChunk().getMinPercentage() * 100);
   maxPercent = String.format("%.1f", 
blocklet.getColumnChunk().getMaxPercentage() * 100);
   DataFile.ColumnChunk columnChunk = blocklet.columnChunk;
-  if (columnChunk.column.isDimensionColumn() && DataTypeUtil
+  if (columnChunk.column.hasEncoding(Encoding.DICTIONARY) || 
blocklet
+  .getColumnChunk().column.getColumnName().contains(".val") || 
blocklet
--- End diff --

this will be for no dictionary colmplex column, for complex column min max 
can be shown as NA, that will be ok right


---


[GitHub] carbondata pull request #2850: [CARBONDATA-3056] Added concurrent reading th...

2018-10-30 Thread NamanRastogi
Github user NamanRastogi commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2850#discussion_r229299196
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ---
@@ -114,6 +117,43 @@ public static CarbonReaderBuilder builder(String 
tablePath) {
 return builder(tablePath, tableName);
   }
 
+  /**
+   * Return a new list of {@link CarbonReader} objects
+   *
--- End diff --

Done!


---


[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Changes for improving carbo...

2018-10-30 Thread ajantha-bhat
Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r229298501
  
--- Diff: 
store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ---
@@ -1737,4 +1738,89 @@ public void 
testReadNextRowWithProjectionAndRowUtil() {
 }
   }
 
+  @Test
+  public void testVectorReader() {
+String path = "./testWriteFiles";
+try {
+  FileUtils.deleteDirectory(new File(path));
+
+  Field[] fields = new Field[12];
+  fields[0] = new Field("stringField", DataTypes.STRING);
+  fields[1] = new Field("shortField", DataTypes.SHORT);
+  fields[2] = new Field("intField", DataTypes.INT);
+  fields[3] = new Field("longField", DataTypes.LONG);
+  fields[4] = new Field("doubleField", DataTypes.DOUBLE);
+  fields[5] = new Field("boolField", DataTypes.BOOLEAN);
+  fields[6] = new Field("dateField", DataTypes.DATE);
+  fields[7] = new Field("timeField", DataTypes.TIMESTAMP);
+  fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 
2));
+  fields[9] = new Field("varcharField", DataTypes.VARCHAR);
+  fields[10] = new Field("byteField", DataTypes.BYTE);
+  fields[11] = new Field("floatField", DataTypes.FLOAT);
+  Map map = new HashMap<>();
+  map.put("complex_delimiter_level_1", "#");
+  CarbonWriter writer = CarbonWriter.builder()
+  .outputPath(path)
+  .withLoadOptions(map)
+  .withCsvInput(new Schema(fields))
+  .writtenBy("CarbonReaderTest")
+  .build();
+
+  for (int i = 0; i < 10; i++) {
+String[] row2 = new String[]{
+"robot" + (i % 10),
+String.valueOf(i % 1),
+String.valueOf(i),
+String.valueOf(Long.MAX_VALUE - i),
+String.valueOf((double) i / 2),
+String.valueOf(true),
+"2019-03-02",
+"2019-02-12 03:03:34",
+"12.345",
+"varchar",
+String.valueOf(i),
+"1.23"
+};
+writer.write(row2);
+  }
+  writer.close();
+
+  // Read data
+  CarbonReader reader = CarbonReader
+  .builder(path, "_temp")
+  .withVectorReader(true)
+  .build();
+
+  int i = 0;
+  while (reader.hasNext()) {
+Object[] data = (Object[]) reader.readNextRow();
+
+assert (RowUtil.getString(data, 0).equals("robot" + i));
+assertEquals(RowUtil.getShort(data, 4), i);
+assertEquals(RowUtil.getInt(data, 5), i);
+assert (RowUtil.getLong(data, 6) == Long.MAX_VALUE - i);
+assertEquals(RowUtil.getDouble(data, 7), ((double) i) / 2);
+assert (RowUtil.getByte(data, 8).equals(new Byte("1")));
+assertEquals(RowUtil.getInt(data, 1), 17957);
+assertEquals(RowUtil.getLong(data, 2), 154992081400L);
+assert (RowUtil.getDecimal(data, 9).equals("12.35"));
+assert (RowUtil.getString(data, 3).equals("varchar"));
+assertEquals(RowUtil.getByte(data, 10), new 
Byte(String.valueOf(i)));
+assertEquals(RowUtil.getFloat(data, 11), new Float("1.23"));
+i++;
+  }
+  reader.close();
--- End diff --

Add validation for total number of rows read.  


---


[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...

2018-10-30 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2882#discussion_r229297954
  
--- Diff: tools/cli/src/main/java/org/apache/carbondata/tool/DataFile.java 
---
@@ -443,7 +444,8 @@ void computePercentage(byte[] shardMin, byte[] 
shardMax) {
  * @return result
  */
 private double computePercentage(byte[] data, byte[] min, byte[] max, 
ColumnSchema column) {
-  if (column.getDataType() == DataTypes.STRING) {
+  if (column.getDataType() == DataTypes.STRING || column.getDataType() 
== DataTypes.BOOLEAN
+  || column.hasEncoding(Encoding.DICTIONARY)) {
--- End diff --

yes, but min max will be surrogate keys right, showing min and max as 
dictionary value is not useful right


---


[GitHub] carbondata pull request #2861: [CARBONDATA-3025]handle passing spark appname...

2018-10-30 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2861#discussion_r229294323
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala
 ---
@@ -175,6 +172,11 @@ with Serializable {
   dataSchema: StructType,
   context: TaskAttemptContext): OutputWriter = {
 val model = 
CarbonTableOutputFormat.getLoadModel(context.getConfiguration)
+val appName = 
context.getConfiguration.get(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME)
+if (null != appName) {
--- End diff --

actually the appname will be always set ,spark will always set the appname, 
this check is added for one of the test cases , i will remove this


---


[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Changes for improving carbo...

2018-10-30 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r229288967
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/SafeVariableLengthDimensionDataChunkStore.java
 ---
@@ -169,7 +169,7 @@ public void fillRow(int rowId, CarbonColumnVector 
vector, int vectorRow) {
 length)) {
   vector.putNull(vectorRow);
--- End diff --

added check


---


[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Changes for improving carbo...

2018-10-30 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r229289006
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/filesystem/LocalCarbonFile.java
 ---
@@ -178,6 +178,25 @@ public boolean delete() {
 return carbonFiles;
   }
 
+  @Override public List listFiles(Boolean recursive, 
CarbonFileFilter fileFilter)
--- End diff --

fixed


---


[GitHub] carbondata pull request #2879: [CARBONDATA-3058] Fix some exception coding i...

2018-10-30 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2879#discussion_r229287888
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/steps/CarbonRowDataWriterProcessorStepImpl.java
 ---
@@ -308,7 +312,7 @@ private void processBatch(CarbonRowBatch batch, 
CarbonFactHandler dataHandler, i
   }
   writeCounter[iteratorIndex] += batch.getSize();
 } catch (Exception e) {
-  throw new CarbonDataLoadingException("unable to generate the mdkey", 
e);
+  throw new CarbonDataLoadingException(e);
--- End diff --

The KeyGenException extend Exception, it needs 
CarbonDataLoadingException(RuntimeException) to wrap and throw. 


---


[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2804
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9420/



---


[GitHub] carbondata issue #2869: [CARBONDATA-3057] Changes for improving carbon reade...

2018-10-30 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2869
  
Can you change the PR title to be more specific


---


[GitHub] carbondata pull request #2861: [CARBONDATA-3025]handle passing spark appname...

2018-10-30 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2861#discussion_r229285778
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala
 ---
@@ -175,6 +172,11 @@ with Serializable {
   dataSchema: StructType,
   context: TaskAttemptContext): OutputWriter = {
 val model = 
CarbonTableOutputFormat.getLoadModel(context.getConfiguration)
+val appName = 
context.getConfiguration.get(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME)
+if (null != appName) {
--- End diff --

Is there is no appName, I think we should construct one, the appName should 
be always write into the file


---


[GitHub] carbondata pull request #2861: [CARBONDATA-3025]handle passing spark appname...

2018-10-30 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2861#discussion_r229285529
  
--- Diff: 
integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala
 ---
@@ -121,15 +121,13 @@ class SparkCarbonFileFormat extends FileFormat
   dataSchema: StructType): OutputWriterFactory = {
 
 val conf = job.getConfiguration
-
+conf
+  .set(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME,
--- End diff --

move to previous line


---


[GitHub] carbondata pull request #2861: [CARBONDATA-3025]handle passing spark appname...

2018-10-30 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2861#discussion_r229285477
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonRDD.scala
 ---
@@ -37,6 +37,11 @@ abstract class CarbonRDD[T: ClassTag](
 @transient private val ss: SparkSession,
 @transient private var deps: Seq[Dependency[_]]) extends 
RDD[T](ss.sparkContext, deps) {
 
+  @transient val sparkAppName: String = ss.sparkContext.appName
+  CarbonProperties.getInstance()
+.addProperty(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME,
+  sparkAppName)
--- End diff --

move to previous line


---


[GitHub] carbondata issue #2861: [CARBONDATA-3025]handle passing spark appname for pa...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2861
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1367/



---


[GitHub] carbondata pull request #2879: [CARBONDATA-3058] Fix some exception coding i...

2018-10-30 Thread kevinjmh
Github user kevinjmh commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2879#discussion_r229285339
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/steps/CarbonRowDataWriterProcessorStepImpl.java
 ---
@@ -212,7 +212,11 @@ private void finish(CarbonFactHandler dataHandler, int 
iteratorIndex) {
 try {
   processingComplete(dataHandler);
 } catch (CarbonDataLoadingException e) {
-  exception = new CarbonDataWriterException(e.getMessage(), e);
+  // only assign when exception is null
+  // else it will erase original root cause
+  if (null == exception) {
--- End diff --

not for the statistics. better to read the whole method. It has two stages: 
finish the handler and close the handler. the exception  could be assigned in 
either stage.


---


[GitHub] carbondata pull request #2842: [CARBONDATA-3032] Remove carbon.blocklet.size...

2018-10-30 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2842#discussion_r229285157
  
--- Diff: docs/sdk-guide.md ---
@@ -24,7 +24,8 @@ CarbonData provides SDK to facilitate
 
 # SDK Writer
 
-In the carbon jars package, there exist a 
carbondata-store-sdk-x.x.x-SNAPSHOT.jar, including SDK writer and reader.
+In the carbon jars package, there exist a 
carbondata-store-sdk-x.x.x-SNAPSHOT.jar, including SDK writer and reader. 
+If you want to use SDK, it needs other carbon jar or you can use 
carbondata-sdk.jar.
--- End diff --

`it needs other carbon jar`
This sentence is not very clear


---


[GitHub] carbondata pull request #2836: [CARBONDATA-3027] Increase unsafe working mem...

2018-10-30 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2836#discussion_r229284826
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -1234,7 +1234,7 @@
 
   @CarbonProperty
   public static final String UNSAFE_WORKING_MEMORY_IN_MB = 
"carbon.unsafe.working.memory.in.mb";
-  public static final String UNSAFE_WORKING_MEMORY_IN_MB_DEFAULT = "512";
+  public static final String UNSAFE_WORKING_MEMORY_IN_MB_DEFAULT = "1024";
--- End diff --

You can change the configuration in your application, there is no need to 
change the default value of this parameter


---


[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2804
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1158/



---


[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Changes for improving carbo...

2018-10-30 Thread ajantha-bhat
Github user ajantha-bhat commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r229284011
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/util/CarbonVectorizedRecordReader.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.hadoop.util;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonV3DataFormatConstants;
+import org.apache.carbondata.core.datastore.block.TableBlockInfo;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.metadata.datatype.DecimalType;
+import org.apache.carbondata.core.metadata.datatype.StructField;
+import org.apache.carbondata.core.scan.executor.QueryExecutor;
+import org.apache.carbondata.core.scan.executor.QueryExecutorFactory;
+import 
org.apache.carbondata.core.scan.executor.exception.QueryExecutionException;
+import org.apache.carbondata.core.scan.model.ProjectionDimension;
+import org.apache.carbondata.core.scan.model.ProjectionMeasure;
+import org.apache.carbondata.core.scan.model.QueryModel;
+import 
org.apache.carbondata.core.scan.result.iterator.AbstractDetailQueryResultIterator;
+import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
+import org.apache.carbondata.core.scan.result.vector.CarbonColumnarBatch;
+import 
org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl;
+import org.apache.carbondata.core.util.ByteUtil;
+import org.apache.carbondata.hadoop.AbstractRecordReader;
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.log4j.Logger;
+
+/**
+ * A specialized RecordReader that reads into CarbonColumnarBatches 
directly using the
+ * carbondata column APIs and fills the data directly into columns.
+ */
+public class CarbonVectorizedRecordReader extends 
AbstractRecordReader {
+
+  private static final Logger LOGGER =
+  
LogServiceFactory.getLogService(CarbonVectorizedRecordReader.class.getName());
+
+  private CarbonColumnarBatch carbonColumnarBatch;
+
+  private QueryExecutor queryExecutor;
+
+  private int batchIdx = 0;
+
+  private int numBatched = 0;
+
+  private AbstractDetailQueryResultIterator iterator;
+
+  private QueryModel queryModel;
+
+  public CarbonVectorizedRecordReader(QueryModel queryModel) {
+this.queryModel = queryModel;
+  }
+
+  @Override public void initialize(InputSplit inputSplit, 
TaskAttemptContext taskAttemptContext)
+  throws IOException, InterruptedException {
+List splitList;
+if (inputSplit instanceof CarbonInputSplit) {
+  splitList = new ArrayList<>(1);
+  splitList.add((CarbonInputSplit) inputSplit);
+} else {
+  throw new RuntimeException("unsupported input split type: " + 
inputSplit);
+}
+List tableBlockInfoList = 
CarbonInputSplit.createBlocks(splitList);
+queryModel.setTableBlockInfos(tableBlockInfoList);
+queryModel.setVectorReader(true);
+try {
+  queryExecutor =
+  QueryExecutorFactory.getQueryExecutor(queryModel, 
taskAttemptContext.getConfiguration());
+  iterator = (AbstractDetailQueryResultIterator) 
queryExecutor.execute(queryModel);
+} catch (QueryExecutionException e) {
+  LOGGER.error(e);
+  throw new InterruptedException(e.getMessage());
+} catch (Exception e) {
+  LOGGER.error(e);
+  throw e;
+}
+  }
+
+  @Override public boolean nextKeyValue() throws IOException, 
InterruptedException {
+initBatch();
+if 

[GitHub] carbondata pull request #2868: [CARBONDATA-3052] Improve drop table performa...

2018-10-30 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2868#discussion_r229283191
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/filesystem/LocalCarbonFile.java
 ---
@@ -141,7 +141,12 @@ public boolean renameTo(String changetoName) {
   }
 
   public boolean delete() {
-return file.delete();
+try {
+  return deleteFile(file.getAbsolutePath(), 
FileFactory.getFileType(file.getAbsolutePath()));
+} catch (IOException e) {
+  LOGGER.error("Exception occurred:" + e.getMessage());
--- End diff --

include the exception in the error log


---


[GitHub] carbondata pull request #2868: [CARBONDATA-3052] Improve drop table performa...

2018-10-30 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2868#discussion_r229283056
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/filesystem/CarbonFile.java
 ---
@@ -62,6 +62,11 @@
 
   boolean renameForce(String changetoName);
 
+  /**
+   * This method will delete the files recursively from file system
+   *
+   * @return
--- End diff --

complete the comment


---


[GitHub] carbondata issue #2868: [CARBONDATA-3052] Improve drop table performance by ...

2018-10-30 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2868
  
If the table is on S3, will it behave correctly since it does not have 
"folder" concept?


---


[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2816
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1157/



---


[GitHub] carbondata issue #2882: [CARBONDATA-3060]Improve the command for cli and fix...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2882
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1368/



---


[GitHub] carbondata pull request #2871: [CARBONDATA-3051] Fix bugs in unclosed stream...

2018-10-30 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2871#discussion_r229282082
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java 
---
@@ -169,7 +170,7 @@ public CarbonReaderBuilder withHadoopConf(String key, 
String value) {
   reader.initialize(split, attempt);
   readers.add(reader);
 } catch (Exception e) {
-  reader.close();
+  CarbonUtil.closeStreams(readers.toArray(new RecordReader[0]));
--- End diff --

Why not loop and close each one in the `readers`?


---


[GitHub] carbondata issue #2861: [CARBONDATA-3025]handle passing spark appname for pa...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2861
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1156/



---


[GitHub] carbondata issue #2861: [CARBONDATA-3025]handle passing spark appname for pa...

2018-10-30 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2861
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9419/



---


[GitHub] carbondata pull request #2879: [CARBONDATA-3058] Fix some exception coding i...

2018-10-30 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2879#discussion_r229280016
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/steps/CarbonRowDataWriterProcessorStepImpl.java
 ---
@@ -212,7 +212,11 @@ private void finish(CarbonFactHandler dataHandler, int 
iteratorIndex) {
 try {
   processingComplete(dataHandler);
 } catch (CarbonDataLoadingException e) {
-  exception = new CarbonDataWriterException(e.getMessage(), e);
+  // only assign when exception is null
+  // else it will erase original root cause
+  if (null == exception) {
--- End diff --

Why should we keep this exception?If we only want to do some statistics, 
we can add that code in finally code block and you can just throw the exception 
in catch code block


---


[GitHub] carbondata pull request #2879: [CARBONDATA-3058] Fix some exception coding i...

2018-10-30 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2879#discussion_r229280235
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/steps/CarbonRowDataWriterProcessorStepImpl.java
 ---
@@ -308,7 +312,7 @@ private void processBatch(CarbonRowBatch batch, 
CarbonFactHandler dataHandler, i
   }
   writeCounter[iteratorIndex] += batch.getSize();
 } catch (Exception e) {
-  throw new CarbonDataLoadingException("unable to generate the mdkey", 
e);
+  throw new CarbonDataLoadingException(e);
--- End diff --

I think there is no need to wrap the exception here, just remove the 
try-catch code.


---


[GitHub] carbondata pull request #2881: [HOTFIX] Remove unuse javax.servlet jar from ...

2018-10-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2881


---


[GitHub] carbondata pull request #2879: [CARBONDATA-3058] Fix some exception coding i...

2018-10-30 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2879#discussion_r229280564
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/steps/DataWriterProcessorStepImpl.java
 ---
@@ -259,7 +259,7 @@ public void processRow(CarbonRow row, CarbonFactHandler 
dataHandler) throws KeyG
   readCounter++;
   dataHandler.addDataToStore(row);
 } catch (Exception e) {
-  throw new CarbonDataLoadingException("unable to generate the mdkey", 
e);
--- End diff --

no need to wrap the exception, just remove the try-catch code


---


[GitHub] carbondata pull request #2879: [CARBONDATA-3058] Fix some exception coding i...

2018-10-30 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2879#discussion_r229280426
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/steps/DataWriterBatchProcessorStepImpl.java
 ---
@@ -141,7 +141,9 @@ private void finish(String tableName, CarbonFactHandler 
dataHandler) {
 try {
   processingComplete(dataHandler);
 } catch (Exception e) {
-  exception = new CarbonDataWriterException(e.getMessage(), e);
+  if (null == exception) {
--- End diff --

no need to keep the exception here. you can do the statistics in finally 
code block


---


  1   2   3   >