[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...

2019-01-10 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3059#discussion_r246826911
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
 ---
@@ -609,6 +609,14 @@ public static Dictionary 
getDictionary(AbsoluteTableIdentifier absoluteTableIden
   blockAssignmentStrategy = 
BlockAssignmentStrategy.BLOCK_SIZE_FIRST;
 } else {
   blockAssignmentStrategy = 
BlockAssignmentStrategy.BLOCK_NUM_FIRST;
+  // fall back to BLOCK_NUM_FIRST strategy need to reset
+  // the average expected size for each node
+  if (blockInfos.size() > 0) {
+sizePerNode = blockInfos.size() / numOfNodes;
--- End diff --

This logic can be simplified like below. 
sizePerNode = blockInfos.size() / numOfNodes;
sizePerNode = sizePerNode == 0 ? 1 : sizePerNode

Also, please avoid DivideByZeroError if numOfNodes is zero in a faulty 
cluster case.


---


[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...

2019-01-09 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3059#discussion_r246317841
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
 ---
@@ -609,6 +613,9 @@ public static Dictionary 
getDictionary(AbsoluteTableIdentifier absoluteTableIden
   blockAssignmentStrategy = 
BlockAssignmentStrategy.BLOCK_SIZE_FIRST;
 } else {
   blockAssignmentStrategy = 
BlockAssignmentStrategy.BLOCK_NUM_FIRST;
+  // fall back to BLOCK_NUM_FIRST strategy need to reset
+  // the average expected size for each node
+  sizePerNode = numberOfBlocksPerNode;
--- End diff --

assignLeftOverBlocks also needs this similar if else self checks. I think 
its ok, you can   take a call to refactor now or later.


---


[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...

2019-01-09 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3059#discussion_r246311819
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
 ---
@@ -575,19 +575,23 @@ public static Dictionary 
getDictionary(AbsoluteTableIdentifier absoluteTableIden
 }
 
 // calculate the average expected size for each node
-long sizePerNode = 0;
+long numberOfBlocksPerNode = 0;
+if (blockInfos.size() > 0) {
+  numberOfBlocksPerNode = blockInfos.size() / numOfNodes;
+}
+numberOfBlocksPerNode = numberOfBlocksPerNode <= 0 ? 1 : 
numberOfBlocksPerNode;
+long dataSizePerNode = 0;
 long totalFileSize = 0;
+for (Distributable blockInfo : uniqueBlocks) {
+  totalFileSize += ((TableBlockInfo) blockInfo).getBlockLength();
+}
+dataSizePerNode = totalFileSize / numOfNodes;
+long sizePerNode = 0;
 if (BlockAssignmentStrategy.BLOCK_NUM_FIRST == 
blockAssignmentStrategy) {
-  if (blockInfos.size() > 0) {
-sizePerNode = blockInfos.size() / numOfNodes;
-  }
-  sizePerNode = sizePerNode <= 0 ? 1 : sizePerNode;
+  sizePerNode = numberOfBlocksPerNode;
--- End diff --

This if else can be complete avoided and use the correct variable in the 
method call for blocks allocation


---


[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...

2019-01-09 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3059#discussion_r246311168
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
 ---
@@ -609,6 +613,9 @@ public static Dictionary 
getDictionary(AbsoluteTableIdentifier absoluteTableIden
   blockAssignmentStrategy = 
BlockAssignmentStrategy.BLOCK_SIZE_FIRST;
 } else {
   blockAssignmentStrategy = 
BlockAssignmentStrategy.BLOCK_NUM_FIRST;
+  // fall back to BLOCK_NUM_FIRST strategy need to reset
+  // the average expected size for each node
+  sizePerNode = numberOfBlocksPerNode;
--- End diff --

instead of reassigning the same variable, assignBlocksByDataLocality () can 
use numberOfBlocksPerNode directly?


---


[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...

2019-01-09 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3059#discussion_r246309331
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
 ---
@@ -575,19 +575,23 @@ public static Dictionary 
getDictionary(AbsoluteTableIdentifier absoluteTableIden
 }
 
 // calculate the average expected size for each node
-long sizePerNode = 0;
+long numberOfBlocksPerNode = 0;
+if (blockInfos.size() > 0) {
+  numberOfBlocksPerNode = blockInfos.size() / numOfNodes;
+}
+numberOfBlocksPerNode = numberOfBlocksPerNode <= 0 ? 1 : 
numberOfBlocksPerNode;
+long dataSizePerNode = 0;
 long totalFileSize = 0;
+for (Distributable blockInfo : uniqueBlocks) {
+  totalFileSize += ((TableBlockInfo) blockInfo).getBlockLength();
+}
+dataSizePerNode = totalFileSize / numOfNodes;
+long sizePerNode = 0;
 if (BlockAssignmentStrategy.BLOCK_NUM_FIRST == 
blockAssignmentStrategy) {
-  if (blockInfos.size() > 0) {
-sizePerNode = blockInfos.size() / numOfNodes;
-  }
-  sizePerNode = sizePerNode <= 0 ? 1 : sizePerNode;
+  sizePerNode = numberOfBlocksPerNode;
--- End diff --

Please don't change sizePerNode variable


---


[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...

2019-01-09 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3059#discussion_r246286723
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
 ---
@@ -609,6 +597,10 @@ public static Dictionary 
getDictionary(AbsoluteTableIdentifier absoluteTableIden
   blockAssignmentStrategy = 
BlockAssignmentStrategy.BLOCK_SIZE_FIRST;
 } else {
   blockAssignmentStrategy = 
BlockAssignmentStrategy.BLOCK_NUM_FIRST;
+  // fall back to BLOCK_NUM_FIRST strategy need to recalculate
+  // the average expected size for each node
+  sizePerNode = calcAvgLoadSizePerNode(blockInfos,uniqueBlocks,
--- End diff --

Avoid calculating times same values. Calculate once and reuse based on the 
strategy.


---


[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...

2019-01-08 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3059#discussion_r246286475
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
 ---
@@ -1164,4 +1156,35 @@ private static void deleteFiles(List 
filesToBeDeleted) throws IOExceptio
   FileFactory.deleteFile(filePath, FileFactory.getFileType(filePath));
 }
   }
+
+  /**
+   * This method will calculate the average expected size for each node
+   *
+   * @param blockInfos blocks
+   * @param uniqueBlocks unique blocks
+   * @param numOfNodes if number of nodes has to be decided
+   *   based on block location information
+   * @param blockAssignmentStrategy strategy used to assign blocks
+   * @return the average expected size for each node
+   */
+  private static long calcAvgLoadSizePerNode(List 
blockInfos,
--- End diff --

Please separate the code for identifying the numberOfBlocksPerNode and 
dataSizeperNode.
Use the required variable based on the BlockAssignmentStartegy. This way 
this function also not required I think.

Using same name for both purposes is confusing. 


---


[GitHub] carbondata issue #3048: [CARBONDATA-3224] Support SDK/CSDK validate the impr...

2019-01-07 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/3048
  
LGTM


---


[GitHub] carbondata pull request #3048: [CARBONDATA-3224] Support SDK/CSDK validate t...

2019-01-06 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3048#discussion_r245532720
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java 
---
@@ -179,7 +181,8 @@ public CarbonWriterBuilder uniqueIdentifier(long 
timestamp) {
*
* @return updated CarbonWriterBuilder
*/
-  public CarbonWriterBuilder withLoadOptions(Map options) {
+  public CarbonWriterBuilder withLoadOptions(Map options)
+  throws IllegalArgumentException {
--- End diff --

IllegalArgumentException is a RuntimeException, In general RuntimExceptions 
are not listed in throws part of method. It will be added in the javadoc if 
method like @throws IllegalArgumentException  


---


[GitHub] carbondata pull request #3048: [CARBONDATA-3224] Support SDK/CSDK validate t...

2019-01-04 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3048#discussion_r245328731
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java 
---
@@ -179,7 +182,8 @@ public CarbonWriterBuilder uniqueIdentifier(long 
timestamp) {
*
* @return updated CarbonWriterBuilder
*/
-  public CarbonWriterBuilder withLoadOptions(Map options) {
+  public CarbonWriterBuilder withLoadOptions(Map options)
--- End diff --

IllegalArgumentException is used already for some validations below.  
Better not to change the signature of method for the existing method. It causes 
compilation errors for existing user code.


---


[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...

2019-01-03 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3047#discussion_r244980360
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala
 ---
@@ -101,14 +102,23 @@ object CarbonStore {
   val (dataSize, indexSize) = if (load.getFileFormat == 
FileFormat.ROW_V1) {
 // for streaming segment, we should get the actual size from 
the index file
 // since it is continuously inserting data
-val segmentDir = CarbonTablePath.getSegmentPath(tablePath, 
load.getLoadName)
+val segmentDir = CarbonTablePath
+  .getSegmentPath(carbonTable.getTablePath, load.getLoadName)
 val indexPath = 
CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir)
 val indices = StreamSegment.readIndexFile(indexPath, 
FileFactory.getFileType(indexPath))
 (indices.asScala.map(_.getFile_size).sum, 
FileFactory.getCarbonFile(indexPath).getSize)
   } else {
 // for batch segment, we can get the data size from table 
status file directly
-(if (load.getDataSize == null) 0L else load.getDataSize.toLong,
-  if (load.getIndexSize == null) 0L else 
load.getIndexSize.toLong)
+if (null == load.getDataSize || null == load.getIndexSize) {
+  // If either of datasize or indexsize comes to be null the 
we calculate the correct
+  // size and assign
+  val dataIndexSize = 
CarbonUtil.calculateDataIndexSize(carbonTable, true)
--- End diff --

Show segments is a read only query. I think we should not perform write 
operation in a query. 
So, I feel its better to calculate every time and show OR just display as 
not available.


---


[GitHub] carbondata pull request #3035: [CARBONDATA-3216] Fix some bugs in CSDK

2019-01-02 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3035#discussion_r244908015
  
--- Diff: store/CSDK/test/main.cpp ---
@@ -709,6 +709,7 @@ bool testWithTableProperty(JNIEnv *env, char *path, int 
argc, char **argv) {
 writer.outputPath(path);
 writer.withCsvInput(jsonSchema);
 writer.withTableProperty("sort_columns", "shortField");
+writer.enableLocalDictionary(false);
--- End diff --

Can you test if there is any possibility to give NULL in C and how it is 
sent to java layer ? true or false?


---


[GitHub] carbondata pull request #3035: [CARBONDATA-3216] Fix some bugs in CSDK

2019-01-02 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3035#discussion_r244907927
  
--- Diff: store/CSDK/test/main.cpp ---
@@ -546,7 +546,7 @@ bool testWriteData(JNIEnv *env, char *path, int argc, 
char *argv[]) {
 writer.withCsvInput(jsonSchema);
 writer.withLoadOption("complex_delimiter_level_1", "#");
 writer.writtenBy("CSDK");
-writer.taskNo(185);
+writer.taskNo(15541554.81);
--- End diff --

This change not related to fix


---


[GitHub] carbondata pull request #3035: [CARBONDATA-3216] Fix some bugs in CSDK

2019-01-02 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3035#discussion_r244907282
  
--- Diff: store/CSDK/src/CarbonWriter.cpp ---
@@ -291,9 +291,6 @@ void CarbonWriter::localDictionaryThreshold(int 
localDictionaryThreshold) {
 }
 
 void CarbonWriter::enableLocalDictionary(bool enableLocalDictionary) {
-if (enableLocalDictionary == NULL) {
--- End diff --

Can you give some description in the  PR, how removing this solving the 
issue?


---


[GitHub] carbondata pull request #3035: [CARBONDATA-3216] Fix some bugs in CSDK

2019-01-02 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3035#discussion_r244907183
  
--- Diff: store/CSDK/test/main.cpp ---
@@ -853,7 +854,7 @@ int main(int argc, char *argv[]) {
 } else {
 int batch = 32000;
 int printNum = 32000;
-
+//
--- End diff --

remove unwanted changes


---


[GitHub] carbondata pull request #2991: [CARBONDATA-3043] Add build script and add te...

2018-12-20 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2991#discussion_r243361889
  
--- Diff: store/CSDK/test/main.cpp ---
@@ -665,6 +699,208 @@ bool readFromS3(JNIEnv *env, char *path, char 
*argv[]) {
 printResult(env, reader);
 }
 
+TEST(CSDKTest,tryCatchException) {
+bool gotExp=tryCatchException(env);
+EXPECT_TRUE(gotExp);
+}
+
+
+TEST(CSDKTest,tryCarbonRowException) {
+char *smallFilePath = "../../../../resources/carbondata";
+try {
+bool result = tryCarbonRowException(env, smallFilePath);;
+EXPECT_TRUE(result) << "Expected Exception as No Index File" << 
result;
+} catch (runtime_error e) {
+EXPECT_TRUE(true);
+}
+}
+
+TEST(CSDKTest,testCarbonProperties) {
+try {
+bool result = testCarbonProperties(env);
+EXPECT_TRUE(result) << "Carbon set properties not working ";
+} catch (runtime_error e) {
+EXPECT_TRUE(false) << " Exception is not expected while setting 
carbon properties ";
+}
+}
+
+TEST(CSDKTest,testWriteData) {
+try {
+bool result =testWriteData(env, "./data", my_argc, my_argv);
+result = result && testWriteData(env, "./data", my_argc, my_argv);
+EXPECT_TRUE(result) << "Either Data Loading Or Carbon Reader  is 
failed";
+} catch (runtime_error e) {
+EXPECT_TRUE(false) << " Exception is not expected while data 
loading";
+}
+}
+
+TEST(CSDKTest,readFromLocalWithoutProjection) {
+try {
+char *smallFilePath = "./data_withoutpro";
+bool result =testWriteData(env, smallFilePath, my_argc, my_argv);
+if(result){
+bool proj_result = readFromLocalWithoutProjection(env, 
smallFilePath);
+EXPECT_TRUE(proj_result) << "Without Projection is failed";
+} else {
+EXPECT_TRUE(result) << "Either Data Loading Or Carbon Reader  
is failed";
+}
+} catch (runtime_error e) {
+EXPECT_TRUE(false) << " Exception is not expected ,During without 
projection selection";
+}
+}
+
+TEST(CSDKTest,readFromLocalWithProjection) {
+try {
+char *smallFilePath = "./data_pro";
+bool result =testWriteData(env, smallFilePath, my_argc, my_argv);
+if(result){
+bool proj_result = readFromLocalWithProjection(env, 
smallFilePath);
+EXPECT_TRUE(proj_result) << "With Projection is failed";
+} else {
+EXPECT_TRUE(result) << "Either Data Loading Or Carbon Reader  
is failed";
+}
+} catch (runtime_error e) {
+EXPECT_TRUE(false) << " Exception is not expected ,During With 
projection selection";
+}
+}
+
+
+TEST(CSDKTest,readSchemaWithoutValidation) {
+try {
+char *path = "./data_readPath";
+bool result =testWriteData(env, path, my_argc, my_argv);
+if(result){
+bool schema_result = readSchema(env, path, false);
+EXPECT_TRUE(schema_result) << "Not Able to read readSchema 
from given path";
+} else {
+EXPECT_TRUE(result) << "Either Data Loading Or Carbon Reader  
is failed";
+}
+} catch (runtime_error e) {
+EXPECT_TRUE(false) << " Exception is not expected ,During Read 
Schema";
+}
+}
+
+
+TEST(CSDKTest,readSchemaWithValidation) {
+try {
+char *path = "./data_readPathWithValidation";
+bool result =testWriteData(env, path, my_argc, my_argv);
+if(result){
+bool schema_result = readSchema(env, path, true);
+EXPECT_TRUE(schema_result) << "Not Able to read readSchema Or 
validate from given path";
+} else {
+EXPECT_TRUE(result) << "Either Data Loading Or Carbon Reader  
is failed";
+}
+} catch (runtime_error e) {
+EXPECT_TRUE(false) << " Exception is not expected ,During Read 
Schema";
+}
+}
+
+TEST(CSDKTest,testReadNextRowWthtVector) {
+try {
+int printNum = 32000;
+char *path = "./data_forVector";
+bool result =testWriteData(env, path, my_argc, my_argv);
+if(result){
+bool readresultWithVector= testReadNextRow(env, pa

[GitHub] carbondata pull request #2991: [CARBONDATA-3043] Add build script and add te...

2018-12-20 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2991#discussion_r243360358
  
--- Diff: store/CSDK/test/main.cpp ---
@@ -665,6 +699,208 @@ bool readFromS3(JNIEnv *env, char *path, char 
*argv[]) {
 printResult(env, reader);
 }
 
+TEST(CSDKTest,tryCatchException) {
+bool gotExp=tryCatchException(env);
+EXPECT_TRUE(gotExp);
+}
+
+
+TEST(CSDKTest,tryCarbonRowException) {
+char *smallFilePath = "../../../../resources/carbondata";
+try {
+bool result = tryCarbonRowException(env, smallFilePath);;
+EXPECT_TRUE(result) << "Expected Exception as No Index File" << 
result;
+} catch (runtime_error e) {
+EXPECT_TRUE(true);
+}
+}
+
+TEST(CSDKTest,testCarbonProperties) {
+try {
+bool result = testCarbonProperties(env);
+EXPECT_TRUE(result) << "Carbon set properties not working ";
+} catch (runtime_error e) {
+EXPECT_TRUE(false) << " Exception is not expected while setting 
carbon properties ";
+}
+}
+
+TEST(CSDKTest,testWriteData) {
+try {
+bool result =testWriteData(env, "./data", my_argc, my_argv);
+result = result && testWriteData(env, "./data", my_argc, my_argv);
+EXPECT_TRUE(result) << "Either Data Loading Or Carbon Reader  is 
failed";
+} catch (runtime_error e) {
+EXPECT_TRUE(false) << " Exception is not expected while data 
loading";
+}
+}
+
+TEST(CSDKTest,readFromLocalWithoutProjection) {
+try {
+char *smallFilePath = "./data_withoutpro";
+bool result =testWriteData(env, smallFilePath, my_argc, my_argv);
+if(result){
+bool proj_result = readFromLocalWithoutProjection(env, 
smallFilePath);
+EXPECT_TRUE(proj_result) << "Without Projection is failed";
+} else {
+EXPECT_TRUE(result) << "Either Data Loading Or Carbon Reader  
is failed";
+}
+} catch (runtime_error e) {
+EXPECT_TRUE(false) << " Exception is not expected ,During without 
projection selection";
+}
+}
+
+TEST(CSDKTest,readFromLocalWithProjection) {
+try {
+char *smallFilePath = "./data_pro";
+bool result =testWriteData(env, smallFilePath, my_argc, my_argv);
+if(result){
+bool proj_result = readFromLocalWithProjection(env, 
smallFilePath);
+EXPECT_TRUE(proj_result) << "With Projection is failed";
+} else {
+EXPECT_TRUE(result) << "Either Data Loading Or Carbon Reader  
is failed";
+}
+} catch (runtime_error e) {
+EXPECT_TRUE(false) << " Exception is not expected ,During With 
projection selection";
+}
+}
+
+
+TEST(CSDKTest,readSchemaWithoutValidation) {
+try {
+char *path = "./data_readPath";
+bool result =testWriteData(env, path, my_argc, my_argv);
+if(result){
+bool schema_result = readSchema(env, path, false);
+EXPECT_TRUE(schema_result) << "Not Able to read readSchema 
from given path";
+} else {
+EXPECT_TRUE(result) << "Either Data Loading Or Carbon Reader  
is failed";
+}
+} catch (runtime_error e) {
+EXPECT_TRUE(false) << " Exception is not expected ,During Read 
Schema";
+}
+}
+
+
+TEST(CSDKTest,readSchemaWithValidation) {
+try {
+char *path = "./data_readPathWithValidation";
+bool result =testWriteData(env, path, my_argc, my_argv);
+if(result){
+bool schema_result = readSchema(env, path, true);
+EXPECT_TRUE(schema_result) << "Not Able to read readSchema Or 
validate from given path";
+} else {
+EXPECT_TRUE(result) << "Either Data Loading Or Carbon Reader  
is failed";
+}
+} catch (runtime_error e) {
+EXPECT_TRUE(false) << " Exception is not expected ,During Read 
Schema";
+}
+}
+
+TEST(CSDKTest,testReadNextRowWthtVector) {
+try {
+int printNum = 32000;
+char *path = "./data_forVector";
+bool result =testWriteData(env, path, my_argc, my_argv);
+if(result){
+bool readresultWithVector= testReadNextRow(env, pa

[GitHub] carbondata pull request #2992: [CARBONDATA-3176] Optimize quick-start-guide ...

2018-12-19 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2992#discussion_r243168227
  
--- Diff: docs/quick-start-guide.md ---
@@ -80,43 +80,49 @@ import org.apache.spark.sql.CarbonSession._
 * Create a CarbonSession :
 
 ```
-val carbon = SparkSession.builder().config(sc.getConf)
- .getOrCreateCarbonSession("")
+val carbon = 
SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("")
 ```
-**NOTE**: By default metastore location points to `../carbon.metastore`, 
user can provide own metastore location to CarbonSession like 
`SparkSession.builder().config(sc.getConf)
-.getOrCreateCarbonSession("", "")`
+**NOTE** 
+ - By default metastore location points to `../carbon.metastore`, user can 
provide own metastore location to CarbonSession like 
+   
`SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("",
 "")`.
+ - Data storage location can be specified by ``, like 
`/carbon/data/store`, `hdfs://localhost:9000/carbon/data/store` or 
`s3a://carbon/data/store`.
 
  Executing Queries
 
 ## Creating a Table
 
 ```
-scala>carbon.sql("CREATE TABLE
-IF NOT EXISTS test_table(
-id string,
-name string,
-city string,
-age Int)
-  STORED AS carbondata")
+carbon.sql(
--- End diff --

This is scala code format.  I think examples need not follow scala code 
format. 
I think the examples are wrapped to multiple lines for readability (Even if 
document converted to PDF)

@sraghunandan , @sgururajshetty please help to confirm the standard 
convention followed.


---


[GitHub] carbondata pull request #2919: [CARBONDATA-3097] Support folder path in getV...

2018-12-18 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2919#discussion_r242588569
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java 
---
@@ -241,12 +241,25 @@ private static Schema readSchemaFromIndexFile(String 
indexFilePath) throws IOExc
 
   /**
* This method return the version details in formatted string by reading 
from carbondata file
+   * default won't validate the version details between different 
carbondata files.
*
-   * @param dataFilePath
-   * @return
+   * @param path carbondata file path or folder path
+   * @return string with information of who has written this file
+   * in which carbondata project version
* @throws IOException
*/
-  public static String getVersionDetails(String dataFilePath) throws 
IOException {
+  public static String getVersionDetails(String path) throws IOException {
+if (path.endsWith(INDEX_FILE_EXT)) {
+  throw new RuntimeException("Can't get version details from 
carbonindex file.");
--- End diff --

RuntimeException is mentioned in the signature. Should throw IOException.
Also some places in this class throws CarbonDataLoadingException but if not 
intentional we can change them to IOException as per signature instead of a 
subclass of RuntimeException (if it was not intentional)


---


[GitHub] carbondata pull request #2919: [CARBONDATA-3097] Support folder path in getV...

2018-12-18 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2919#discussion_r242578856
  
--- Diff: docs/sdk-guide.md ---
@@ -816,13 +816,15 @@ Find example code at 
[CarbonReaderExample](https://github.com/apache/carbondata/
* This method return the version details in formatted string by reading 
from carbondata file
* If application name is SDK_1.0.0 and this has written the carbondata 
file in carbondata 1.6 project version,
* then this API returns the String "SDK_1.0.0 in version: 
1.6.0-SNAPSHOT"
-   * @param dataFilePath complete path including carbondata file name
+   * Default value of validate is false, it won't validate the version 
details between different carbondata files.
--- End diff --

" Default value of validate is false,"
- This statement is not valid as we don't have validate and no validate 
flows differently.


---


[GitHub] carbondata pull request #2919: [CARBONDATA-3097] Support folder path in getV...

2018-12-18 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2919#discussion_r242577513
  
--- Diff: 
store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonSchemaReaderTest.java
 ---
@@ -236,6 +243,113 @@ public void testReadSchemaWithDifferentSchema() {
 }
   }
 
+  @Test
+  public void testGetVersionDetailsAndValidate() {
+try {
+  String versionDetails = CarbonSchemaReader
+  .getVersionDetails(path);
+  Assert.assertTrue(versionDetails.contains("CarbonSchemaReaderTest in 
version: "));
+} catch (Throwable e) {
+  e.printStackTrace();
+  Assert.fail();
+}
+  }
+
+  @Test
+  public void testGetVersionDetailsWithoutValidate() {
+try {
+  String versionDetails = CarbonSchemaReader
+  .getVersionDetails(path);
+  Assert.assertTrue(versionDetails.contains("CarbonSchemaReaderTest in 
version: "));
+} catch (Throwable e) {
+  e.printStackTrace();
+  Assert.fail();
+}
+  }
+
+  @Test
+  public void testGetVersionDetailsWithCarbonDataFile() {
+try {
+  File[] dataFiles = new File(path).listFiles(new FilenameFilter() {
+@Override
+public boolean accept(File dir, String name) {
+  return name.endsWith(CARBON_DATA_EXT);
+}
+  });
+  String versionDetails = 
CarbonSchemaReader.getVersionDetails(dataFiles[0].getAbsolutePath());
+  Assert.assertTrue(versionDetails.contains("CarbonSchemaReaderTest in 
version: "));
+} catch (Throwable e) {
+  e.printStackTrace();
+  Assert.fail();
+}
+  }
+
+  @Test
+  public void testGetVersionDetailsWithCarbonIndexFile() {
+try {
+  File[] indexFiles = new File(path).listFiles(new FilenameFilter() {
+@Override
+public boolean accept(File dir, String name) {
+  return name.endsWith(INDEX_FILE_EXT);
+}
+  });
+  
CarbonSchemaReader.getVersionDetails(indexFiles[0].getAbsolutePath());
+  Assert.fail();
+} catch (Throwable e) {
+  Assert.assertTrue(e.getMessage()
+  .equalsIgnoreCase("Can't get version details from carbonindex 
file."));
+}
+  }
+
+  public void testGetVersionDetailsWithTheSameSchema() {
+try {
+  writeData();
+  try {
+String versionDetails = CarbonSchemaReader
+.getVersionDetails(path);
+Assert.assertTrue(versionDetails
+.contains("CarbonSchemaReaderTest in version: "));
+  } catch (Exception e) {
+Assert.fail();
+  }
+} catch (Throwable e) {
+  e.printStackTrace();
+  Assert.fail();
+}
+  }
+
+  @Test
+  public void testGetVersionDetailsWithDifferentSchema() {
+try {
+  int num = 10;
+  Field[] fields = new Field[2];
+  fields[0] = new Field("name", DataTypes.STRING);
+  fields[1] = new Field("age", DataTypes.INT);
+  CarbonWriter writer = CarbonWriter
+  .builder()
+  .outputPath(path)
+  .withCsvInput(new Schema(fields))
+  .writtenBy("testReadSchemaWithDifferentSchema")
+  .build();
+
+  for (int i = 0; i < num; i++) {
+writer.write(new String[]{"robot" + (i % 10), String.valueOf(i)});
+  }
+  writer.close();
+  try {
+CarbonSchemaReader
+.getVersionDetails(path);
+  } catch (Exception e) {
+Assert.assertTrue(e.getMessage()
+.equalsIgnoreCase("Version details is different between 
different files."));
+Assert.fail();
--- End diff --

 What is the purpose of this test case?  "Version details is different 
between different files" exception we never get now right? If thee is no 
separate validation required, we can remove this test.



---


[GitHub] carbondata pull request #2919: [CARBONDATA-3097] Support folder path in getV...

2018-12-18 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2919#discussion_r242577599
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java 
---
@@ -241,12 +241,25 @@ private static Schema readSchemaFromIndexFile(String 
indexFilePath) throws IOExc
 
   /**
* This method return the version details in formatted string by reading 
from carbondata file
+   * default won't validate the version details between different 
carbondata files.
*
-   * @param dataFilePath
-   * @return
+   * @param path carbondata file path or folder path
+   * @return string with information of who has written this file
+   * in which carbondata project version
* @throws IOException
*/
-  public static String getVersionDetails(String dataFilePath) throws 
IOException {
+  public static String getVersionDetails(String path) throws IOException {
+if (path.endsWith(INDEX_FILE_EXT)) {
+  throw new RuntimeException("Can't get version details from 
carbonindex file.");
+} else if (path.endsWith(CARBON_DATA_EXT)) {
+  return getVersionDetailsFromDataFile(path);
+} else {
+  String indexFilePath = getCarbonFile(path, 
CARBON_DATA_EXT)[0].getAbsolutePath();
--- End diff --

this is data file


---


[GitHub] carbondata issue #2931: [CARBONDATA-2999] support read schema from S3

2018-12-17 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/2931
  
LGTM


---


[GitHub] carbondata pull request #2919: [CARBONDATA-3097] Support folder path in getV...

2018-12-16 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2919#discussion_r242035462
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java 
---
@@ -241,12 +241,52 @@ private static Schema readSchemaFromIndexFile(String 
indexFilePath) throws IOExc
 
   /**
* This method return the version details in formatted string by reading 
from carbondata file
+   * If validate is true, it will check the version details between 
different carbondata files.
+   * And if version details are not the same, it will throw exception
*
-   * @param dataFilePath
-   * @return
+   * @param path carbondata file path or folder path
+   * @param validate whether validate the version details between 
different carbondata files.
+   * @return string with information of who has written this file
+   * in which carbondata project version
* @throws IOException
*/
-  public static String getVersionDetails(String dataFilePath) throws 
IOException {
+  public static String getVersionDetails(String path, boolean validate) 
throws IOException {
--- End diff --

I think it is not correct to validate readability through version details. 
In general new version jars can read all old version files. 

Please remove this method.




---


[GitHub] carbondata pull request #2931: [CARBONDATA-2999] support read schema from S3

2018-12-16 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2931#discussion_r242034641
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java 
---
@@ -147,34 +170,55 @@ public static Schema readSchema(String path, boolean 
validateSchema) throws IOEx
 throw new CarbonDataLoadingException("No carbonindex file in this 
path.");
   }
 } else {
-  String indexFilePath = getCarbonFile(path, 
INDEX_FILE_EXT)[0].getAbsolutePath();
-  return readSchemaFromIndexFile(indexFilePath);
+  String indexFilePath = getCarbonFile(path, INDEX_FILE_EXT, 
conf)[0].getAbsolutePath();
+  return readSchemaFromIndexFile(indexFilePath, conf);
 }
   }
 
+  /**
+   * read schema from path,
+   * path can be folder path, carbonindex file path, and carbondata file 
path
+   * and user can decide whether check all files schema
+   *
+   * @param path   file/folder path
+   * @param validateSchema whether check all files schema
+   * @return schema
+   * @throws IOException
+   */
+  public static Schema readSchema(String path, boolean validateSchema) 
throws IOException {
+Configuration conf = new Configuration();
+return readSchema(path, validateSchema, conf);
+  }
+
   /**
* Read carbondata file and return the schema
* This interface will be removed,
* please use readSchema instead of this interface
*
* @param dataFilePath carbondata file store path
+   * @param conf hadoop configuration support, can set s3a AK,SK,
+   * end point and other conf with this
* @return Schema object
* @throws IOException
*/
   @Deprecated
-  public static Schema readSchemaInDataFile(String dataFilePath) throws 
IOException {
-return readSchema(dataFilePath, false);
+  public static Schema readSchemaInDataFile(String dataFilePath, 
Configuration conf)
--- End diff --

No need to change the deprecated old method.  Keep as it is.


---


[GitHub] carbondata issue #2899: [CARBONDATA-3073][CARBONDATA-3044] Support configure...

2018-12-16 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/2899
  
LGTM


---


[GitHub] carbondata pull request #2899: [CARBONDATA-3073][CARBONDATA-3044] Support co...

2018-12-14 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2899#discussion_r241731133
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java 
---
@@ -77,8 +77,9 @@
 
   /**
* Sets the output path of the writer builder
+   *
--- End diff --

Can avoid the space changes in multiple places if not intentional.


---


[GitHub] carbondata pull request #2899: [CARBONDATA-3073][CARBONDATA-3044] Support co...

2018-12-14 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2899#discussion_r241730265
  
--- Diff: store/CSDK/src/CarbonWriter.cpp ---
@@ -98,6 +127,158 @@ void CarbonWriter::withHadoopConf(char *key, char 
*value) {
 carbonWriterBuilderObject = 
jniEnv->CallObjectMethodA(carbonWriterBuilderObject, methodID, args);
 }
 
+void CarbonWriter::withTableProperty(char *key, char *value) {
+if (key == NULL) {
+throw std::runtime_error("key parameter can't be NULL.");
+}
+if (value == NULL) {
+throw std::runtime_error("value parameter can't be NULL.");
+}
+checkBuilder();
+jclass carbonWriterBuilderClass = 
jniEnv->GetObjectClass(carbonWriterBuilderObject);
+jmethodID methodID = jniEnv->GetMethodID(carbonWriterBuilderClass, 
"withTableProperty",
+
"(Ljava/lang/String;Ljava/lang/String;)Lorg/apache/carbondata/sdk/file/CarbonWriterBuilder;");
+if (methodID == NULL) {
+throw std::runtime_error("Can't find the method in java: 
withTableProperty");
+}
+jvalue args[2];
+args[0].l = jniEnv->NewStringUTF(key);
+args[1].l = jniEnv->NewStringUTF(value);
+carbonWriterBuilderObject = 
jniEnv->CallObjectMethodA(carbonWriterBuilderObject, methodID, args);
--- End diff --

Java code may throw IllegalArgumentException for unsuported property.  We 
should add exception check after the method call


---


[GitHub] carbondata issue #2986: [CARBONDATA-3166]Updated Document and added Column C...

2018-12-13 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/2986
  
LGTM


---


[GitHub] carbondata pull request #2986: [CARBONDATA-3166]Updated Document and added C...

2018-12-13 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2986#discussion_r241477189
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala
 ---
@@ -92,7 +92,9 @@ private[sql] case class CarbonDescribeFormattedCommand(
 Strings.formatSize(
   
tblProps.getOrElse(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB,
 
CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB_DEFAULT).toFloat), ""),
-
+  ("Carbon Column Compressor ", tblProps
--- End diff --

OK


---


[GitHub] carbondata pull request #2986: [CARBONDATA-3166]Updated Document and added C...

2018-12-13 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2986#discussion_r241363901
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala
 ---
@@ -92,7 +92,9 @@ private[sql] case class CarbonDescribeFormattedCommand(
 Strings.formatSize(
   
tblProps.getOrElse(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB,
 
CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB_DEFAULT).toFloat), ""),
-
+  ("Carbon Column Compressor ", tblProps
--- End diff --

By default we should display empty  instead of system default.


---


[GitHub] carbondata pull request #2986: [CARBONDATA-3166]Updated Document and added C...

2018-12-13 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2986#discussion_r241338497
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala
 ---
@@ -92,7 +92,9 @@ private[sql] case class CarbonDescribeFormattedCommand(
 Strings.formatSize(
   
tblProps.getOrElse(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB,
 
CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB_DEFAULT).toFloat), ""),
-
+  ("Carbon Column Compressor ", tblProps
--- End diff --

Same compressor is used for column and row data files. So, please change it 
to  "Data File Compressor"


---


[GitHub] carbondata issue #2847: [CARBONDATA-3005]Support Gzip as column compressor

2018-12-11 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/2847
  
LGTM


---


[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

2018-12-10 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240208519
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+/**
+ * Codec Class for performing Gzip Compression
+ */
+public class GzipCompressor extends AbstractCompressor {
+
+  @Override public String getName() {
+return "gzip";
+  }
+
+  /**
+   * This method takes the Byte Array data and Compresses in gzip format
+   *
+   * @param data Data Byte Array passed for compression
+   * @return Compressed Byte Array
+   */
+  private byte[] compressData(byte[] data) {
+ByteArrayOutputStream byteArrayOutputStream = new 
ByteArrayOutputStream();
--- End diff --

ByteArrayOutputStream initializes with 32 and copies the data to new byte[] 
on expansion. Can you use a better initial size to limit the number of copies 
during expansion. Snappy has a utility (maxCompressedLength) to calculate the 
same, you check if any gzip libs has similar method. If not we an use based a 
test with max possible compression ratio. 


---


[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

2018-12-10 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240206699
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+/**
+ * Codec Class for performing Gzip Compression
+ */
+public class GzipCompressor extends AbstractCompressor {
+
+  @Override public String getName() {
+return "gzip";
+  }
+
+  /**
+   * This method takes the Byte Array data and Compresses in gzip format
+   *
+   * @param data Data Byte Array passed for compression
+   * @return Compressed Byte Array
+   */
+  private byte[] compressData(byte[] data) {
+ByteArrayOutputStream byteArrayOutputStream = new 
ByteArrayOutputStream();
+try {
+  GzipCompressorOutputStream gzipCompressorOutputStream =
+  new GzipCompressorOutputStream(byteArrayOutputStream);
+  try {
+/**
+ * Below api will write bytes from specified byte array to the 
gzipCompressorOutputStream
+ * The output stream will compress the given byte array.
+ */
+gzipCompressorOutputStream.write(data);
+  } catch (IOException e) {
+throw new RuntimeException("Error during Compression writing step 
", e);
+  } finally {
+gzipCompressorOutputStream.close();
+  }
+} catch (IOException e) {
+  throw new RuntimeException("Error during Compression step ", e);
+}
+return byteArrayOutputStream.toByteArray();
+  }
+
+  /**
+   * This method takes the Byte Array data and Decompresses in gzip format
+   *
+   * @param data   Data Byte Array for Compression
+   * @param offset Start value of Data Byte Array
+   * @param length Size of Byte Array
+   * @return
+   */
+  private byte[] decompressData(byte[] data, int offset, int length) {
+ByteArrayInputStream byteArrayOutputStream = new 
ByteArrayInputStream(data, offset, length);
+ByteArrayOutputStream byteOutputStream = new ByteArrayOutputStream();
+try {
+  GzipCompressorInputStream gzipCompressorInputStream =
+  new GzipCompressorInputStream(byteArrayOutputStream);
+  byte[] buffer = new byte[1024];
--- End diff --

Instead of fixed 1024, can you observe what is the blocksize (bytes size) 
gzip operates and use the same value ?


---


[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

2018-12-10 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240211334
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala
 ---
@@ -252,50 +253,94 @@ class TestLoadDataWithCompression extends QueryTest 
with BeforeAndAfterEach with
""".stripMargin)
   }
 
-  test("test data loading with snappy compressor and offheap") {
+  test("test data loading with different compressors and offheap") {
+for(comp <- compressors){
+  
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT,
 "true")
--- End diff --

Should we have UT for enable.unsafe.in.query.processing ture and false ?


---


[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...

2018-12-10 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r240147384
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+/**
+ * Codec Class for performing Gzip Compression
+ */
+public class GzipCompressor extends AbstractCompressor {
+
+  @Override public String getName() {
+return "gzip";
+  }
+
+  /**
+   * This method takes the Byte Array data and Compresses in gzip format
+   *
+   * @param data Data Byte Array passed for compression
+   * @return Compressed Byte Array
+   */
+  private byte[] compressData(byte[] data) {
+ByteArrayOutputStream byteArrayOutputStream = new 
ByteArrayOutputStream();
+try {
+  GzipCompressorOutputStream gzipCompressorOutputStream =
+  new GzipCompressorOutputStream(byteArrayOutputStream);
+  try {
+/**
+ * Below api will write bytes from specified byte array to the 
gzipCompressorOutputStream
+ * The output stream will compress the given byte array.
+ */
+gzipCompressorOutputStream.write(data);
+  } catch (IOException e) {
+throw new RuntimeException("Error during Compression step " + 
e.getMessage());
--- End diff --

Don't skip the actual exception. Add original exception  also as the cause 
to RunTimeException


---


[GitHub] carbondata issue #2915: [CARBONDATA-3095] Optimize the documentation of SDK/...

2018-11-27 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/2915
  
LGTM


---


[GitHub] carbondata pull request #2915: [CARBONDATA-3095] Optimize the documentation ...

2018-11-22 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2915#discussion_r235648435
  
--- Diff: docs/sdk-guide.md ---
@@ -674,6 +693,16 @@ Find example code at 
[CarbonReaderExample](https://github.com/apache/carbondata/
   public CarbonReaderBuilder filter(Expression filterExpression);
 ```
 
+```
+  /**
+   * set read batch size before build
--- End diff --

Sets the batch size of records


---


[GitHub] carbondata pull request #2915: [CARBONDATA-3095] Optimize the documentation ...

2018-11-22 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2915#discussion_r235645252
  
--- Diff: docs/sdk-guide.md ---
@@ -684,6 +713,17 @@ Find example code at 
[CarbonReaderExample](https://github.com/apache/carbondata/
  public CarbonReaderBuilder withHadoopConf(Configuration conf);
 ```
 
+```
+  /**
+   * configure hadoop configuration with key value
--- End diff --

Change to "Updates the hadoop configuration with the given key value"


---


[GitHub] carbondata issue #2929: [CARBONDATA-3108][CARBONDATA-3044] Fix the error of ...

2018-11-20 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/2929
  
LGTM


---


[GitHub] carbondata pull request #2929: [CARBONDATA-3108][CARBONDATA-3044] Fix the er...

2018-11-20 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2929#discussion_r235266810
  
--- Diff: store/CSDK/src/CarbonRow.cpp ---
@@ -170,6 +170,9 @@ char *CarbonRow::getString(int ordinal) {
 args[0].l = carbonRow;
 args[1].i = ordinal;
 jobject data = jniEnv->CallStaticObjectMethodA(rowUtilClass, 
getStringId, args);
+if (jniEnv->ExceptionCheck()) {
--- End diff --

ok


---


[GitHub] carbondata pull request #2929: [CARBONDATA-3108][CARBONDATA-3044] Fix the er...

2018-11-20 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2929#discussion_r235261186
  
--- Diff: store/CSDK/src/CarbonRow.cpp ---
@@ -170,6 +170,9 @@ char *CarbonRow::getString(int ordinal) {
 args[0].l = carbonRow;
 args[1].i = ordinal;
 jobject data = jniEnv->CallStaticObjectMethodA(rowUtilClass, 
getStringId, args);
+if (jniEnv->ExceptionCheck()) {
--- End diff --

Exception check should be done for all other method calls also like 
getBoolean, getFloat,etc.

ArrayIndexOutofBoundsException on wrong index & cast exceptions  on wrong 
method usage are possible 


---


[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...

2018-11-06 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/2804
  
LGTM


---


[GitHub] carbondata pull request #2804: [CARBONDATA-2996] CarbonSchemaReader support ...

2018-11-06 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2804#discussion_r231061128
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java 
---
@@ -61,14 +65,122 @@ public static Schema readSchemaInSchemaFile(String 
schemaFilePath) throws IOExce
 return new Schema(schemaList);
   }
 
+  /**
+   * get carbondata/carbonindex file in path
+   *
+   * @param path carbon file path
+   * @return CarbonFile array
+   */
+  private static CarbonFile[] getCarbonFile(String path, final String 
extension)
+  throws IOException {
+String dataFilePath = path;
+if (!(dataFilePath.contains(extension))) {
+  CarbonFile[] carbonFiles = FileFactory
+  .getCarbonFile(path)
+  .listFiles(new CarbonFileFilter() {
+@Override
+public boolean accept(CarbonFile file) {
+  if (file == null) {
+return false;
+  }
+  return file.getName().endsWith(extension);
+}
+  });
+  if (carbonFiles == null || carbonFiles.length < 1) {
+throw new IOException("Carbon file not exists.");
+  }
+  return carbonFiles;
+}
+return null;
+  }
+
+  /**
+   * read schema from path,
+   * path can be folder path, carbonindex file path, and carbondata file 
path
+   * and will not check all files schema
+   *
+   * @param path file/folder path
+   * @return schema
+   * @throws IOException
+   */
+  public static Schema readSchema(String path) throws IOException {
+return readSchema(path, false);
+  }
+
+  /**
+   * read schema from path,
+   * path can be folder path, carbonindex file path, and carbondata file 
path
+   * and user can decide whether check all files schema
+   *
+   * @param path file/folder path
+   * @param validateSchema whether check all files schema
+   * @return schema
+   * @throws IOException
+   */
+  public static Schema readSchema(String path, boolean validateSchema) 
throws IOException {
+if (path.endsWith(INDEX_FILE_EXT)) {
+  return readSchemaFromIndexFile(path);
+} else if (path.endsWith(CARBON_DATA_EXT)) {
+  return readSchemaFromDataFile(path);
+} else if (validateSchema) {
+  CarbonFile[] carbonIndexFiles = getCarbonFile(path, INDEX_FILE_EXT);
+  Schema schema;
+  if (carbonIndexFiles != null && carbonIndexFiles.length != 0) {
+schema = 
readSchemaFromIndexFile(carbonIndexFiles[0].getAbsolutePath());
+for (int i = 1; i < carbonIndexFiles.length; i++) {
+  Schema schema2 = 
readSchemaFromIndexFile(carbonIndexFiles[i].getAbsolutePath());
+  if (schema != schema2) {
+throw new CarbonDataLoadingException("Schema is different 
between different files.");
+  }
+}
+CarbonFile[] carbonDataFiles = getCarbonFile(path, 
CARBON_DATA_EXT);
+for (int i = 0; i < carbonDataFiles.length; i++) {
+  Schema schema2 = 
readSchemaFromDataFile(carbonDataFiles[i].getAbsolutePath());
+  if (!schema.equals(schema2)) {
+throw new CarbonDataLoadingException("Schema is different 
between different files.");
+  }
+}
+return schema;
+  } else {
+throw new CarbonDataLoadingException("No carbonindex file in this 
path.");
+  }
+} else {
+  String indexFilePath = getCarbonFile(path, 
INDEX_FILE_EXT)[0].getAbsolutePath();
+  if (indexFilePath != null) {
--- End diff --

I think this null check is not required. Is there any chance the absolute 
path can be null ?


---


[GitHub] carbondata pull request #2804: [CARBONDATA-2996] CarbonSchemaReader support ...

2018-11-06 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2804#discussion_r231060332
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java 
---
@@ -144,4 +246,28 @@ public static Schema readSchemaInIndexFile(String 
indexFilePath) throws IOExcept
 }
   }
 
+  /**
+   * This method return the version details in formatted string by reading 
from carbondata file
+   *
+   * @param dataFilePath
+   * @return
+   * @throws IOException
+   */
+  public static String getVersionDetails(String dataFilePath) throws 
IOException {
--- End diff --

This complete method is displayed as removed and added again. Is it 
possible to avoid?


---


[GitHub] carbondata pull request #2804: [CARBONDATA-2996] CarbonSchemaReader support ...

2018-11-06 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2804#discussion_r231059908
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java 
---
@@ -61,14 +65,122 @@ public static Schema readSchemaInSchemaFile(String 
schemaFilePath) throws IOExce
 return new Schema(schemaList);
   }
 
+  /**
+   * get carbondata/carbonindex file in path
+   *
+   * @param path carbon file path
+   * @return CarbonFile array
+   */
+  private static CarbonFile[] getCarbonFile(String path, final String 
extension)
+  throws IOException {
+String dataFilePath = path;
+if (!(dataFilePath.contains(extension))) {
+  CarbonFile[] carbonFiles = FileFactory
+  .getCarbonFile(path)
+  .listFiles(new CarbonFileFilter() {
+@Override
+public boolean accept(CarbonFile file) {
+  if (file == null) {
+return false;
+  }
+  return file.getName().endsWith(extension);
+}
+  });
+  if (carbonFiles == null || carbonFiles.length < 1) {
+throw new IOException("Carbon file not exists.");
+  }
+  return carbonFiles;
+}
+return null;
--- End diff --

We can stick to one contract from the method. Either return the list or 
throw exception.  Generally listing APIs should not return null, if this case 
is not expected, we can throw exception.


---


[GitHub] carbondata pull request #2804: [CARBONDATA-2996] CarbonSchemaReader support ...

2018-11-06 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2804#discussion_r231058670
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java 
---
@@ -61,14 +65,122 @@ public static Schema readSchemaInSchemaFile(String 
schemaFilePath) throws IOExce
 return new Schema(schemaList);
   }
 
+  /**
+   * get carbondata/carbonindex file in path
+   *
+   * @param path carbon file path
+   * @return CarbonFile array
+   */
+  private static CarbonFile[] getCarbonFile(String path, final String 
extension)
+  throws IOException {
+String dataFilePath = path;
+if (!(dataFilePath.contains(extension))) {
+  CarbonFile[] carbonFiles = FileFactory
+  .getCarbonFile(path)
+  .listFiles(new CarbonFileFilter() {
+@Override
+public boolean accept(CarbonFile file) {
+  if (file == null) {
+return false;
+  }
+  return file.getName().endsWith(extension);
+}
+  });
+  if (carbonFiles == null || carbonFiles.length < 1) {
+throw new IOException("Carbon file not exists.");
+  }
+  return carbonFiles;
+}
+return null;
+  }
+
+  /**
+   * read schema from path,
+   * path can be folder path, carbonindex file path, and carbondata file 
path
+   * and will not check all files schema
+   *
+   * @param path file/folder path
+   * @return schema
+   * @throws IOException
+   */
+  public static Schema readSchema(String path) throws IOException {
+return readSchema(path, false);
+  }
+
+  /**
+   * read schema from path,
+   * path can be folder path, carbonindex file path, and carbondata file 
path
+   * and user can decide whether check all files schema
+   *
+   * @param path file/folder path
+   * @param validateSchema whether check all files schema
+   * @return schema
+   * @throws IOException
+   */
+  public static Schema readSchema(String path, boolean validateSchema) 
throws IOException {
+if (path.endsWith(INDEX_FILE_EXT)) {
+  return readSchemaFromIndexFile(path);
+} else if (path.endsWith(CARBON_DATA_EXT)) {
+  return readSchemaFromDataFile(path);
+} else if (validateSchema) {
+  CarbonFile[] carbonIndexFiles = getCarbonFile(path, INDEX_FILE_EXT);
+  Schema schema;
+  if (carbonIndexFiles != null && carbonIndexFiles.length != 0) {
+schema = 
readSchemaFromIndexFile(carbonIndexFiles[0].getAbsolutePath());
+for (int i = 1; i < carbonIndexFiles.length; i++) {
+  Schema schema2 = 
readSchemaFromIndexFile(carbonIndexFiles[i].getAbsolutePath());
+  if (schema != schema2) {
+throw new CarbonDataLoadingException("Schema is different 
between different files.");
+  }
+}
+CarbonFile[] carbonDataFiles = getCarbonFile(path, 
CARBON_DATA_EXT);
+for (int i = 0; i < carbonDataFiles.length; i++) {
+  Schema schema2 = 
readSchemaFromDataFile(carbonDataFiles[i].getAbsolutePath());
+  if (!schema.equals(schema2)) {
+throw new CarbonDataLoadingException("Schema is different 
between different files.");
+  }
+}
+return schema;
+  } else {
+throw new CarbonDataLoadingException("No carbonindex file in this 
path.");
+  }
+} else {
+  String indexFilePath = getCarbonFile(path, 
INDEX_FILE_EXT)[0].getAbsolutePath();
+  if (indexFilePath != null) {
+return readSchemaFromIndexFile(indexFilePath);
+  } else {
+String dataFilePath = getCarbonFile(path, 
CARBON_DATA_EXT)[0].getAbsolutePath();
--- End diff --

As per getCarbonFile(...) implementation, if there is no INDEX file found, 
it throws exception. So, there is no need of this else case ?


---


[GitHub] carbondata pull request #2804: [CARBONDATA-2996] CarbonSchemaReader support ...

2018-11-06 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2804#discussion_r231057030
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java 
---
@@ -61,14 +65,122 @@ public static Schema readSchemaInSchemaFile(String 
schemaFilePath) throws IOExce
 return new Schema(schemaList);
   }
 
+  /**
+   * get carbondata/carbonindex file in path
+   *
+   * @param path carbon file path
+   * @return CarbonFile array
+   */
+  private static CarbonFile[] getCarbonFile(String path, final String 
extension)
+  throws IOException {
+String dataFilePath = path;
+if (!(dataFilePath.contains(extension))) {
+  CarbonFile[] carbonFiles = FileFactory
+  .getCarbonFile(path)
+  .listFiles(new CarbonFileFilter() {
+@Override
+public boolean accept(CarbonFile file) {
+  if (file == null) {
+return false;
+  }
+  return file.getName().endsWith(extension);
+}
+  });
+  if (carbonFiles == null || carbonFiles.length < 1) {
+throw new IOException("Carbon file not exists.");
+  }
+  return carbonFiles;
+}
+return null;
+  }
+
+  /**
+   * read schema from path,
+   * path can be folder path, carbonindex file path, and carbondata file 
path
+   * and will not check all files schema
+   *
+   * @param path file/folder path
+   * @return schema
+   * @throws IOException
+   */
+  public static Schema readSchema(String path) throws IOException {
+return readSchema(path, false);
+  }
+
+  /**
+   * read schema from path,
+   * path can be folder path, carbonindex file path, and carbondata file 
path
+   * and user can decide whether check all files schema
+   *
+   * @param path file/folder path
+   * @param validateSchema whether check all files schema
+   * @return schema
+   * @throws IOException
+   */
+  public static Schema readSchema(String path, boolean validateSchema) 
throws IOException {
+if (path.endsWith(INDEX_FILE_EXT)) {
+  return readSchemaFromIndexFile(path);
+} else if (path.endsWith(CARBON_DATA_EXT)) {
+  return readSchemaFromDataFile(path);
+} else if (validateSchema) {
+  CarbonFile[] carbonIndexFiles = getCarbonFile(path, INDEX_FILE_EXT);
+  Schema schema;
+  if (carbonIndexFiles != null && carbonIndexFiles.length != 0) {
+schema = 
readSchemaFromIndexFile(carbonIndexFiles[0].getAbsolutePath());
+for (int i = 1; i < carbonIndexFiles.length; i++) {
+  Schema schema2 = 
readSchemaFromIndexFile(carbonIndexFiles[i].getAbsolutePath());
+  if (schema != schema2) {
--- End diff --

use equals .. schema.equals(schema2)


---


[GitHub] carbondata pull request #2804: [CARBONDATA-2996] CarbonSchemaReader support ...

2018-11-05 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2804#discussion_r230802487
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java 
---
@@ -61,14 +65,121 @@ public static Schema readSchemaInSchemaFile(String 
schemaFilePath) throws IOExce
 return new Schema(schemaList);
   }
 
+  /**
+   * get carbondata/carbonindex file in path
+   *
+   * @param path carbon file path
+   * @return CarbonFile array
+   */
+  private static CarbonFile[] getCarbonFile(String path, final String 
extension) {
+String dataFilePath = path;
+if (!(dataFilePath.contains(extension))) {
+  CarbonFile[] carbonFiles = FileFactory
+  .getCarbonFile(path)
+  .listFiles(new CarbonFileFilter() {
+@Override
+public boolean accept(CarbonFile file) {
+  if (file == null) {
+return false;
+  }
+  return file.getName().endsWith(extension);
+}
+  });
+  if (carbonFiles == null || carbonFiles.length < 1) {
+throw new RuntimeException("Carbon file not exists.");
--- End diff --

Why RunTimeException, IO related failures should throw IOException


---


[GitHub] carbondata pull request #2804: [CARBONDATA-2996] CarbonSchemaReader support ...

2018-11-05 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2804#discussion_r230801853
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java 
---
@@ -61,14 +65,121 @@ public static Schema readSchemaInSchemaFile(String 
schemaFilePath) throws IOExce
 return new Schema(schemaList);
   }
 
+  /**
+   * get carbondata/carbonindex file in path
+   *
+   * @param path carbon file path
+   * @return CarbonFile array
+   */
+  private static CarbonFile[] getCarbonFile(String path, final String 
extension) {
+String dataFilePath = path;
+if (!(dataFilePath.contains(extension))) {
+  CarbonFile[] carbonFiles = FileFactory
+  .getCarbonFile(path)
+  .listFiles(new CarbonFileFilter() {
+@Override
+public boolean accept(CarbonFile file) {
+  if (file == null) {
+return false;
+  }
+  return file.getName().endsWith(extension);
+}
+  });
+  if (carbonFiles == null || carbonFiles.length < 1) {
+throw new RuntimeException("Carbon file not exists.");
+  }
+  return carbonFiles;
+}
+return null;
+  }
+
+  /**
+   * read schema from path,
+   * path can be folder path, carbonindex file path, and carbondata file 
path
+   * and will not check all files schema
+   *
+   * @param path file/folder path
+   * @return schema
+   * @throws IOException
+   */
+  public static Schema readSchema(String path) throws IOException {
+return readSchema(path, false);
+  }
+
+  /**
+   * read schema from path,
+   * path can be folder path, carbonindex file path, and carbondata file 
path
+   * and user can decide whether check all files schema
+   *
+   * @param path file/folder path
+   * @param checkFilesSchema whether check all files schema
+   * @return schema
+   * @throws IOException
+   */
+  public static Schema readSchema(String path, boolean checkFilesSchema) 
throws IOException {
--- End diff --

readSchema(String path, boolean checkFilesSchema) 
-- Is this schema validation method is required ? If no use case we can 
skip this..  during query execution anyways schema is validated.


---


[GitHub] carbondata pull request #2804: [CARBONDATA-2996] CarbonSchemaReader support ...

2018-11-05 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2804#discussion_r230799912
  
--- Diff: 
store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonSchemaReaderTest.java
 ---
@@ -101,18 +104,30 @@ public boolean accept(CarbonFile file) {
   String dataFilePath = carbonFiles[0].getAbsolutePath();
 
   Schema schema = CarbonSchemaReader
-  .readSchemaInDataFile(dataFilePath)
+  .readSchema(dataFilePath)
   .asOriginOrder();
 
   assertEquals(schema.getFieldsLength(), 12);
   checkSchema(schema);
+} catch (Throwable e) {
+  e.printStackTrace();
--- End diff --

should fail


---


[GitHub] carbondata pull request #2804: [CARBONDATA-2996] CarbonSchemaReader support ...

2018-11-05 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2804#discussion_r230788337
  
--- Diff: docs/sdk-guide.md ---
@@ -685,6 +685,31 @@ Find example code at 
[CarbonReaderExample](https://github.com/apache/carbondata/
*/
   public static Schema readSchemaInIndexFile(String indexFilePath);
 ```
+```
+  /**
+   * read schema from path,
+   * path can be folder path,carbonindex file path, and carbondata file 
path
+   * and will not check all files schema
+   *
+   * @param path file/folder path
+   * @return schema
+   * @throws IOException
+   */
+  public static Schema readSchema(String path);
+```
+```
+  /**
+   * read schema from path,
+   * path can be folder path,carbonindex file path, and carbondata file 
path
+   * and user can decide whether check all files schema
+   *
+   * @param path file/folder path
+   * @param checkFilesSchema whether check all files schema
+   * @return schema
+   * @throws IOException
+   */
+  public static Schema readSchema(String path, boolean checkFilesSchema);
--- End diff --

checkFilesSchema should be validateSchema 


---


[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK

2018-11-05 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/2816
  
LGTM


---


[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...

2018-11-02 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2869#discussion_r230274806
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java 
---
@@ -51,6 +54,7 @@
   private Expression filterExpression;
   private String tableName;
   private Configuration hadoopConf;
+  private boolean useVectorReader;
--- End diff --

I suggest we can make vector as default as we see benefit in the 
performance.  
withVectorReader() can be a deprecated to support just for disable  in case 
of issues till the time vector reader is stabilized. @ravipesala , is the fine?


---


[GitHub] carbondata issue #2884: [CARBONDATA-3063] Support set and get carbon propert...

2018-11-01 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/2884
  
LGTM


---


[GitHub] carbondata issue #2807: [CARBONDATA-2997] Support read schema from index fil...

2018-11-01 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/2807
  
LGTM


---


[GitHub] carbondata issue #2807: [CARBONDATA-2997] Support read schema from index fil...

2018-11-01 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/2807
  
Ok Xubo, fine.


---


[GitHub] carbondata issue #2807: [CARBONDATA-2997] Support read schema from index fil...

2018-10-31 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/2807
  
I suggest let's hold this PR till 2804 is concluded & merged to avoid 
duplicate work.


---


[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...

2018-10-31 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/2804
  
I think we can enhance the existing APIs itself to take folder path  and 
then list from the first available data file or index file.. Adding many APIs 
in SDK will cause confusion for same functionality. I think we can unify to 
single method like getSchemaFromPath() and return the schema irrespective of 
index file or data file or multiple subfolders.


---


[GitHub] carbondata issue #2836: [CARBONDATA-3027] Increase unsafe working memory def...

2018-10-31 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/2836
  
I think this PR may not be require any more after #2841.  @xubo245 , our 
intention is to make unsafe configuration flexible and even with default values 
we should  be able to continue with heap memory. 


---


[GitHub] carbondata issue #2877: [CARBONDATA-3061] Add validation for supported forma...

2018-10-31 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/2877
  
LGTM


---


[GitHub] carbondata pull request #2877: [CARBONDATA-3061] Add validation for supporte...

2018-10-31 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2877#discussion_r229579618
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/metadata/encoder/Encoding.java ---
@@ -57,10 +62,55 @@ public static Encoding valueOf(int ordinal) {
   return ADAPTIVE_DELTA_INTEGRAL;
 } else if (ordinal == RLE_INTEGRAL.ordinal()) {
   return RLE_INTEGRAL;
+} else if (ordinal == DIRECT_STRING.ordinal()) {
--- End diff --

DIRECT_STRING ordinal in thrift file is 10.  In this enum it is 11. Does it 
cause problem for old stores if any one uses this method to get encoding?


---


[GitHub] carbondata pull request #2877: [CARBONDATA-3061] Add validation for supporte...

2018-10-31 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2877#discussion_r229578569
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java ---
@@ -696,9 +697,6 @@ private void validateCarbonDataFileVersion() {
 CarbonCommonConstants.CARBON_DATA_FILE_DEFAULT_VERSION);
   }
 }
-LOGGER.info("Carbon Current data file version: " + carbonProperties
--- End diff --

Log removal is intentional ? No need to log the current version?


---


[GitHub] carbondata pull request #2877: [CARBONDATA-3061] Add validation for supporte...

2018-10-31 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2877#discussion_r229578345
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/metadata/encoder/Encoding.java ---
@@ -57,10 +62,55 @@ public static Encoding valueOf(int ordinal) {
   return ADAPTIVE_DELTA_INTEGRAL;
 } else if (ordinal == RLE_INTEGRAL.ordinal()) {
   return RLE_INTEGRAL;
+} else if (ordinal == DIRECT_STRING.ordinal()) {
+  return DIRECT_STRING;
+} else if (ordinal == ADAPTIVE_FLOATING.ordinal()) {
+  return ADAPTIVE_FLOATING;
+} else if (ordinal == BOOL_BYTE.ordinal()) {
+  return BOOL_BYTE;
+} else if (ordinal == ADAPTIVE_DELTA_FLOATING.ordinal()) {
+  return ADAPTIVE_DELTA_FLOATING;
 } else if (ordinal == DIRECT_COMPRESS_VARCHAR.ordinal()) {
   return DIRECT_COMPRESS_VARCHAR;
 } else {
   throw new RuntimeException("create Encoding with invalid ordinal: " 
+ ordinal);
 }
   }
+
+  /**
+   * Method to validate for supported encoding types that can be read 
using the current version
+   *
+   * @param encodings
+   * @return
+   */
+  public static boolean assertTrueForEncodingTypes(
--- End diff --

1)This method return value is never used. So please remove the return type.
2) Change method name validateEncodingTypes


---


[GitHub] carbondata pull request #2877: [CARBONDATA-3061] Add validation for supporte...

2018-10-31 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2877#discussion_r229578078
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/metadata/encoder/Encoding.java ---
@@ -57,10 +62,55 @@ public static Encoding valueOf(int ordinal) {
   return ADAPTIVE_DELTA_INTEGRAL;
 } else if (ordinal == RLE_INTEGRAL.ordinal()) {
   return RLE_INTEGRAL;
+} else if (ordinal == DIRECT_STRING.ordinal()) {
+  return DIRECT_STRING;
+} else if (ordinal == ADAPTIVE_FLOATING.ordinal()) {
+  return ADAPTIVE_FLOATING;
+} else if (ordinal == BOOL_BYTE.ordinal()) {
+  return BOOL_BYTE;
+} else if (ordinal == ADAPTIVE_DELTA_FLOATING.ordinal()) {
+  return ADAPTIVE_DELTA_FLOATING;
 } else if (ordinal == DIRECT_COMPRESS_VARCHAR.ordinal()) {
   return DIRECT_COMPRESS_VARCHAR;
 } else {
   throw new RuntimeException("create Encoding with invalid ordinal: " 
+ ordinal);
 }
   }
+
+  /**
+   * Method to validate for supported encoding types that can be read 
using the current version
+   *
+   * @param encodings
+   * @return
+   */
+  public static boolean assertTrueForEncodingTypes(
+  List encodings) {
+if (null == encodings || encodings.isEmpty()) {
+  return true;
+}
+boolean supportedEncoding = true;
+for (org.apache.carbondata.format.Encoding encoder : encodings) {
+  try {
+// if case is handle unsupported encoding type. An encoding not 
supported for read will be
+// added as null by thrift during deserialization
+if (null == encoder) {
+  supportedEncoding = false;
+} else {
+  // if given encoding name is not supported exception will be 
thrown. This case can come
+  // if there is any change done in the ordinal of supported 
encoding
+  Encoding.valueOf(encoder.name());
+}
+  } catch (IllegalArgumentException ex) {
+supportedEncoding = false;
--- End diff --

Log the exception to identify which encoding name failed. 


---


[GitHub] carbondata issue #2837: [CARBONDATA-3000] Provide C++ interface for writing ...

2018-10-30 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/2837
  
LGTM


---


[GitHub] carbondata issue #2792: [CARBONDATA-2981] Support read primitive data type i...

2018-10-24 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/2792
  
LGTM apart from those nits in guide & ut.


---


[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...

2018-10-24 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2792#discussion_r227839509
  
--- Diff: 
store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ---
@@ -1522,4 +1522,204 @@ public boolean accept(File dir, String name) {
   e.printStackTrace();
 }
   }
+
+   @Test
+  public void testReadNextRowWithRowUtil() {
+String path = "./carbondata";
+try {
+  FileUtils.deleteDirectory(new File(path));
+
+  Field[] fields = new Field[12];
+  fields[0] = new Field("stringField", DataTypes.STRING);
+  fields[1] = new Field("shortField", DataTypes.SHORT);
+  fields[2] = new Field("intField", DataTypes.INT);
+  fields[3] = new Field("longField", DataTypes.LONG);
+  fields[4] = new Field("doubleField", DataTypes.DOUBLE);
+  fields[5] = new Field("boolField", DataTypes.BOOLEAN);
+  fields[6] = new Field("dateField", DataTypes.DATE);
+  fields[7] = new Field("timeField", DataTypes.TIMESTAMP);
+  fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 
2));
+  fields[9] = new Field("varcharField", DataTypes.VARCHAR);
+  fields[10] = new Field("arrayField", 
DataTypes.createArrayType(DataTypes.STRING));
+  fields[11] = new Field("floatField", DataTypes.FLOAT);
+  Map map = new HashMap<>();
+  map.put("complex_delimiter_level_1", "#");
+  CarbonWriter writer = CarbonWriter.builder()
+  .outputPath(path)
+  .withLoadOptions(map)
+  .withCsvInput(new Schema(fields)).build();
+
+  for (int i = 0; i < 10; i++) {
+String[] row2 = new String[]{
+"robot" + (i % 10),
+String.valueOf(i % 1),
+String.valueOf(i),
+String.valueOf(Long.MAX_VALUE - i),
+String.valueOf((double) i / 2),
+String.valueOf(true),
+"2019-03-02",
+"2019-02-12 03:03:34",
+"12.345",
+"varchar",
+"Hello#World#From#Carbon",
+"1.23"
+};
+writer.write(row2);
+  }
+  writer.close();
+
+  File[] dataFiles = new File(path).listFiles(new FilenameFilter() {
+@Override
+public boolean accept(File dir, String name) {
+  if (name == null) {
+return false;
+  }
+  return name.endsWith("carbonindex");
+}
+  });
+  if (dataFiles == null || dataFiles.length < 1) {
+throw new RuntimeException("Carbon index file not exists.");
+  }
+  Schema schema = CarbonSchemaReader
+  .readSchemaInIndexFile(dataFiles[0].getAbsolutePath())
+  .asOriginOrder();
+  // Transform the schema
+  int count = 0;
+  for (int i = 0; i < schema.getFields().length; i++) {
+if (!((schema.getFields())[i].getFieldName().contains("."))) {
+  count++;
+}
+  }
+  String[] strings = new String[count];
+  int index = 0;
+  for (int i = 0; i < schema.getFields().length; i++) {
+if (!((schema.getFields())[i].getFieldName().contains("."))) {
+  strings[index] = (schema.getFields())[i].getFieldName();
+  index++;
+}
+  }
+  // Read data
+  CarbonReader reader = CarbonReader
+  .builder(path, "_temp")
+  .projection(strings)
+  .build();
+
+  int i = 0;
+  while (reader.hasNext()) {
+Object[] data = (Object[]) reader.readNextRow();
+
+assert (RowUtil.getString(data, 0).equals("robot" + i));
+assertEquals(RowUtil.getShort(data, 1), i);
+assertEquals(RowUtil.getInt(data, 2), i);
+assertEquals(RowUtil.getLong(data, 3), Long.MAX_VALUE - i);
+assertEquals(RowUtil.getDouble(data, 4), ((double) i) / 2);
+assert (RowUtil.getBoolean(data, 5));
+assertEquals(RowUtil.getInt(data, 6), 17957);
+assert (RowUtil.getDecimal(data, 8).equals("12.35"));
+assert (RowUtil.getVarchar(data, 9).equals("varchar"));
+
+Object[] arr = RowUtil.getArray(data, 10);
+assert (arr[0].equals("Hello"));
+assert (arr[1].equals("World"));
+assert (arr[2].equals("From"));
+

[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...

2018-10-24 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2792#discussion_r227839131
  
--- Diff: 
store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ---
@@ -1522,4 +1522,204 @@ public boolean accept(File dir, String name) {
   e.printStackTrace();
 }
   }
+
+   @Test
+  public void testReadNextRowWithRowUtil() {
+String path = "./carbondata";
+try {
+  FileUtils.deleteDirectory(new File(path));
+
+  Field[] fields = new Field[12];
+  fields[0] = new Field("stringField", DataTypes.STRING);
+  fields[1] = new Field("shortField", DataTypes.SHORT);
+  fields[2] = new Field("intField", DataTypes.INT);
+  fields[3] = new Field("longField", DataTypes.LONG);
+  fields[4] = new Field("doubleField", DataTypes.DOUBLE);
+  fields[5] = new Field("boolField", DataTypes.BOOLEAN);
+  fields[6] = new Field("dateField", DataTypes.DATE);
+  fields[7] = new Field("timeField", DataTypes.TIMESTAMP);
+  fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 
2));
+  fields[9] = new Field("varcharField", DataTypes.VARCHAR);
+  fields[10] = new Field("arrayField", 
DataTypes.createArrayType(DataTypes.STRING));
+  fields[11] = new Field("floatField", DataTypes.FLOAT);
+  Map map = new HashMap<>();
+  map.put("complex_delimiter_level_1", "#");
+  CarbonWriter writer = CarbonWriter.builder()
+  .outputPath(path)
+  .withLoadOptions(map)
+  .withCsvInput(new Schema(fields)).build();
+
+  for (int i = 0; i < 10; i++) {
+String[] row2 = new String[]{
+"robot" + (i % 10),
+String.valueOf(i % 1),
+String.valueOf(i),
+String.valueOf(Long.MAX_VALUE - i),
+String.valueOf((double) i / 2),
+String.valueOf(true),
+"2019-03-02",
+"2019-02-12 03:03:34",
+"12.345",
+"varchar",
+"Hello#World#From#Carbon",
+"1.23"
+};
+writer.write(row2);
+  }
+  writer.close();
+
+  File[] dataFiles = new File(path).listFiles(new FilenameFilter() {
+@Override
+public boolean accept(File dir, String name) {
+  if (name == null) {
+return false;
+  }
+  return name.endsWith("carbonindex");
+}
+  });
+  if (dataFiles == null || dataFiles.length < 1) {
+throw new RuntimeException("Carbon index file not exists.");
+  }
+  Schema schema = CarbonSchemaReader
+  .readSchemaInIndexFile(dataFiles[0].getAbsolutePath())
+  .asOriginOrder();
+  // Transform the schema
+  int count = 0;
+  for (int i = 0; i < schema.getFields().length; i++) {
+if (!((schema.getFields())[i].getFieldName().contains("."))) {
+  count++;
+}
+  }
+  String[] strings = new String[count];
+  int index = 0;
+  for (int i = 0; i < schema.getFields().length; i++) {
+if (!((schema.getFields())[i].getFieldName().contains("."))) {
+  strings[index] = (schema.getFields())[i].getFieldName();
+  index++;
+}
+  }
+  // Read data
+  CarbonReader reader = CarbonReader
+  .builder(path, "_temp")
+  .projection(strings)
+  .build();
+
+  int i = 0;
+  while (reader.hasNext()) {
+Object[] data = (Object[]) reader.readNextRow();
+
+assert (RowUtil.getString(data, 0).equals("robot" + i));
+assertEquals(RowUtil.getShort(data, 1), i);
+assertEquals(RowUtil.getInt(data, 2), i);
+assertEquals(RowUtil.getLong(data, 3), Long.MAX_VALUE - i);
+assertEquals(RowUtil.getDouble(data, 4), ((double) i) / 2);
+assert (RowUtil.getBoolean(data, 5));
+assertEquals(RowUtil.getInt(data, 6), 17957);
+assert (RowUtil.getDecimal(data, 8).equals("12.35"));
+assert (RowUtil.getVarchar(data, 9).equals("varchar"));
+
+Object[] arr = RowUtil.getArray(data, 10);
+assert (arr[0].equals("Hello"));
+assert (arr[1].equals("World"));
+assert (arr[2].equals("From"));
+assert (arr[3].equals("Carbon"));
+
+assertEquals(RowUtil.getFloat(data, 11), (float) 1.23);
+i++;
+  }
+  reader.close();
+} catch (Throwable e) {
+  e.printStackTrace();
--- End diff --

I think we should make test fail for any exception. Should not ignore


---


[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...

2018-10-24 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2792#discussion_r227836644
  
--- Diff: docs/csdk-guide.md ---
@@ -106,20 +128,41 @@ bool readFromS3(JNIEnv *env, char *argv[]) {
 // "your endPoint"
 args[2] = argv[3];
 
-reader.builder(env, "s3a://sdk/WriterOutput", "test");
-reader.withHadoopConf(3, args);
+reader.builder(env, "s3a://sdk/WriterOutput/carbondata/", "test");
+reader.withHadoopConf("fs.s3a.access.key", argv[1]);
+reader.withHadoopConf("fs.s3a.secret.key", argv[2]);
+reader.withHadoopConf("fs.s3a.endpoint", argv[3]);
 reader.build();
 printf("\nRead data from S3:\n");
+CarbonRow carbonRow(env);
 while (reader.hasNext()) {
-jobjectArray row = reader.readNextRow();
-jsize length = env->GetArrayLength(row);
-
+jobject row = reader.readNextCarbonRow();
--- End diff --

readNextCarbonRow is removed from this PR


---


[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...

2018-10-24 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2792#discussion_r227836440
  
--- Diff: docs/csdk-guide.md ---
@@ -68,20 +68,42 @@ JNIEnv *initJVM() {
 bool readFromLocalWithoutProjection(JNIEnv *env) {
 
 CarbonReader carbonReaderClass;
-carbonReaderClass.builder(env, "../resources/carbondata", "test");
+carbonReaderClass.builder(env, "../resources/carbondata");
 carbonReaderClass.build();
 
+printf("\nRead data from local without projection:\n");
+
+CarbonRow carbonRow(env);
 while (carbonReaderClass.hasNext()) {
-jobjectArray row = carbonReaderClass.readNextRow();
-jsize length = env->GetArrayLength(row);
+jobject row = carbonReaderClass.readNextCarbonRow();
--- End diff --

Need to correct readNextCarbonRow is no more present


---


[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...

2018-10-24 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2792#discussion_r227741794
  
--- Diff: store/CSDK/CarbonRow.cpp ---
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+#include 
+#include "CarbonRow.h"
+
+CarbonRow::CarbonRow(JNIEnv *env) {
+this->rowUtilClass = 
env->FindClass("org/apache/carbondata/sdk/file/RowUtil");
+this->jniEnv = env;
+}
+
+void CarbonRow::setCarbonRow(jobject data) {
+this->carbonRow = data;
+}
+
+short CarbonRow::getShort(int ordinal) {
+jmethodID buildID = jniEnv->GetStaticMethodID(rowUtilClass, "getShort",
+"([Ljava/lang/Object;I)S");
+jvalue args[2];
+args[0].l = carbonRow;
+args[1].i = ordinal;
+return jniEnv->CallStaticShortMethodA(rowUtilClass, buildID, args);
+}
+
+int CarbonRow::getInt(int ordinal) {
+jmethodID buildID = jniEnv->GetStaticMethodID(rowUtilClass, "getInt",
+"([Ljava/lang/Object;I)I");
+jvalue args[2];
+args[0].l = carbonRow;
+args[1].i = ordinal;
+return jniEnv->CallStaticIntMethodA(rowUtilClass, buildID, args);
+}
+
+long CarbonRow::getLong(int ordinal) {
+jmethodID buildID = jniEnv->GetStaticMethodID(rowUtilClass, "getLong",
+"([Ljava/lang/Object;I)J");
+jvalue args[2];
+args[0].l = carbonRow;
+args[1].i = ordinal;
+return jniEnv->CallStaticLongMethodA(rowUtilClass, buildID, args);
+}
+
+double CarbonRow::getDouble(int ordinal) {
+jmethodID buildID = jniEnv->GetStaticMethodID(rowUtilClass, 
"getDouble",
+"([Ljava/lang/Object;I)D");
+jvalue args[2];
+args[0].l = carbonRow;
+args[1].i = ordinal;
+return jniEnv->CallStaticDoubleMethodA(rowUtilClass, buildID, args);
+}
+
+
+float CarbonRow::getFloat(int ordinal) {
+jmethodID buildID = jniEnv->GetStaticMethodID(rowUtilClass, "getFloat",
+"([Ljava/lang/Object;I)F");
+jvalue args[2];
+args[0].l = carbonRow;
+args[1].i = ordinal;
+return jniEnv->CallStaticFloatMethodA(rowUtilClass, buildID, args);
+}
+
+jboolean CarbonRow::getBoolean(int ordinal) {
+jmethodID buildID = jniEnv->GetStaticMethodID(rowUtilClass, 
"getBoolean",
+"([Ljava/lang/Object;I)Z");
+jvalue args[2];
+args[0].l = carbonRow;
+args[1].i = ordinal;
+return jniEnv->CallStaticBooleanMethodA(rowUtilClass, buildID, args);
+}
+
+char *CarbonRow::getString(int ordinal) {
+jmethodID buildID = jniEnv->GetStaticMethodID(rowUtilClass, 
"getString",
+"([Ljava/lang/Object;I)Ljava/lang/String;");
+jvalue args[2];
+args[0].l = carbonRow;
+args[1].i = ordinal;
+jobject data = jniEnv->CallStaticObjectMethodA(rowUtilClass, buildID, 
args);
+
+char *str = (char *) jniEnv->GetStringUTFChars((jstring) data, 
JNI_FALSE);
+return str;
+}
+
+char *CarbonRow::getDecimal(int ordinal) {
+jmethodID buildID = jniEnv->GetStaticMethodID(rowUtilClass, 
"getDecimal",
--- End diff --

jmethodID buildID = jniEnv->GetStaticMethodID(rowUtilClass, "getDecimal",
 "([Ljava/lang/Object;I)Ljava/lang/String;");
jvalue args[2];

Accessing the static method and initializing the array is done for every  
row reading.  Create once and reuse may improve performance. Can you please 
try? 



---


[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...

2018-10-04 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2792#discussion_r222691292
  
--- Diff: store/CSDK/CarbonReader.cpp ---
@@ -89,10 +89,18 @@ jboolean CarbonReader::hasNext() {
 return hasNext;
 }
 
+jobject CarbonReader::readNextCarbonRow() {
+jclass carbonReader = jniEnv->GetObjectClass(carbonReaderObject);
+jmethodID readNextCarbonRowID = jniEnv->GetMethodID(carbonReader, 
"readNextCarbonRow",
+"()Lorg/apache/carbondata/core/datastore/row/CarbonRow;");
+jobject carbonRow = (jobject) 
jniEnv->CallObjectMethod(carbonReaderObject, readNextCarbonRowID);
+return carbonRow;
+}
+
 jobjectArray CarbonReader::readNextRow() {
 jclass carbonReader = jniEnv->GetObjectClass(carbonReaderObject);
-jmethodID readNextRow2ID = jniEnv->GetMethodID(carbonReader, 
"readNextStringRow", "()[Ljava/lang/Object;");
-jobjectArray row = (jobjectArray) 
jniEnv->CallObjectMethod(carbonReaderObject, readNextRow2ID);
+jmethodID readNextStringRowID = jniEnv->GetMethodID(carbonReader, 
"readNextStringRow", "()[Ljava/lang/Object;");
--- End diff --

We can remove "readNextStringRow" and add a utility method in JNI to 
achieve the same.


---


[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...

2018-10-04 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2792#discussion_r222691023
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/row/CarbonRow.java ---
@@ -18,8 +18,11 @@
 package org.apache.carbondata.core.datastore.row;
 
 import java.io.Serializable;
+import java.math.BigDecimal;
--- End diff --

CarbonRow has different fields like data, rawData, rangeID etc.  It seems 
not intended for end user API.
I think we can add a simple Row class for SDK scope. 


---


[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...

2018-10-04 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2792#discussion_r222690123
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/readsupport/impl/DictionaryDecodeReadSupport.java
 ---
@@ -81,7 +82,24 @@
 data[i] = dictionaries[i].getDictionaryValueForKey((int) data[i]);
   }
 }
-return (T)data;
+return (T) data;
+  }
+
+  /**
+   * get carbonRow, including data and datatpes
+   *
+   * @param data row data
+   * @return CarbonRow Object
+   */
+  public T readCarbonRow(Object[] data) {
--- End diff --

Instead of changing the DictionaryDecodeReadSupport & other classes 
hierarchy, I suggest  to use a new Row class as utility and just provide 
required methods to avoid impact on base code.


---


[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...

2018-10-04 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2792#discussion_r222689040
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonRecordReader.java ---
@@ -116,6 +117,25 @@ public void initialize(InputSplit inputSplit, 
TaskAttemptContext context)
 return readSupport.readRow(carbonIterator.next());
   }
 
+  /**
+   * get CarbonRow data, including data and datatypes
+   *
+   * @return carbonRow object or data array or T
+   * @throws IOException
+   * @throws InterruptedException
+   */
+  public T getCarbonRow() throws IOException, InterruptedException {
--- End diff --

I think instead of confusing T, we can define the return type as CarbonRow 
itself


---


[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...

2018-10-04 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2792#discussion_r222686566
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/row/CarbonRow.java ---
@@ -57,6 +74,154 @@ public String getString(int ordinal) {
 return (String) data[ordinal];
   }
 
+  /**
+   * get short type data by ordinal
+   *
+   * @param ordinal the data index of carbonRow
+   * @return
+   */
+  public short getShort(int ordinal) {
+return (short) data[ordinal];
+  }
+
+  /**
+   * get int data type data by ordinal
+   *
+   * @param ordinal the data index of carbonRow
+   * @return
+   */
+  public int getInt(int ordinal) {
+return (Integer) data[ordinal];
+  }
+
+  /**
+   * get long data type data by ordinal
+   *
+   * @param ordinal the data index of carbonRow
+   * @return
+   */
+  public long getLong(int ordinal) {
+return (long) data[ordinal];
+  }
+
+  /**
+   * get array data type data by ordinal
+   *
+   * @param ordinal the data index of carbonRow
+   * @return
+   */
+  public Object[] getArray(int ordinal) {
+return (Object[]) data[ordinal];
+  }
+
+  /**
+   * get double data type data by ordinal
+   *
+   * @param ordinal the data index of carbonRow
+   * @return
+   */
+  public double getDouble(int ordinal) {
+return (double) data[ordinal];
+  }
+
+  /**
+   * get boolean data type data by ordinal
+   *
+   * @param ordinal the data index of carbonRow
+   * @return
+   */
+  public boolean getBoolean(int ordinal) {
+return (boolean) data[ordinal];
+  }
+
+  /**
+   * get byte data type data by ordinal
+   *
+   * @param ordinal the data index of carbonRow
+   * @return
+   */
+  public Byte getByte(int ordinal) {
+return (Byte) data[ordinal];
+  }
+
+  /**
+   * get float data type data by ordinal
+   *
+   * @param ordinal the data index of carbonRow
+   * @return
+   */
+  public float getFloat(int ordinal) {
+return (float) data[ordinal];
+  }
+
+  /**
+   * get varchar data type data by ordinal
+   * This is for CSDK
+   * JNI don't support varchar, so carbon convert decimal to string
+   *
+   * @param ordinal the data index of carbonRow
+   * @return
+   */
+  public String getVarchar(int ordinal) {
+return (String) data[ordinal];
+  }
+
+  /**
+   * get decimal data type data by ordinal
+   * This is for CSDK
+   * JNI don't support Decimal, so carbon convert decimal to string
+   *
+   * @param ordinal the data index of carbonRow
+   * @return
+   */
+  public String getDecimal(int ordinal) {
+return ((BigDecimal) data[ordinal]).toString();
+  }
+
+  /**
+   * get data type by ordinal
+   *
+   * @param ordinal the data index of carbonRow
+   * @return
+   */
+  public DataType getDataType(int ordinal) {
+return dataTypes[ordinal];
+  }
+
+  /**
+   * get data type name by ordinal
+   *
+   * @param ordinal the data index of carbonRow
+   * @return
+   */
+  public String getDataTypeName(int ordinal) {
+return dataTypes[ordinal].getName();
+  }
+
+  /**
+   * get element type name by ordinal
+   * child schema data type name
+   * for example: return STRING if it's Array in java
+   *
+   * @param ordinal the data index of carbonRow
+   * @return element type name
+   */
+  public String getElementTypeName(int ordinal) {
--- End diff --

If this method can work only for Array, we can rename it to 
getArrayElementTypeName and throw exception if its not array type. return null 
cause integration errors for unsupported ata types


---


[GitHub] carbondata pull request #2791: [HOTFIX]correct the exception handling in loo...

2018-10-04 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2791#discussion_r222594726
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala
 ---
@@ -208,7 +209,10 @@ class CarbonFileMetastore extends CarbonMetaStore {
 try {
   lookupRelation(tableIdentifier)(sparkSession)
 } catch {
-  case _: Exception =>
+  case ex: Exception =>
+if (ex.getCause.isInstanceOf[HiveException]) {
+  throw ex
+}
--- End diff --

There are exceptions from different hierarchy incase of table not found, 
not all are HiveException.
Ex:- NoSuchTableException extends  AnalysisException, InvalidTableException 
extends HiveException.
So, I think we can refer the exceptions from Spark Code related to table 
not exists, not found and return false for only those exception cases. All 
other exceptions we can throw back to caller.


---


[GitHub] carbondata pull request #2794: [CARBONDATA-2985]Fix issues in Table level co...

2018-10-03 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2794#discussion_r222330949
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
 ---
@@ -744,6 +744,7 @@ private static long getSizeOfSegment(String tablePath, 
String segId) {
 if (size >= 2) {
   level1Size = noOfSegmentLevelsCount[0];
   level2Size = noOfSegmentLevelsCount[1];
+  level2Size = level2Size == 1 ? 0 : level2Size;
--- End diff --

Please add a comment why to handle this


---


[GitHub] carbondata pull request #2791: [HOTFIX]correct the exception handling in loo...

2018-10-03 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2791#discussion_r52487
  
--- Diff: 
integration/spark2/src/main/spark2.1/org/apache/spark/sql/hive/CarbonSessionState.scala
 ---
@@ -141,7 +141,13 @@ class CarbonHiveSessionCatalog(
*/
   override def lookupRelation(name: TableIdentifier,
   alias: Option[String]): LogicalPlan = {
-val rtnRelation = super.lookupRelation(name, alias)
+val rtnRelation =
+  try {
+super.lookupRelation(name, alias)
+  } catch {
+case ex: Exception =>
--- End diff --

This change is not required. Catch and throwing same exception. 


---


[GitHub] carbondata issue #2780: [CARBONDATA-2982] CarbonSchemaReader support array

2018-10-03 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/2780
  
LGTM


---


[GitHub] carbondata pull request #2780: [CARBONDATA-2982] CarbonSchemaReader support ...

2018-09-28 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2780#discussion_r221246385
  
--- Diff: 
store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java ---
@@ -1435,5 +1435,99 @@ public void 
testReadWithFilterOfnonTransactionalwithsubfolders() throws IOExcept
 FileUtils.deleteDirectory(new File("./testWriteFiles"));
   }
 
+  @Test
+  public void testReadSchemaFromDataFileArrayString() {
+String path = "./testWriteFiles";
+try {
+  FileUtils.deleteDirectory(new File(path));
+
+  Field[] fields = new Field[11];
+  fields[0] = new Field("stringField", DataTypes.STRING);
+  fields[1] = new Field("shortField", DataTypes.SHORT);
+  fields[2] = new Field("intField", DataTypes.INT);
+  fields[3] = new Field("longField", DataTypes.LONG);
+  fields[4] = new Field("doubleField", DataTypes.DOUBLE);
+  fields[5] = new Field("boolField", DataTypes.BOOLEAN);
+  fields[6] = new Field("dateField", DataTypes.DATE);
+  fields[7] = new Field("timeField", DataTypes.TIMESTAMP);
+  fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 
2));
+  fields[9] = new Field("varcharField", DataTypes.VARCHAR);
+  fields[10] = new Field("arrayField", 
DataTypes.createArrayType(DataTypes.STRING));
+  Map map = new HashMap<>();
+  map.put("complex_delimiter_level_1", "#");
+  CarbonWriter writer = CarbonWriter.builder()
+  .outputPath(path)
+  .withLoadOptions(map)
+  .withCsvInput(new Schema(fields)).build();
+
+  for (int i = 0; i < 10; i++) {
+String[] row2 = new String[]{
+"robot" + (i % 10),
+String.valueOf(i % 1),
+String.valueOf(i),
+String.valueOf(Long.MAX_VALUE - i),
+String.valueOf((double) i / 2),
+String.valueOf(true),
+"2019-03-02",
+"2019-02-12 03:03:34",
+"12.345",
+"varchar",
+"Hello#World#From#Carbon"
+};
+writer.write(row2);
+  }
+  writer.close();
 
+  File[] dataFiles = new File(path).listFiles(new FilenameFilter() {
+@Override
+public boolean accept(File dir, String name) {
+  if (name == null) {
+return false;
+  }
+  return name.endsWith("carbondata");
+}
+  });
+  if (dataFiles == null || dataFiles.length < 1) {
+throw new RuntimeException("Carbon index file not exists.");
+  }
+  Schema schema = CarbonSchemaReader
+  .readSchemaInDataFile(dataFiles[0].getAbsolutePath())
+  .asOriginOrder();
+  // Transform the schema
+  String[] strings = new String[schema.getFields().length];
+  for (int i = 0; i < schema.getFields().length; i++) {
+strings[i] = (schema.getFields())[i].getFieldName();
+  }
+
+  // Read data
+  CarbonReader reader = CarbonReader
+  .builder(path, "_temp")
+  .projection(strings)
+  .build();
+
+  System.out.println("\nData:");
+  long day = 24L * 3600 * 1000;
+  int i = 0;
+  while (reader.hasNext()) {
+Object[] row = (Object[]) reader.readNextRow();
+
System.out.println(String.format("%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t",
+i, row[0], row[1], row[2], row[3], row[4], row[5],
+new Date((day * ((int) row[6]))), new Timestamp((long) row[7] 
/ 1000),
+row[8], row[9]
+));
+Object[] arr = (Object[]) row[10];
+for (int j = 0; j < arr.length; j++) {
+  System.out.print(arr[j] + " ");
+}
+System.out.println();
+i++;
+  }
+  System.out.println("\nFinished");
--- End diff --

Please remove Sys out content


---


[GitHub] carbondata pull request #2738: [CARBONDATA-2952] Provide c++ interface for S...

2018-09-28 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2738#discussion_r221218207
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ---
@@ -90,6 +91,33 @@ public T readNextRow() throws IOException, 
InterruptedException {
 return currentReader.getCurrentValue();
   }
 
+  /**
+   * Read and return next string row object
+   */
+  public Object[] readNextStringRow() throws IOException, 
InterruptedException {
--- End diff --

OK.. Please add this limitation as only single dimension Array is supported 
in method signature and remove after enhancing with CarbonRow based API in 
other PR


---


[GitHub] carbondata pull request #2738: [CARBONDATA-2952] Provide c++ interface for S...

2018-09-28 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2738#discussion_r221149564
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ---
@@ -90,6 +91,33 @@ public T readNextRow() throws IOException, 
InterruptedException {
 return currentReader.getCurrentValue();
   }
 
+  /**
+   * Read and return next string row object
+   */
+  public Object[] readNextStringRow() throws IOException, 
InterruptedException {
--- End diff --

readNextStringRow is not a standard API I think. It can not work with 
nested complex data types. Like Array[Array[...]], Struct[Array]. 

I suggest to add a better API like CarbonRow wrapping on object[] and give 
reader utility methods like getInt, getString in CarbonRow .

If it can't be finished in this PR scope, temporarily you can move this 
string making logic to JNI layer code so that it can be improved later. 
**Adding in java SDK layer will add more confusion to users**


---


[GitHub] carbondata pull request #2738: [CARBONDATA-2952] Provide c++ interface for S...

2018-09-27 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2738#discussion_r220802829
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java 
---
@@ -101,6 +101,20 @@ public CarbonReaderBuilder 
withHadoopConf(Configuration conf) {
 return this;
   }
 
+
+  public CarbonReaderBuilder withHadoopConf(String[] args) {
+Configuration configuration = new Configuration();
--- End diff --

My comment is the same, we should not replace entire conf object every 
time. Change this method as withHadoopConf(String key, String value)  and 
update property to a class level hadoopConfiguraiton variable.


---


[GitHub] carbondata pull request #2738: [CARBONDATA-2952] Provide c++ interface for S...

2018-09-27 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2738#discussion_r220802212
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java 
---
@@ -101,6 +101,20 @@ public CarbonReaderBuilder 
withHadoopConf(Configuration conf) {
 return this;
   }
 
+
+  public CarbonReaderBuilder withHadoopConf(String[] args) {
--- End diff --

Yes, we can add the  withHadoopConf(String key, String value) in java API 
also.


---


[GitHub] carbondata pull request #2738: [CARBONDATA-2952] Provide c++ interface for S...

2018-09-26 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2738#discussion_r220566235
  
--- Diff: store/CSDK/CarbonReader.cpp ---
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "CarbonReader.h"
+#include 
+
+jobject CarbonReader::builder(JNIEnv *env, char *path, char *tableName) {
+
+jniEnv = env;
+jclass carbonReaderClass = 
env->FindClass("org/apache/carbondata/sdk/file/CarbonReader");
+jmethodID carbonReaderBuilderID = 
env->GetStaticMethodID(carbonReaderClass, "builder",
+
"(Ljava/lang/String;Ljava/lang/String;)Lorg/apache/carbondata/sdk/file/CarbonReaderBuilder;");
+jstring jpath = env->NewStringUTF(path);
+jstring jtableName = env->NewStringUTF(tableName);
+jvalue args[2];
+args[0].l = jpath;
+args[1].l = jtableName;
+carbonReaderBuilderObject = 
env->CallStaticObjectMethodA(carbonReaderClass, carbonReaderBuilderID, args);
+return carbonReaderBuilderObject;
+}
+
+jobject CarbonReader::projection(int argc, char *argv[]) {
+jclass carbonReaderBuilderClass = 
jniEnv->GetObjectClass(carbonReaderBuilderObject);
+jmethodID buildID = jniEnv->GetMethodID(carbonReaderBuilderClass, 
"projection",
+
"([Ljava/lang/String;)Lorg/apache/carbondata/sdk/file/CarbonReaderBuilder;");
+jclass objectArrayClass = jniEnv->FindClass("Ljava/lang/String;");
+jobjectArray array = jniEnv->NewObjectArray(argc, objectArrayClass, 
NULL);
+for (int i = 0; i < argc; ++i) {
+jstring value = jniEnv->NewStringUTF(argv[i]);
+jniEnv->SetObjectArrayElement(array, i, value);
+}
+
+jvalue args[1];
+args[0].l = array;
+carbonReaderBuilderObject = 
jniEnv->CallObjectMethodA(carbonReaderBuilderObject, buildID, args);
+return carbonReaderBuilderObject;
+}
+
+jobject CarbonReader::withHadoopConf(int argc, char *argv[]) {
--- End diff --

As suggested in another comment, let's add simple API to accept Key & value.


---


[GitHub] carbondata pull request #2738: [CARBONDATA-2952] Provide c++ interface for S...

2018-09-26 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2738#discussion_r220565596
  
--- Diff: store/sdk/pom.xml ---
@@ -39,6 +39,16 @@
   hadoop-aws
   ${hadoop.version}
 
+
+  org.apache.httpcomponents
+  httpclient
+  4.2
+
+
+  org.apache.hadoop
+  hadoop-common
--- End diff --

carbondata-hadoop already depends on hadoop. Why to add explicitly here?


---


[GitHub] carbondata pull request #2738: [CARBONDATA-2952] Provide c++ interface for S...

2018-09-26 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2738#discussion_r220562265
  
--- Diff: store/sdk/pom.xml ---
@@ -39,6 +39,16 @@
   hadoop-aws
   ${hadoop.version}
 
+
+  org.apache.httpcomponents
+  httpclient
--- End diff --

httpclient version number dependent may be different in different hadoop 
versions. So, it should be downloaded as part of carbon-parent pom. Can you 
verify & avoid this ?


---


[GitHub] carbondata pull request #2738: [CARBONDATA-2952] Provide c++ interface for S...

2018-09-26 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2738#discussion_r220560348
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ---
@@ -90,6 +91,33 @@ public T readNextRow() throws IOException, 
InterruptedException {
 return currentReader.getCurrentValue();
   }
 
+  /**
+   * Read and return next string row object
+   */
+  public Object[] readNextStringRow() throws IOException, 
InterruptedException {
--- End diff --

Object conversion to String is not suggested.  The representation for the 
values may change for objects like double, date.  So, please use adapter in JNI 
layer for each of the java data types to c data type



---


[GitHub] carbondata pull request #2738: [CARBONDATA-2952] Provide c++ interface for S...

2018-09-26 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2738#discussion_r220559298
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java 
---
@@ -101,6 +101,20 @@ public CarbonReaderBuilder 
withHadoopConf(Configuration conf) {
 return this;
   }
 
+
+  public CarbonReaderBuilder withHadoopConf(String[] args) {
+Configuration configuration = new Configuration();
--- End diff --

We should not create every time a new configuration.  Should create once 
for first time and then update the value for all further calls to this method


---


[GitHub] carbondata pull request #2738: [CARBONDATA-2952] Provide c++ interface for S...

2018-09-26 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2738#discussion_r220558655
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java 
---
@@ -101,6 +101,20 @@ public CarbonReaderBuilder 
withHadoopConf(Configuration conf) {
 return this;
   }
 
+
+  public CarbonReaderBuilder withHadoopConf(String[] args) {
--- End diff --

I think this API is not clear. Instead we can support a method like 
withHadoopConf(String key, String value) then we don't have to parse in SDK 
java layer. C JNI layer can be invoked for every property user wants to 
overwrite




---


[GitHub] carbondata issue #2697: [HOTFIX] support "carbon.load.directWriteHdfs.enable...

2018-09-25 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/2697
  
retest this please


---


[GitHub] carbondata pull request #2715: [CARBONDATA-2930] Support customize column co...

2018-09-14 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2715#discussion_r217682375
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/CompressorFactory.java
 ---
@@ -62,15 +67,54 @@ public Compressor getCompressor() {
   }
 
   private CompressorFactory() {
-for (SupportedCompressor supportedCompressor : 
SupportedCompressor.values()) {
-  compressors.put(supportedCompressor.getName(), supportedCompressor);
+for (NativeSupportedCompressor nativeSupportedCompressor : 
NativeSupportedCompressor.values()) {
+  allSupportedCompressors.put(nativeSupportedCompressor.getName(),
+  nativeSupportedCompressor.getCompressor());
 }
   }
 
   public static CompressorFactory getInstance() {
 return COMPRESSOR_FACTORY;
   }
 
+  /**
+   * register the compressor using reflection.
+   * If the class name of the compressor has already been registered 
before, it will return false;
+   * If the reflection fails to work or the compressor name has problem, 
it will throw
+   * RunTimeException; If it is registered successfully, it will return 
true.
+   *
+   * @param compressorClassName full class name of the compressor
+   * @return true if register successfully, false if failed.
+   */
+  private Compressor registerColumnCompressor(String compressorClassName) {
+if (allSupportedCompressors.containsKey(compressorClassName)) {
+  return allSupportedCompressors.get(compressorClassName);
+}
+
+Class clazz;
+try {
+  clazz = Class.forName(compressorClassName);
+  Object instance = clazz.newInstance();
+  if (instance instanceof Compressor) {
+if (!((Compressor) 
instance).getName().equals(compressorClassName)) {
+  throw new RuntimeException(String.format("For not carbondata 
native supported compressor,"
+  + " the result of method getName() should be the full class 
name. Expected '%s',"
+  + " found '%s'", compressorClassName, ((Compressor) 
instance).getName()));
+}
+allSupportedCompressors.put(compressorClassName, (Compressor) 
instance);
--- End diff --

Please add a info log for new compression registered.



---


[GitHub] carbondata issue #2719: [HOTFIX] changed AbstractDFSFileSystem to use existi...

2018-09-14 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/2719
  
LGTM


---


[GitHub] carbondata issue #2704: [HOTFIX] Old stores cannot read with new table infer...

2018-09-12 Thread KanakaKumar
Github user KanakaKumar commented on the issue:

https://github.com/apache/carbondata/pull/2704
  
LGTM


---


[GitHub] carbondata pull request #2704: [HOTFIX] Old stores cannot read with new tabl...

2018-09-12 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2704#discussion_r217021955
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/executor/util/RestructureUtil.java
 ---
@@ -165,14 +165,15 @@ private static boolean isColumnMatches(boolean 
isTransactionalTable,
 // column ID but can have same column name
 if (tableColumn.getDataType().isComplexType() && 
!(tableColumn.getDataType().getId()
 == DataTypes.ARRAY_TYPE_ID)) {
-  if (tableColumn.getColumnId().equals(queryColumn.getColumnId())) {
+  if 
(tableColumn.getColumnId().equalsIgnoreCase(queryColumn.getColumnId())) {
 return true;
   } else {
 return isColumnMatchesStruct(tableColumn, queryColumn);
   }
 } else {
-  return (tableColumn.getColumnId().equals(queryColumn.getColumnId()) 
|| (!isTransactionalTable
-  && tableColumn.getColName().equals(queryColumn.getColName(;
+  return 
(tableColumn.getColumnId().equalsIgnoreCase(queryColumn.getColumnId()) || (
--- End diff --

OK


---


[GitHub] carbondata pull request #2704: [HOTFIX] Old stores cannot read with new tabl...

2018-09-12 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2704#discussion_r216917965
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/executor/util/RestructureUtil.java
 ---
@@ -165,14 +165,15 @@ private static boolean isColumnMatches(boolean 
isTransactionalTable,
 // column ID but can have same column name
 if (tableColumn.getDataType().isComplexType() && 
!(tableColumn.getDataType().getId()
 == DataTypes.ARRAY_TYPE_ID)) {
-  if (tableColumn.getColumnId().equals(queryColumn.getColumnId())) {
+  if 
(tableColumn.getColumnId().equalsIgnoreCase(queryColumn.getColumnId())) {
 return true;
   } else {
 return isColumnMatchesStruct(tableColumn, queryColumn);
   }
 } else {
-  return (tableColumn.getColumnId().equals(queryColumn.getColumnId()) 
|| (!isTransactionalTable
-  && tableColumn.getColName().equals(queryColumn.getColName(;
+  return 
(tableColumn.getColumnId().equalsIgnoreCase(queryColumn.getColumnId()) || (
--- End diff --

I think we should have more strict comparison in case of restructure.
If column with same name is added with different name with different data 
type, will be problem.
Let's use data type also to compare.


---


[GitHub] carbondata pull request #2704: [HOTFIX] Old stores cannot read with new tabl...

2018-09-12 Thread KanakaKumar
Github user KanakaKumar commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2704#discussion_r216918150
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/executor/util/RestructureUtil.java
 ---
@@ -165,14 +165,15 @@ private static boolean isColumnMatches(boolean 
isTransactionalTable,
 // column ID but can have same column name
 if (tableColumn.getDataType().isComplexType() && 
!(tableColumn.getDataType().getId()
 == DataTypes.ARRAY_TYPE_ID)) {
-  if (tableColumn.getColumnId().equals(queryColumn.getColumnId())) {
+  if 
(tableColumn.getColumnId().equalsIgnoreCase(queryColumn.getColumnId())) {
--- End diff --

Same comment give below applicable here also.


---


  1   2   3   >