[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3059#discussion_r246826911 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java --- @@ -609,6 +609,14 @@ public static Dictionary getDictionary(AbsoluteTableIdentifier absoluteTableIden blockAssignmentStrategy = BlockAssignmentStrategy.BLOCK_SIZE_FIRST; } else { blockAssignmentStrategy = BlockAssignmentStrategy.BLOCK_NUM_FIRST; + // fall back to BLOCK_NUM_FIRST strategy need to reset + // the average expected size for each node + if (blockInfos.size() > 0) { +sizePerNode = blockInfos.size() / numOfNodes; --- End diff -- This logic can be simplified like below. sizePerNode = blockInfos.size() / numOfNodes; sizePerNode = sizePerNode == 0 ? 1 : sizePerNode Also, please avoid DivideByZeroError if numOfNodes is zero in a faulty cluster case. ---
[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3059#discussion_r246317841 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java --- @@ -609,6 +613,9 @@ public static Dictionary getDictionary(AbsoluteTableIdentifier absoluteTableIden blockAssignmentStrategy = BlockAssignmentStrategy.BLOCK_SIZE_FIRST; } else { blockAssignmentStrategy = BlockAssignmentStrategy.BLOCK_NUM_FIRST; + // fall back to BLOCK_NUM_FIRST strategy need to reset + // the average expected size for each node + sizePerNode = numberOfBlocksPerNode; --- End diff -- assignLeftOverBlocks also needs this similar if else self checks. I think its ok, you can take a call to refactor now or later. ---
[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3059#discussion_r246311819 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java --- @@ -575,19 +575,23 @@ public static Dictionary getDictionary(AbsoluteTableIdentifier absoluteTableIden } // calculate the average expected size for each node -long sizePerNode = 0; +long numberOfBlocksPerNode = 0; +if (blockInfos.size() > 0) { + numberOfBlocksPerNode = blockInfos.size() / numOfNodes; +} +numberOfBlocksPerNode = numberOfBlocksPerNode <= 0 ? 1 : numberOfBlocksPerNode; +long dataSizePerNode = 0; long totalFileSize = 0; +for (Distributable blockInfo : uniqueBlocks) { + totalFileSize += ((TableBlockInfo) blockInfo).getBlockLength(); +} +dataSizePerNode = totalFileSize / numOfNodes; +long sizePerNode = 0; if (BlockAssignmentStrategy.BLOCK_NUM_FIRST == blockAssignmentStrategy) { - if (blockInfos.size() > 0) { -sizePerNode = blockInfos.size() / numOfNodes; - } - sizePerNode = sizePerNode <= 0 ? 1 : sizePerNode; + sizePerNode = numberOfBlocksPerNode; --- End diff -- This if else can be complete avoided and use the correct variable in the method call for blocks allocation ---
[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3059#discussion_r246311168 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java --- @@ -609,6 +613,9 @@ public static Dictionary getDictionary(AbsoluteTableIdentifier absoluteTableIden blockAssignmentStrategy = BlockAssignmentStrategy.BLOCK_SIZE_FIRST; } else { blockAssignmentStrategy = BlockAssignmentStrategy.BLOCK_NUM_FIRST; + // fall back to BLOCK_NUM_FIRST strategy need to reset + // the average expected size for each node + sizePerNode = numberOfBlocksPerNode; --- End diff -- instead of reassigning the same variable, assignBlocksByDataLocality () can use numberOfBlocksPerNode directly? ---
[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3059#discussion_r246309331 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java --- @@ -575,19 +575,23 @@ public static Dictionary getDictionary(AbsoluteTableIdentifier absoluteTableIden } // calculate the average expected size for each node -long sizePerNode = 0; +long numberOfBlocksPerNode = 0; +if (blockInfos.size() > 0) { + numberOfBlocksPerNode = blockInfos.size() / numOfNodes; +} +numberOfBlocksPerNode = numberOfBlocksPerNode <= 0 ? 1 : numberOfBlocksPerNode; +long dataSizePerNode = 0; long totalFileSize = 0; +for (Distributable blockInfo : uniqueBlocks) { + totalFileSize += ((TableBlockInfo) blockInfo).getBlockLength(); +} +dataSizePerNode = totalFileSize / numOfNodes; +long sizePerNode = 0; if (BlockAssignmentStrategy.BLOCK_NUM_FIRST == blockAssignmentStrategy) { - if (blockInfos.size() > 0) { -sizePerNode = blockInfos.size() / numOfNodes; - } - sizePerNode = sizePerNode <= 0 ? 1 : sizePerNode; + sizePerNode = numberOfBlocksPerNode; --- End diff -- Please don't change sizePerNode variable ---
[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3059#discussion_r246286723 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java --- @@ -609,6 +597,10 @@ public static Dictionary getDictionary(AbsoluteTableIdentifier absoluteTableIden blockAssignmentStrategy = BlockAssignmentStrategy.BLOCK_SIZE_FIRST; } else { blockAssignmentStrategy = BlockAssignmentStrategy.BLOCK_NUM_FIRST; + // fall back to BLOCK_NUM_FIRST strategy need to recalculate + // the average expected size for each node + sizePerNode = calcAvgLoadSizePerNode(blockInfos,uniqueBlocks, --- End diff -- Avoid calculating times same values. Calculate once and reuse based on the strategy. ---
[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3059#discussion_r246286475 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java --- @@ -1164,4 +1156,35 @@ private static void deleteFiles(List filesToBeDeleted) throws IOExceptio FileFactory.deleteFile(filePath, FileFactory.getFileType(filePath)); } } + + /** + * This method will calculate the average expected size for each node + * + * @param blockInfos blocks + * @param uniqueBlocks unique blocks + * @param numOfNodes if number of nodes has to be decided + * based on block location information + * @param blockAssignmentStrategy strategy used to assign blocks + * @return the average expected size for each node + */ + private static long calcAvgLoadSizePerNode(List blockInfos, --- End diff -- Please separate the code for identifying the numberOfBlocksPerNode and dataSizeperNode. Use the required variable based on the BlockAssignmentStartegy. This way this function also not required I think. Using same name for both purposes is confusing. ---
[GitHub] carbondata issue #3048: [CARBONDATA-3224] Support SDK/CSDK validate the impr...
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/3048 LGTM ---
[GitHub] carbondata pull request #3048: [CARBONDATA-3224] Support SDK/CSDK validate t...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3048#discussion_r245532720 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java --- @@ -179,7 +181,8 @@ public CarbonWriterBuilder uniqueIdentifier(long timestamp) { * * @return updated CarbonWriterBuilder */ - public CarbonWriterBuilder withLoadOptions(Map options) { + public CarbonWriterBuilder withLoadOptions(Map options) + throws IllegalArgumentException { --- End diff -- IllegalArgumentException is a RuntimeException, In general RuntimExceptions are not listed in throws part of method. It will be added in the javadoc if method like @throws IllegalArgumentException ---
[GitHub] carbondata pull request #3048: [CARBONDATA-3224] Support SDK/CSDK validate t...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3048#discussion_r245328731 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java --- @@ -179,7 +182,8 @@ public CarbonWriterBuilder uniqueIdentifier(long timestamp) { * * @return updated CarbonWriterBuilder */ - public CarbonWriterBuilder withLoadOptions(Map options) { + public CarbonWriterBuilder withLoadOptions(Map options) --- End diff -- IllegalArgumentException is used already for some validations below. Better not to change the signature of method for the existing method. It causes compilation errors for existing user code. ---
[GitHub] carbondata pull request #3047: [CARBONDATA-3223] Fixed Wrong Datasize and In...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3047#discussion_r244980360 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala --- @@ -101,14 +102,23 @@ object CarbonStore { val (dataSize, indexSize) = if (load.getFileFormat == FileFormat.ROW_V1) { // for streaming segment, we should get the actual size from the index file // since it is continuously inserting data -val segmentDir = CarbonTablePath.getSegmentPath(tablePath, load.getLoadName) +val segmentDir = CarbonTablePath + .getSegmentPath(carbonTable.getTablePath, load.getLoadName) val indexPath = CarbonTablePath.getCarbonStreamIndexFilePath(segmentDir) val indices = StreamSegment.readIndexFile(indexPath, FileFactory.getFileType(indexPath)) (indices.asScala.map(_.getFile_size).sum, FileFactory.getCarbonFile(indexPath).getSize) } else { // for batch segment, we can get the data size from table status file directly -(if (load.getDataSize == null) 0L else load.getDataSize.toLong, - if (load.getIndexSize == null) 0L else load.getIndexSize.toLong) +if (null == load.getDataSize || null == load.getIndexSize) { + // If either of datasize or indexsize comes to be null the we calculate the correct + // size and assign + val dataIndexSize = CarbonUtil.calculateDataIndexSize(carbonTable, true) --- End diff -- Show segments is a read only query. I think we should not perform write operation in a query. So, I feel its better to calculate every time and show OR just display as not available. ---
[GitHub] carbondata pull request #3035: [CARBONDATA-3216] Fix some bugs in CSDK
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3035#discussion_r244908015 --- Diff: store/CSDK/test/main.cpp --- @@ -709,6 +709,7 @@ bool testWithTableProperty(JNIEnv *env, char *path, int argc, char **argv) { writer.outputPath(path); writer.withCsvInput(jsonSchema); writer.withTableProperty("sort_columns", "shortField"); +writer.enableLocalDictionary(false); --- End diff -- Can you test if there is any possibility to give NULL in C and how it is sent to java layer ? true or false? ---
[GitHub] carbondata pull request #3035: [CARBONDATA-3216] Fix some bugs in CSDK
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3035#discussion_r244907927 --- Diff: store/CSDK/test/main.cpp --- @@ -546,7 +546,7 @@ bool testWriteData(JNIEnv *env, char *path, int argc, char *argv[]) { writer.withCsvInput(jsonSchema); writer.withLoadOption("complex_delimiter_level_1", "#"); writer.writtenBy("CSDK"); -writer.taskNo(185); +writer.taskNo(15541554.81); --- End diff -- This change not related to fix ---
[GitHub] carbondata pull request #3035: [CARBONDATA-3216] Fix some bugs in CSDK
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3035#discussion_r244907282 --- Diff: store/CSDK/src/CarbonWriter.cpp --- @@ -291,9 +291,6 @@ void CarbonWriter::localDictionaryThreshold(int localDictionaryThreshold) { } void CarbonWriter::enableLocalDictionary(bool enableLocalDictionary) { -if (enableLocalDictionary == NULL) { --- End diff -- Can you give some description in the PR, how removing this solving the issue? ---
[GitHub] carbondata pull request #3035: [CARBONDATA-3216] Fix some bugs in CSDK
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3035#discussion_r244907183 --- Diff: store/CSDK/test/main.cpp --- @@ -853,7 +854,7 @@ int main(int argc, char *argv[]) { } else { int batch = 32000; int printNum = 32000; - +// --- End diff -- remove unwanted changes ---
[GitHub] carbondata pull request #2991: [CARBONDATA-3043] Add build script and add te...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2991#discussion_r243361889 --- Diff: store/CSDK/test/main.cpp --- @@ -665,6 +699,208 @@ bool readFromS3(JNIEnv *env, char *path, char *argv[]) { printResult(env, reader); } +TEST(CSDKTest,tryCatchException) { +bool gotExp=tryCatchException(env); +EXPECT_TRUE(gotExp); +} + + +TEST(CSDKTest,tryCarbonRowException) { +char *smallFilePath = "../../../../resources/carbondata"; +try { +bool result = tryCarbonRowException(env, smallFilePath);; +EXPECT_TRUE(result) << "Expected Exception as No Index File" << result; +} catch (runtime_error e) { +EXPECT_TRUE(true); +} +} + +TEST(CSDKTest,testCarbonProperties) { +try { +bool result = testCarbonProperties(env); +EXPECT_TRUE(result) << "Carbon set properties not working "; +} catch (runtime_error e) { +EXPECT_TRUE(false) << " Exception is not expected while setting carbon properties "; +} +} + +TEST(CSDKTest,testWriteData) { +try { +bool result =testWriteData(env, "./data", my_argc, my_argv); +result = result && testWriteData(env, "./data", my_argc, my_argv); +EXPECT_TRUE(result) << "Either Data Loading Or Carbon Reader is failed"; +} catch (runtime_error e) { +EXPECT_TRUE(false) << " Exception is not expected while data loading"; +} +} + +TEST(CSDKTest,readFromLocalWithoutProjection) { +try { +char *smallFilePath = "./data_withoutpro"; +bool result =testWriteData(env, smallFilePath, my_argc, my_argv); +if(result){ +bool proj_result = readFromLocalWithoutProjection(env, smallFilePath); +EXPECT_TRUE(proj_result) << "Without Projection is failed"; +} else { +EXPECT_TRUE(result) << "Either Data Loading Or Carbon Reader is failed"; +} +} catch (runtime_error e) { +EXPECT_TRUE(false) << " Exception is not expected ,During without projection selection"; +} +} + +TEST(CSDKTest,readFromLocalWithProjection) { +try { +char *smallFilePath = "./data_pro"; +bool result =testWriteData(env, smallFilePath, my_argc, my_argv); +if(result){ +bool proj_result = readFromLocalWithProjection(env, smallFilePath); +EXPECT_TRUE(proj_result) << "With Projection is failed"; +} else { +EXPECT_TRUE(result) << "Either Data Loading Or Carbon Reader is failed"; +} +} catch (runtime_error e) { +EXPECT_TRUE(false) << " Exception is not expected ,During With projection selection"; +} +} + + +TEST(CSDKTest,readSchemaWithoutValidation) { +try { +char *path = "./data_readPath"; +bool result =testWriteData(env, path, my_argc, my_argv); +if(result){ +bool schema_result = readSchema(env, path, false); +EXPECT_TRUE(schema_result) << "Not Able to read readSchema from given path"; +} else { +EXPECT_TRUE(result) << "Either Data Loading Or Carbon Reader is failed"; +} +} catch (runtime_error e) { +EXPECT_TRUE(false) << " Exception is not expected ,During Read Schema"; +} +} + + +TEST(CSDKTest,readSchemaWithValidation) { +try { +char *path = "./data_readPathWithValidation"; +bool result =testWriteData(env, path, my_argc, my_argv); +if(result){ +bool schema_result = readSchema(env, path, true); +EXPECT_TRUE(schema_result) << "Not Able to read readSchema Or validate from given path"; +} else { +EXPECT_TRUE(result) << "Either Data Loading Or Carbon Reader is failed"; +} +} catch (runtime_error e) { +EXPECT_TRUE(false) << " Exception is not expected ,During Read Schema"; +} +} + +TEST(CSDKTest,testReadNextRowWthtVector) { +try { +int printNum = 32000; +char *path = "./data_forVector"; +bool result =testWriteData(env, path, my_argc, my_argv); +if(result){ +bool readresultWithVector= testReadNextRow(env, pa
[GitHub] carbondata pull request #2991: [CARBONDATA-3043] Add build script and add te...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2991#discussion_r243360358 --- Diff: store/CSDK/test/main.cpp --- @@ -665,6 +699,208 @@ bool readFromS3(JNIEnv *env, char *path, char *argv[]) { printResult(env, reader); } +TEST(CSDKTest,tryCatchException) { +bool gotExp=tryCatchException(env); +EXPECT_TRUE(gotExp); +} + + +TEST(CSDKTest,tryCarbonRowException) { +char *smallFilePath = "../../../../resources/carbondata"; +try { +bool result = tryCarbonRowException(env, smallFilePath);; +EXPECT_TRUE(result) << "Expected Exception as No Index File" << result; +} catch (runtime_error e) { +EXPECT_TRUE(true); +} +} + +TEST(CSDKTest,testCarbonProperties) { +try { +bool result = testCarbonProperties(env); +EXPECT_TRUE(result) << "Carbon set properties not working "; +} catch (runtime_error e) { +EXPECT_TRUE(false) << " Exception is not expected while setting carbon properties "; +} +} + +TEST(CSDKTest,testWriteData) { +try { +bool result =testWriteData(env, "./data", my_argc, my_argv); +result = result && testWriteData(env, "./data", my_argc, my_argv); +EXPECT_TRUE(result) << "Either Data Loading Or Carbon Reader is failed"; +} catch (runtime_error e) { +EXPECT_TRUE(false) << " Exception is not expected while data loading"; +} +} + +TEST(CSDKTest,readFromLocalWithoutProjection) { +try { +char *smallFilePath = "./data_withoutpro"; +bool result =testWriteData(env, smallFilePath, my_argc, my_argv); +if(result){ +bool proj_result = readFromLocalWithoutProjection(env, smallFilePath); +EXPECT_TRUE(proj_result) << "Without Projection is failed"; +} else { +EXPECT_TRUE(result) << "Either Data Loading Or Carbon Reader is failed"; +} +} catch (runtime_error e) { +EXPECT_TRUE(false) << " Exception is not expected ,During without projection selection"; +} +} + +TEST(CSDKTest,readFromLocalWithProjection) { +try { +char *smallFilePath = "./data_pro"; +bool result =testWriteData(env, smallFilePath, my_argc, my_argv); +if(result){ +bool proj_result = readFromLocalWithProjection(env, smallFilePath); +EXPECT_TRUE(proj_result) << "With Projection is failed"; +} else { +EXPECT_TRUE(result) << "Either Data Loading Or Carbon Reader is failed"; +} +} catch (runtime_error e) { +EXPECT_TRUE(false) << " Exception is not expected ,During With projection selection"; +} +} + + +TEST(CSDKTest,readSchemaWithoutValidation) { +try { +char *path = "./data_readPath"; +bool result =testWriteData(env, path, my_argc, my_argv); +if(result){ +bool schema_result = readSchema(env, path, false); +EXPECT_TRUE(schema_result) << "Not Able to read readSchema from given path"; +} else { +EXPECT_TRUE(result) << "Either Data Loading Or Carbon Reader is failed"; +} +} catch (runtime_error e) { +EXPECT_TRUE(false) << " Exception is not expected ,During Read Schema"; +} +} + + +TEST(CSDKTest,readSchemaWithValidation) { +try { +char *path = "./data_readPathWithValidation"; +bool result =testWriteData(env, path, my_argc, my_argv); +if(result){ +bool schema_result = readSchema(env, path, true); +EXPECT_TRUE(schema_result) << "Not Able to read readSchema Or validate from given path"; +} else { +EXPECT_TRUE(result) << "Either Data Loading Or Carbon Reader is failed"; +} +} catch (runtime_error e) { +EXPECT_TRUE(false) << " Exception is not expected ,During Read Schema"; +} +} + +TEST(CSDKTest,testReadNextRowWthtVector) { +try { +int printNum = 32000; +char *path = "./data_forVector"; +bool result =testWriteData(env, path, my_argc, my_argv); +if(result){ +bool readresultWithVector= testReadNextRow(env, pa
[GitHub] carbondata pull request #2992: [CARBONDATA-3176] Optimize quick-start-guide ...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2992#discussion_r243168227 --- Diff: docs/quick-start-guide.md --- @@ -80,43 +80,49 @@ import org.apache.spark.sql.CarbonSession._ * Create a CarbonSession : ``` -val carbon = SparkSession.builder().config(sc.getConf) - .getOrCreateCarbonSession("") +val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("") ``` -**NOTE**: By default metastore location points to `../carbon.metastore`, user can provide own metastore location to CarbonSession like `SparkSession.builder().config(sc.getConf) -.getOrCreateCarbonSession("", "")` +**NOTE** + - By default metastore location points to `../carbon.metastore`, user can provide own metastore location to CarbonSession like + `SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("", "")`. + - Data storage location can be specified by ``, like `/carbon/data/store`, `hdfs://localhost:9000/carbon/data/store` or `s3a://carbon/data/store`. Executing Queries ## Creating a Table ``` -scala>carbon.sql("CREATE TABLE -IF NOT EXISTS test_table( -id string, -name string, -city string, -age Int) - STORED AS carbondata") +carbon.sql( --- End diff -- This is scala code format. I think examples need not follow scala code format. I think the examples are wrapped to multiple lines for readability (Even if document converted to PDF) @sraghunandan , @sgururajshetty please help to confirm the standard convention followed. ---
[GitHub] carbondata pull request #2919: [CARBONDATA-3097] Support folder path in getV...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2919#discussion_r242588569 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java --- @@ -241,12 +241,25 @@ private static Schema readSchemaFromIndexFile(String indexFilePath) throws IOExc /** * This method return the version details in formatted string by reading from carbondata file + * default won't validate the version details between different carbondata files. * - * @param dataFilePath - * @return + * @param path carbondata file path or folder path + * @return string with information of who has written this file + * in which carbondata project version * @throws IOException */ - public static String getVersionDetails(String dataFilePath) throws IOException { + public static String getVersionDetails(String path) throws IOException { +if (path.endsWith(INDEX_FILE_EXT)) { + throw new RuntimeException("Can't get version details from carbonindex file."); --- End diff -- RuntimeException is mentioned in the signature. Should throw IOException. Also some places in this class throws CarbonDataLoadingException but if not intentional we can change them to IOException as per signature instead of a subclass of RuntimeException (if it was not intentional) ---
[GitHub] carbondata pull request #2919: [CARBONDATA-3097] Support folder path in getV...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2919#discussion_r242578856 --- Diff: docs/sdk-guide.md --- @@ -816,13 +816,15 @@ Find example code at [CarbonReaderExample](https://github.com/apache/carbondata/ * This method return the version details in formatted string by reading from carbondata file * If application name is SDK_1.0.0 and this has written the carbondata file in carbondata 1.6 project version, * then this API returns the String "SDK_1.0.0 in version: 1.6.0-SNAPSHOT" - * @param dataFilePath complete path including carbondata file name + * Default value of validate is false, it won't validate the version details between different carbondata files. --- End diff -- " Default value of validate is false," - This statement is not valid as we don't have validate and no validate flows differently. ---
[GitHub] carbondata pull request #2919: [CARBONDATA-3097] Support folder path in getV...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2919#discussion_r242577513 --- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonSchemaReaderTest.java --- @@ -236,6 +243,113 @@ public void testReadSchemaWithDifferentSchema() { } } + @Test + public void testGetVersionDetailsAndValidate() { +try { + String versionDetails = CarbonSchemaReader + .getVersionDetails(path); + Assert.assertTrue(versionDetails.contains("CarbonSchemaReaderTest in version: ")); +} catch (Throwable e) { + e.printStackTrace(); + Assert.fail(); +} + } + + @Test + public void testGetVersionDetailsWithoutValidate() { +try { + String versionDetails = CarbonSchemaReader + .getVersionDetails(path); + Assert.assertTrue(versionDetails.contains("CarbonSchemaReaderTest in version: ")); +} catch (Throwable e) { + e.printStackTrace(); + Assert.fail(); +} + } + + @Test + public void testGetVersionDetailsWithCarbonDataFile() { +try { + File[] dataFiles = new File(path).listFiles(new FilenameFilter() { +@Override +public boolean accept(File dir, String name) { + return name.endsWith(CARBON_DATA_EXT); +} + }); + String versionDetails = CarbonSchemaReader.getVersionDetails(dataFiles[0].getAbsolutePath()); + Assert.assertTrue(versionDetails.contains("CarbonSchemaReaderTest in version: ")); +} catch (Throwable e) { + e.printStackTrace(); + Assert.fail(); +} + } + + @Test + public void testGetVersionDetailsWithCarbonIndexFile() { +try { + File[] indexFiles = new File(path).listFiles(new FilenameFilter() { +@Override +public boolean accept(File dir, String name) { + return name.endsWith(INDEX_FILE_EXT); +} + }); + CarbonSchemaReader.getVersionDetails(indexFiles[0].getAbsolutePath()); + Assert.fail(); +} catch (Throwable e) { + Assert.assertTrue(e.getMessage() + .equalsIgnoreCase("Can't get version details from carbonindex file.")); +} + } + + public void testGetVersionDetailsWithTheSameSchema() { +try { + writeData(); + try { +String versionDetails = CarbonSchemaReader +.getVersionDetails(path); +Assert.assertTrue(versionDetails +.contains("CarbonSchemaReaderTest in version: ")); + } catch (Exception e) { +Assert.fail(); + } +} catch (Throwable e) { + e.printStackTrace(); + Assert.fail(); +} + } + + @Test + public void testGetVersionDetailsWithDifferentSchema() { +try { + int num = 10; + Field[] fields = new Field[2]; + fields[0] = new Field("name", DataTypes.STRING); + fields[1] = new Field("age", DataTypes.INT); + CarbonWriter writer = CarbonWriter + .builder() + .outputPath(path) + .withCsvInput(new Schema(fields)) + .writtenBy("testReadSchemaWithDifferentSchema") + .build(); + + for (int i = 0; i < num; i++) { +writer.write(new String[]{"robot" + (i % 10), String.valueOf(i)}); + } + writer.close(); + try { +CarbonSchemaReader +.getVersionDetails(path); + } catch (Exception e) { +Assert.assertTrue(e.getMessage() +.equalsIgnoreCase("Version details is different between different files.")); +Assert.fail(); --- End diff -- What is the purpose of this test case? "Version details is different between different files" exception we never get now right? If thee is no separate validation required, we can remove this test. ---
[GitHub] carbondata pull request #2919: [CARBONDATA-3097] Support folder path in getV...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2919#discussion_r242577599 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java --- @@ -241,12 +241,25 @@ private static Schema readSchemaFromIndexFile(String indexFilePath) throws IOExc /** * This method return the version details in formatted string by reading from carbondata file + * default won't validate the version details between different carbondata files. * - * @param dataFilePath - * @return + * @param path carbondata file path or folder path + * @return string with information of who has written this file + * in which carbondata project version * @throws IOException */ - public static String getVersionDetails(String dataFilePath) throws IOException { + public static String getVersionDetails(String path) throws IOException { +if (path.endsWith(INDEX_FILE_EXT)) { + throw new RuntimeException("Can't get version details from carbonindex file."); +} else if (path.endsWith(CARBON_DATA_EXT)) { + return getVersionDetailsFromDataFile(path); +} else { + String indexFilePath = getCarbonFile(path, CARBON_DATA_EXT)[0].getAbsolutePath(); --- End diff -- this is data file ---
[GitHub] carbondata issue #2931: [CARBONDATA-2999] support read schema from S3
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/2931 LGTM ---
[GitHub] carbondata pull request #2919: [CARBONDATA-3097] Support folder path in getV...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2919#discussion_r242035462 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java --- @@ -241,12 +241,52 @@ private static Schema readSchemaFromIndexFile(String indexFilePath) throws IOExc /** * This method return the version details in formatted string by reading from carbondata file + * If validate is true, it will check the version details between different carbondata files. + * And if version details are not the same, it will throw exception * - * @param dataFilePath - * @return + * @param path carbondata file path or folder path + * @param validate whether validate the version details between different carbondata files. + * @return string with information of who has written this file + * in which carbondata project version * @throws IOException */ - public static String getVersionDetails(String dataFilePath) throws IOException { + public static String getVersionDetails(String path, boolean validate) throws IOException { --- End diff -- I think it is not correct to validate readability through version details. In general new version jars can read all old version files. Please remove this method. ---
[GitHub] carbondata pull request #2931: [CARBONDATA-2999] support read schema from S3
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2931#discussion_r242034641 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java --- @@ -147,34 +170,55 @@ public static Schema readSchema(String path, boolean validateSchema) throws IOEx throw new CarbonDataLoadingException("No carbonindex file in this path."); } } else { - String indexFilePath = getCarbonFile(path, INDEX_FILE_EXT)[0].getAbsolutePath(); - return readSchemaFromIndexFile(indexFilePath); + String indexFilePath = getCarbonFile(path, INDEX_FILE_EXT, conf)[0].getAbsolutePath(); + return readSchemaFromIndexFile(indexFilePath, conf); } } + /** + * read schema from path, + * path can be folder path, carbonindex file path, and carbondata file path + * and user can decide whether check all files schema + * + * @param path file/folder path + * @param validateSchema whether check all files schema + * @return schema + * @throws IOException + */ + public static Schema readSchema(String path, boolean validateSchema) throws IOException { +Configuration conf = new Configuration(); +return readSchema(path, validateSchema, conf); + } + /** * Read carbondata file and return the schema * This interface will be removed, * please use readSchema instead of this interface * * @param dataFilePath carbondata file store path + * @param conf hadoop configuration support, can set s3a AK,SK, + * end point and other conf with this * @return Schema object * @throws IOException */ @Deprecated - public static Schema readSchemaInDataFile(String dataFilePath) throws IOException { -return readSchema(dataFilePath, false); + public static Schema readSchemaInDataFile(String dataFilePath, Configuration conf) --- End diff -- No need to change the deprecated old method. Keep as it is. ---
[GitHub] carbondata issue #2899: [CARBONDATA-3073][CARBONDATA-3044] Support configure...
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/2899 LGTM ---
[GitHub] carbondata pull request #2899: [CARBONDATA-3073][CARBONDATA-3044] Support co...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2899#discussion_r241731133 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java --- @@ -77,8 +77,9 @@ /** * Sets the output path of the writer builder + * --- End diff -- Can avoid the space changes in multiple places if not intentional. ---
[GitHub] carbondata pull request #2899: [CARBONDATA-3073][CARBONDATA-3044] Support co...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2899#discussion_r241730265 --- Diff: store/CSDK/src/CarbonWriter.cpp --- @@ -98,6 +127,158 @@ void CarbonWriter::withHadoopConf(char *key, char *value) { carbonWriterBuilderObject = jniEnv->CallObjectMethodA(carbonWriterBuilderObject, methodID, args); } +void CarbonWriter::withTableProperty(char *key, char *value) { +if (key == NULL) { +throw std::runtime_error("key parameter can't be NULL."); +} +if (value == NULL) { +throw std::runtime_error("value parameter can't be NULL."); +} +checkBuilder(); +jclass carbonWriterBuilderClass = jniEnv->GetObjectClass(carbonWriterBuilderObject); +jmethodID methodID = jniEnv->GetMethodID(carbonWriterBuilderClass, "withTableProperty", + "(Ljava/lang/String;Ljava/lang/String;)Lorg/apache/carbondata/sdk/file/CarbonWriterBuilder;"); +if (methodID == NULL) { +throw std::runtime_error("Can't find the method in java: withTableProperty"); +} +jvalue args[2]; +args[0].l = jniEnv->NewStringUTF(key); +args[1].l = jniEnv->NewStringUTF(value); +carbonWriterBuilderObject = jniEnv->CallObjectMethodA(carbonWriterBuilderObject, methodID, args); --- End diff -- Java code may throw IllegalArgumentException for unsuported property. We should add exception check after the method call ---
[GitHub] carbondata issue #2986: [CARBONDATA-3166]Updated Document and added Column C...
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/2986 LGTM ---
[GitHub] carbondata pull request #2986: [CARBONDATA-3166]Updated Document and added C...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2986#discussion_r241477189 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala --- @@ -92,7 +92,9 @@ private[sql] case class CarbonDescribeFormattedCommand( Strings.formatSize( tblProps.getOrElse(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB, CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB_DEFAULT).toFloat), ""), - + ("Carbon Column Compressor ", tblProps --- End diff -- OK ---
[GitHub] carbondata pull request #2986: [CARBONDATA-3166]Updated Document and added C...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2986#discussion_r241363901 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala --- @@ -92,7 +92,9 @@ private[sql] case class CarbonDescribeFormattedCommand( Strings.formatSize( tblProps.getOrElse(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB, CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB_DEFAULT).toFloat), ""), - + ("Carbon Column Compressor ", tblProps --- End diff -- By default we should display empty instead of system default. ---
[GitHub] carbondata pull request #2986: [CARBONDATA-3166]Updated Document and added C...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2986#discussion_r241338497 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala --- @@ -92,7 +92,9 @@ private[sql] case class CarbonDescribeFormattedCommand( Strings.formatSize( tblProps.getOrElse(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB, CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB_DEFAULT).toFloat), ""), - + ("Carbon Column Compressor ", tblProps --- End diff -- Same compressor is used for column and row data files. So, please change it to "Data File Compressor" ---
[GitHub] carbondata issue #2847: [CARBONDATA-3005]Support Gzip as column compressor
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/2847 LGTM ---
[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r240208519 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +/** + * Codec Class for performing Gzip Compression + */ +public class GzipCompressor extends AbstractCompressor { + + @Override public String getName() { +return "gzip"; + } + + /** + * This method takes the Byte Array data and Compresses in gzip format + * + * @param data Data Byte Array passed for compression + * @return Compressed Byte Array + */ + private byte[] compressData(byte[] data) { +ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); --- End diff -- ByteArrayOutputStream initializes with 32 and copies the data to new byte[] on expansion. Can you use a better initial size to limit the number of copies during expansion. Snappy has a utility (maxCompressedLength) to calculate the same, you check if any gzip libs has similar method. If not we an use based a test with max possible compression ratio. ---
[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r240206699 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +/** + * Codec Class for performing Gzip Compression + */ +public class GzipCompressor extends AbstractCompressor { + + @Override public String getName() { +return "gzip"; + } + + /** + * This method takes the Byte Array data and Compresses in gzip format + * + * @param data Data Byte Array passed for compression + * @return Compressed Byte Array + */ + private byte[] compressData(byte[] data) { +ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); +try { + GzipCompressorOutputStream gzipCompressorOutputStream = + new GzipCompressorOutputStream(byteArrayOutputStream); + try { +/** + * Below api will write bytes from specified byte array to the gzipCompressorOutputStream + * The output stream will compress the given byte array. + */ +gzipCompressorOutputStream.write(data); + } catch (IOException e) { +throw new RuntimeException("Error during Compression writing step ", e); + } finally { +gzipCompressorOutputStream.close(); + } +} catch (IOException e) { + throw new RuntimeException("Error during Compression step ", e); +} +return byteArrayOutputStream.toByteArray(); + } + + /** + * This method takes the Byte Array data and Decompresses in gzip format + * + * @param data Data Byte Array for Compression + * @param offset Start value of Data Byte Array + * @param length Size of Byte Array + * @return + */ + private byte[] decompressData(byte[] data, int offset, int length) { +ByteArrayInputStream byteArrayOutputStream = new ByteArrayInputStream(data, offset, length); +ByteArrayOutputStream byteOutputStream = new ByteArrayOutputStream(); +try { + GzipCompressorInputStream gzipCompressorInputStream = + new GzipCompressorInputStream(byteArrayOutputStream); + byte[] buffer = new byte[1024]; --- End diff -- Instead of fixed 1024, can you observe what is the blocksize (bytes size) gzip operates and use the same value ? ---
[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r240211334 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala --- @@ -252,50 +253,94 @@ class TestLoadDataWithCompression extends QueryTest with BeforeAndAfterEach with """.stripMargin) } - test("test data loading with snappy compressor and offheap") { + test("test data loading with different compressors and offheap") { +for(comp <- compressors){ + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true") --- End diff -- Should we have UT for enable.unsafe.in.query.processing ture and false ? ---
[GitHub] carbondata pull request #2847: [CARBONDATA-3005]Support Gzip as column compr...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r240147384 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +/** + * Codec Class for performing Gzip Compression + */ +public class GzipCompressor extends AbstractCompressor { + + @Override public String getName() { +return "gzip"; + } + + /** + * This method takes the Byte Array data and Compresses in gzip format + * + * @param data Data Byte Array passed for compression + * @return Compressed Byte Array + */ + private byte[] compressData(byte[] data) { +ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); +try { + GzipCompressorOutputStream gzipCompressorOutputStream = + new GzipCompressorOutputStream(byteArrayOutputStream); + try { +/** + * Below api will write bytes from specified byte array to the gzipCompressorOutputStream + * The output stream will compress the given byte array. + */ +gzipCompressorOutputStream.write(data); + } catch (IOException e) { +throw new RuntimeException("Error during Compression step " + e.getMessage()); --- End diff -- Don't skip the actual exception. Add original exception also as the cause to RunTimeException ---
[GitHub] carbondata issue #2915: [CARBONDATA-3095] Optimize the documentation of SDK/...
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/2915 LGTM ---
[GitHub] carbondata pull request #2915: [CARBONDATA-3095] Optimize the documentation ...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2915#discussion_r235648435 --- Diff: docs/sdk-guide.md --- @@ -674,6 +693,16 @@ Find example code at [CarbonReaderExample](https://github.com/apache/carbondata/ public CarbonReaderBuilder filter(Expression filterExpression); ``` +``` + /** + * set read batch size before build --- End diff -- Sets the batch size of records ---
[GitHub] carbondata pull request #2915: [CARBONDATA-3095] Optimize the documentation ...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2915#discussion_r235645252 --- Diff: docs/sdk-guide.md --- @@ -684,6 +713,17 @@ Find example code at [CarbonReaderExample](https://github.com/apache/carbondata/ public CarbonReaderBuilder withHadoopConf(Configuration conf); ``` +``` + /** + * configure hadoop configuration with key value --- End diff -- Change to "Updates the hadoop configuration with the given key value" ---
[GitHub] carbondata issue #2929: [CARBONDATA-3108][CARBONDATA-3044] Fix the error of ...
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/2929 LGTM ---
[GitHub] carbondata pull request #2929: [CARBONDATA-3108][CARBONDATA-3044] Fix the er...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2929#discussion_r235266810 --- Diff: store/CSDK/src/CarbonRow.cpp --- @@ -170,6 +170,9 @@ char *CarbonRow::getString(int ordinal) { args[0].l = carbonRow; args[1].i = ordinal; jobject data = jniEnv->CallStaticObjectMethodA(rowUtilClass, getStringId, args); +if (jniEnv->ExceptionCheck()) { --- End diff -- ok ---
[GitHub] carbondata pull request #2929: [CARBONDATA-3108][CARBONDATA-3044] Fix the er...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2929#discussion_r235261186 --- Diff: store/CSDK/src/CarbonRow.cpp --- @@ -170,6 +170,9 @@ char *CarbonRow::getString(int ordinal) { args[0].l = carbonRow; args[1].i = ordinal; jobject data = jniEnv->CallStaticObjectMethodA(rowUtilClass, getStringId, args); +if (jniEnv->ExceptionCheck()) { --- End diff -- Exception check should be done for all other method calls also like getBoolean, getFloat,etc. ArrayIndexOutofBoundsException on wrong index & cast exceptions on wrong method usage are possible ---
[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/2804 LGTM ---
[GitHub] carbondata pull request #2804: [CARBONDATA-2996] CarbonSchemaReader support ...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2804#discussion_r231061128 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java --- @@ -61,14 +65,122 @@ public static Schema readSchemaInSchemaFile(String schemaFilePath) throws IOExce return new Schema(schemaList); } + /** + * get carbondata/carbonindex file in path + * + * @param path carbon file path + * @return CarbonFile array + */ + private static CarbonFile[] getCarbonFile(String path, final String extension) + throws IOException { +String dataFilePath = path; +if (!(dataFilePath.contains(extension))) { + CarbonFile[] carbonFiles = FileFactory + .getCarbonFile(path) + .listFiles(new CarbonFileFilter() { +@Override +public boolean accept(CarbonFile file) { + if (file == null) { +return false; + } + return file.getName().endsWith(extension); +} + }); + if (carbonFiles == null || carbonFiles.length < 1) { +throw new IOException("Carbon file not exists."); + } + return carbonFiles; +} +return null; + } + + /** + * read schema from path, + * path can be folder path, carbonindex file path, and carbondata file path + * and will not check all files schema + * + * @param path file/folder path + * @return schema + * @throws IOException + */ + public static Schema readSchema(String path) throws IOException { +return readSchema(path, false); + } + + /** + * read schema from path, + * path can be folder path, carbonindex file path, and carbondata file path + * and user can decide whether check all files schema + * + * @param path file/folder path + * @param validateSchema whether check all files schema + * @return schema + * @throws IOException + */ + public static Schema readSchema(String path, boolean validateSchema) throws IOException { +if (path.endsWith(INDEX_FILE_EXT)) { + return readSchemaFromIndexFile(path); +} else if (path.endsWith(CARBON_DATA_EXT)) { + return readSchemaFromDataFile(path); +} else if (validateSchema) { + CarbonFile[] carbonIndexFiles = getCarbonFile(path, INDEX_FILE_EXT); + Schema schema; + if (carbonIndexFiles != null && carbonIndexFiles.length != 0) { +schema = readSchemaFromIndexFile(carbonIndexFiles[0].getAbsolutePath()); +for (int i = 1; i < carbonIndexFiles.length; i++) { + Schema schema2 = readSchemaFromIndexFile(carbonIndexFiles[i].getAbsolutePath()); + if (schema != schema2) { +throw new CarbonDataLoadingException("Schema is different between different files."); + } +} +CarbonFile[] carbonDataFiles = getCarbonFile(path, CARBON_DATA_EXT); +for (int i = 0; i < carbonDataFiles.length; i++) { + Schema schema2 = readSchemaFromDataFile(carbonDataFiles[i].getAbsolutePath()); + if (!schema.equals(schema2)) { +throw new CarbonDataLoadingException("Schema is different between different files."); + } +} +return schema; + } else { +throw new CarbonDataLoadingException("No carbonindex file in this path."); + } +} else { + String indexFilePath = getCarbonFile(path, INDEX_FILE_EXT)[0].getAbsolutePath(); + if (indexFilePath != null) { --- End diff -- I think this null check is not required. Is there any chance the absolute path can be null ? ---
[GitHub] carbondata pull request #2804: [CARBONDATA-2996] CarbonSchemaReader support ...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2804#discussion_r231060332 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java --- @@ -144,4 +246,28 @@ public static Schema readSchemaInIndexFile(String indexFilePath) throws IOExcept } } + /** + * This method return the version details in formatted string by reading from carbondata file + * + * @param dataFilePath + * @return + * @throws IOException + */ + public static String getVersionDetails(String dataFilePath) throws IOException { --- End diff -- This complete method is displayed as removed and added again. Is it possible to avoid? ---
[GitHub] carbondata pull request #2804: [CARBONDATA-2996] CarbonSchemaReader support ...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2804#discussion_r231059908 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java --- @@ -61,14 +65,122 @@ public static Schema readSchemaInSchemaFile(String schemaFilePath) throws IOExce return new Schema(schemaList); } + /** + * get carbondata/carbonindex file in path + * + * @param path carbon file path + * @return CarbonFile array + */ + private static CarbonFile[] getCarbonFile(String path, final String extension) + throws IOException { +String dataFilePath = path; +if (!(dataFilePath.contains(extension))) { + CarbonFile[] carbonFiles = FileFactory + .getCarbonFile(path) + .listFiles(new CarbonFileFilter() { +@Override +public boolean accept(CarbonFile file) { + if (file == null) { +return false; + } + return file.getName().endsWith(extension); +} + }); + if (carbonFiles == null || carbonFiles.length < 1) { +throw new IOException("Carbon file not exists."); + } + return carbonFiles; +} +return null; --- End diff -- We can stick to one contract from the method. Either return the list or throw exception. Generally listing APIs should not return null, if this case is not expected, we can throw exception. ---
[GitHub] carbondata pull request #2804: [CARBONDATA-2996] CarbonSchemaReader support ...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2804#discussion_r231058670 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java --- @@ -61,14 +65,122 @@ public static Schema readSchemaInSchemaFile(String schemaFilePath) throws IOExce return new Schema(schemaList); } + /** + * get carbondata/carbonindex file in path + * + * @param path carbon file path + * @return CarbonFile array + */ + private static CarbonFile[] getCarbonFile(String path, final String extension) + throws IOException { +String dataFilePath = path; +if (!(dataFilePath.contains(extension))) { + CarbonFile[] carbonFiles = FileFactory + .getCarbonFile(path) + .listFiles(new CarbonFileFilter() { +@Override +public boolean accept(CarbonFile file) { + if (file == null) { +return false; + } + return file.getName().endsWith(extension); +} + }); + if (carbonFiles == null || carbonFiles.length < 1) { +throw new IOException("Carbon file not exists."); + } + return carbonFiles; +} +return null; + } + + /** + * read schema from path, + * path can be folder path, carbonindex file path, and carbondata file path + * and will not check all files schema + * + * @param path file/folder path + * @return schema + * @throws IOException + */ + public static Schema readSchema(String path) throws IOException { +return readSchema(path, false); + } + + /** + * read schema from path, + * path can be folder path, carbonindex file path, and carbondata file path + * and user can decide whether check all files schema + * + * @param path file/folder path + * @param validateSchema whether check all files schema + * @return schema + * @throws IOException + */ + public static Schema readSchema(String path, boolean validateSchema) throws IOException { +if (path.endsWith(INDEX_FILE_EXT)) { + return readSchemaFromIndexFile(path); +} else if (path.endsWith(CARBON_DATA_EXT)) { + return readSchemaFromDataFile(path); +} else if (validateSchema) { + CarbonFile[] carbonIndexFiles = getCarbonFile(path, INDEX_FILE_EXT); + Schema schema; + if (carbonIndexFiles != null && carbonIndexFiles.length != 0) { +schema = readSchemaFromIndexFile(carbonIndexFiles[0].getAbsolutePath()); +for (int i = 1; i < carbonIndexFiles.length; i++) { + Schema schema2 = readSchemaFromIndexFile(carbonIndexFiles[i].getAbsolutePath()); + if (schema != schema2) { +throw new CarbonDataLoadingException("Schema is different between different files."); + } +} +CarbonFile[] carbonDataFiles = getCarbonFile(path, CARBON_DATA_EXT); +for (int i = 0; i < carbonDataFiles.length; i++) { + Schema schema2 = readSchemaFromDataFile(carbonDataFiles[i].getAbsolutePath()); + if (!schema.equals(schema2)) { +throw new CarbonDataLoadingException("Schema is different between different files."); + } +} +return schema; + } else { +throw new CarbonDataLoadingException("No carbonindex file in this path."); + } +} else { + String indexFilePath = getCarbonFile(path, INDEX_FILE_EXT)[0].getAbsolutePath(); + if (indexFilePath != null) { +return readSchemaFromIndexFile(indexFilePath); + } else { +String dataFilePath = getCarbonFile(path, CARBON_DATA_EXT)[0].getAbsolutePath(); --- End diff -- As per getCarbonFile(...) implementation, if there is no INDEX file found, it throws exception. So, there is no need of this else case ? ---
[GitHub] carbondata pull request #2804: [CARBONDATA-2996] CarbonSchemaReader support ...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2804#discussion_r231057030 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java --- @@ -61,14 +65,122 @@ public static Schema readSchemaInSchemaFile(String schemaFilePath) throws IOExce return new Schema(schemaList); } + /** + * get carbondata/carbonindex file in path + * + * @param path carbon file path + * @return CarbonFile array + */ + private static CarbonFile[] getCarbonFile(String path, final String extension) + throws IOException { +String dataFilePath = path; +if (!(dataFilePath.contains(extension))) { + CarbonFile[] carbonFiles = FileFactory + .getCarbonFile(path) + .listFiles(new CarbonFileFilter() { +@Override +public boolean accept(CarbonFile file) { + if (file == null) { +return false; + } + return file.getName().endsWith(extension); +} + }); + if (carbonFiles == null || carbonFiles.length < 1) { +throw new IOException("Carbon file not exists."); + } + return carbonFiles; +} +return null; + } + + /** + * read schema from path, + * path can be folder path, carbonindex file path, and carbondata file path + * and will not check all files schema + * + * @param path file/folder path + * @return schema + * @throws IOException + */ + public static Schema readSchema(String path) throws IOException { +return readSchema(path, false); + } + + /** + * read schema from path, + * path can be folder path, carbonindex file path, and carbondata file path + * and user can decide whether check all files schema + * + * @param path file/folder path + * @param validateSchema whether check all files schema + * @return schema + * @throws IOException + */ + public static Schema readSchema(String path, boolean validateSchema) throws IOException { +if (path.endsWith(INDEX_FILE_EXT)) { + return readSchemaFromIndexFile(path); +} else if (path.endsWith(CARBON_DATA_EXT)) { + return readSchemaFromDataFile(path); +} else if (validateSchema) { + CarbonFile[] carbonIndexFiles = getCarbonFile(path, INDEX_FILE_EXT); + Schema schema; + if (carbonIndexFiles != null && carbonIndexFiles.length != 0) { +schema = readSchemaFromIndexFile(carbonIndexFiles[0].getAbsolutePath()); +for (int i = 1; i < carbonIndexFiles.length; i++) { + Schema schema2 = readSchemaFromIndexFile(carbonIndexFiles[i].getAbsolutePath()); + if (schema != schema2) { --- End diff -- use equals .. schema.equals(schema2) ---
[GitHub] carbondata pull request #2804: [CARBONDATA-2996] CarbonSchemaReader support ...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2804#discussion_r230802487 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java --- @@ -61,14 +65,121 @@ public static Schema readSchemaInSchemaFile(String schemaFilePath) throws IOExce return new Schema(schemaList); } + /** + * get carbondata/carbonindex file in path + * + * @param path carbon file path + * @return CarbonFile array + */ + private static CarbonFile[] getCarbonFile(String path, final String extension) { +String dataFilePath = path; +if (!(dataFilePath.contains(extension))) { + CarbonFile[] carbonFiles = FileFactory + .getCarbonFile(path) + .listFiles(new CarbonFileFilter() { +@Override +public boolean accept(CarbonFile file) { + if (file == null) { +return false; + } + return file.getName().endsWith(extension); +} + }); + if (carbonFiles == null || carbonFiles.length < 1) { +throw new RuntimeException("Carbon file not exists."); --- End diff -- Why RunTimeException, IO related failures should throw IOException ---
[GitHub] carbondata pull request #2804: [CARBONDATA-2996] CarbonSchemaReader support ...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2804#discussion_r230801853 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java --- @@ -61,14 +65,121 @@ public static Schema readSchemaInSchemaFile(String schemaFilePath) throws IOExce return new Schema(schemaList); } + /** + * get carbondata/carbonindex file in path + * + * @param path carbon file path + * @return CarbonFile array + */ + private static CarbonFile[] getCarbonFile(String path, final String extension) { +String dataFilePath = path; +if (!(dataFilePath.contains(extension))) { + CarbonFile[] carbonFiles = FileFactory + .getCarbonFile(path) + .listFiles(new CarbonFileFilter() { +@Override +public boolean accept(CarbonFile file) { + if (file == null) { +return false; + } + return file.getName().endsWith(extension); +} + }); + if (carbonFiles == null || carbonFiles.length < 1) { +throw new RuntimeException("Carbon file not exists."); + } + return carbonFiles; +} +return null; + } + + /** + * read schema from path, + * path can be folder path, carbonindex file path, and carbondata file path + * and will not check all files schema + * + * @param path file/folder path + * @return schema + * @throws IOException + */ + public static Schema readSchema(String path) throws IOException { +return readSchema(path, false); + } + + /** + * read schema from path, + * path can be folder path, carbonindex file path, and carbondata file path + * and user can decide whether check all files schema + * + * @param path file/folder path + * @param checkFilesSchema whether check all files schema + * @return schema + * @throws IOException + */ + public static Schema readSchema(String path, boolean checkFilesSchema) throws IOException { --- End diff -- readSchema(String path, boolean checkFilesSchema) -- Is this schema validation method is required ? If no use case we can skip this.. during query execution anyways schema is validated. ---
[GitHub] carbondata pull request #2804: [CARBONDATA-2996] CarbonSchemaReader support ...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2804#discussion_r230799912 --- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonSchemaReaderTest.java --- @@ -101,18 +104,30 @@ public boolean accept(CarbonFile file) { String dataFilePath = carbonFiles[0].getAbsolutePath(); Schema schema = CarbonSchemaReader - .readSchemaInDataFile(dataFilePath) + .readSchema(dataFilePath) .asOriginOrder(); assertEquals(schema.getFieldsLength(), 12); checkSchema(schema); +} catch (Throwable e) { + e.printStackTrace(); --- End diff -- should fail ---
[GitHub] carbondata pull request #2804: [CARBONDATA-2996] CarbonSchemaReader support ...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2804#discussion_r230788337 --- Diff: docs/sdk-guide.md --- @@ -685,6 +685,31 @@ Find example code at [CarbonReaderExample](https://github.com/apache/carbondata/ */ public static Schema readSchemaInIndexFile(String indexFilePath); ``` +``` + /** + * read schema from path, + * path can be folder path,carbonindex file path, and carbondata file path + * and will not check all files schema + * + * @param path file/folder path + * @return schema + * @throws IOException + */ + public static Schema readSchema(String path); +``` +``` + /** + * read schema from path, + * path can be folder path,carbonindex file path, and carbondata file path + * and user can decide whether check all files schema + * + * @param path file/folder path + * @param checkFilesSchema whether check all files schema + * @return schema + * @throws IOException + */ + public static Schema readSchema(String path, boolean checkFilesSchema); --- End diff -- checkFilesSchema should be validateSchema ---
[GitHub] carbondata issue #2816: [CARBONDATA-3003] Suppor read batch row in CSDK
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/2816 LGTM ---
[GitHub] carbondata pull request #2869: [CARBONDATA-3057] Implement VectorizedReader ...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2869#discussion_r230274806 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java --- @@ -51,6 +54,7 @@ private Expression filterExpression; private String tableName; private Configuration hadoopConf; + private boolean useVectorReader; --- End diff -- I suggest we can make vector as default as we see benefit in the performance. withVectorReader() can be a deprecated to support just for disable in case of issues till the time vector reader is stabilized. @ravipesala , is the fine? ---
[GitHub] carbondata issue #2884: [CARBONDATA-3063] Support set and get carbon propert...
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/2884 LGTM ---
[GitHub] carbondata issue #2807: [CARBONDATA-2997] Support read schema from index fil...
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/2807 LGTM ---
[GitHub] carbondata issue #2807: [CARBONDATA-2997] Support read schema from index fil...
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/2807 Ok Xubo, fine. ---
[GitHub] carbondata issue #2807: [CARBONDATA-2997] Support read schema from index fil...
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/2807 I suggest let's hold this PR till 2804 is concluded & merged to avoid duplicate work. ---
[GitHub] carbondata issue #2804: [CARBONDATA-2996] CarbonSchemaReader support read sc...
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/2804 I think we can enhance the existing APIs itself to take folder path and then list from the first available data file or index file.. Adding many APIs in SDK will cause confusion for same functionality. I think we can unify to single method like getSchemaFromPath() and return the schema irrespective of index file or data file or multiple subfolders. ---
[GitHub] carbondata issue #2836: [CARBONDATA-3027] Increase unsafe working memory def...
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/2836 I think this PR may not be require any more after #2841. @xubo245 , our intention is to make unsafe configuration flexible and even with default values we should be able to continue with heap memory. ---
[GitHub] carbondata issue #2877: [CARBONDATA-3061] Add validation for supported forma...
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/2877 LGTM ---
[GitHub] carbondata pull request #2877: [CARBONDATA-3061] Add validation for supporte...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2877#discussion_r229579618 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/encoder/Encoding.java --- @@ -57,10 +62,55 @@ public static Encoding valueOf(int ordinal) { return ADAPTIVE_DELTA_INTEGRAL; } else if (ordinal == RLE_INTEGRAL.ordinal()) { return RLE_INTEGRAL; +} else if (ordinal == DIRECT_STRING.ordinal()) { --- End diff -- DIRECT_STRING ordinal in thrift file is 10. In this enum it is 11. Does it cause problem for old stores if any one uses this method to get encoding? ---
[GitHub] carbondata pull request #2877: [CARBONDATA-3061] Add validation for supporte...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2877#discussion_r229578569 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java --- @@ -696,9 +697,6 @@ private void validateCarbonDataFileVersion() { CarbonCommonConstants.CARBON_DATA_FILE_DEFAULT_VERSION); } } -LOGGER.info("Carbon Current data file version: " + carbonProperties --- End diff -- Log removal is intentional ? No need to log the current version? ---
[GitHub] carbondata pull request #2877: [CARBONDATA-3061] Add validation for supporte...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2877#discussion_r229578345 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/encoder/Encoding.java --- @@ -57,10 +62,55 @@ public static Encoding valueOf(int ordinal) { return ADAPTIVE_DELTA_INTEGRAL; } else if (ordinal == RLE_INTEGRAL.ordinal()) { return RLE_INTEGRAL; +} else if (ordinal == DIRECT_STRING.ordinal()) { + return DIRECT_STRING; +} else if (ordinal == ADAPTIVE_FLOATING.ordinal()) { + return ADAPTIVE_FLOATING; +} else if (ordinal == BOOL_BYTE.ordinal()) { + return BOOL_BYTE; +} else if (ordinal == ADAPTIVE_DELTA_FLOATING.ordinal()) { + return ADAPTIVE_DELTA_FLOATING; } else if (ordinal == DIRECT_COMPRESS_VARCHAR.ordinal()) { return DIRECT_COMPRESS_VARCHAR; } else { throw new RuntimeException("create Encoding with invalid ordinal: " + ordinal); } } + + /** + * Method to validate for supported encoding types that can be read using the current version + * + * @param encodings + * @return + */ + public static boolean assertTrueForEncodingTypes( --- End diff -- 1)This method return value is never used. So please remove the return type. 2) Change method name validateEncodingTypes ---
[GitHub] carbondata pull request #2877: [CARBONDATA-3061] Add validation for supporte...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2877#discussion_r229578078 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/encoder/Encoding.java --- @@ -57,10 +62,55 @@ public static Encoding valueOf(int ordinal) { return ADAPTIVE_DELTA_INTEGRAL; } else if (ordinal == RLE_INTEGRAL.ordinal()) { return RLE_INTEGRAL; +} else if (ordinal == DIRECT_STRING.ordinal()) { + return DIRECT_STRING; +} else if (ordinal == ADAPTIVE_FLOATING.ordinal()) { + return ADAPTIVE_FLOATING; +} else if (ordinal == BOOL_BYTE.ordinal()) { + return BOOL_BYTE; +} else if (ordinal == ADAPTIVE_DELTA_FLOATING.ordinal()) { + return ADAPTIVE_DELTA_FLOATING; } else if (ordinal == DIRECT_COMPRESS_VARCHAR.ordinal()) { return DIRECT_COMPRESS_VARCHAR; } else { throw new RuntimeException("create Encoding with invalid ordinal: " + ordinal); } } + + /** + * Method to validate for supported encoding types that can be read using the current version + * + * @param encodings + * @return + */ + public static boolean assertTrueForEncodingTypes( + List encodings) { +if (null == encodings || encodings.isEmpty()) { + return true; +} +boolean supportedEncoding = true; +for (org.apache.carbondata.format.Encoding encoder : encodings) { + try { +// if case is handle unsupported encoding type. An encoding not supported for read will be +// added as null by thrift during deserialization +if (null == encoder) { + supportedEncoding = false; +} else { + // if given encoding name is not supported exception will be thrown. This case can come + // if there is any change done in the ordinal of supported encoding + Encoding.valueOf(encoder.name()); +} + } catch (IllegalArgumentException ex) { +supportedEncoding = false; --- End diff -- Log the exception to identify which encoding name failed. ---
[GitHub] carbondata issue #2837: [CARBONDATA-3000] Provide C++ interface for writing ...
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/2837 LGTM ---
[GitHub] carbondata issue #2792: [CARBONDATA-2981] Support read primitive data type i...
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/2792 LGTM apart from those nits in guide & ut. ---
[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2792#discussion_r227839509 --- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java --- @@ -1522,4 +1522,204 @@ public boolean accept(File dir, String name) { e.printStackTrace(); } } + + @Test + public void testReadNextRowWithRowUtil() { +String path = "./carbondata"; +try { + FileUtils.deleteDirectory(new File(path)); + + Field[] fields = new Field[12]; + fields[0] = new Field("stringField", DataTypes.STRING); + fields[1] = new Field("shortField", DataTypes.SHORT); + fields[2] = new Field("intField", DataTypes.INT); + fields[3] = new Field("longField", DataTypes.LONG); + fields[4] = new Field("doubleField", DataTypes.DOUBLE); + fields[5] = new Field("boolField", DataTypes.BOOLEAN); + fields[6] = new Field("dateField", DataTypes.DATE); + fields[7] = new Field("timeField", DataTypes.TIMESTAMP); + fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 2)); + fields[9] = new Field("varcharField", DataTypes.VARCHAR); + fields[10] = new Field("arrayField", DataTypes.createArrayType(DataTypes.STRING)); + fields[11] = new Field("floatField", DataTypes.FLOAT); + Map map = new HashMap<>(); + map.put("complex_delimiter_level_1", "#"); + CarbonWriter writer = CarbonWriter.builder() + .outputPath(path) + .withLoadOptions(map) + .withCsvInput(new Schema(fields)).build(); + + for (int i = 0; i < 10; i++) { +String[] row2 = new String[]{ +"robot" + (i % 10), +String.valueOf(i % 1), +String.valueOf(i), +String.valueOf(Long.MAX_VALUE - i), +String.valueOf((double) i / 2), +String.valueOf(true), +"2019-03-02", +"2019-02-12 03:03:34", +"12.345", +"varchar", +"Hello#World#From#Carbon", +"1.23" +}; +writer.write(row2); + } + writer.close(); + + File[] dataFiles = new File(path).listFiles(new FilenameFilter() { +@Override +public boolean accept(File dir, String name) { + if (name == null) { +return false; + } + return name.endsWith("carbonindex"); +} + }); + if (dataFiles == null || dataFiles.length < 1) { +throw new RuntimeException("Carbon index file not exists."); + } + Schema schema = CarbonSchemaReader + .readSchemaInIndexFile(dataFiles[0].getAbsolutePath()) + .asOriginOrder(); + // Transform the schema + int count = 0; + for (int i = 0; i < schema.getFields().length; i++) { +if (!((schema.getFields())[i].getFieldName().contains("."))) { + count++; +} + } + String[] strings = new String[count]; + int index = 0; + for (int i = 0; i < schema.getFields().length; i++) { +if (!((schema.getFields())[i].getFieldName().contains("."))) { + strings[index] = (schema.getFields())[i].getFieldName(); + index++; +} + } + // Read data + CarbonReader reader = CarbonReader + .builder(path, "_temp") + .projection(strings) + .build(); + + int i = 0; + while (reader.hasNext()) { +Object[] data = (Object[]) reader.readNextRow(); + +assert (RowUtil.getString(data, 0).equals("robot" + i)); +assertEquals(RowUtil.getShort(data, 1), i); +assertEquals(RowUtil.getInt(data, 2), i); +assertEquals(RowUtil.getLong(data, 3), Long.MAX_VALUE - i); +assertEquals(RowUtil.getDouble(data, 4), ((double) i) / 2); +assert (RowUtil.getBoolean(data, 5)); +assertEquals(RowUtil.getInt(data, 6), 17957); +assert (RowUtil.getDecimal(data, 8).equals("12.35")); +assert (RowUtil.getVarchar(data, 9).equals("varchar")); + +Object[] arr = RowUtil.getArray(data, 10); +assert (arr[0].equals("Hello")); +assert (arr[1].equals("World")); +assert (arr[2].equals("From")); +
[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2792#discussion_r227839131 --- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java --- @@ -1522,4 +1522,204 @@ public boolean accept(File dir, String name) { e.printStackTrace(); } } + + @Test + public void testReadNextRowWithRowUtil() { +String path = "./carbondata"; +try { + FileUtils.deleteDirectory(new File(path)); + + Field[] fields = new Field[12]; + fields[0] = new Field("stringField", DataTypes.STRING); + fields[1] = new Field("shortField", DataTypes.SHORT); + fields[2] = new Field("intField", DataTypes.INT); + fields[3] = new Field("longField", DataTypes.LONG); + fields[4] = new Field("doubleField", DataTypes.DOUBLE); + fields[5] = new Field("boolField", DataTypes.BOOLEAN); + fields[6] = new Field("dateField", DataTypes.DATE); + fields[7] = new Field("timeField", DataTypes.TIMESTAMP); + fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 2)); + fields[9] = new Field("varcharField", DataTypes.VARCHAR); + fields[10] = new Field("arrayField", DataTypes.createArrayType(DataTypes.STRING)); + fields[11] = new Field("floatField", DataTypes.FLOAT); + Map map = new HashMap<>(); + map.put("complex_delimiter_level_1", "#"); + CarbonWriter writer = CarbonWriter.builder() + .outputPath(path) + .withLoadOptions(map) + .withCsvInput(new Schema(fields)).build(); + + for (int i = 0; i < 10; i++) { +String[] row2 = new String[]{ +"robot" + (i % 10), +String.valueOf(i % 1), +String.valueOf(i), +String.valueOf(Long.MAX_VALUE - i), +String.valueOf((double) i / 2), +String.valueOf(true), +"2019-03-02", +"2019-02-12 03:03:34", +"12.345", +"varchar", +"Hello#World#From#Carbon", +"1.23" +}; +writer.write(row2); + } + writer.close(); + + File[] dataFiles = new File(path).listFiles(new FilenameFilter() { +@Override +public boolean accept(File dir, String name) { + if (name == null) { +return false; + } + return name.endsWith("carbonindex"); +} + }); + if (dataFiles == null || dataFiles.length < 1) { +throw new RuntimeException("Carbon index file not exists."); + } + Schema schema = CarbonSchemaReader + .readSchemaInIndexFile(dataFiles[0].getAbsolutePath()) + .asOriginOrder(); + // Transform the schema + int count = 0; + for (int i = 0; i < schema.getFields().length; i++) { +if (!((schema.getFields())[i].getFieldName().contains("."))) { + count++; +} + } + String[] strings = new String[count]; + int index = 0; + for (int i = 0; i < schema.getFields().length; i++) { +if (!((schema.getFields())[i].getFieldName().contains("."))) { + strings[index] = (schema.getFields())[i].getFieldName(); + index++; +} + } + // Read data + CarbonReader reader = CarbonReader + .builder(path, "_temp") + .projection(strings) + .build(); + + int i = 0; + while (reader.hasNext()) { +Object[] data = (Object[]) reader.readNextRow(); + +assert (RowUtil.getString(data, 0).equals("robot" + i)); +assertEquals(RowUtil.getShort(data, 1), i); +assertEquals(RowUtil.getInt(data, 2), i); +assertEquals(RowUtil.getLong(data, 3), Long.MAX_VALUE - i); +assertEquals(RowUtil.getDouble(data, 4), ((double) i) / 2); +assert (RowUtil.getBoolean(data, 5)); +assertEquals(RowUtil.getInt(data, 6), 17957); +assert (RowUtil.getDecimal(data, 8).equals("12.35")); +assert (RowUtil.getVarchar(data, 9).equals("varchar")); + +Object[] arr = RowUtil.getArray(data, 10); +assert (arr[0].equals("Hello")); +assert (arr[1].equals("World")); +assert (arr[2].equals("From")); +assert (arr[3].equals("Carbon")); + +assertEquals(RowUtil.getFloat(data, 11), (float) 1.23); +i++; + } + reader.close(); +} catch (Throwable e) { + e.printStackTrace(); --- End diff -- I think we should make test fail for any exception. Should not ignore ---
[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2792#discussion_r227836644 --- Diff: docs/csdk-guide.md --- @@ -106,20 +128,41 @@ bool readFromS3(JNIEnv *env, char *argv[]) { // "your endPoint" args[2] = argv[3]; -reader.builder(env, "s3a://sdk/WriterOutput", "test"); -reader.withHadoopConf(3, args); +reader.builder(env, "s3a://sdk/WriterOutput/carbondata/", "test"); +reader.withHadoopConf("fs.s3a.access.key", argv[1]); +reader.withHadoopConf("fs.s3a.secret.key", argv[2]); +reader.withHadoopConf("fs.s3a.endpoint", argv[3]); reader.build(); printf("\nRead data from S3:\n"); +CarbonRow carbonRow(env); while (reader.hasNext()) { -jobjectArray row = reader.readNextRow(); -jsize length = env->GetArrayLength(row); - +jobject row = reader.readNextCarbonRow(); --- End diff -- readNextCarbonRow is removed from this PR ---
[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2792#discussion_r227836440 --- Diff: docs/csdk-guide.md --- @@ -68,20 +68,42 @@ JNIEnv *initJVM() { bool readFromLocalWithoutProjection(JNIEnv *env) { CarbonReader carbonReaderClass; -carbonReaderClass.builder(env, "../resources/carbondata", "test"); +carbonReaderClass.builder(env, "../resources/carbondata"); carbonReaderClass.build(); +printf("\nRead data from local without projection:\n"); + +CarbonRow carbonRow(env); while (carbonReaderClass.hasNext()) { -jobjectArray row = carbonReaderClass.readNextRow(); -jsize length = env->GetArrayLength(row); +jobject row = carbonReaderClass.readNextCarbonRow(); --- End diff -- Need to correct readNextCarbonRow is no more present ---
[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2792#discussion_r227741794 --- Diff: store/CSDK/CarbonRow.cpp --- @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include +#include "CarbonRow.h" + +CarbonRow::CarbonRow(JNIEnv *env) { +this->rowUtilClass = env->FindClass("org/apache/carbondata/sdk/file/RowUtil"); +this->jniEnv = env; +} + +void CarbonRow::setCarbonRow(jobject data) { +this->carbonRow = data; +} + +short CarbonRow::getShort(int ordinal) { +jmethodID buildID = jniEnv->GetStaticMethodID(rowUtilClass, "getShort", +"([Ljava/lang/Object;I)S"); +jvalue args[2]; +args[0].l = carbonRow; +args[1].i = ordinal; +return jniEnv->CallStaticShortMethodA(rowUtilClass, buildID, args); +} + +int CarbonRow::getInt(int ordinal) { +jmethodID buildID = jniEnv->GetStaticMethodID(rowUtilClass, "getInt", +"([Ljava/lang/Object;I)I"); +jvalue args[2]; +args[0].l = carbonRow; +args[1].i = ordinal; +return jniEnv->CallStaticIntMethodA(rowUtilClass, buildID, args); +} + +long CarbonRow::getLong(int ordinal) { +jmethodID buildID = jniEnv->GetStaticMethodID(rowUtilClass, "getLong", +"([Ljava/lang/Object;I)J"); +jvalue args[2]; +args[0].l = carbonRow; +args[1].i = ordinal; +return jniEnv->CallStaticLongMethodA(rowUtilClass, buildID, args); +} + +double CarbonRow::getDouble(int ordinal) { +jmethodID buildID = jniEnv->GetStaticMethodID(rowUtilClass, "getDouble", +"([Ljava/lang/Object;I)D"); +jvalue args[2]; +args[0].l = carbonRow; +args[1].i = ordinal; +return jniEnv->CallStaticDoubleMethodA(rowUtilClass, buildID, args); +} + + +float CarbonRow::getFloat(int ordinal) { +jmethodID buildID = jniEnv->GetStaticMethodID(rowUtilClass, "getFloat", +"([Ljava/lang/Object;I)F"); +jvalue args[2]; +args[0].l = carbonRow; +args[1].i = ordinal; +return jniEnv->CallStaticFloatMethodA(rowUtilClass, buildID, args); +} + +jboolean CarbonRow::getBoolean(int ordinal) { +jmethodID buildID = jniEnv->GetStaticMethodID(rowUtilClass, "getBoolean", +"([Ljava/lang/Object;I)Z"); +jvalue args[2]; +args[0].l = carbonRow; +args[1].i = ordinal; +return jniEnv->CallStaticBooleanMethodA(rowUtilClass, buildID, args); +} + +char *CarbonRow::getString(int ordinal) { +jmethodID buildID = jniEnv->GetStaticMethodID(rowUtilClass, "getString", +"([Ljava/lang/Object;I)Ljava/lang/String;"); +jvalue args[2]; +args[0].l = carbonRow; +args[1].i = ordinal; +jobject data = jniEnv->CallStaticObjectMethodA(rowUtilClass, buildID, args); + +char *str = (char *) jniEnv->GetStringUTFChars((jstring) data, JNI_FALSE); +return str; +} + +char *CarbonRow::getDecimal(int ordinal) { +jmethodID buildID = jniEnv->GetStaticMethodID(rowUtilClass, "getDecimal", --- End diff -- jmethodID buildID = jniEnv->GetStaticMethodID(rowUtilClass, "getDecimal", "([Ljava/lang/Object;I)Ljava/lang/String;"); jvalue args[2]; Accessing the static method and initializing the array is done for every row reading. Create once and reuse may improve performance. Can you please try? ---
[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2792#discussion_r222691292 --- Diff: store/CSDK/CarbonReader.cpp --- @@ -89,10 +89,18 @@ jboolean CarbonReader::hasNext() { return hasNext; } +jobject CarbonReader::readNextCarbonRow() { +jclass carbonReader = jniEnv->GetObjectClass(carbonReaderObject); +jmethodID readNextCarbonRowID = jniEnv->GetMethodID(carbonReader, "readNextCarbonRow", +"()Lorg/apache/carbondata/core/datastore/row/CarbonRow;"); +jobject carbonRow = (jobject) jniEnv->CallObjectMethod(carbonReaderObject, readNextCarbonRowID); +return carbonRow; +} + jobjectArray CarbonReader::readNextRow() { jclass carbonReader = jniEnv->GetObjectClass(carbonReaderObject); -jmethodID readNextRow2ID = jniEnv->GetMethodID(carbonReader, "readNextStringRow", "()[Ljava/lang/Object;"); -jobjectArray row = (jobjectArray) jniEnv->CallObjectMethod(carbonReaderObject, readNextRow2ID); +jmethodID readNextStringRowID = jniEnv->GetMethodID(carbonReader, "readNextStringRow", "()[Ljava/lang/Object;"); --- End diff -- We can remove "readNextStringRow" and add a utility method in JNI to achieve the same. ---
[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2792#discussion_r222691023 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/row/CarbonRow.java --- @@ -18,8 +18,11 @@ package org.apache.carbondata.core.datastore.row; import java.io.Serializable; +import java.math.BigDecimal; --- End diff -- CarbonRow has different fields like data, rawData, rangeID etc. It seems not intended for end user API. I think we can add a simple Row class for SDK scope. ---
[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2792#discussion_r222690123 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/readsupport/impl/DictionaryDecodeReadSupport.java --- @@ -81,7 +82,24 @@ data[i] = dictionaries[i].getDictionaryValueForKey((int) data[i]); } } -return (T)data; +return (T) data; + } + + /** + * get carbonRow, including data and datatpes + * + * @param data row data + * @return CarbonRow Object + */ + public T readCarbonRow(Object[] data) { --- End diff -- Instead of changing the DictionaryDecodeReadSupport & other classes hierarchy, I suggest to use a new Row class as utility and just provide required methods to avoid impact on base code. ---
[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2792#discussion_r222689040 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonRecordReader.java --- @@ -116,6 +117,25 @@ public void initialize(InputSplit inputSplit, TaskAttemptContext context) return readSupport.readRow(carbonIterator.next()); } + /** + * get CarbonRow data, including data and datatypes + * + * @return carbonRow object or data array or T + * @throws IOException + * @throws InterruptedException + */ + public T getCarbonRow() throws IOException, InterruptedException { --- End diff -- I think instead of confusing T, we can define the return type as CarbonRow itself ---
[GitHub] carbondata pull request #2792: [CARBONDATA-2981] Support read primitive data...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2792#discussion_r222686566 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/row/CarbonRow.java --- @@ -57,6 +74,154 @@ public String getString(int ordinal) { return (String) data[ordinal]; } + /** + * get short type data by ordinal + * + * @param ordinal the data index of carbonRow + * @return + */ + public short getShort(int ordinal) { +return (short) data[ordinal]; + } + + /** + * get int data type data by ordinal + * + * @param ordinal the data index of carbonRow + * @return + */ + public int getInt(int ordinal) { +return (Integer) data[ordinal]; + } + + /** + * get long data type data by ordinal + * + * @param ordinal the data index of carbonRow + * @return + */ + public long getLong(int ordinal) { +return (long) data[ordinal]; + } + + /** + * get array data type data by ordinal + * + * @param ordinal the data index of carbonRow + * @return + */ + public Object[] getArray(int ordinal) { +return (Object[]) data[ordinal]; + } + + /** + * get double data type data by ordinal + * + * @param ordinal the data index of carbonRow + * @return + */ + public double getDouble(int ordinal) { +return (double) data[ordinal]; + } + + /** + * get boolean data type data by ordinal + * + * @param ordinal the data index of carbonRow + * @return + */ + public boolean getBoolean(int ordinal) { +return (boolean) data[ordinal]; + } + + /** + * get byte data type data by ordinal + * + * @param ordinal the data index of carbonRow + * @return + */ + public Byte getByte(int ordinal) { +return (Byte) data[ordinal]; + } + + /** + * get float data type data by ordinal + * + * @param ordinal the data index of carbonRow + * @return + */ + public float getFloat(int ordinal) { +return (float) data[ordinal]; + } + + /** + * get varchar data type data by ordinal + * This is for CSDK + * JNI don't support varchar, so carbon convert decimal to string + * + * @param ordinal the data index of carbonRow + * @return + */ + public String getVarchar(int ordinal) { +return (String) data[ordinal]; + } + + /** + * get decimal data type data by ordinal + * This is for CSDK + * JNI don't support Decimal, so carbon convert decimal to string + * + * @param ordinal the data index of carbonRow + * @return + */ + public String getDecimal(int ordinal) { +return ((BigDecimal) data[ordinal]).toString(); + } + + /** + * get data type by ordinal + * + * @param ordinal the data index of carbonRow + * @return + */ + public DataType getDataType(int ordinal) { +return dataTypes[ordinal]; + } + + /** + * get data type name by ordinal + * + * @param ordinal the data index of carbonRow + * @return + */ + public String getDataTypeName(int ordinal) { +return dataTypes[ordinal].getName(); + } + + /** + * get element type name by ordinal + * child schema data type name + * for example: return STRING if it's Array in java + * + * @param ordinal the data index of carbonRow + * @return element type name + */ + public String getElementTypeName(int ordinal) { --- End diff -- If this method can work only for Array, we can rename it to getArrayElementTypeName and throw exception if its not array type. return null cause integration errors for unsupported ata types ---
[GitHub] carbondata pull request #2791: [HOTFIX]correct the exception handling in loo...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2791#discussion_r222594726 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala --- @@ -208,7 +209,10 @@ class CarbonFileMetastore extends CarbonMetaStore { try { lookupRelation(tableIdentifier)(sparkSession) } catch { - case _: Exception => + case ex: Exception => +if (ex.getCause.isInstanceOf[HiveException]) { + throw ex +} --- End diff -- There are exceptions from different hierarchy incase of table not found, not all are HiveException. Ex:- NoSuchTableException extends AnalysisException, InvalidTableException extends HiveException. So, I think we can refer the exceptions from Spark Code related to table not exists, not found and return false for only those exception cases. All other exceptions we can throw back to caller. ---
[GitHub] carbondata pull request #2794: [CARBONDATA-2985]Fix issues in Table level co...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2794#discussion_r222330949 --- Diff: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java --- @@ -744,6 +744,7 @@ private static long getSizeOfSegment(String tablePath, String segId) { if (size >= 2) { level1Size = noOfSegmentLevelsCount[0]; level2Size = noOfSegmentLevelsCount[1]; + level2Size = level2Size == 1 ? 0 : level2Size; --- End diff -- Please add a comment why to handle this ---
[GitHub] carbondata pull request #2791: [HOTFIX]correct the exception handling in loo...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2791#discussion_r52487 --- Diff: integration/spark2/src/main/spark2.1/org/apache/spark/sql/hive/CarbonSessionState.scala --- @@ -141,7 +141,13 @@ class CarbonHiveSessionCatalog( */ override def lookupRelation(name: TableIdentifier, alias: Option[String]): LogicalPlan = { -val rtnRelation = super.lookupRelation(name, alias) +val rtnRelation = + try { +super.lookupRelation(name, alias) + } catch { +case ex: Exception => --- End diff -- This change is not required. Catch and throwing same exception. ---
[GitHub] carbondata issue #2780: [CARBONDATA-2982] CarbonSchemaReader support array
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/2780 LGTM ---
[GitHub] carbondata pull request #2780: [CARBONDATA-2982] CarbonSchemaReader support ...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2780#discussion_r221246385 --- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java --- @@ -1435,5 +1435,99 @@ public void testReadWithFilterOfnonTransactionalwithsubfolders() throws IOExcept FileUtils.deleteDirectory(new File("./testWriteFiles")); } + @Test + public void testReadSchemaFromDataFileArrayString() { +String path = "./testWriteFiles"; +try { + FileUtils.deleteDirectory(new File(path)); + + Field[] fields = new Field[11]; + fields[0] = new Field("stringField", DataTypes.STRING); + fields[1] = new Field("shortField", DataTypes.SHORT); + fields[2] = new Field("intField", DataTypes.INT); + fields[3] = new Field("longField", DataTypes.LONG); + fields[4] = new Field("doubleField", DataTypes.DOUBLE); + fields[5] = new Field("boolField", DataTypes.BOOLEAN); + fields[6] = new Field("dateField", DataTypes.DATE); + fields[7] = new Field("timeField", DataTypes.TIMESTAMP); + fields[8] = new Field("decimalField", DataTypes.createDecimalType(8, 2)); + fields[9] = new Field("varcharField", DataTypes.VARCHAR); + fields[10] = new Field("arrayField", DataTypes.createArrayType(DataTypes.STRING)); + Map map = new HashMap<>(); + map.put("complex_delimiter_level_1", "#"); + CarbonWriter writer = CarbonWriter.builder() + .outputPath(path) + .withLoadOptions(map) + .withCsvInput(new Schema(fields)).build(); + + for (int i = 0; i < 10; i++) { +String[] row2 = new String[]{ +"robot" + (i % 10), +String.valueOf(i % 1), +String.valueOf(i), +String.valueOf(Long.MAX_VALUE - i), +String.valueOf((double) i / 2), +String.valueOf(true), +"2019-03-02", +"2019-02-12 03:03:34", +"12.345", +"varchar", +"Hello#World#From#Carbon" +}; +writer.write(row2); + } + writer.close(); + File[] dataFiles = new File(path).listFiles(new FilenameFilter() { +@Override +public boolean accept(File dir, String name) { + if (name == null) { +return false; + } + return name.endsWith("carbondata"); +} + }); + if (dataFiles == null || dataFiles.length < 1) { +throw new RuntimeException("Carbon index file not exists."); + } + Schema schema = CarbonSchemaReader + .readSchemaInDataFile(dataFiles[0].getAbsolutePath()) + .asOriginOrder(); + // Transform the schema + String[] strings = new String[schema.getFields().length]; + for (int i = 0; i < schema.getFields().length; i++) { +strings[i] = (schema.getFields())[i].getFieldName(); + } + + // Read data + CarbonReader reader = CarbonReader + .builder(path, "_temp") + .projection(strings) + .build(); + + System.out.println("\nData:"); + long day = 24L * 3600 * 1000; + int i = 0; + while (reader.hasNext()) { +Object[] row = (Object[]) reader.readNextRow(); + System.out.println(String.format("%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t", +i, row[0], row[1], row[2], row[3], row[4], row[5], +new Date((day * ((int) row[6]))), new Timestamp((long) row[7] / 1000), +row[8], row[9] +)); +Object[] arr = (Object[]) row[10]; +for (int j = 0; j < arr.length; j++) { + System.out.print(arr[j] + " "); +} +System.out.println(); +i++; + } + System.out.println("\nFinished"); --- End diff -- Please remove Sys out content ---
[GitHub] carbondata pull request #2738: [CARBONDATA-2952] Provide c++ interface for S...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2738#discussion_r221218207 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java --- @@ -90,6 +91,33 @@ public T readNextRow() throws IOException, InterruptedException { return currentReader.getCurrentValue(); } + /** + * Read and return next string row object + */ + public Object[] readNextStringRow() throws IOException, InterruptedException { --- End diff -- OK.. Please add this limitation as only single dimension Array is supported in method signature and remove after enhancing with CarbonRow based API in other PR ---
[GitHub] carbondata pull request #2738: [CARBONDATA-2952] Provide c++ interface for S...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2738#discussion_r221149564 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java --- @@ -90,6 +91,33 @@ public T readNextRow() throws IOException, InterruptedException { return currentReader.getCurrentValue(); } + /** + * Read and return next string row object + */ + public Object[] readNextStringRow() throws IOException, InterruptedException { --- End diff -- readNextStringRow is not a standard API I think. It can not work with nested complex data types. Like Array[Array[...]], Struct[Array]. I suggest to add a better API like CarbonRow wrapping on object[] and give reader utility methods like getInt, getString in CarbonRow . If it can't be finished in this PR scope, temporarily you can move this string making logic to JNI layer code so that it can be improved later. **Adding in java SDK layer will add more confusion to users** ---
[GitHub] carbondata pull request #2738: [CARBONDATA-2952] Provide c++ interface for S...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2738#discussion_r220802829 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java --- @@ -101,6 +101,20 @@ public CarbonReaderBuilder withHadoopConf(Configuration conf) { return this; } + + public CarbonReaderBuilder withHadoopConf(String[] args) { +Configuration configuration = new Configuration(); --- End diff -- My comment is the same, we should not replace entire conf object every time. Change this method as withHadoopConf(String key, String value) and update property to a class level hadoopConfiguraiton variable. ---
[GitHub] carbondata pull request #2738: [CARBONDATA-2952] Provide c++ interface for S...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2738#discussion_r220802212 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java --- @@ -101,6 +101,20 @@ public CarbonReaderBuilder withHadoopConf(Configuration conf) { return this; } + + public CarbonReaderBuilder withHadoopConf(String[] args) { --- End diff -- Yes, we can add the withHadoopConf(String key, String value) in java API also. ---
[GitHub] carbondata pull request #2738: [CARBONDATA-2952] Provide c++ interface for S...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2738#discussion_r220566235 --- Diff: store/CSDK/CarbonReader.cpp --- @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include "CarbonReader.h" +#include + +jobject CarbonReader::builder(JNIEnv *env, char *path, char *tableName) { + +jniEnv = env; +jclass carbonReaderClass = env->FindClass("org/apache/carbondata/sdk/file/CarbonReader"); +jmethodID carbonReaderBuilderID = env->GetStaticMethodID(carbonReaderClass, "builder", + "(Ljava/lang/String;Ljava/lang/String;)Lorg/apache/carbondata/sdk/file/CarbonReaderBuilder;"); +jstring jpath = env->NewStringUTF(path); +jstring jtableName = env->NewStringUTF(tableName); +jvalue args[2]; +args[0].l = jpath; +args[1].l = jtableName; +carbonReaderBuilderObject = env->CallStaticObjectMethodA(carbonReaderClass, carbonReaderBuilderID, args); +return carbonReaderBuilderObject; +} + +jobject CarbonReader::projection(int argc, char *argv[]) { +jclass carbonReaderBuilderClass = jniEnv->GetObjectClass(carbonReaderBuilderObject); +jmethodID buildID = jniEnv->GetMethodID(carbonReaderBuilderClass, "projection", + "([Ljava/lang/String;)Lorg/apache/carbondata/sdk/file/CarbonReaderBuilder;"); +jclass objectArrayClass = jniEnv->FindClass("Ljava/lang/String;"); +jobjectArray array = jniEnv->NewObjectArray(argc, objectArrayClass, NULL); +for (int i = 0; i < argc; ++i) { +jstring value = jniEnv->NewStringUTF(argv[i]); +jniEnv->SetObjectArrayElement(array, i, value); +} + +jvalue args[1]; +args[0].l = array; +carbonReaderBuilderObject = jniEnv->CallObjectMethodA(carbonReaderBuilderObject, buildID, args); +return carbonReaderBuilderObject; +} + +jobject CarbonReader::withHadoopConf(int argc, char *argv[]) { --- End diff -- As suggested in another comment, let's add simple API to accept Key & value. ---
[GitHub] carbondata pull request #2738: [CARBONDATA-2952] Provide c++ interface for S...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2738#discussion_r220565596 --- Diff: store/sdk/pom.xml --- @@ -39,6 +39,16 @@ hadoop-aws ${hadoop.version} + + org.apache.httpcomponents + httpclient + 4.2 + + + org.apache.hadoop + hadoop-common --- End diff -- carbondata-hadoop already depends on hadoop. Why to add explicitly here? ---
[GitHub] carbondata pull request #2738: [CARBONDATA-2952] Provide c++ interface for S...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2738#discussion_r220562265 --- Diff: store/sdk/pom.xml --- @@ -39,6 +39,16 @@ hadoop-aws ${hadoop.version} + + org.apache.httpcomponents + httpclient --- End diff -- httpclient version number dependent may be different in different hadoop versions. So, it should be downloaded as part of carbon-parent pom. Can you verify & avoid this ? ---
[GitHub] carbondata pull request #2738: [CARBONDATA-2952] Provide c++ interface for S...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2738#discussion_r220560348 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java --- @@ -90,6 +91,33 @@ public T readNextRow() throws IOException, InterruptedException { return currentReader.getCurrentValue(); } + /** + * Read and return next string row object + */ + public Object[] readNextStringRow() throws IOException, InterruptedException { --- End diff -- Object conversion to String is not suggested. The representation for the values may change for objects like double, date. So, please use adapter in JNI layer for each of the java data types to c data type ---
[GitHub] carbondata pull request #2738: [CARBONDATA-2952] Provide c++ interface for S...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2738#discussion_r220559298 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java --- @@ -101,6 +101,20 @@ public CarbonReaderBuilder withHadoopConf(Configuration conf) { return this; } + + public CarbonReaderBuilder withHadoopConf(String[] args) { +Configuration configuration = new Configuration(); --- End diff -- We should not create every time a new configuration. Should create once for first time and then update the value for all further calls to this method ---
[GitHub] carbondata pull request #2738: [CARBONDATA-2952] Provide c++ interface for S...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2738#discussion_r220558655 --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java --- @@ -101,6 +101,20 @@ public CarbonReaderBuilder withHadoopConf(Configuration conf) { return this; } + + public CarbonReaderBuilder withHadoopConf(String[] args) { --- End diff -- I think this API is not clear. Instead we can support a method like withHadoopConf(String key, String value) then we don't have to parse in SDK java layer. C JNI layer can be invoked for every property user wants to overwrite ---
[GitHub] carbondata issue #2697: [HOTFIX] support "carbon.load.directWriteHdfs.enable...
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/2697 retest this please ---
[GitHub] carbondata pull request #2715: [CARBONDATA-2930] Support customize column co...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2715#discussion_r217682375 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/CompressorFactory.java --- @@ -62,15 +67,54 @@ public Compressor getCompressor() { } private CompressorFactory() { -for (SupportedCompressor supportedCompressor : SupportedCompressor.values()) { - compressors.put(supportedCompressor.getName(), supportedCompressor); +for (NativeSupportedCompressor nativeSupportedCompressor : NativeSupportedCompressor.values()) { + allSupportedCompressors.put(nativeSupportedCompressor.getName(), + nativeSupportedCompressor.getCompressor()); } } public static CompressorFactory getInstance() { return COMPRESSOR_FACTORY; } + /** + * register the compressor using reflection. + * If the class name of the compressor has already been registered before, it will return false; + * If the reflection fails to work or the compressor name has problem, it will throw + * RunTimeException; If it is registered successfully, it will return true. + * + * @param compressorClassName full class name of the compressor + * @return true if register successfully, false if failed. + */ + private Compressor registerColumnCompressor(String compressorClassName) { +if (allSupportedCompressors.containsKey(compressorClassName)) { + return allSupportedCompressors.get(compressorClassName); +} + +Class clazz; +try { + clazz = Class.forName(compressorClassName); + Object instance = clazz.newInstance(); + if (instance instanceof Compressor) { +if (!((Compressor) instance).getName().equals(compressorClassName)) { + throw new RuntimeException(String.format("For not carbondata native supported compressor," + + " the result of method getName() should be the full class name. Expected '%s'," + + " found '%s'", compressorClassName, ((Compressor) instance).getName())); +} +allSupportedCompressors.put(compressorClassName, (Compressor) instance); --- End diff -- Please add a info log for new compression registered. ---
[GitHub] carbondata issue #2719: [HOTFIX] changed AbstractDFSFileSystem to use existi...
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/2719 LGTM ---
[GitHub] carbondata issue #2704: [HOTFIX] Old stores cannot read with new table infer...
Github user KanakaKumar commented on the issue: https://github.com/apache/carbondata/pull/2704 LGTM ---
[GitHub] carbondata pull request #2704: [HOTFIX] Old stores cannot read with new tabl...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2704#discussion_r217021955 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/util/RestructureUtil.java --- @@ -165,14 +165,15 @@ private static boolean isColumnMatches(boolean isTransactionalTable, // column ID but can have same column name if (tableColumn.getDataType().isComplexType() && !(tableColumn.getDataType().getId() == DataTypes.ARRAY_TYPE_ID)) { - if (tableColumn.getColumnId().equals(queryColumn.getColumnId())) { + if (tableColumn.getColumnId().equalsIgnoreCase(queryColumn.getColumnId())) { return true; } else { return isColumnMatchesStruct(tableColumn, queryColumn); } } else { - return (tableColumn.getColumnId().equals(queryColumn.getColumnId()) || (!isTransactionalTable - && tableColumn.getColName().equals(queryColumn.getColName(; + return (tableColumn.getColumnId().equalsIgnoreCase(queryColumn.getColumnId()) || ( --- End diff -- OK ---
[GitHub] carbondata pull request #2704: [HOTFIX] Old stores cannot read with new tabl...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2704#discussion_r216917965 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/util/RestructureUtil.java --- @@ -165,14 +165,15 @@ private static boolean isColumnMatches(boolean isTransactionalTable, // column ID but can have same column name if (tableColumn.getDataType().isComplexType() && !(tableColumn.getDataType().getId() == DataTypes.ARRAY_TYPE_ID)) { - if (tableColumn.getColumnId().equals(queryColumn.getColumnId())) { + if (tableColumn.getColumnId().equalsIgnoreCase(queryColumn.getColumnId())) { return true; } else { return isColumnMatchesStruct(tableColumn, queryColumn); } } else { - return (tableColumn.getColumnId().equals(queryColumn.getColumnId()) || (!isTransactionalTable - && tableColumn.getColName().equals(queryColumn.getColName(; + return (tableColumn.getColumnId().equalsIgnoreCase(queryColumn.getColumnId()) || ( --- End diff -- I think we should have more strict comparison in case of restructure. If column with same name is added with different name with different data type, will be problem. Let's use data type also to compare. ---
[GitHub] carbondata pull request #2704: [HOTFIX] Old stores cannot read with new tabl...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2704#discussion_r216918150 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/util/RestructureUtil.java --- @@ -165,14 +165,15 @@ private static boolean isColumnMatches(boolean isTransactionalTable, // column ID but can have same column name if (tableColumn.getDataType().isComplexType() && !(tableColumn.getDataType().getId() == DataTypes.ARRAY_TYPE_ID)) { - if (tableColumn.getColumnId().equals(queryColumn.getColumnId())) { + if (tableColumn.getColumnId().equalsIgnoreCase(queryColumn.getColumnId())) { --- End diff -- Same comment give below applicable here also. ---