[GitHub] [carbondata] nihal0107 removed a comment on pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.
nihal0107 removed a comment on pull request #3905: URL: https://github.com/apache/carbondata/pull/3905#issuecomment-684215300 We are getting this exception in case of more than 0.1 million record. We can't load that amount of data for the test case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.
nihal0107 commented on pull request #3905: URL: https://github.com/apache/carbondata/pull/3905#issuecomment-684216370 > @nihal0107 , can a test case be added for your fix? We are getting this exception in case of more than 0.1 million record. We can't load that amount of data for the test case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.
nihal0107 commented on pull request #3905: URL: https://github.com/apache/carbondata/pull/3905#issuecomment-684215300 We are getting this exception in case of more than 0.1 million record. We can't load that amount of data for the test case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3906: [CARBONDATA-3968]Added test cases for hive read complex types and handled other issues
CarbonDataQA1 commented on pull request #3906: URL: https://github.com/apache/carbondata/pull/3906#issuecomment-684059041 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3938/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3906: [CARBONDATA-3968]Added test cases for hive read complex types and handled other issues
CarbonDataQA1 commented on pull request #3906: URL: https://github.com/apache/carbondata/pull/3906#issuecomment-684057932 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2198/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (CARBONDATA-3968) Hive read complex types issues
[ https://issues.apache.org/jira/browse/CARBONDATA-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshay updated CARBONDATA-3968: --- Description: # Issues in reading array/map/struct of byte, varchar and decimal types. # Map of primitive type with only one row inserted has issues. was: # Issues in reading of byte, varchar and decimal types. # Map of primitive type with only one row inserted has issues. > Hive read complex types issues > -- > > Key: CARBONDATA-3968 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3968 > Project: CarbonData > Issue Type: Bug > Components: hive-integration >Reporter: Akshay >Priority: Major > > # Issues in reading array/map/struct of byte, varchar and decimal types. > # Map of primitive type with only one row inserted has issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.
CarbonDataQA1 commented on pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#issuecomment-684031807 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2197/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.
CarbonDataQA1 commented on pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#issuecomment-684022546 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3937/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-684015193 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3936/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-684012482 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2195/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3968) Hive read complex types issues
Akshay created CARBONDATA-3968: -- Summary: Hive read complex types issues Key: CARBONDATA-3968 URL: https://issues.apache.org/jira/browse/CARBONDATA-3968 Project: CarbonData Issue Type: Bug Components: hive-integration Reporter: Akshay # Issues in reading of byte, varchar and decimal types. # Map of primitive type with only one row inserted has issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] akkio-97 commented on a change in pull request #3906: Added test cases for hive read complex types and handled other issues
akkio-97 commented on a change in pull request #3906: URL: https://github.com/apache/carbondata/pull/3906#discussion_r480358731 ## File path: integration/hive/src/test/java/org/apache/carbondata/hive/HiveTestUtils.java ## @@ -65,7 +74,12 @@ public boolean checkAnswer(ResultSet actual, ResultSet expected) throws SQLExcep Assert.assertTrue(numOfColumnsExpected > 0); Assert.assertEquals(actual.getMetaData().getColumnCount(), numOfColumnsExpected); for (int i = 1; i <= numOfColumnsExpected; i++) { -Assert.assertEquals(actual.getString(i), actual.getString(i)); +if (actual.getString(i).contains(":")) { + Assert.assertTrue(checkMapPairsIgnoringOrder(actual.getString(i), expected.getString(i))); +} else { + Assert.assertEquals(actual.getString(i), expected.getString(i)); +} +// System.out.println(actual.getString(i)); Review comment: done ## File path: integration/hive/src/main/java/org/apache/carbondata/hive/util/DataTypeUtil.java ## @@ -64,13 +69,23 @@ public static DataType convertHiveTypeToCarbon(String type) throws SQLException return DataTypes.createArrayType(convertHiveTypeToCarbon(subType)); } else if (type.startsWith("map<")) { String[] subType = (type.substring(type.indexOf("<") + 1, type.indexOf(">"))).split(","); + for (int i = 0; i < subType.length; i++) { +if (subType[i].startsWith("decimal")) { + subType[i] += ',' + subType[++i]; Review comment: done ## File path: integration/hive/src/main/java/org/apache/carbondata/hive/util/DataTypeUtil.java ## @@ -64,13 +69,23 @@ public static DataType convertHiveTypeToCarbon(String type) throws SQLException return DataTypes.createArrayType(convertHiveTypeToCarbon(subType)); } else if (type.startsWith("map<")) { String[] subType = (type.substring(type.indexOf("<") + 1, type.indexOf(">"))).split(","); + for (int i = 0; i < subType.length; i++) { +if (subType[i].startsWith("decimal")) { + subType[i] += ',' + subType[++i]; + subType = (String[]) ArrayUtils.removeElement(subType, subType[i]); +} + } return DataTypes .createMapType(convertHiveTypeToCarbon(subType[0]), convertHiveTypeToCarbon(subType[1])); } else if (type.startsWith("struct<")) { String[] subTypes = (type.substring(type.indexOf("<") + 1, type.indexOf(">"))).split(","); List structFieldList = new ArrayList<>(); - for (String subType : subTypes) { + for (int i = 0; i < subTypes.length; i++) { +String subType = subTypes[i]; +if (subType.startsWith("decimal")) { + subType += ',' + subTypes[++i]; Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning
CarbonDataQA1 commented on pull request #3908: URL: https://github.com/apache/carbondata/pull/3908#issuecomment-683969864 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3933/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3909: [WIP] Date/timestamp compatability between hive and carbon
CarbonDataQA1 commented on pull request #3909: URL: https://github.com/apache/carbondata/pull/3909#issuecomment-683969333 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2194/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3909: [WIP] Date/timestamp compatability between hive and carbon
CarbonDataQA1 commented on pull request #3909: URL: https://github.com/apache/carbondata/pull/3909#issuecomment-683967436 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3935/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning
CarbonDataQA1 commented on pull request #3908: URL: https://github.com/apache/carbondata/pull/3908#issuecomment-683966337 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2193/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
CarbonDataQA1 commented on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-683964722 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2192/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
CarbonDataQA1 commented on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-683964482 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3934/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan-c980 commented on pull request #3876: TestingCI
Karan-c980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-683945124 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
kunal642 commented on a change in pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#discussion_r480289433 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala ## @@ -58,6 +58,16 @@ object NodeType extends Enumeration { */ class CarbonSecondaryIndexOptimizer(sparkSession: SparkSession) { + // to store the sort node per query + var sortNodeForPushDown: Sort = _ + + // to store the limit literal per query + var limitLiteral : Literal = _ + + // by default do not push down notNull filter, + // but for orderby limit push down, push down notNull filter also. Else we get wrong results. + var pushDownNotNullFilter : Boolean = _ Review comment: Why not keep these as local variables in transformFilterToJoin and pass to rewritePlanForSecondaryIndex()? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
kunal642 commented on a change in pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#discussion_r480286695 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala ## @@ -824,6 +904,57 @@ class CarbonSecondaryIndexOptimizer(sparkSession: SparkSession) { } } + private def checkIfPushDownOrderByLimitAndNotNullFilter(literal: Literal, sort: Sort, + filter: Filter): Unit = { +// 1. check all the filter columns present in SI +val originalFilterAttributes = filter.condition collect { + case attr: AttributeReference => +attr.name.toLowerCase +} +val filterAttributes = filter.condition collect { + case attr: AttributeReference => attr.name.toLowerCase +} +val indexTableRelation = MatchIndexableRelation.unapply(filter.child).get +val matchingIndexTables = CarbonCostBasedOptimizer.identifyRequiredTables( + filterAttributes.toSet.asJava, + CarbonIndexUtil.getSecondaryIndexes(indexTableRelation).mapValues(_.toList.asJava).asJava) + .asScala +val databaseName = filter.child.asInstanceOf[LogicalRelation].relation + .asInstanceOf[CarbonDatasourceHadoopRelation].carbonRelation.databaseName +// filter out all the index tables which are disabled +val enabledMatchingIndexTables = matchingIndexTables + .filter(table => { +sparkSession.sessionState.catalog + .getTableMetadata(TableIdentifier(table, +Some(databaseName))).storage + .properties + .getOrElse("isSITableEnabled", "true").equalsIgnoreCase("true") + }) +// 2. check if only one SI matches for the filter columns +if (enabledMatchingIndexTables.nonEmpty && enabledMatchingIndexTables.size == 1 && +filterAttributes.intersect(originalFilterAttributes).size == +originalFilterAttributes.size) { + // 3. check if all the sort columns is in SI + val sortColumns = sort +.order +.map(_.child.asInstanceOf[AttributeReference].name.toLowerCase()) +.toSet + val indexCarbonTable = CarbonEnv +.getCarbonTable(Some(databaseName), enabledMatchingIndexTables.head)(sparkSession) + var allColumnsFound = true Review comment: use forall to check whether all columns exists or not This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
kunal642 commented on a change in pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#discussion_r480285525 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala ## @@ -824,6 +904,57 @@ class CarbonSecondaryIndexOptimizer(sparkSession: SparkSession) { } } + private def checkIfPushDownOrderByLimitAndNotNullFilter(literal: Literal, sort: Sort, + filter: Filter): Unit = { +// 1. check all the filter columns present in SI +val originalFilterAttributes = filter.condition collect { + case attr: AttributeReference => +attr.name.toLowerCase +} +val filterAttributes = filter.condition collect { + case attr: AttributeReference => attr.name.toLowerCase +} +val indexTableRelation = MatchIndexableRelation.unapply(filter.child).get +val matchingIndexTables = CarbonCostBasedOptimizer.identifyRequiredTables( + filterAttributes.toSet.asJava, + CarbonIndexUtil.getSecondaryIndexes(indexTableRelation).mapValues(_.toList.asJava).asJava) + .asScala +val databaseName = filter.child.asInstanceOf[LogicalRelation].relation + .asInstanceOf[CarbonDatasourceHadoopRelation].carbonRelation.databaseName +// filter out all the index tables which are disabled +val enabledMatchingIndexTables = matchingIndexTables + .filter(table => { +sparkSession.sessionState.catalog + .getTableMetadata(TableIdentifier(table, +Some(databaseName))).storage + .properties + .getOrElse("isSITableEnabled", "true").equalsIgnoreCase("true") + }) +// 2. check if only one SI matches for the filter columns +if (enabledMatchingIndexTables.nonEmpty && enabledMatchingIndexTables.size == 1 && +filterAttributes.intersect(originalFilterAttributes).size == +originalFilterAttributes.size) { + // 3. check if all the sort columns is in SI + val sortColumns = sort +.order +.map(_.child.asInstanceOf[AttributeReference].name.toLowerCase()) +.toSet + val indexCarbonTable = CarbonEnv +.getCarbonTable(Some(databaseName), enabledMatchingIndexTables.head)(sparkSession) Review comment: use indexTableRelation.carbonTable to get indexCarbonTable This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
kunal642 commented on a change in pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#discussion_r480276719 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala ## @@ -824,6 +904,57 @@ class CarbonSecondaryIndexOptimizer(sparkSession: SparkSession) { } } + private def checkIfPushDownOrderByLimitAndNotNullFilter(literal: Literal, sort: Sort, + filter: Filter): Unit = { +// 1. check all the filter columns present in SI +val originalFilterAttributes = filter.condition collect { + case attr: AttributeReference => +attr.name.toLowerCase +} +val filterAttributes = filter.condition collect { + case attr: AttributeReference => attr.name.toLowerCase +} +val indexTableRelation = MatchIndexableRelation.unapply(filter.child).get +val matchingIndexTables = CarbonCostBasedOptimizer.identifyRequiredTables( + filterAttributes.toSet.asJava, + CarbonIndexUtil.getSecondaryIndexes(indexTableRelation).mapValues(_.toList.asJava).asJava) + .asScala +val databaseName = filter.child.asInstanceOf[LogicalRelation].relation Review comment: why not use `indexTableRelation.carbonRelation.databaseName` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
kunal642 commented on a change in pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#discussion_r480273615 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala ## @@ -824,6 +904,57 @@ class CarbonSecondaryIndexOptimizer(sparkSession: SparkSession) { } } + private def checkIfPushDownOrderByLimitAndNotNullFilter(literal: Literal, sort: Sort, + filter: Filter): Unit = { +// 1. check all the filter columns present in SI +val originalFilterAttributes = filter.condition collect { + case attr: AttributeReference => +attr.name.toLowerCase +} +val filterAttributes = filter.condition collect { Review comment: is filterAttributes same as originalFilterAttributes?? code looks to be same This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG opened a new pull request #3909: [WIP] Date/timestamp compatability between hive and carbon
ShreelekhyaG opened a new pull request #3909: URL: https://github.com/apache/carbondata/pull/3909 ### Why is this PR needed? To ensure the date/timestamp that is supported by hive also to be supported by carbon. Ex: -01-01 is accepted by hive as a valid record and converted to 0001-01-01. ### What changes were proposed in this PR? Changed the min value of date which is used for validation. When setlenient flag is set to true, carbon can convert and support year. ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (CARBONDATA-3961) Reorder filter according to the column storage ordinal to improve reading
[ https://issues.apache.org/jira/browse/CARBONDATA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor updated CARBONDATA-3961: - Issue Type: Improvement (was: Bug) > Reorder filter according to the column storage ordinal to improve reading > - > > Key: CARBONDATA-3961 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3961 > Project: CarbonData > Issue Type: Improvement >Reporter: Kunal Kapoor >Assignee: Kunal Kapoor >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] kunal642 opened a new pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning
kunal642 opened a new pull request #3908: URL: https://github.com/apache/carbondata/pull/3908 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3967) Cache partitions to improve partition pruning performance
Kunal Kapoor created CARBONDATA-3967: Summary: Cache partitions to improve partition pruning performance Key: CARBONDATA-3967 URL: https://issues.apache.org/jira/browse/CARBONDATA-3967 Project: CarbonData Issue Type: Improvement Reporter: Kunal Kapoor Assignee: Kunal Kapoor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] ajantha-bhat commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
ajantha-bhat commented on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-683903456 > @ajantha-bhat , can test cases be added to check no filter pushdown in not equal to case? @vikramahuja1001 : Done added. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat removed a comment on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
ajantha-bhat removed a comment on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-683873006 > @ajantha-bhat , can test cases be added to check no filter pushdown in not equal to case? There was already some testcase related to notNull pushdown was failing in `TestNIQueryWithIndex`, I will check and add notEquals anyways. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3906: Added test cases for hive read complex types and handled other issues
vikramahuja1001 commented on a change in pull request #3906: URL: https://github.com/apache/carbondata/pull/3906#discussion_r480233177 ## File path: integration/hive/src/main/java/org/apache/carbondata/hive/util/DataTypeUtil.java ## @@ -64,13 +69,23 @@ public static DataType convertHiveTypeToCarbon(String type) throws SQLException return DataTypes.createArrayType(convertHiveTypeToCarbon(subType)); } else if (type.startsWith("map<")) { String[] subType = (type.substring(type.indexOf("<") + 1, type.indexOf(">"))).split(","); + for (int i = 0; i < subType.length; i++) { +if (subType[i].startsWith("decimal")) { + subType[i] += ',' + subType[++i]; Review comment: Use CarbonCommonConstants.COMMA instead of ',' ## File path: integration/hive/src/main/java/org/apache/carbondata/hive/util/DataTypeUtil.java ## @@ -64,13 +69,23 @@ public static DataType convertHiveTypeToCarbon(String type) throws SQLException return DataTypes.createArrayType(convertHiveTypeToCarbon(subType)); } else if (type.startsWith("map<")) { String[] subType = (type.substring(type.indexOf("<") + 1, type.indexOf(">"))).split(","); + for (int i = 0; i < subType.length; i++) { +if (subType[i].startsWith("decimal")) { + subType[i] += ',' + subType[++i]; + subType = (String[]) ArrayUtils.removeElement(subType, subType[i]); +} + } return DataTypes .createMapType(convertHiveTypeToCarbon(subType[0]), convertHiveTypeToCarbon(subType[1])); } else if (type.startsWith("struct<")) { String[] subTypes = (type.substring(type.indexOf("<") + 1, type.indexOf(">"))).split(","); List structFieldList = new ArrayList<>(); - for (String subType : subTypes) { + for (int i = 0; i < subTypes.length; i++) { +String subType = subTypes[i]; +if (subType.startsWith("decimal")) { + subType += ',' + subTypes[++i]; Review comment: Use CarbonCommonConstants.COMMA This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3906: Added test cases for hive read complex types and handled other issues
vikramahuja1001 commented on a change in pull request #3906: URL: https://github.com/apache/carbondata/pull/3906#discussion_r480230761 ## File path: integration/hive/src/test/java/org/apache/carbondata/hive/HiveTestUtils.java ## @@ -65,7 +74,12 @@ public boolean checkAnswer(ResultSet actual, ResultSet expected) throws SQLExcep Assert.assertTrue(numOfColumnsExpected > 0); Assert.assertEquals(actual.getMetaData().getColumnCount(), numOfColumnsExpected); for (int i = 1; i <= numOfColumnsExpected; i++) { -Assert.assertEquals(actual.getString(i), actual.getString(i)); +if (actual.getString(i).contains(":")) { + Assert.assertTrue(checkMapPairsIgnoringOrder(actual.getString(i), expected.getString(i))); +} else { + Assert.assertEquals(actual.getString(i), expected.getString(i)); +} +// System.out.println(actual.getString(i)); Review comment: Remove this comment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on pull request #3906: Added test cases for hive read complex types and handled other issues
vikramahuja1001 commented on pull request #3906: URL: https://github.com/apache/carbondata/pull/3906#issuecomment-683873583 Add jira ID This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
ajantha-bhat commented on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-683873006 > @ajantha-bhat , can test cases be added to check no filter pushdown in not equal to case? There was already some testcase related to notNull pushdown was failing in `TestNIQueryWithIndex`, I will check and add notEquals anyways. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on pull request #3873: [CARBONDATA-3956] Reindex command on SI table
vikramahuja1001 commented on pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#issuecomment-683866796 @akashrn5 , @kunal642 please check This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.
vikramahuja1001 commented on pull request #3905: URL: https://github.com/apache/carbondata/pull/3905#issuecomment-683869471 @nihal0107 , can a test case be added for your fix? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 closed pull request #3895: [WIP]SI fix for not equal to filter
vikramahuja1001 closed pull request #3895: URL: https://github.com/apache/carbondata/pull/3895 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
vikramahuja1001 commented on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-683871061 @ajantha-bhat , can test cases be added to check no filter pushdown in not equal to case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on pull request #3895: [WIP]SI fix for not equal to filter
vikramahuja1001 commented on pull request #3895: URL: https://github.com/apache/carbondata/pull/3895#issuecomment-683868709 @ajantha-bhat ,i checked that PR, maybe you can add the test cases for not equal and check SI pushdown This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3890: [CARBONDATA-3952] After reset query not hitting MV
CarbonDataQA1 commented on pull request #3890: URL: https://github.com/apache/carbondata/pull/3890#issuecomment-683841817 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2191/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3890: [CARBONDATA-3952] After reset query not hitting MV
CarbonDataQA1 commented on pull request #3890: URL: https://github.com/apache/carbondata/pull/3890#issuecomment-683840584 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3932/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
CarbonDataQA1 commented on pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#issuecomment-683731944 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3931/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
CarbonDataQA1 commented on pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#issuecomment-683728958 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2190/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal
kunal642 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-683729131 @akashrn5 @kumarvishal09 @QiangCai @ravipesala @ajantha-bhat Please review This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3873: [CARBONDATA-3956] Reindex command on SI table
CarbonDataQA1 commented on pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#issuecomment-683685077 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3930/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r480021706 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CSVCarbonWriter.java ## @@ -72,6 +90,72 @@ public void write(Object object) throws IOException { } } + public static CsvParser buildCsvParser(Configuration conf) { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r480021578 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -594,6 +608,227 @@ public CarbonWriterBuilder withJsonInput(Schema carbonSchema) { return this; } + /** + * to build a {@link CarbonWriter}, which accepts loading CSV files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withCsvInput(); +this.dataFiles = this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts CSV files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the CSV file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withCsvPath(filePath); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading JSON files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withJsonPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withJsonInput(); +this.dataFiles = this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts JSON file directory and + * list of file which has to be loaded. + * + * @param filePath directory where the json file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + * @throws IOException + */ + public CarbonWriterBuilder withJsonPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withJsonPath(filePath); +return this; + } + + private void validateFilePath(String filePath) { +if (StringUtils.isEmpty(filePath)) { + throw new IllegalArgumentException("filePath can not be empty"); +} + } + + /** + * to build a {@link CarbonWriter}, which accepts loading Parquet files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withParquetPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.writerType = WRITER_TYPE.PARQUET; +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.PARQUET_FILE_EXT); +org.apache.avro.Schema parquetSchema = ParquetCarbonWriter +.extractParquetSchema(dataFiles[0], this.hadoopConf); +this.dataFiles = dataFiles; +this.avroSchema = parquetSchema; +this.schema = AvroCarbonWriter.getCarbonSchemaFromAvroSchema(this.avroSchema); +return this; + } + + private void setIsDirectory(String filePath) { +if (this.hadoopConf == null) { + this.hadoopConf = new Configuration(FileFactory.getConfiguration()); +} +CarbonFile carbonFile = FileFactory.getCarbonFile(filePath, hadoopConf); +this.isDirectory = carbonFile.isDirectory(); + } + + /** + * to build a {@link CarbonWriter}, which accepts parquet files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the parquet file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + * @throws IOException + */ + public CarbonWriterBuilder withParquetPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withParquetPath(filePath); +return this; + } + + private CarbonFile[] extractDataFiles(String suf) { +List dataFiles; +if (this.isDirectory) { + if (CollectionUtils.isEmpty(this.fileList)) { +dataFiles = SDKUtil.extractFilesFromFolder(this.filePath, suf, this.hadoopConf); + } else { +dataFiles = this.appendFileListWithPath(); + } +} else { + dataFiles = new ArrayList<>(); + dataFiles.add(FileFactory.getCarbonFile(this.filePath, this.hadoopConf)); +} +if (CollectionUtils.isEmpty(dataFiles)) { + throw new RuntimeException("Data files can't be empty."); +} +return dataFiles.toArray(new CarbonFile[0]); + } + + /** + * to build a {@link CarbonWriter}, which accepts loading ORC files. + * + * @param
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3907: [CARBONDATA-3966]Fix NullPointerException issue in case of reliability testing of load and compaction
CarbonDataQA1 commented on pull request #3907: URL: https://github.com/apache/carbondata/pull/3907#issuecomment-683683411 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2188/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r480021451 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -660,13 +895,39 @@ public CarbonWriter build() throws IOException, InvalidLoadOptionException { // removed from the load. LoadWithoutConverter flag is going to point to the Loader Builder // which will skip Conversion Step. loadModel.setLoadWithoutConverterStep(true); - return new AvroCarbonWriter(loadModel, hadoopConf, this.avroSchema); + AvroCarbonWriter avroCarbonWriter = new AvroCarbonWriter(loadModel, Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3873: [CARBONDATA-3956] Reindex command on SI table
CarbonDataQA1 commented on pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#issuecomment-683681865 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2189/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3907: [CARBONDATA-3966]Fix NullPointerException issue in case of reliability testing of load and compaction
CarbonDataQA1 commented on pull request #3907: URL: https://github.com/apache/carbondata/pull/3907#issuecomment-683675465 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3929/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-683663848 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3928/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-683661702 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2187/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
CarbonDataQA1 commented on pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#issuecomment-683645409 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2186/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
CarbonDataQA1 commented on pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#issuecomment-683645642 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3927/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479973113 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -660,13 +895,39 @@ public CarbonWriter build() throws IOException, InvalidLoadOptionException { // removed from the load. LoadWithoutConverter flag is going to point to the Loader Builder // which will skip Conversion Step. loadModel.setLoadWithoutConverterStep(true); - return new AvroCarbonWriter(loadModel, hadoopConf, this.avroSchema); + AvroCarbonWriter avroCarbonWriter = new AvroCarbonWriter(loadModel, Review comment: We have some code duplications for each type of writer. Suggest to refactor it. Something like this - ```suggestion CarbonWriter carbonWriter; if (this.writerType == WRITER_TYPE.AVRO) { // AVRO records are pushed to Carbon as Object not as Strings. This was done in order to // handle multi level complex type support. As there are no conversion converter step is // removed from the load. LoadWithoutConverter flag is going to point to the Loader Builder // which will skip Conversion Step. loadModel.setLoadWithoutConverterStep(true); carbonWriter = new AvroCarbonWriter(loadModel, hadoopConf, this.avroSchema); } else if (this.writerType == WRITER_TYPE.JSON) { loadModel.setJsonFileLoad(true); carbonWriter = new JsonCarbonWriter(loadModel, hadoopConf); } else if (this.writerType == WRITER_TYPE.PARQUET) { loadModel.setLoadWithoutConverterStep(true); carbonWriter = new ParquetCarbonWriter(loadModel, hadoopConf, this.avroSchema); } else if (this.writerType == WRITER_TYPE.ORC) { carbonWriter = new ORCCarbonWriter(loadModel, hadoopConf); } else { // CSV CSVCarbonWriter csvCarbonWriter = new CSVCarbonWriter(loadModel, hadoopConf); if (!this.options.containsKey(CarbonCommonConstants.FILE_HEADER)) { csvCarbonWriter.setSkipHeader(true); } carbonWriter = csvCarbonWriter; } if (!StringUtils.isEmpty(filePath)) { carbonWriter.validateAndSetDataFiles(this.dataFiles); } return carbonWriter; ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479962896 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -594,6 +608,227 @@ public CarbonWriterBuilder withJsonInput(Schema carbonSchema) { return this; } + /** + * to build a {@link CarbonWriter}, which accepts loading CSV files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withCsvInput(); +this.dataFiles = this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts CSV files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the CSV file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withCsvPath(filePath); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading JSON files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withJsonPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withJsonInput(); +this.dataFiles = this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts JSON file directory and + * list of file which has to be loaded. + * + * @param filePath directory where the json file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + * @throws IOException + */ + public CarbonWriterBuilder withJsonPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withJsonPath(filePath); +return this; + } + + private void validateFilePath(String filePath) { +if (StringUtils.isEmpty(filePath)) { + throw new IllegalArgumentException("filePath can not be empty"); +} + } + + /** + * to build a {@link CarbonWriter}, which accepts loading Parquet files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withParquetPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.writerType = WRITER_TYPE.PARQUET; +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.PARQUET_FILE_EXT); +org.apache.avro.Schema parquetSchema = ParquetCarbonWriter +.extractParquetSchema(dataFiles[0], this.hadoopConf); +this.dataFiles = dataFiles; +this.avroSchema = parquetSchema; +this.schema = AvroCarbonWriter.getCarbonSchemaFromAvroSchema(this.avroSchema); +return this; + } + + private void setIsDirectory(String filePath) { +if (this.hadoopConf == null) { + this.hadoopConf = new Configuration(FileFactory.getConfiguration()); +} +CarbonFile carbonFile = FileFactory.getCarbonFile(filePath, hadoopConf); +this.isDirectory = carbonFile.isDirectory(); + } + + /** + * to build a {@link CarbonWriter}, which accepts parquet files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the parquet file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + * @throws IOException + */ + public CarbonWriterBuilder withParquetPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withParquetPath(filePath); +return this; + } + + private CarbonFile[] extractDataFiles(String suf) { +List dataFiles; +if (this.isDirectory) { + if (CollectionUtils.isEmpty(this.fileList)) { +dataFiles = SDKUtil.extractFilesFromFolder(this.filePath, suf, this.hadoopConf); + } else { +dataFiles = this.appendFileListWithPath(); + } +} else { + dataFiles = new ArrayList<>(); + dataFiles.add(FileFactory.getCarbonFile(this.filePath, this.hadoopConf)); +} +if (CollectionUtils.isEmpty(dataFiles)) { + throw new RuntimeException("Data files can't be empty."); +} +return dataFiles.toArray(new CarbonFile[0]); + } + + /** + * to build a {@link CarbonWriter}, which accepts loading ORC files. + * + * @param
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479961626 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CSVCarbonWriter.java ## @@ -72,6 +90,72 @@ public void write(Object object) throws IOException { } } + public static CsvParser buildCsvParser(Configuration conf) { Review comment: can it be private and non-static method? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479961626 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CSVCarbonWriter.java ## @@ -72,6 +90,72 @@ public void write(Object object) throws IOException { } } + public static CsvParser buildCsvParser(Configuration conf) { Review comment: It can be private and non-static method? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
vikramahuja1001 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r479960111 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/IndexRepairCommand.scala ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.index + +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.execution.command.DataCommand +import org.apache.spark.sql.hive.CarbonRelation +import org.apache.spark.sql.index.CarbonIndexUtil + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.metadata.index.IndexType +import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, SegmentStatusManager} +import org.apache.carbondata.core.util.path.CarbonTablePath +import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, CarbonLoadModel} + +/** + * Show indexes on the table + */ +case class IndexRepairCommand(indexname: Option[String], tableNameOp: TableIdentifier, + dbName: String, + segments: Option[List[String]]) extends DataCommand{ + + private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName) + + def processData(sparkSession: SparkSession): Seq[Row] = { +if (dbName == null) { + // table level and index level + val databaseName = if (tableNameOp.database.isEmpty) { +SparkSession.getActiveSession.get.catalog.currentDatabase + } else { +tableNameOp.database.get.toString + } + triggerRepair(tableNameOp.table, databaseName, indexname.isEmpty, indexname, segments) +} else { + // for all tables in the db +sparkSession.sessionState.catalog.listTables(dbName).foreach { + tableIdent => +triggerRepair(tableIdent.table, dbName, indexname.isEmpty, indexname, segments) +} +} +Seq.empty + } + + def triggerRepair(tableNameOp: String, databaseName: String, allIndex: Boolean, +indexName: Option[String], segments: Option[List[String]]): Unit = { +val sparkSession = SparkSession.getActiveSession.get +// when Si creation and load to main table are parallel, get the carbonTable from the +// metastore which will have the latest index Info +val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore +val carbonTable = metaStore + .lookupRelation(Some(databaseName), tableNameOp)(sparkSession) + .asInstanceOf[CarbonRelation].carbonTable + +val carbonLoadModel = new CarbonLoadModel +carbonLoadModel.setDatabaseName(databaseName) +carbonLoadModel.setTableName(tableNameOp) +carbonLoadModel.setTablePath(carbonTable.getTablePath) +val tableStatusFilePath = CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath) +carbonLoadModel.setLoadMetadataDetails(SegmentStatusManager Review comment: added This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
vikramahuja1001 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r479959497 ## File path: docs/index/secondary-index-guide.md ## @@ -188,4 +188,25 @@ where we have old stores. Syntax ``` REGISTER INDEX TABLE index_name ON [TABLE] [db_name.]table_name - ``` \ No newline at end of file + ``` + +### Reindex Command +This command is used to reload segments in the SI table in case when there is some mismatch in the number +of segments with main table. + +Syntax + +Reindex on all the secondary Indexes on the main table + ``` + REINDEX ON TABLE [db_name.]main_table_name [WHERE SEGMENT.ID IN(0,1)] + ``` +Reindex on index table level Review comment: done ## File path: docs/index/secondary-index-guide.md ## @@ -188,4 +188,25 @@ where we have old stores. Syntax ``` REGISTER INDEX TABLE index_name ON [TABLE] [db_name.]table_name - ``` \ No newline at end of file + ``` + +### Reindex Command +This command is used to reload segments in the SI table in case when there is some mismatch in the number +of segments with main table. + +Syntax + +Reindex on all the secondary Indexes on the main table Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
vikramahuja1001 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r479959112 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/IndexRepairCommand.scala ## @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.index + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.execution.command.DataCommand +import org.apache.spark.sql.hive.CarbonRelation +import org.apache.spark.sql.index.CarbonIndexUtil + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.metadata.index.IndexType +import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, SegmentStatusManager} +import org.apache.carbondata.core.util.path.CarbonTablePath +import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, CarbonLoadModel} + +/** + * Repair logic for reindex command on maintable/indextable + */ +case class IndexRepairCommand(indexnameOp: Option[String], tableIdentifier: TableIdentifier, Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 opened a new pull request #3907: [CARBONDATA-3966]Fix nullPointerException issue in case of reliability testing of load and compaction
akashrn5 opened a new pull request #3907: URL: https://github.com/apache/carbondata/pull/3907 ### Why is this PR needed? During the carbondata reliability and concurrency test of load, compaction and query, Some times nullPointerException is thrown. This is because, in `TableSegmentRefresher ` we get the last modified timestamp of the segment file, to decide to refresh the cache, in case of concurrency, it can happen that the segment file get deleted or during update, file may not be there, that time getLastModified time throws null pointer. ### What changes were proposed in this PR? Before get the last modified time, always check for the file exists, as it can be deleted during that time due to concurrency, if not present, initialize to zero. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No(verified with concurrency of 1000s of segments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3966) NullPointerException is thrown in case of reliability testing of load, compaction and query
Akash R Nilugal created CARBONDATA-3966: --- Summary: NullPointerException is thrown in case of reliability testing of load, compaction and query Key: CARBONDATA-3966 URL: https://issues.apache.org/jira/browse/CARBONDATA-3966 Project: CarbonData Issue Type: Bug Reporter: Akash R Nilugal Assignee: Akash R Nilugal Sometimes NullPointerException is thrown in case of reliability testing of load, compaction and query -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI
Karan980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-683604932 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479933175 ## File path: sdk/sdk/src/test/java/org/apache/carbondata/sdk/file/ParquetCarbonWriterTest.java ## @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.sdk.file; + +import java.io.File; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.Objects; + +import org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException; +import org.apache.commons.io.FileUtils; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +/** + * Test suite for {@link ParquetCarbonWriter} + */ +public class ParquetCarbonWriterTest { + String DATA_PATH = "./src/test/resources/file/"; + String outputPath = "./testWriteFiles"; + + @Before + @After + public void cleanTestData() { +try { + FileUtils.deleteDirectory(new File(outputPath)); +} catch (Exception e) { + e.printStackTrace(); + Assert.fail(e.getMessage()); Review comment: Removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479933103 ## File path: sdk/sdk/src/test/java/org/apache/carbondata/sdk/file/ORCCarbonWriterTest.java ## @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.sdk.file; + +import java.io.File; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.Objects; + +import org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException; +import org.apache.commons.io.FileUtils; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +/** + * Test suite for {@link ORCCarbonWriter} + */ +public class ORCCarbonWriterTest { + + String DATA_PATH = "./src/test/resources/file/"; + String outputPath = "./testloadORCFiles"; + + @Before + @After + public void cleanTestData() { +try { + FileUtils.deleteDirectory(new File(outputPath)); +} catch (Exception e) { + e.printStackTrace(); + Assert.fail(e.getMessage()); Review comment: Removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479932734 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/utils/SDKUtil.java ## @@ -79,4 +98,75 @@ public static ArrayList listFiles(String sourceImageFolder, return (Object[]) input[i]; } + public static List extractFilesFromFolder(String path, + String suf, Configuration hadoopConf) { +List dataFiles = listFiles(path, suf, hadoopConf); +List carbonFiles = new ArrayList<>(); +for (Object dataFile: dataFiles) { + carbonFiles.add(FileFactory.getCarbonFile(dataFile.toString(), hadoopConf)); +} +if (CollectionUtils.isEmpty(dataFiles)) { + throw new RuntimeException("No file found at given location. Please provide" + + "the correct folder location."); +} +return carbonFiles; + } + + public static DataFileStream buildAvroReader(CarbonFile carbonFile, Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479932585 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ORCCarbonWriter.java ## @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.sdk.file; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.sdk.file.utils.SDKUtil; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.ql.io.orc.OrcStruct; +import org.apache.hadoop.hive.ql.io.orc.Reader; +import org.apache.hadoop.hive.ql.io.orc.RecordReader; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; +import org.apache.hadoop.io.Text; + + +/** + * Implementation to write ORC rows in CSV format to carbondata file. + */ +public class ORCCarbonWriter extends CSVCarbonWriter { Review comment: Removed composition. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479933031 ## File path: sdk/sdk/src/test/java/org/apache/carbondata/sdk/file/AvroCarbonWriterTest.java ## @@ -603,4 +605,161 @@ public void testWriteBasicForFloat() throws IOException { } } + @Test + public void testAvroFileLoadWithPrimitiveSchema() throws IOException { Review comment: Removed this test case and also refactored all the testcases for all the files. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479932500 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ParquetCarbonWriter.java ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.sdk.file; + +import java.io.IOException; +import java.util.Arrays; +import java.util.Comparator; + +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.sdk.file.utils.SDKUtil; + +import org.apache.avro.generic.GenericRecord; +import org.apache.hadoop.conf.Configuration; +import org.apache.parquet.hadoop.ParquetReader; + +/** + * Implementation to write parquet rows in avro format to carbondata file. + */ +public class ParquetCarbonWriter extends AvroCarbonWriter { Review comment: Removed composition. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479932276 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema carbonSchema) { return this; } + private void validateCsvFiles() throws IOException { +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION); +if (CollectionUtils.isEmpty(Arrays.asList(dataFiles))) { + throw new RuntimeException("CSV files can't be empty."); +} +for (CarbonFile dataFile : dataFiles) { + try { +CsvParser csvParser = SDKUtil.buildCsvParser(this.hadoopConf); + csvParser.beginParsing(FileFactory.getDataInputStream(dataFile.getPath(), +-1, this.hadoopConf)); + } catch (IllegalArgumentException ex) { +if (ex.getCause() instanceof FileNotFoundException) { + throw new FileNotFoundException("File " + dataFile + + " not found to build carbon writer."); +} +throw ex; + } +} +this.dataFiles = dataFiles; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading CSV files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withCsvInput(); +this.validateCsvFiles(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts CSV files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the CSV file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withCsvPath(filePath); +return this; + } + + private void validateJsonFiles() throws IOException { +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION); +for (CarbonFile dataFile : dataFiles) { + try { +new JSONParser().parse(SDKUtil.buildJsonReader(dataFile, this.hadoopConf)); + } catch (FileNotFoundException ex) { +throw new FileNotFoundException("File " + dataFile + " not found to build carbon writer."); + } catch (ParseException ex) { +throw new RuntimeException("File " + dataFile + " is not in json format."); + } +} +this.dataFiles = dataFiles; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading JSON files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withJsonPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withJsonInput(); +this.validateJsonFiles(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts JSON file directory and + * list of file which has to be loaded. + * + * @param filePath directory where the json file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + * @throws IOException + */ + public CarbonWriterBuilder withJsonPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withJsonPath(filePath); +return this; + } + + private void validateFilePath(String filePath) { +if (StringUtils.isEmpty(filePath)) { + throw new IllegalArgumentException("filePath can not be empty"); +} + } + + /** + * to build a {@link CarbonWriter}, which accepts loading Parquet files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withParquetPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.writerType = WRITER_TYPE.PARQUET; +this.validateParquetFiles(); +return this; + } + + private void setIsDirectory(String filePath) { +if (this.hadoopConf == null) { + this.hadoopConf = new Configuration(FileFactory.getConfiguration()); +} +CarbonFile carbonFile = FileFactory.getCarbonFile(filePath, hadoopConf); +this.isDirectory = carbonFile.isDirectory(); + } + + /** + * to build a {@link CarbonWriter}, which accepts parquet files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the parquet file exists. + * @param fileList list of files which has to be
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479931321 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ORCCarbonWriter.java ## @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.sdk.file; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.sdk.file.utils.SDKUtil; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.ql.io.orc.OrcStruct; +import org.apache.hadoop.hive.ql.io.orc.Reader; +import org.apache.hadoop.hive.ql.io.orc.RecordReader; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; +import org.apache.hadoop.io.Text; + + +/** + * Implementation to write ORC rows in CSV format to carbondata file. + */ +public class ORCCarbonWriter extends CSVCarbonWriter { + private Configuration configuration; + private CSVCarbonWriter csvCarbonWriter = null; + private Reader orcReader = null; + private CarbonFile[] dataFiles; + + ORCCarbonWriter(CSVCarbonWriter csvCarbonWriter, Configuration configuration) { +this.csvCarbonWriter = csvCarbonWriter; +this.configuration = configuration; + } + + @Override + public void setDataFiles(CarbonFile[] dataFiles) { +this.dataFiles = dataFiles; + } + + /** + * Load ORC file in iterative way. + */ + @Override + public void write() throws IOException { +if (this.dataFiles == null || this.dataFiles.length == 0) { + throw new RuntimeException("'withOrcPath()' must be called to support loading ORC files"); +} +if (this.csvCarbonWriter == null) { + throw new RuntimeException("csv carbon writer can not be null"); +} +Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath)); +for (CarbonFile dataFile : this.dataFiles) { + this.loadSingleFile(dataFile); +} + } + + private void loadSingleFile(CarbonFile file) throws IOException { +orcReader = SDKUtil.buildOrcReader(file.getPath(), this.configuration); +ObjectInspector objectInspector = orcReader.getObjectInspector(); +RecordReader recordReader = orcReader.rows(); +if (objectInspector instanceof StructObjectInspector) { + StructObjectInspector structObjectInspector = + (StructObjectInspector) orcReader.getObjectInspector(); + while (recordReader.hasNext()) { +Object record = recordReader.next(null); // to remove duplicacy. +List valueList = structObjectInspector.getStructFieldsDataAsList(record); +for (int i = 0; i < valueList.size(); i++) { + valueList.set(i, parseOrcObject(valueList.get(i), 0)); +} +this.csvCarbonWriter.write(valueList.toArray()); + } +} else { + while (recordReader.hasNext()) { Review comment: This case will happen when ORC schema is not an instance of StructObjectInspector. Added the test case for the same now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479930818 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ORCCarbonWriter.java ## @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.sdk.file; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.sdk.file.utils.SDKUtil; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.ql.io.orc.OrcStruct; +import org.apache.hadoop.hive.ql.io.orc.Reader; +import org.apache.hadoop.hive.ql.io.orc.RecordReader; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; +import org.apache.hadoop.io.Text; + + +/** + * Implementation to write ORC rows in CSV format to carbondata file. + */ +public class ORCCarbonWriter extends CSVCarbonWriter { + private Configuration configuration; + private CSVCarbonWriter csvCarbonWriter = null; + private Reader orcReader = null; + private CarbonFile[] dataFiles; + + ORCCarbonWriter(CSVCarbonWriter csvCarbonWriter, Configuration configuration) { +this.csvCarbonWriter = csvCarbonWriter; +this.configuration = configuration; + } + + @Override + public void setDataFiles(CarbonFile[] dataFiles) { +this.dataFiles = dataFiles; + } + + /** + * Load ORC file in iterative way. + */ + @Override + public void write() throws IOException { +if (this.dataFiles == null || this.dataFiles.length == 0) { + throw new RuntimeException("'withOrcPath()' must be called to support loading ORC files"); +} +if (this.csvCarbonWriter == null) { + throw new RuntimeException("csv carbon writer can not be null"); +} +Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath)); +for (CarbonFile dataFile : this.dataFiles) { + this.loadSingleFile(dataFile); +} + } + + private void loadSingleFile(CarbonFile file) throws IOException { +orcReader = SDKUtil.buildOrcReader(file.getPath(), this.configuration); +ObjectInspector objectInspector = orcReader.getObjectInspector(); +RecordReader recordReader = orcReader.rows(); +if (objectInspector instanceof StructObjectInspector) { + StructObjectInspector structObjectInspector = + (StructObjectInspector) orcReader.getObjectInspector(); + while (recordReader.hasNext()) { +Object record = recordReader.next(null); // to remove duplicacy. Review comment: moved record above while loop and now passing the record as the previous object in the argument. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479930367 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ORCCarbonWriter.java ## @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.sdk.file; + +import java.io.IOException; +import java.util.*; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.sdk.file.utils.SDKUtil; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.ql.io.orc.OrcStruct; +import org.apache.hadoop.hive.ql.io.orc.Reader; +import org.apache.hadoop.hive.ql.io.orc.RecordReader; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; +import org.apache.hadoop.io.Text; + + +/** + * Implementation to write ORC rows in CSV format to carbondata file. + */ +public class ORCCarbonWriter extends CSVCarbonWriter { + private Configuration configuration; + private CSVCarbonWriter csvCarbonWriter = null; + private Reader orcReader = null; + private CarbonFile[] dataFiles; + + ORCCarbonWriter(CSVCarbonWriter csvCarbonWriter, Configuration configuration) { +this.csvCarbonWriter = csvCarbonWriter; +this.configuration = configuration; + } + + @Override + public void setDataFiles(CarbonFile[] dataFiles) { +this.dataFiles = dataFiles; + } + + /** + * Load ORC file in iterative way. + */ + @Override + public void write() throws IOException { +if (this.dataFiles == null || this.dataFiles.length == 0) { + throw new RuntimeException("'withOrcPath()' must be called to support loading ORC files"); +} +if (this.csvCarbonWriter == null) { Review comment: removed from here as this is never possible. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479930110 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/JsonCarbonWriter.java ## @@ -91,4 +106,44 @@ public void close() throws IOException { throw new IOException(e); } } + + private void loadSingleFile(CarbonFile file) throws IOException { +Reader reader = null; +try { + reader = SDKUtil.buildJsonReader(file, configuration); + JSONParser jsonParser = new JSONParser(); + Object jsonRecord = jsonParser.parse(reader); + if (jsonRecord instanceof JSONArray) { +JSONArray jsonArray = (JSONArray) jsonRecord; +for (Object record : jsonArray) { + this.write(record.toString()); +} + } else { +this.write(jsonRecord.toString()); + } +} catch (Exception e) { Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479929826 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema carbonSchema) { return this; } + private void validateCsvFiles() throws IOException { +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION); +if (CollectionUtils.isEmpty(Arrays.asList(dataFiles))) { + throw new RuntimeException("CSV files can't be empty."); +} +for (CarbonFile dataFile : dataFiles) { + try { +CsvParser csvParser = SDKUtil.buildCsvParser(this.hadoopConf); + csvParser.beginParsing(FileFactory.getDataInputStream(dataFile.getPath(), +-1, this.hadoopConf)); + } catch (IllegalArgumentException ex) { +if (ex.getCause() instanceof FileNotFoundException) { + throw new FileNotFoundException("File " + dataFile + + " not found to build carbon writer."); +} +throw ex; + } +} +this.dataFiles = dataFiles; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading CSV files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withCsvInput(); +this.validateCsvFiles(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts CSV files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the CSV file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withCsvPath(filePath); +return this; + } + + private void validateJsonFiles() throws IOException { +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION); +for (CarbonFile dataFile : dataFiles) { + try { +new JSONParser().parse(SDKUtil.buildJsonReader(dataFile, this.hadoopConf)); + } catch (FileNotFoundException ex) { +throw new FileNotFoundException("File " + dataFile + " not found to build carbon writer."); + } catch (ParseException ex) { +throw new RuntimeException("File " + dataFile + " is not in json format."); + } +} +this.dataFiles = dataFiles; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading JSON files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withJsonPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withJsonInput(); +this.validateJsonFiles(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts JSON file directory and + * list of file which has to be loaded. + * + * @param filePath directory where the json file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + * @throws IOException + */ + public CarbonWriterBuilder withJsonPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withJsonPath(filePath); +return this; + } + + private void validateFilePath(String filePath) { +if (StringUtils.isEmpty(filePath)) { + throw new IllegalArgumentException("filePath can not be empty"); +} + } + + /** + * to build a {@link CarbonWriter}, which accepts loading Parquet files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withParquetPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.writerType = WRITER_TYPE.PARQUET; +this.validateParquetFiles(); +return this; + } + + private void setIsDirectory(String filePath) { +if (this.hadoopConf == null) { + this.hadoopConf = new Configuration(FileFactory.getConfiguration()); +} +CarbonFile carbonFile = FileFactory.getCarbonFile(filePath, hadoopConf); +this.isDirectory = carbonFile.isDirectory(); + } + + /** + * to build a {@link CarbonWriter}, which accepts parquet files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the parquet file exists. + * @param fileList list of files which has to be
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479930022 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema carbonSchema) { return this; } + private void validateCsvFiles() throws IOException { +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION); +if (CollectionUtils.isEmpty(Arrays.asList(dataFiles))) { + throw new RuntimeException("CSV files can't be empty."); +} +for (CarbonFile dataFile : dataFiles) { + try { +CsvParser csvParser = SDKUtil.buildCsvParser(this.hadoopConf); + csvParser.beginParsing(FileFactory.getDataInputStream(dataFile.getPath(), +-1, this.hadoopConf)); + } catch (IllegalArgumentException ex) { +if (ex.getCause() instanceof FileNotFoundException) { + throw new FileNotFoundException("File " + dataFile + + " not found to build carbon writer."); +} +throw ex; + } +} +this.dataFiles = dataFiles; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading CSV files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withCsvInput(); +this.validateCsvFiles(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts CSV files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the CSV file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withCsvPath(filePath); +return this; + } + + private void validateJsonFiles() throws IOException { +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION); +for (CarbonFile dataFile : dataFiles) { + try { +new JSONParser().parse(SDKUtil.buildJsonReader(dataFile, this.hadoopConf)); + } catch (FileNotFoundException ex) { +throw new FileNotFoundException("File " + dataFile + " not found to build carbon writer."); + } catch (ParseException ex) { +throw new RuntimeException("File " + dataFile + " is not in json format."); + } +} +this.dataFiles = dataFiles; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading JSON files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withJsonPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withJsonInput(); +this.validateJsonFiles(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts JSON file directory and + * list of file which has to be loaded. + * + * @param filePath directory where the json file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + * @throws IOException + */ + public CarbonWriterBuilder withJsonPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withJsonPath(filePath); +return this; + } + + private void validateFilePath(String filePath) { +if (StringUtils.isEmpty(filePath)) { + throw new IllegalArgumentException("filePath can not be empty"); +} + } + + /** + * to build a {@link CarbonWriter}, which accepts loading Parquet files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withParquetPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.writerType = WRITER_TYPE.PARQUET; +this.validateParquetFiles(); +return this; + } + + private void setIsDirectory(String filePath) { +if (this.hadoopConf == null) { + this.hadoopConf = new Configuration(FileFactory.getConfiguration()); +} +CarbonFile carbonFile = FileFactory.getCarbonFile(filePath, hadoopConf); +this.isDirectory = carbonFile.isDirectory(); + } + + /** + * to build a {@link CarbonWriter}, which accepts parquet files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the parquet file exists. + * @param fileList list of files which has to be
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479929509 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -660,13 +1113,42 @@ public CarbonWriter build() throws IOException, InvalidLoadOptionException { // removed from the load. LoadWithoutConverter flag is going to point to the Loader Builder // which will skip Conversion Step. loadModel.setLoadWithoutConverterStep(true); - return new AvroCarbonWriter(loadModel, hadoopConf, this.avroSchema); + AvroCarbonWriter avroCarbonWriter = new AvroCarbonWriter(loadModel, + hadoopConf, this.avroSchema); + if (!StringUtils.isEmpty(filePath)) { +avroCarbonWriter.setDataFiles(this.dataFiles); + } + return avroCarbonWriter; } else if (this.writerType == WRITER_TYPE.JSON) { loadModel.setJsonFileLoad(true); - return new JsonCarbonWriter(loadModel, hadoopConf); + JsonCarbonWriter jsonCarbonWriter = new JsonCarbonWriter(loadModel, hadoopConf); + if (!StringUtils.isEmpty(filePath)) { +jsonCarbonWriter.setDataFiles(this.dataFiles); + } + return jsonCarbonWriter; +} else if (this.writerType == WRITER_TYPE.PARQUET) { + loadModel.setLoadWithoutConverterStep(true); + AvroCarbonWriter avroCarbonWriter = new AvroCarbonWriter(loadModel, + hadoopConf, this.avroSchema); + ParquetCarbonWriter parquetCarbonWriter = new + ParquetCarbonWriter(avroCarbonWriter, hadoopConf); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479929400 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema carbonSchema) { return this; } + private void validateCsvFiles() throws IOException { Review comment: Moved all the validation methods to the respective writers and getting validated them from the build() method now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.
CarbonDataQA1 commented on pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#issuecomment-683594434 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3926/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479927294 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ## @@ -823,6 +829,31 @@ public void write(Object object) throws IOException { } } + /** + * Load data of all avro files at given location iteratively. + * + * @throws IOException + */ + @Override + public void write() throws IOException { +if (this.dataFiles == null || this.dataFiles.length == 0) { + throw new RuntimeException("'withAvroPath()' must be called to support loading avro files"); +} +Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath)); +for (CarbonFile dataFile : this.dataFiles) { + this.loadSingleFile(dataFile); +} + } + + private void loadSingleFile(CarbonFile file) throws IOException { +DataFileStream avroReader = SDKUtil +.buildAvroReader(file, this.configuration); Review comment: Closed now all the stream reader. ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CSVCarbonWriter.java ## @@ -72,6 +93,36 @@ public void write(Object object) throws IOException { } } + /** + * Load data of all or selected csv files at given location iteratively. + * + * @throws IOException + */ + @Override + public void write() throws IOException { +if (this.dataFiles == null || this.dataFiles.length == 0) { + throw new RuntimeException("'withCsvPath()' must be called to support load files"); +} +this.csvParser = SDKUtil.buildCsvParser(this.configuration); +Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath)); +for (CarbonFile dataFile : this.dataFiles) { + this.loadSingleFile(dataFile); +} + } + + private void loadSingleFile(CarbonFile file) throws IOException { +this.csvParser.beginParsing(FileFactory.getDataInputStream(file.getPath(), -1, configuration)); Review comment: closed. ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -17,25 +17,19 @@ package org.apache.carbondata.sdk.file; +import java.io.FileNotFoundException; import java.io.IOException; -import java.util.ArrayList; -import java.util.Arrays; -import java.util.HashMap; -import java.util.HashSet; -import java.util.List; -import java.util.Map; -import java.util.Objects; -import java.util.Set; -import java.util.TreeMap; -import java.util.UUID; +import java.util.*; Review comment: done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.
CarbonDataQA1 commented on pull request #3834: URL: https://github.com/apache/carbondata/pull/3834#issuecomment-683593554 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2185/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479925943 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ## @@ -25,17 +25,12 @@ import java.math.BigDecimal; import java.math.BigInteger; import java.nio.ByteBuffer; -import java.util.ArrayList; -import java.util.HashMap; -import java.util.Iterator; -import java.util.List; -import java.util.Map; -import java.util.Random; -import java.util.UUID; +import java.util.*; Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479926586 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ## @@ -823,6 +829,31 @@ public void write(Object object) throws IOException { } } + /** + * Load data of all avro files at given location iteratively. + * + * @throws IOException + */ + @Override + public void write() throws IOException { +if (this.dataFiles == null || this.dataFiles.length == 0) { + throw new RuntimeException("'withAvroPath()' must be called to support loading avro files"); +} +Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath)); Review comment: Removed sort from all the files. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479925640 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema carbonSchema) { return this; } + private void validateCsvFiles() throws IOException { +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION); +if (CollectionUtils.isEmpty(Arrays.asList(dataFiles))) { + throw new RuntimeException("CSV files can't be empty."); +} +for (CarbonFile dataFile : dataFiles) { + try { +CsvParser csvParser = SDKUtil.buildCsvParser(this.hadoopConf); + csvParser.beginParsing(FileFactory.getDataInputStream(dataFile.getPath(), +-1, this.hadoopConf)); + } catch (IllegalArgumentException ex) { +if (ex.getCause() instanceof FileNotFoundException) { + throw new FileNotFoundException("File " + dataFile + + " not found to build carbon writer."); +} +throw ex; + } +} +this.dataFiles = dataFiles; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading CSV files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withCsvInput(); +this.validateCsvFiles(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts CSV files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the CSV file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withCsvPath(filePath); +return this; + } + + private void validateJsonFiles() throws IOException { +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION); +for (CarbonFile dataFile : dataFiles) { + try { +new JSONParser().parse(SDKUtil.buildJsonReader(dataFile, this.hadoopConf)); + } catch (FileNotFoundException ex) { +throw new FileNotFoundException("File " + dataFile + " not found to build carbon writer."); + } catch (ParseException ex) { +throw new RuntimeException("File " + dataFile + " is not in json format."); + } +} +this.dataFiles = dataFiles; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading JSON files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withJsonPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withJsonInput(); +this.validateJsonFiles(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts JSON file directory and + * list of file which has to be loaded. + * + * @param filePath directory where the json file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + * @throws IOException + */ + public CarbonWriterBuilder withJsonPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withJsonPath(filePath); +return this; + } + + private void validateFilePath(String filePath) { +if (StringUtils.isEmpty(filePath)) { + throw new IllegalArgumentException("filePath can not be empty"); +} + } + + /** + * to build a {@link CarbonWriter}, which accepts loading Parquet files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withParquetPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.writerType = WRITER_TYPE.PARQUET; +this.validateParquetFiles(); +return this; + } + + private void setIsDirectory(String filePath) { +if (this.hadoopConf == null) { + this.hadoopConf = new Configuration(FileFactory.getConfiguration()); Review comment: I have checked the build() method of the same file and their hadoopConf is building like this. please let me know if this is not the correct way. ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/utils/SDKUtil.java ## @@ -79,4 +98,75 @@ public static ArrayList listFiles(String sourceImageFolder, return (Object[]) input[i]; } +
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479925025 ## File path: examples/spark/pom.xml ## @@ -38,6 +38,12 @@ org.apache.carbondata carbondata-spark_${spark.binary.version} ${project.version} + Review comment: Without exclusion, all these dependencies was causing transitive dependency and because of that build was failing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
nihal0107 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r479924597 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -2456,4 +2471,24 @@ private CarbonCommonConstants() { * property which defines the insert stage flow */ public static final String IS_INSERT_STAGE = "is_insert_stage"; + + /** + * the level 1 complex delimiter default value + */ + @CarbonProperty Review comment: Removed CarbonProperty This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-683585404 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3925/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-683582558 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2184/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org