date:20200831

[GitHub] [carbondata] nihal0107 removed a comment on pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.

2020-08-31 Thread GitBox



nihal0107 removed a comment on pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905#issuecomment-684215300


   We are getting this exception in case of more than 0.1 million record. We 
can't load that amount of data for the test case. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] nihal0107 commented on pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.

2020-08-31 Thread GitBox



nihal0107 commented on pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905#issuecomment-684216370


   > @nihal0107 , can a test case be added for your fix?
   
   We are getting this exception in case of more than 0.1 million record. We 
can't load that amount of data for the test case.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] nihal0107 commented on pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.

2020-08-31 Thread GitBox



nihal0107 commented on pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905#issuecomment-684215300


   We are getting this exception in case of more than 0.1 million record. We 
can't load that amount of data for the test case. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3906: [CARBONDATA-3968]Added test cases for hive read complex types and handled other issues

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3906:
URL: https://github.com/apache/carbondata/pull/3906#issuecomment-684059041


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3938/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3906: [CARBONDATA-3968]Added test cases for hive read complex types and handled other issues

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3906:
URL: https://github.com/apache/carbondata/pull/3906#issuecomment-684057932


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2198/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (CARBONDATA-3968) Hive read complex types issues

2020-08-31 Thread Akshay (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshay updated CARBONDATA-3968:
---
Description: 
# Issues in reading array/map/struct of byte, varchar and decimal types.
 # Map of primitive type with only one row inserted has issues.

  was:
# Issues in reading of byte, varchar and decimal types.
 # Map of primitive type with only one row inserted has issues.


> Hive read complex types issues
> --
>
> Key: CARBONDATA-3968
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3968
> Project: CarbonData
>  Issue Type: Bug
>  Components: hive-integration
>Reporter: Akshay
>Priority: Major
>
> # Issues in reading array/map/struct of byte, varchar and decimal types.
>  # Map of primitive type with only one row inserted has issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3834:
URL: https://github.com/apache/carbondata/pull/3834#issuecomment-684031807


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2197/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3834:
URL: https://github.com/apache/carbondata/pull/3834#issuecomment-684022546


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3937/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-684015193


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3936/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-684012482


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2195/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (CARBONDATA-3968) Hive read complex types issues

2020-08-31 Thread Akshay (Jira)

Akshay created CARBONDATA-3968:
--

 Summary: Hive read complex types issues
 Key: CARBONDATA-3968
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3968
 Project: CarbonData
  Issue Type: Bug
  Components: hive-integration
Reporter: Akshay


# Issues in reading of byte, varchar and decimal types.
 # Map of primitive type with only one row inserted has issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [carbondata] akkio-97 commented on a change in pull request #3906: Added test cases for hive read complex types and handled other issues

2020-08-31 Thread GitBox



akkio-97 commented on a change in pull request #3906:
URL: https://github.com/apache/carbondata/pull/3906#discussion_r480358731



##
File path: 
integration/hive/src/test/java/org/apache/carbondata/hive/HiveTestUtils.java
##
@@ -65,7 +74,12 @@ public boolean checkAnswer(ResultSet actual, ResultSet 
expected) throws SQLExcep
   Assert.assertTrue(numOfColumnsExpected > 0);
   Assert.assertEquals(actual.getMetaData().getColumnCount(), 
numOfColumnsExpected);
   for (int i = 1; i <= numOfColumnsExpected; i++) {
-Assert.assertEquals(actual.getString(i), actual.getString(i));
+if (actual.getString(i).contains(":")) {
+  Assert.assertTrue(checkMapPairsIgnoringOrder(actual.getString(i), 
expected.getString(i)));
+} else {
+  Assert.assertEquals(actual.getString(i), expected.getString(i));
+}
+// System.out.println(actual.getString(i));

Review comment:
   done

##
File path: 
integration/hive/src/main/java/org/apache/carbondata/hive/util/DataTypeUtil.java
##
@@ -64,13 +69,23 @@ public static DataType convertHiveTypeToCarbon(String type) 
throws SQLException
   return DataTypes.createArrayType(convertHiveTypeToCarbon(subType));
 } else if (type.startsWith("map<")) {
   String[] subType = (type.substring(type.indexOf("<") + 1, 
type.indexOf(">"))).split(",");
+  for (int i = 0; i < subType.length; i++) {
+if (subType[i].startsWith("decimal")) {
+  subType[i] += ',' + subType[++i];

Review comment:
   done

##
File path: 
integration/hive/src/main/java/org/apache/carbondata/hive/util/DataTypeUtil.java
##
@@ -64,13 +69,23 @@ public static DataType convertHiveTypeToCarbon(String type) 
throws SQLException
   return DataTypes.createArrayType(convertHiveTypeToCarbon(subType));
 } else if (type.startsWith("map<")) {
   String[] subType = (type.substring(type.indexOf("<") + 1, 
type.indexOf(">"))).split(",");
+  for (int i = 0; i < subType.length; i++) {
+if (subType[i].startsWith("decimal")) {
+  subType[i] += ',' + subType[++i];
+  subType = (String[]) ArrayUtils.removeElement(subType, subType[i]);
+}
+  }
   return DataTypes
   .createMapType(convertHiveTypeToCarbon(subType[0]), 
convertHiveTypeToCarbon(subType[1]));
 } else if (type.startsWith("struct<")) {
   String[] subTypes =
   (type.substring(type.indexOf("<") + 1, 
type.indexOf(">"))).split(",");
   List structFieldList = new ArrayList<>();
-  for (String subType : subTypes) {
+  for (int i = 0; i < subTypes.length; i++) {
+String subType = subTypes[i];
+if (subType.startsWith("decimal")) {
+  subType += ',' + subTypes[++i];

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3908:
URL: https://github.com/apache/carbondata/pull/3908#issuecomment-683969864


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3933/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3909: [WIP] Date/timestamp compatability between hive and carbon

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3909:
URL: https://github.com/apache/carbondata/pull/3909#issuecomment-683969333


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2194/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3909: [WIP] Date/timestamp compatability between hive and carbon

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3909:
URL: https://github.com/apache/carbondata/pull/3909#issuecomment-683967436


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3935/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3908:
URL: https://github.com/apache/carbondata/pull/3908#issuecomment-683966337


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2193/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3861:
URL: https://github.com/apache/carbondata/pull/3861#issuecomment-683964722


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2192/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3861:
URL: https://github.com/apache/carbondata/pull/3861#issuecomment-683964482


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3934/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] Karan-c980 commented on pull request #3876: TestingCI

2020-08-31 Thread GitBox



Karan-c980 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-683945124


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] kunal642 commented on a change in pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries

2020-08-31 Thread GitBox



kunal642 commented on a change in pull request #3861:
URL: https://github.com/apache/carbondata/pull/3861#discussion_r480289433



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala
##
@@ -58,6 +58,16 @@ object NodeType extends Enumeration {
  */
 class CarbonSecondaryIndexOptimizer(sparkSession: SparkSession) {
 
+  // to store the sort node per query
+  var sortNodeForPushDown: Sort = _
+
+  // to store the limit literal per query
+  var limitLiteral : Literal = _
+
+  // by default do not push down notNull filter,
+  // but for orderby limit push down, push down notNull filter also. Else we 
get wrong results.
+  var pushDownNotNullFilter : Boolean = _

Review comment:
   Why not keep these as local variables in transformFilterToJoin and pass 
to rewritePlanForSecondaryIndex()?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] kunal642 commented on a change in pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries

2020-08-31 Thread GitBox



kunal642 commented on a change in pull request #3861:
URL: https://github.com/apache/carbondata/pull/3861#discussion_r480286695



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala
##
@@ -824,6 +904,57 @@ class CarbonSecondaryIndexOptimizer(sparkSession: 
SparkSession) {
 }
   }
 
+  private def checkIfPushDownOrderByLimitAndNotNullFilter(literal: Literal, 
sort: Sort,
+  filter: Filter): Unit = {
+// 1. check all the filter columns present in SI
+val originalFilterAttributes = filter.condition collect {
+  case attr: AttributeReference =>
+attr.name.toLowerCase
+}
+val filterAttributes = filter.condition collect {
+  case attr: AttributeReference => attr.name.toLowerCase
+}
+val indexTableRelation = MatchIndexableRelation.unapply(filter.child).get
+val matchingIndexTables = CarbonCostBasedOptimizer.identifyRequiredTables(
+  filterAttributes.toSet.asJava,
+  
CarbonIndexUtil.getSecondaryIndexes(indexTableRelation).mapValues(_.toList.asJava).asJava)
+  .asScala
+val databaseName = filter.child.asInstanceOf[LogicalRelation].relation
+  .asInstanceOf[CarbonDatasourceHadoopRelation].carbonRelation.databaseName
+// filter out all the index tables which are disabled
+val enabledMatchingIndexTables = matchingIndexTables
+  .filter(table => {
+sparkSession.sessionState.catalog
+  .getTableMetadata(TableIdentifier(table,
+Some(databaseName))).storage
+  .properties
+  .getOrElse("isSITableEnabled", "true").equalsIgnoreCase("true")
+  })
+// 2. check if only one SI matches for the filter columns
+if (enabledMatchingIndexTables.nonEmpty && enabledMatchingIndexTables.size 
== 1 &&
+filterAttributes.intersect(originalFilterAttributes).size ==
+originalFilterAttributes.size) {
+  // 3. check if all the sort columns is in SI
+  val sortColumns = sort
+.order
+.map(_.child.asInstanceOf[AttributeReference].name.toLowerCase())
+.toSet
+  val indexCarbonTable = CarbonEnv
+.getCarbonTable(Some(databaseName), 
enabledMatchingIndexTables.head)(sparkSession)
+  var allColumnsFound = true

Review comment:
   use forall to check whether all columns exists or not





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] kunal642 commented on a change in pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries

2020-08-31 Thread GitBox



kunal642 commented on a change in pull request #3861:
URL: https://github.com/apache/carbondata/pull/3861#discussion_r480285525



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala
##
@@ -824,6 +904,57 @@ class CarbonSecondaryIndexOptimizer(sparkSession: 
SparkSession) {
 }
   }
 
+  private def checkIfPushDownOrderByLimitAndNotNullFilter(literal: Literal, 
sort: Sort,
+  filter: Filter): Unit = {
+// 1. check all the filter columns present in SI
+val originalFilterAttributes = filter.condition collect {
+  case attr: AttributeReference =>
+attr.name.toLowerCase
+}
+val filterAttributes = filter.condition collect {
+  case attr: AttributeReference => attr.name.toLowerCase
+}
+val indexTableRelation = MatchIndexableRelation.unapply(filter.child).get
+val matchingIndexTables = CarbonCostBasedOptimizer.identifyRequiredTables(
+  filterAttributes.toSet.asJava,
+  
CarbonIndexUtil.getSecondaryIndexes(indexTableRelation).mapValues(_.toList.asJava).asJava)
+  .asScala
+val databaseName = filter.child.asInstanceOf[LogicalRelation].relation
+  .asInstanceOf[CarbonDatasourceHadoopRelation].carbonRelation.databaseName
+// filter out all the index tables which are disabled
+val enabledMatchingIndexTables = matchingIndexTables
+  .filter(table => {
+sparkSession.sessionState.catalog
+  .getTableMetadata(TableIdentifier(table,
+Some(databaseName))).storage
+  .properties
+  .getOrElse("isSITableEnabled", "true").equalsIgnoreCase("true")
+  })
+// 2. check if only one SI matches for the filter columns
+if (enabledMatchingIndexTables.nonEmpty && enabledMatchingIndexTables.size 
== 1 &&
+filterAttributes.intersect(originalFilterAttributes).size ==
+originalFilterAttributes.size) {
+  // 3. check if all the sort columns is in SI
+  val sortColumns = sort
+.order
+.map(_.child.asInstanceOf[AttributeReference].name.toLowerCase())
+.toSet
+  val indexCarbonTable = CarbonEnv
+.getCarbonTable(Some(databaseName), 
enabledMatchingIndexTables.head)(sparkSession)

Review comment:
   use indexTableRelation.carbonTable to get indexCarbonTable





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] kunal642 commented on a change in pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries

2020-08-31 Thread GitBox



kunal642 commented on a change in pull request #3861:
URL: https://github.com/apache/carbondata/pull/3861#discussion_r480276719



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala
##
@@ -824,6 +904,57 @@ class CarbonSecondaryIndexOptimizer(sparkSession: 
SparkSession) {
 }
   }
 
+  private def checkIfPushDownOrderByLimitAndNotNullFilter(literal: Literal, 
sort: Sort,
+  filter: Filter): Unit = {
+// 1. check all the filter columns present in SI
+val originalFilterAttributes = filter.condition collect {
+  case attr: AttributeReference =>
+attr.name.toLowerCase
+}
+val filterAttributes = filter.condition collect {
+  case attr: AttributeReference => attr.name.toLowerCase
+}
+val indexTableRelation = MatchIndexableRelation.unapply(filter.child).get
+val matchingIndexTables = CarbonCostBasedOptimizer.identifyRequiredTables(
+  filterAttributes.toSet.asJava,
+  
CarbonIndexUtil.getSecondaryIndexes(indexTableRelation).mapValues(_.toList.asJava).asJava)
+  .asScala
+val databaseName = filter.child.asInstanceOf[LogicalRelation].relation

Review comment:
   why not use `indexTableRelation.carbonRelation.databaseName`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] kunal642 commented on a change in pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries

2020-08-31 Thread GitBox



kunal642 commented on a change in pull request #3861:
URL: https://github.com/apache/carbondata/pull/3861#discussion_r480273615



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala
##
@@ -824,6 +904,57 @@ class CarbonSecondaryIndexOptimizer(sparkSession: 
SparkSession) {
 }
   }
 
+  private def checkIfPushDownOrderByLimitAndNotNullFilter(literal: Literal, 
sort: Sort,
+  filter: Filter): Unit = {
+// 1. check all the filter columns present in SI
+val originalFilterAttributes = filter.condition collect {
+  case attr: AttributeReference =>
+attr.name.toLowerCase
+}
+val filterAttributes = filter.condition collect {

Review comment:
   is filterAttributes same as originalFilterAttributes?? code looks to be 
same
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] ShreelekhyaG opened a new pull request #3909: [WIP] Date/timestamp compatability between hive and carbon

2020-08-31 Thread GitBox



ShreelekhyaG opened a new pull request #3909:
URL: https://github.com/apache/carbondata/pull/3909


### Why is this PR needed?
   To ensure the date/timestamp that is supported by hive also to be supported 
by carbon.
   Ex: -01-01 is accepted by hive as a valid record and converted to 
0001-01-01.

### What changes were proposed in this PR?
   Changed the min value of date which is used for validation. When setlenient 
flag is set to true, carbon can convert and support
    year.
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (CARBONDATA-3961) Reorder filter according to the column storage ordinal to improve reading

2020-08-31 Thread Kunal Kapoor (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-3961:
-
Issue Type: Improvement  (was: Bug)

> Reorder filter according to the column storage ordinal to improve reading
> -
>
> Key: CARBONDATA-3961
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3961
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Kunal Kapoor
>Assignee: Kunal Kapoor
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [carbondata] kunal642 opened a new pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning

2020-08-31 Thread GitBox



kunal642 opened a new pull request #3908:
URL: https://github.com/apache/carbondata/pull/3908


### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (CARBONDATA-3967) Cache partitions to improve partition pruning performance

2020-08-31 Thread Kunal Kapoor (Jira)

Kunal Kapoor created CARBONDATA-3967:


 Summary: Cache partitions to improve partition pruning performance
 Key: CARBONDATA-3967
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3967
 Project: CarbonData
  Issue Type: Improvement
Reporter: Kunal Kapoor
Assignee: Kunal Kapoor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [carbondata] ajantha-bhat commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries

2020-08-31 Thread GitBox



ajantha-bhat commented on pull request #3861:
URL: https://github.com/apache/carbondata/pull/3861#issuecomment-683903456


   > @ajantha-bhat , can test cases be added to check no filter pushdown in not 
equal to case?
   
   @vikramahuja1001 :  Done added.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] ajantha-bhat removed a comment on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries

2020-08-31 Thread GitBox



ajantha-bhat removed a comment on pull request #3861:
URL: https://github.com/apache/carbondata/pull/3861#issuecomment-683873006


   > @ajantha-bhat , can test cases be added to check no filter pushdown in not 
equal to case?
   
   There was already some testcase related to notNull pushdown was failing in 
`TestNIQueryWithIndex`, I will check and add notEquals anyways.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3906: Added test cases for hive read complex types and handled other issues

2020-08-31 Thread GitBox



vikramahuja1001 commented on a change in pull request #3906:
URL: https://github.com/apache/carbondata/pull/3906#discussion_r480233177



##
File path: 
integration/hive/src/main/java/org/apache/carbondata/hive/util/DataTypeUtil.java
##
@@ -64,13 +69,23 @@ public static DataType convertHiveTypeToCarbon(String type) 
throws SQLException
   return DataTypes.createArrayType(convertHiveTypeToCarbon(subType));
 } else if (type.startsWith("map<")) {
   String[] subType = (type.substring(type.indexOf("<") + 1, 
type.indexOf(">"))).split(",");
+  for (int i = 0; i < subType.length; i++) {
+if (subType[i].startsWith("decimal")) {
+  subType[i] += ',' + subType[++i];

Review comment:
   Use CarbonCommonConstants.COMMA instead of ','

##
File path: 
integration/hive/src/main/java/org/apache/carbondata/hive/util/DataTypeUtil.java
##
@@ -64,13 +69,23 @@ public static DataType convertHiveTypeToCarbon(String type) 
throws SQLException
   return DataTypes.createArrayType(convertHiveTypeToCarbon(subType));
 } else if (type.startsWith("map<")) {
   String[] subType = (type.substring(type.indexOf("<") + 1, 
type.indexOf(">"))).split(",");
+  for (int i = 0; i < subType.length; i++) {
+if (subType[i].startsWith("decimal")) {
+  subType[i] += ',' + subType[++i];
+  subType = (String[]) ArrayUtils.removeElement(subType, subType[i]);
+}
+  }
   return DataTypes
   .createMapType(convertHiveTypeToCarbon(subType[0]), 
convertHiveTypeToCarbon(subType[1]));
 } else if (type.startsWith("struct<")) {
   String[] subTypes =
   (type.substring(type.indexOf("<") + 1, 
type.indexOf(">"))).split(",");
   List structFieldList = new ArrayList<>();
-  for (String subType : subTypes) {
+  for (int i = 0; i < subTypes.length; i++) {
+String subType = subTypes[i];
+if (subType.startsWith("decimal")) {
+  subType += ',' + subTypes[++i];

Review comment:
   Use CarbonCommonConstants.COMMA





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3906: Added test cases for hive read complex types and handled other issues

2020-08-31 Thread GitBox



vikramahuja1001 commented on a change in pull request #3906:
URL: https://github.com/apache/carbondata/pull/3906#discussion_r480230761



##
File path: 
integration/hive/src/test/java/org/apache/carbondata/hive/HiveTestUtils.java
##
@@ -65,7 +74,12 @@ public boolean checkAnswer(ResultSet actual, ResultSet 
expected) throws SQLExcep
   Assert.assertTrue(numOfColumnsExpected > 0);
   Assert.assertEquals(actual.getMetaData().getColumnCount(), 
numOfColumnsExpected);
   for (int i = 1; i <= numOfColumnsExpected; i++) {
-Assert.assertEquals(actual.getString(i), actual.getString(i));
+if (actual.getString(i).contains(":")) {
+  Assert.assertTrue(checkMapPairsIgnoringOrder(actual.getString(i), 
expected.getString(i)));
+} else {
+  Assert.assertEquals(actual.getString(i), expected.getString(i));
+}
+// System.out.println(actual.getString(i));

Review comment:
   Remove this comment





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] vikramahuja1001 commented on pull request #3906: Added test cases for hive read complex types and handled other issues

2020-08-31 Thread GitBox



vikramahuja1001 commented on pull request #3906:
URL: https://github.com/apache/carbondata/pull/3906#issuecomment-683873583


   Add jira ID



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] ajantha-bhat commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries

2020-08-31 Thread GitBox



ajantha-bhat commented on pull request #3861:
URL: https://github.com/apache/carbondata/pull/3861#issuecomment-683873006


   > @ajantha-bhat , can test cases be added to check no filter pushdown in not 
equal to case?
   
   There was already some testcase related to notNull pushdown was failing in 
`TestNIQueryWithIndex`, I will check and add notEquals anyways.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] vikramahuja1001 commented on pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-31 Thread GitBox



vikramahuja1001 commented on pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#issuecomment-683866796


   @akashrn5 , @kunal642 please check



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] vikramahuja1001 commented on pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.

2020-08-31 Thread GitBox



vikramahuja1001 commented on pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905#issuecomment-683869471


   @nihal0107 , can a test case be added for your fix?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] vikramahuja1001 closed pull request #3895: [WIP]SI fix for not equal to filter

2020-08-31 Thread GitBox



vikramahuja1001 closed pull request #3895:
URL: https://github.com/apache/carbondata/pull/3895


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] vikramahuja1001 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries

2020-08-31 Thread GitBox



vikramahuja1001 commented on pull request #3861:
URL: https://github.com/apache/carbondata/pull/3861#issuecomment-683871061


   @ajantha-bhat , can test cases be added to check no filter pushdown in not 
equal to case?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] vikramahuja1001 commented on pull request #3895: [WIP]SI fix for not equal to filter

2020-08-31 Thread GitBox



vikramahuja1001 commented on pull request #3895:
URL: https://github.com/apache/carbondata/pull/3895#issuecomment-683868709


   @ajantha-bhat ,i checked that PR, maybe you can add the test cases for not 
equal and check SI pushdown



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3890: [CARBONDATA-3952] After reset query not hitting MV

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3890:
URL: https://github.com/apache/carbondata/pull/3890#issuecomment-683841817


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2191/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3890: [CARBONDATA-3952] After reset query not hitting MV

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3890:
URL: https://github.com/apache/carbondata/pull/3890#issuecomment-683840584


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3932/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#issuecomment-683731944


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3931/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#issuecomment-683728958


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2190/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] kunal642 commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-08-31 Thread GitBox



kunal642 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-683729131


   @akashrn5 @kumarvishal09 @QiangCai @ravipesala @ajantha-bhat Please review



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#issuecomment-683685077


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3930/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r480021706



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CSVCarbonWriter.java
##
@@ -72,6 +90,72 @@ public void write(Object object) throws IOException {
 }
   }
 
+  public static CsvParser buildCsvParser(Configuration conf) {

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r480021578



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +608,227 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading CSV files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withCsvInput();
+this.dataFiles = 
this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION);
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts CSV files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the CSV file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withCsvPath(filePath);
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading JSON files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withJsonInput();
+this.dataFiles = 
this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION);
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts JSON file directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the json file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withJsonPath(filePath);
+return this;
+  }
+
+  private void validateFilePath(String filePath) {
+if (StringUtils.isEmpty(filePath)) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading Parquet files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath) throws 
IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.writerType = WRITER_TYPE.PARQUET;
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.PARQUET_FILE_EXT);
+org.apache.avro.Schema parquetSchema = ParquetCarbonWriter
+.extractParquetSchema(dataFiles[0], this.hadoopConf);
+this.dataFiles = dataFiles;
+this.avroSchema = parquetSchema;
+this.schema = 
AvroCarbonWriter.getCarbonSchemaFromAvroSchema(this.avroSchema);
+return this;
+  }
+
+  private void setIsDirectory(String filePath) {
+if (this.hadoopConf == null) {
+  this.hadoopConf = new Configuration(FileFactory.getConfiguration());
+}
+CarbonFile carbonFile = FileFactory.getCarbonFile(filePath, hadoopConf);
+this.isDirectory = carbonFile.isDirectory();
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts parquet files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the parquet file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withParquetPath(filePath);
+return this;
+  }
+
+  private CarbonFile[] extractDataFiles(String suf) {
+List dataFiles;
+if (this.isDirectory) {
+  if (CollectionUtils.isEmpty(this.fileList)) {
+dataFiles = SDKUtil.extractFilesFromFolder(this.filePath, suf, 
this.hadoopConf);
+  } else {
+dataFiles = this.appendFileListWithPath();
+  }
+} else {
+  dataFiles = new ArrayList<>();
+  dataFiles.add(FileFactory.getCarbonFile(this.filePath, this.hadoopConf));
+}
+if (CollectionUtils.isEmpty(dataFiles)) {
+  throw new RuntimeException("Data files can't be empty.");
+}
+return dataFiles.toArray(new CarbonFile[0]);
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading ORC files.
+   *
+   * @param

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3907: [CARBONDATA-3966]Fix NullPointerException issue in case of reliability testing of load and compaction

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3907:
URL: https://github.com/apache/carbondata/pull/3907#issuecomment-683683411


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2188/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r480021451



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -660,13 +895,39 @@ public CarbonWriter build() throws IOException, 
InvalidLoadOptionException {
   // removed from the load. LoadWithoutConverter flag is going to point to 
the Loader Builder
   // which will skip Conversion Step.
   loadModel.setLoadWithoutConverterStep(true);
-  return new AvroCarbonWriter(loadModel, hadoopConf, this.avroSchema);
+  AvroCarbonWriter avroCarbonWriter = new AvroCarbonWriter(loadModel,

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#issuecomment-683681865


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2189/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3907: [CARBONDATA-3966]Fix NullPointerException issue in case of reliability testing of load and compaction

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3907:
URL: https://github.com/apache/carbondata/pull/3907#issuecomment-683675465


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3929/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-683663848


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3928/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-683661702


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2187/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#issuecomment-683645409


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2186/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#issuecomment-683645642


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3927/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479973113



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -660,13 +895,39 @@ public CarbonWriter build() throws IOException, 
InvalidLoadOptionException {
   // removed from the load. LoadWithoutConverter flag is going to point to 
the Loader Builder
   // which will skip Conversion Step.
   loadModel.setLoadWithoutConverterStep(true);
-  return new AvroCarbonWriter(loadModel, hadoopConf, this.avroSchema);
+  AvroCarbonWriter avroCarbonWriter = new AvroCarbonWriter(loadModel,

Review comment:
   We have some code duplications for each type of writer. Suggest to 
refactor it. Something like this - 
   ```suggestion
   CarbonWriter carbonWriter;
   if (this.writerType == WRITER_TYPE.AVRO) {
 // AVRO records are pushed to Carbon as Object not as Strings. This 
was done in order to
 // handle multi level complex type support. As there are no conversion 
converter step is
 // removed from the load. LoadWithoutConverter flag is going to point 
to the Loader Builder
 // which will skip Conversion Step.
 loadModel.setLoadWithoutConverterStep(true);
 carbonWriter = new AvroCarbonWriter(loadModel, hadoopConf, 
this.avroSchema);
   } else if (this.writerType == WRITER_TYPE.JSON) {
 loadModel.setJsonFileLoad(true);
 carbonWriter = new JsonCarbonWriter(loadModel, hadoopConf);
   } else if (this.writerType == WRITER_TYPE.PARQUET) {
 loadModel.setLoadWithoutConverterStep(true);
 carbonWriter = new ParquetCarbonWriter(loadModel, hadoopConf, 
this.avroSchema);
   } else if (this.writerType == WRITER_TYPE.ORC) {
 carbonWriter = new ORCCarbonWriter(loadModel, hadoopConf);
   } else {
 // CSV
 CSVCarbonWriter csvCarbonWriter = new CSVCarbonWriter(loadModel, 
hadoopConf);
 if (!this.options.containsKey(CarbonCommonConstants.FILE_HEADER)) {
   csvCarbonWriter.setSkipHeader(true);
 }
 carbonWriter = csvCarbonWriter;
   }
   if (!StringUtils.isEmpty(filePath)) {
 carbonWriter.validateAndSetDataFiles(this.dataFiles);
   }
   return carbonWriter;
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479962896



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +608,227 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading CSV files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withCsvInput();
+this.dataFiles = 
this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION);
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts CSV files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the CSV file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withCsvPath(filePath);
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading JSON files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withJsonInput();
+this.dataFiles = 
this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION);
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts JSON file directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the json file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withJsonPath(filePath);
+return this;
+  }
+
+  private void validateFilePath(String filePath) {
+if (StringUtils.isEmpty(filePath)) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading Parquet files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath) throws 
IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.writerType = WRITER_TYPE.PARQUET;
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.PARQUET_FILE_EXT);
+org.apache.avro.Schema parquetSchema = ParquetCarbonWriter
+.extractParquetSchema(dataFiles[0], this.hadoopConf);
+this.dataFiles = dataFiles;
+this.avroSchema = parquetSchema;
+this.schema = 
AvroCarbonWriter.getCarbonSchemaFromAvroSchema(this.avroSchema);
+return this;
+  }
+
+  private void setIsDirectory(String filePath) {
+if (this.hadoopConf == null) {
+  this.hadoopConf = new Configuration(FileFactory.getConfiguration());
+}
+CarbonFile carbonFile = FileFactory.getCarbonFile(filePath, hadoopConf);
+this.isDirectory = carbonFile.isDirectory();
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts parquet files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the parquet file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withParquetPath(filePath);
+return this;
+  }
+
+  private CarbonFile[] extractDataFiles(String suf) {
+List dataFiles;
+if (this.isDirectory) {
+  if (CollectionUtils.isEmpty(this.fileList)) {
+dataFiles = SDKUtil.extractFilesFromFolder(this.filePath, suf, 
this.hadoopConf);
+  } else {
+dataFiles = this.appendFileListWithPath();
+  }
+} else {
+  dataFiles = new ArrayList<>();
+  dataFiles.add(FileFactory.getCarbonFile(this.filePath, this.hadoopConf));
+}
+if (CollectionUtils.isEmpty(dataFiles)) {
+  throw new RuntimeException("Data files can't be empty.");
+}
+return dataFiles.toArray(new CarbonFile[0]);
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading ORC files.
+   *
+   * @param

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479961626



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CSVCarbonWriter.java
##
@@ -72,6 +90,72 @@ public void write(Object object) throws IOException {
 }
   }
 
+  public static CsvParser buildCsvParser(Configuration conf) {

Review comment:
   can it be private and non-static method?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479961626



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CSVCarbonWriter.java
##
@@ -72,6 +90,72 @@ public void write(Object object) throws IOException {
 }
   }
 
+  public static CsvParser buildCsvParser(Configuration conf) {

Review comment:
   It can be private and non-static method?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-31 Thread GitBox



vikramahuja1001 commented on a change in pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#discussion_r479960111



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/IndexRepairCommand.scala
##
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command.index
+
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.execution.command.DataCommand
+import org.apache.spark.sql.hive.CarbonRelation
+import org.apache.spark.sql.index.CarbonIndexUtil
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.metadata.index.IndexType
+import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, 
SegmentStatusManager}
+import org.apache.carbondata.core.util.path.CarbonTablePath
+import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, 
CarbonLoadModel}
+
+/**
+ * Show indexes on the table
+ */
+case class IndexRepairCommand(indexname: Option[String], tableNameOp: 
TableIdentifier,
+  dbName: String,
+  segments: Option[List[String]]) extends 
DataCommand{
+
+  private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName)
+
+  def processData(sparkSession: SparkSession): Seq[Row] = {
+if (dbName == null) {
+  // table level and index level
+  val databaseName = if (tableNameOp.database.isEmpty) {
+SparkSession.getActiveSession.get.catalog.currentDatabase
+  } else {
+tableNameOp.database.get.toString
+  }
+  triggerRepair(tableNameOp.table, databaseName, indexname.isEmpty, 
indexname, segments)
+} else {
+  // for all tables in the db
+sparkSession.sessionState.catalog.listTables(dbName).foreach {
+  tableIdent =>
+triggerRepair(tableIdent.table, dbName, indexname.isEmpty, 
indexname, segments)
+}
+}
+Seq.empty
+  }
+
+  def triggerRepair(tableNameOp: String, databaseName: String, allIndex: 
Boolean,
+indexName: Option[String], segments: 
Option[List[String]]): Unit = {
+val sparkSession = SparkSession.getActiveSession.get
+// when Si creation and load to main table are parallel, get the 
carbonTable from the
+// metastore which will have the latest index Info
+val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
+val carbonTable = metaStore
+  .lookupRelation(Some(databaseName), tableNameOp)(sparkSession)
+  .asInstanceOf[CarbonRelation].carbonTable
+
+val carbonLoadModel = new CarbonLoadModel
+carbonLoadModel.setDatabaseName(databaseName)
+carbonLoadModel.setTableName(tableNameOp)
+carbonLoadModel.setTablePath(carbonTable.getTablePath)
+val tableStatusFilePath = 
CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath)
+carbonLoadModel.setLoadMetadataDetails(SegmentStatusManager

Review comment:
   added





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-31 Thread GitBox



vikramahuja1001 commented on a change in pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#discussion_r479959497



##
File path: docs/index/secondary-index-guide.md
##
@@ -188,4 +188,25 @@ where we have old stores.
 Syntax
   ```
   REGISTER INDEX TABLE index_name ON [TABLE] [db_name.]table_name
-  ```
\ No newline at end of file
+  ```
+
+### Reindex Command
+This command is used to reload segments in the SI table in case when there is 
some mismatch in the number
+of segments with main table.
+
+Syntax
+
+Reindex on all the secondary Indexes on the main table
+  ```
+  REINDEX ON TABLE [db_name.]main_table_name [WHERE SEGMENT.ID IN(0,1)]
+  ```
+Reindex on index table level

Review comment:
   done

##
File path: docs/index/secondary-index-guide.md
##
@@ -188,4 +188,25 @@ where we have old stores.
 Syntax
   ```
   REGISTER INDEX TABLE index_name ON [TABLE] [db_name.]table_name
-  ```
\ No newline at end of file
+  ```
+
+### Reindex Command
+This command is used to reload segments in the SI table in case when there is 
some mismatch in the number
+of segments with main table.
+
+Syntax
+
+Reindex on all the secondary Indexes on the main table

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-31 Thread GitBox



vikramahuja1001 commented on a change in pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#discussion_r479959112



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/IndexRepairCommand.scala
##
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command.index
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.execution.command.DataCommand
+import org.apache.spark.sql.hive.CarbonRelation
+import org.apache.spark.sql.index.CarbonIndexUtil
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.metadata.index.IndexType
+import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, 
SegmentStatusManager}
+import org.apache.carbondata.core.util.path.CarbonTablePath
+import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, 
CarbonLoadModel}
+
+/**
+ * Repair logic for reindex command on maintable/indextable
+ */
+case class IndexRepairCommand(indexnameOp: Option[String], tableIdentifier: 
TableIdentifier,

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] akashrn5 opened a new pull request #3907: [CARBONDATA-3966]Fix nullPointerException issue in case of reliability testing of load and compaction

2020-08-31 Thread GitBox



akashrn5 opened a new pull request #3907:
URL: https://github.com/apache/carbondata/pull/3907


### Why is this PR needed?
   During the carbondata reliability and concurrency test of load, compaction 
and query, Some times nullPointerException is thrown. This is because, in 
`TableSegmentRefresher ` we get the last modified timestamp of the segment 
file, to decide to refresh the cache, in case of concurrency, it can happen 
that the segment file get deleted or during update, file may not be there, that 
time getLastModified time throws null pointer.

### What changes were proposed in this PR?
   Before get the last modified time, always check for the file exists, as it 
can be deleted during that time due to concurrency, if not present, initialize 
to zero.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No(verified with concurrency of 1000s of segments.
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (CARBONDATA-3966) NullPointerException is thrown in case of reliability testing of load, compaction and query

2020-08-31 Thread Akash R Nilugal (Jira)

Akash R Nilugal created CARBONDATA-3966:
---

 Summary: NullPointerException is thrown in case of reliability 
testing of load, compaction and query
 Key: CARBONDATA-3966
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3966
 Project: CarbonData
  Issue Type: Bug
Reporter: Akash R Nilugal
Assignee: Akash R Nilugal


Sometimes NullPointerException is thrown in case of reliability testing of 
load, compaction and query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI

2020-08-31 Thread GitBox



Karan980 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-683604932


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479933175



##
File path: 
sdk/sdk/src/test/java/org/apache/carbondata/sdk/file/ParquetCarbonWriterTest.java
##
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.file;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Objects;
+
+import org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException;
+import org.apache.commons.io.FileUtils;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+/**
+ * Test suite for {@link ParquetCarbonWriter}
+ */
+public class ParquetCarbonWriterTest {
+  String DATA_PATH = "./src/test/resources/file/";
+  String outputPath = "./testWriteFiles";
+
+  @Before
+  @After
+  public void cleanTestData() {
+try {
+  FileUtils.deleteDirectory(new File(outputPath));
+} catch (Exception e) {
+  e.printStackTrace();
+  Assert.fail(e.getMessage());

Review comment:
   Removed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479933103



##
File path: 
sdk/sdk/src/test/java/org/apache/carbondata/sdk/file/ORCCarbonWriterTest.java
##
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.file;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Objects;
+
+import org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException;
+import org.apache.commons.io.FileUtils;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+/**
+ * Test suite for {@link ORCCarbonWriter}
+ */
+public class ORCCarbonWriterTest {
+
+  String DATA_PATH = "./src/test/resources/file/";
+  String outputPath = "./testloadORCFiles";
+
+  @Before
+  @After
+  public void cleanTestData() {
+try {
+  FileUtils.deleteDirectory(new File(outputPath));
+} catch (Exception e) {
+  e.printStackTrace();
+  Assert.fail(e.getMessage());

Review comment:
   Removed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479932734



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/utils/SDKUtil.java
##
@@ -79,4 +98,75 @@ public static ArrayList listFiles(String sourceImageFolder,
 return (Object[]) input[i];
   }
 
+  public static List extractFilesFromFolder(String path,
+  String suf, Configuration hadoopConf) {
+List dataFiles = listFiles(path, suf, hadoopConf);
+List carbonFiles = new ArrayList<>();
+for (Object dataFile: dataFiles) {
+  carbonFiles.add(FileFactory.getCarbonFile(dataFile.toString(), 
hadoopConf));
+}
+if (CollectionUtils.isEmpty(dataFiles)) {
+  throw new RuntimeException("No file found at given location. Please 
provide" +
+  "the correct folder location.");
+}
+return carbonFiles;
+  }
+
+  public static DataFileStream buildAvroReader(CarbonFile 
carbonFile,

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479932585



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ORCCarbonWriter.java
##
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.file;
+
+import java.io.IOException;
+import java.util.*;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.sdk.file.utils.SDKUtil;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.ql.io.orc.OrcStruct;
+import org.apache.hadoop.hive.ql.io.orc.Reader;
+import org.apache.hadoop.hive.ql.io.orc.RecordReader;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.io.Text;
+
+
+/**
+ * Implementation to write ORC rows in CSV format to carbondata file.
+ */
+public class ORCCarbonWriter extends CSVCarbonWriter {

Review comment:
   Removed composition.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479933031



##
File path: 
sdk/sdk/src/test/java/org/apache/carbondata/sdk/file/AvroCarbonWriterTest.java
##
@@ -603,4 +605,161 @@ public void testWriteBasicForFloat() throws IOException {
 }
   }
 
+  @Test
+  public void testAvroFileLoadWithPrimitiveSchema() throws IOException {

Review comment:
   Removed this test case and also refactored all the testcases for all the 
files.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479932500



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ParquetCarbonWriter.java
##
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.file;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Comparator;
+
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.sdk.file.utils.SDKUtil;
+
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.parquet.hadoop.ParquetReader;
+
+/**
+ * Implementation to write parquet rows in avro format to carbondata file.
+ */
+public class ParquetCarbonWriter extends AvroCarbonWriter {

Review comment:
   Removed composition.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479932276



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  private void validateCsvFiles() throws IOException {
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION);
+if (CollectionUtils.isEmpty(Arrays.asList(dataFiles))) {
+  throw new RuntimeException("CSV files can't be empty.");
+}
+for (CarbonFile dataFile : dataFiles) {
+  try {
+CsvParser csvParser = SDKUtil.buildCsvParser(this.hadoopConf);
+
csvParser.beginParsing(FileFactory.getDataInputStream(dataFile.getPath(),
+-1, this.hadoopConf));
+  } catch (IllegalArgumentException ex) {
+if (ex.getCause() instanceof FileNotFoundException) {
+  throw new FileNotFoundException("File " + dataFile +
+  " not found to build carbon writer.");
+}
+throw ex;
+  }
+}
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading CSV files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withCsvInput();
+this.validateCsvFiles();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts CSV files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the CSV file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withCsvPath(filePath);
+return this;
+  }
+
+  private void validateJsonFiles() throws IOException {
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION);
+for (CarbonFile dataFile : dataFiles) {
+  try {
+new JSONParser().parse(SDKUtil.buildJsonReader(dataFile, 
this.hadoopConf));
+  } catch (FileNotFoundException ex) {
+throw new FileNotFoundException("File " + dataFile + " not found to 
build carbon writer.");
+  } catch (ParseException ex) {
+throw new RuntimeException("File " + dataFile + " is not in json 
format.");
+  }
+}
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading JSON files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withJsonInput();
+this.validateJsonFiles();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts JSON file directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the json file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withJsonPath(filePath);
+return this;
+  }
+
+  private void validateFilePath(String filePath) {
+if (StringUtils.isEmpty(filePath)) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading Parquet files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath) throws 
IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.writerType = WRITER_TYPE.PARQUET;
+this.validateParquetFiles();
+return this;
+  }
+
+  private void setIsDirectory(String filePath) {
+if (this.hadoopConf == null) {
+  this.hadoopConf = new Configuration(FileFactory.getConfiguration());
+}
+CarbonFile carbonFile = FileFactory.getCarbonFile(filePath, hadoopConf);
+this.isDirectory = carbonFile.isDirectory();
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts parquet files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the parquet file exists.
+   * @param fileList list of files which has to be

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479931321



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ORCCarbonWriter.java
##
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.file;
+
+import java.io.IOException;
+import java.util.*;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.sdk.file.utils.SDKUtil;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.ql.io.orc.OrcStruct;
+import org.apache.hadoop.hive.ql.io.orc.Reader;
+import org.apache.hadoop.hive.ql.io.orc.RecordReader;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.io.Text;
+
+
+/**
+ * Implementation to write ORC rows in CSV format to carbondata file.
+ */
+public class ORCCarbonWriter extends CSVCarbonWriter {
+  private Configuration configuration;
+  private CSVCarbonWriter csvCarbonWriter = null;
+  private Reader orcReader = null;
+  private CarbonFile[] dataFiles;
+
+  ORCCarbonWriter(CSVCarbonWriter csvCarbonWriter, Configuration 
configuration) {
+this.csvCarbonWriter = csvCarbonWriter;
+this.configuration = configuration;
+  }
+
+  @Override
+  public void setDataFiles(CarbonFile[] dataFiles) {
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * Load ORC file in iterative way.
+   */
+  @Override
+  public void write() throws IOException {
+if (this.dataFiles == null || this.dataFiles.length == 0) {
+  throw new RuntimeException("'withOrcPath()' must be called to support 
loading ORC files");
+}
+if (this.csvCarbonWriter == null) {
+  throw new RuntimeException("csv carbon writer can not be null");
+}
+Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath));
+for (CarbonFile dataFile : this.dataFiles) {
+  this.loadSingleFile(dataFile);
+}
+  }
+
+  private void loadSingleFile(CarbonFile file) throws IOException {
+orcReader = SDKUtil.buildOrcReader(file.getPath(), this.configuration);
+ObjectInspector objectInspector = orcReader.getObjectInspector();
+RecordReader recordReader = orcReader.rows();
+if (objectInspector instanceof StructObjectInspector) {
+  StructObjectInspector structObjectInspector =
+  (StructObjectInspector) orcReader.getObjectInspector();
+  while (recordReader.hasNext()) {
+Object record = recordReader.next(null); // to remove duplicacy.
+List valueList = 
structObjectInspector.getStructFieldsDataAsList(record);
+for (int i = 0; i < valueList.size(); i++) {
+  valueList.set(i, parseOrcObject(valueList.get(i), 0));
+}
+this.csvCarbonWriter.write(valueList.toArray());
+  }
+} else {
+  while (recordReader.hasNext()) {

Review comment:
   This case will happen when ORC schema is not an instance of 
StructObjectInspector. Added the test case for the same now.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479930818



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ORCCarbonWriter.java
##
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.file;
+
+import java.io.IOException;
+import java.util.*;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.sdk.file.utils.SDKUtil;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.ql.io.orc.OrcStruct;
+import org.apache.hadoop.hive.ql.io.orc.Reader;
+import org.apache.hadoop.hive.ql.io.orc.RecordReader;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.io.Text;
+
+
+/**
+ * Implementation to write ORC rows in CSV format to carbondata file.
+ */
+public class ORCCarbonWriter extends CSVCarbonWriter {
+  private Configuration configuration;
+  private CSVCarbonWriter csvCarbonWriter = null;
+  private Reader orcReader = null;
+  private CarbonFile[] dataFiles;
+
+  ORCCarbonWriter(CSVCarbonWriter csvCarbonWriter, Configuration 
configuration) {
+this.csvCarbonWriter = csvCarbonWriter;
+this.configuration = configuration;
+  }
+
+  @Override
+  public void setDataFiles(CarbonFile[] dataFiles) {
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * Load ORC file in iterative way.
+   */
+  @Override
+  public void write() throws IOException {
+if (this.dataFiles == null || this.dataFiles.length == 0) {
+  throw new RuntimeException("'withOrcPath()' must be called to support 
loading ORC files");
+}
+if (this.csvCarbonWriter == null) {
+  throw new RuntimeException("csv carbon writer can not be null");
+}
+Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath));
+for (CarbonFile dataFile : this.dataFiles) {
+  this.loadSingleFile(dataFile);
+}
+  }
+
+  private void loadSingleFile(CarbonFile file) throws IOException {
+orcReader = SDKUtil.buildOrcReader(file.getPath(), this.configuration);
+ObjectInspector objectInspector = orcReader.getObjectInspector();
+RecordReader recordReader = orcReader.rows();
+if (objectInspector instanceof StructObjectInspector) {
+  StructObjectInspector structObjectInspector =
+  (StructObjectInspector) orcReader.getObjectInspector();
+  while (recordReader.hasNext()) {
+Object record = recordReader.next(null); // to remove duplicacy.

Review comment:
   moved record above while loop and now passing the record as the previous 
object in the argument.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479930367



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ORCCarbonWriter.java
##
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.file;
+
+import java.io.IOException;
+import java.util.*;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.sdk.file.utils.SDKUtil;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.ql.io.orc.OrcStruct;
+import org.apache.hadoop.hive.ql.io.orc.Reader;
+import org.apache.hadoop.hive.ql.io.orc.RecordReader;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.io.Text;
+
+
+/**
+ * Implementation to write ORC rows in CSV format to carbondata file.
+ */
+public class ORCCarbonWriter extends CSVCarbonWriter {
+  private Configuration configuration;
+  private CSVCarbonWriter csvCarbonWriter = null;
+  private Reader orcReader = null;
+  private CarbonFile[] dataFiles;
+
+  ORCCarbonWriter(CSVCarbonWriter csvCarbonWriter, Configuration 
configuration) {
+this.csvCarbonWriter = csvCarbonWriter;
+this.configuration = configuration;
+  }
+
+  @Override
+  public void setDataFiles(CarbonFile[] dataFiles) {
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * Load ORC file in iterative way.
+   */
+  @Override
+  public void write() throws IOException {
+if (this.dataFiles == null || this.dataFiles.length == 0) {
+  throw new RuntimeException("'withOrcPath()' must be called to support 
loading ORC files");
+}
+if (this.csvCarbonWriter == null) {

Review comment:
   removed from here as this is never possible.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479930110



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/JsonCarbonWriter.java
##
@@ -91,4 +106,44 @@ public void close() throws IOException {
   throw new IOException(e);
 }
   }
+
+  private void loadSingleFile(CarbonFile file) throws IOException {
+Reader reader = null;
+try {
+  reader = SDKUtil.buildJsonReader(file, configuration);
+  JSONParser jsonParser = new JSONParser();
+  Object jsonRecord = jsonParser.parse(reader);
+  if (jsonRecord instanceof JSONArray) {
+JSONArray jsonArray = (JSONArray) jsonRecord;
+for (Object record : jsonArray) {
+  this.write(record.toString());
+}
+  } else {
+this.write(jsonRecord.toString());
+  }
+} catch (Exception e) {

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479929826



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  private void validateCsvFiles() throws IOException {
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION);
+if (CollectionUtils.isEmpty(Arrays.asList(dataFiles))) {
+  throw new RuntimeException("CSV files can't be empty.");
+}
+for (CarbonFile dataFile : dataFiles) {
+  try {
+CsvParser csvParser = SDKUtil.buildCsvParser(this.hadoopConf);
+
csvParser.beginParsing(FileFactory.getDataInputStream(dataFile.getPath(),
+-1, this.hadoopConf));
+  } catch (IllegalArgumentException ex) {
+if (ex.getCause() instanceof FileNotFoundException) {
+  throw new FileNotFoundException("File " + dataFile +
+  " not found to build carbon writer.");
+}
+throw ex;
+  }
+}
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading CSV files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withCsvInput();
+this.validateCsvFiles();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts CSV files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the CSV file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withCsvPath(filePath);
+return this;
+  }
+
+  private void validateJsonFiles() throws IOException {
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION);
+for (CarbonFile dataFile : dataFiles) {
+  try {
+new JSONParser().parse(SDKUtil.buildJsonReader(dataFile, 
this.hadoopConf));
+  } catch (FileNotFoundException ex) {
+throw new FileNotFoundException("File " + dataFile + " not found to 
build carbon writer.");
+  } catch (ParseException ex) {
+throw new RuntimeException("File " + dataFile + " is not in json 
format.");
+  }
+}
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading JSON files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withJsonInput();
+this.validateJsonFiles();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts JSON file directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the json file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withJsonPath(filePath);
+return this;
+  }
+
+  private void validateFilePath(String filePath) {
+if (StringUtils.isEmpty(filePath)) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading Parquet files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath) throws 
IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.writerType = WRITER_TYPE.PARQUET;
+this.validateParquetFiles();
+return this;
+  }
+
+  private void setIsDirectory(String filePath) {
+if (this.hadoopConf == null) {
+  this.hadoopConf = new Configuration(FileFactory.getConfiguration());
+}
+CarbonFile carbonFile = FileFactory.getCarbonFile(filePath, hadoopConf);
+this.isDirectory = carbonFile.isDirectory();
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts parquet files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the parquet file exists.
+   * @param fileList list of files which has to be

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479930022



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  private void validateCsvFiles() throws IOException {
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION);
+if (CollectionUtils.isEmpty(Arrays.asList(dataFiles))) {
+  throw new RuntimeException("CSV files can't be empty.");
+}
+for (CarbonFile dataFile : dataFiles) {
+  try {
+CsvParser csvParser = SDKUtil.buildCsvParser(this.hadoopConf);
+
csvParser.beginParsing(FileFactory.getDataInputStream(dataFile.getPath(),
+-1, this.hadoopConf));
+  } catch (IllegalArgumentException ex) {
+if (ex.getCause() instanceof FileNotFoundException) {
+  throw new FileNotFoundException("File " + dataFile +
+  " not found to build carbon writer.");
+}
+throw ex;
+  }
+}
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading CSV files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withCsvInput();
+this.validateCsvFiles();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts CSV files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the CSV file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withCsvPath(filePath);
+return this;
+  }
+
+  private void validateJsonFiles() throws IOException {
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION);
+for (CarbonFile dataFile : dataFiles) {
+  try {
+new JSONParser().parse(SDKUtil.buildJsonReader(dataFile, 
this.hadoopConf));
+  } catch (FileNotFoundException ex) {
+throw new FileNotFoundException("File " + dataFile + " not found to 
build carbon writer.");
+  } catch (ParseException ex) {
+throw new RuntimeException("File " + dataFile + " is not in json 
format.");
+  }
+}
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading JSON files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withJsonInput();
+this.validateJsonFiles();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts JSON file directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the json file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withJsonPath(filePath);
+return this;
+  }
+
+  private void validateFilePath(String filePath) {
+if (StringUtils.isEmpty(filePath)) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading Parquet files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath) throws 
IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.writerType = WRITER_TYPE.PARQUET;
+this.validateParquetFiles();
+return this;
+  }
+
+  private void setIsDirectory(String filePath) {
+if (this.hadoopConf == null) {
+  this.hadoopConf = new Configuration(FileFactory.getConfiguration());
+}
+CarbonFile carbonFile = FileFactory.getCarbonFile(filePath, hadoopConf);
+this.isDirectory = carbonFile.isDirectory();
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts parquet files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the parquet file exists.
+   * @param fileList list of files which has to be

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479929509



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -660,13 +1113,42 @@ public CarbonWriter build() throws IOException, 
InvalidLoadOptionException {
   // removed from the load. LoadWithoutConverter flag is going to point to 
the Loader Builder
   // which will skip Conversion Step.
   loadModel.setLoadWithoutConverterStep(true);
-  return new AvroCarbonWriter(loadModel, hadoopConf, this.avroSchema);
+  AvroCarbonWriter avroCarbonWriter = new AvroCarbonWriter(loadModel,
+  hadoopConf, this.avroSchema);
+  if (!StringUtils.isEmpty(filePath)) {
+avroCarbonWriter.setDataFiles(this.dataFiles);
+  }
+  return avroCarbonWriter;
 } else if (this.writerType == WRITER_TYPE.JSON) {
   loadModel.setJsonFileLoad(true);
-  return new JsonCarbonWriter(loadModel, hadoopConf);
+  JsonCarbonWriter jsonCarbonWriter = new JsonCarbonWriter(loadModel, 
hadoopConf);
+  if (!StringUtils.isEmpty(filePath)) {
+jsonCarbonWriter.setDataFiles(this.dataFiles);
+  }
+  return jsonCarbonWriter;
+} else if (this.writerType == WRITER_TYPE.PARQUET) {
+  loadModel.setLoadWithoutConverterStep(true);
+  AvroCarbonWriter avroCarbonWriter = new AvroCarbonWriter(loadModel,
+  hadoopConf, this.avroSchema);
+  ParquetCarbonWriter parquetCarbonWriter = new
+  ParquetCarbonWriter(avroCarbonWriter, hadoopConf);

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479929400



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  private void validateCsvFiles() throws IOException {

Review comment:
   Moved all the validation methods to the respective writers and getting 
validated them from the build() method now.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3834:
URL: https://github.com/apache/carbondata/pull/3834#issuecomment-683594434


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3926/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479927294



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java
##
@@ -823,6 +829,31 @@ public void write(Object object) throws IOException {
 }
   }
 
+  /**
+   * Load data of all avro files at given location iteratively.
+   *
+   * @throws IOException
+   */
+  @Override
+  public void write() throws IOException {
+if (this.dataFiles == null || this.dataFiles.length == 0) {
+  throw new RuntimeException("'withAvroPath()' must be called to support 
loading avro files");
+}
+Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath));
+for (CarbonFile dataFile : this.dataFiles) {
+  this.loadSingleFile(dataFile);
+}
+  }
+
+  private void loadSingleFile(CarbonFile file) throws IOException {
+DataFileStream avroReader = SDKUtil
+.buildAvroReader(file, this.configuration);

Review comment:
   Closed now all the stream reader.

##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CSVCarbonWriter.java
##
@@ -72,6 +93,36 @@ public void write(Object object) throws IOException {
 }
   }
 
+  /**
+   * Load data of all or selected csv files at given location iteratively.
+   *
+   * @throws IOException
+   */
+  @Override
+  public void write() throws IOException {
+if (this.dataFiles == null || this.dataFiles.length == 0) {
+  throw new RuntimeException("'withCsvPath()' must be called to support 
load files");
+}
+this.csvParser = SDKUtil.buildCsvParser(this.configuration);
+Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath));
+for (CarbonFile dataFile : this.dataFiles) {
+  this.loadSingleFile(dataFile);
+}
+  }
+
+  private void loadSingleFile(CarbonFile file) throws IOException {
+this.csvParser.beginParsing(FileFactory.getDataInputStream(file.getPath(), 
-1, configuration));

Review comment:
   closed.

##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -17,25 +17,19 @@
 
 package org.apache.carbondata.sdk.file;
 
+import java.io.FileNotFoundException;
 import java.io.IOException;
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.HashMap;
-import java.util.HashSet;
-import java.util.List;
-import java.util.Map;
-import java.util.Objects;
-import java.util.Set;
-import java.util.TreeMap;
-import java.util.UUID;
+import java.util.*;

Review comment:
   done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3834:
URL: https://github.com/apache/carbondata/pull/3834#issuecomment-683593554


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2185/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479925943



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java
##
@@ -25,17 +25,12 @@
 import java.math.BigDecimal;
 import java.math.BigInteger;
 import java.nio.ByteBuffer;
-import java.util.ArrayList;
-import java.util.HashMap;
-import java.util.Iterator;
-import java.util.List;
-import java.util.Map;
-import java.util.Random;
-import java.util.UUID;
+import java.util.*;

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479926586



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java
##
@@ -823,6 +829,31 @@ public void write(Object object) throws IOException {
 }
   }
 
+  /**
+   * Load data of all avro files at given location iteratively.
+   *
+   * @throws IOException
+   */
+  @Override
+  public void write() throws IOException {
+if (this.dataFiles == null || this.dataFiles.length == 0) {
+  throw new RuntimeException("'withAvroPath()' must be called to support 
loading avro files");
+}
+Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath));

Review comment:
   Removed sort from all the files.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479925640



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  private void validateCsvFiles() throws IOException {
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION);
+if (CollectionUtils.isEmpty(Arrays.asList(dataFiles))) {
+  throw new RuntimeException("CSV files can't be empty.");
+}
+for (CarbonFile dataFile : dataFiles) {
+  try {
+CsvParser csvParser = SDKUtil.buildCsvParser(this.hadoopConf);
+
csvParser.beginParsing(FileFactory.getDataInputStream(dataFile.getPath(),
+-1, this.hadoopConf));
+  } catch (IllegalArgumentException ex) {
+if (ex.getCause() instanceof FileNotFoundException) {
+  throw new FileNotFoundException("File " + dataFile +
+  " not found to build carbon writer.");
+}
+throw ex;
+  }
+}
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading CSV files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withCsvInput();
+this.validateCsvFiles();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts CSV files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the CSV file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withCsvPath(filePath);
+return this;
+  }
+
+  private void validateJsonFiles() throws IOException {
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION);
+for (CarbonFile dataFile : dataFiles) {
+  try {
+new JSONParser().parse(SDKUtil.buildJsonReader(dataFile, 
this.hadoopConf));
+  } catch (FileNotFoundException ex) {
+throw new FileNotFoundException("File " + dataFile + " not found to 
build carbon writer.");
+  } catch (ParseException ex) {
+throw new RuntimeException("File " + dataFile + " is not in json 
format.");
+  }
+}
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading JSON files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withJsonInput();
+this.validateJsonFiles();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts JSON file directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the json file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withJsonPath(filePath);
+return this;
+  }
+
+  private void validateFilePath(String filePath) {
+if (StringUtils.isEmpty(filePath)) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading Parquet files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath) throws 
IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.writerType = WRITER_TYPE.PARQUET;
+this.validateParquetFiles();
+return this;
+  }
+
+  private void setIsDirectory(String filePath) {
+if (this.hadoopConf == null) {
+  this.hadoopConf = new Configuration(FileFactory.getConfiguration());

Review comment:
   I have checked the build() method of the same file and their hadoopConf 
is building like this. please let me know if this is not the correct way.

##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/utils/SDKUtil.java
##
@@ -79,4 +98,75 @@ public static ArrayList listFiles(String sourceImageFolder,
 return (Object[]) input[i];
   }
 
+

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479925025



##
File path: examples/spark/pom.xml
##
@@ -38,6 +38,12 @@
   org.apache.carbondata
   carbondata-spark_${spark.binary.version}
   ${project.version}
+  

Review comment:
   Without exclusion, all these dependencies was causing transitive 
dependency and because of that build was failing.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-31 Thread GitBox



nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r479924597



##
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##
@@ -2456,4 +2471,24 @@ private CarbonCommonConstants() {
* property which defines the insert stage flow
*/
   public static final String IS_INSERT_STAGE = "is_insert_stage";
+
+  /**
+   * the level 1 complex delimiter default value
+   */
+  @CarbonProperty

Review comment:
   Removed CarbonProperty





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-683585404


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3925/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-31 Thread GitBox



CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-683582558


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2184/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

90 matches

Mail list logo