[GitHub] [carbondata] akkio-97 commented on a change in pull request #3906: [CARBONDATA-3968]Added test cases for hive read complex types and handled other issues

2020-09-08 Thread GitBox


akkio-97 commented on a change in pull request #3906:
URL: https://github.com/apache/carbondata/pull/3906#discussion_r484698276



##
File path: 
integration/hive/src/main/java/org/apache/carbondata/hive/util/DataTypeUtil.java
##
@@ -21,25 +21,31 @@
 import java.util.ArrayList;
 import java.util.List;
 
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
 import org.apache.carbondata.core.metadata.datatype.DataType;
 import org.apache.carbondata.core.metadata.datatype.DataTypes;
 import org.apache.carbondata.core.metadata.datatype.StructField;
 
+import org.apache.commons.lang.ArrayUtils;
+
 public class DataTypeUtil {
 
   public static DataType convertHiveTypeToCarbon(String type) throws 
SQLException {
 if ("string".equalsIgnoreCase(type) || type.startsWith("char")) {
   return DataTypes.STRING;
-} else if ("varchar".equalsIgnoreCase(type)) {
+} else if (!type.startsWith("map<") && !type.startsWith("array<") && 
!type.startsWith("struct<")

Review comment:
   made required changes, they are not required





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.

2020-09-08 Thread GitBox


ShreelekhyaG commented on a change in pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905#discussion_r484709973



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##
@@ -318,48 +318,47 @@ class TestLoadDataWithDiffTimestampFormat extends 
QueryTest with BeforeAndAfterA
   test("test load, update data with setlenient carbon property for daylight " +
"saving time from different timezone") {
 
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE,
 "true")
-TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
 sql("DROP TABLE IF EXISTS test_time")
 sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time 
Timestamp) STORED AS carbondata " +
 "TBLPROPERTIES('dateformat'='-MM-dd', 
'timestampformat'='-MM-dd HH:mm:ss') ")
 sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time")
-sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
-sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+sql(s"insert into test_time select 11, '2016-7-24', '2019-3-10 02:00:00' ")

Review comment:
   ok done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.

2020-09-08 Thread GitBox


ShreelekhyaG commented on a change in pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905#discussion_r484710316



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##
@@ -318,48 +318,53 @@ class TestLoadDataWithDiffTimestampFormat extends 
QueryTest with BeforeAndAfterA
   test("test load, update data with setlenient carbon property for daylight " +
"saving time from different timezone") {
 
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE,
 "true")
-TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
 sql("DROP TABLE IF EXISTS test_time")
 sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time 
Timestamp) STORED AS carbondata " +
 "TBLPROPERTIES('dateformat'='-MM-dd', 
'timestampformat'='-MM-dd HH:mm:ss') ")
 sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time")
-sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
-sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+sql(s"insert into test_time select 11, '2016-7-24', '2019-3-10 02:00:00' ")
+sql("update test_time set (time) = ('2019-3-10 02:00:00') where ID='2'")
+// Using America/Los_Angeles timezone (timezone is fixed to 
America/Los_Angeles for all tests)
+// Here, 2019-3-10 02:00:00 is invalid data in America/Los_Angeles zone, 
as DST is observed and
+// clocks were turned forward 1 hour to 2019-3-10 03:00:00. With lenience 
property enabled, can parse the time according to DST.
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("2019-3-10 03:00:00"

Review comment:
   added





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3906: [CARBONDATA-3968]Added test cases for hive read complex types and handled other issues

2020-09-08 Thread GitBox


kunal642 commented on a change in pull request #3906:
URL: https://github.com/apache/carbondata/pull/3906#discussion_r484718004



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/parser/impl/MapParserImpl.java
##
@@ -73,9 +73,12 @@ public ArrayObject parse(Object data) {
 
   @Override
   public ArrayObject parseRaw(Object data) {
-Object keyArray = ((Object[]) data)[0];
-Object valueArray = ((Object[]) data)[1];
-return new ArrayObject(new Object[]{child.parseRaw(keyArray), 
child.parseRaw(valueArray)});
+Object[] keyValuePairs = ((Object[]) data);

Review comment:
   okay





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3834:
URL: https://github.com/apache/carbondata/pull/3834#issuecomment-688694994


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2257/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3834:
URL: https://github.com/apache/carbondata/pull/3834#issuecomment-688698823


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3997/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3976) CarbonData Update operation enhancement

2020-09-08 Thread TangLin (Jira)
TangLin created CARBONDATA-3976:
---

 Summary: CarbonData Update operation enhancement
 Key: CARBONDATA-3976
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3976
 Project: CarbonData
  Issue Type: Improvement
  Components: data-load
Reporter: TangLin


*Background*
Update operation will clean up delta files before update( see
cleanUpDeltaFiles(carbonTable, false)), It's loop traversal metadata path
and segment path many times. When there are too many files, the overhead
will increase and update time will be longer.

*Motivation & Goal*
During the update process, reduce loop traversal or remove cleanUpDelteFiles
to another method.

*Modification*
There are some solutions as following.

Solution 1:

In cleanUpDeltaFiles have some same points in get files method, like
updateStatusManager.getUpdateDeltaFilesList(segment,
false,CarbonCommonConstants.UPDATE_DELTA_FILE_EXT, true,
allSegmentFiles,true) and
updateStatusManager.getUpdateDeltaFilesList(segment,
false,CarbonCommonConstants.UPDATE_INDEX_FILE_EXT, true,
allSegmentFiles,true), They are just different file types,but loop traversal
segment path twice. we can merge it.

Solution 2:

Base solution 1,Use Spark or MapReduce to hand over tasks to other nodes.

Solution 3:

Submit cleanUpDelaFiles  to another task, process them in the early morning
or when the cluster is not busy.

Solution 4:

Establish a garbage collection bin, which provides some interfaces for our
program to determine when files enter the garbage collection bin and how to
deal with them.

Please vote for all solutions.

Best Regards,
LinWood



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3906: [CARBONDATA-3968]Added test cases for hive read complex types and handled other issues

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3906:
URL: https://github.com/apache/carbondata/pull/3906#issuecomment-688701556


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3998/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3906: [CARBONDATA-3968]Added test cases for hive read complex types and handled other issues

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3906:
URL: https://github.com/apache/carbondata/pull/3906#issuecomment-688703629


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2258/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #3909: [CARBONDATA-3972] Date/timestamp compatability between hive and carbon

2020-09-08 Thread GitBox


akashrn5 commented on pull request #3909:
URL: https://github.com/apache/carbondata/pull/3909#issuecomment-688713873


   @ShreelekhyaG please add test cases.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.

2020-09-08 Thread GitBox


akashrn5 commented on a change in pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905#discussion_r484752262



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##
@@ -318,48 +318,65 @@ class TestLoadDataWithDiffTimestampFormat extends 
QueryTest with BeforeAndAfterA
   test("test load, update data with setlenient carbon property for daylight " +
"saving time from different timezone") {
 
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE,
 "true")
-TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
 sql("DROP TABLE IF EXISTS test_time")
+sql("DROP TABLE IF EXISTS testhivetable")
+// Create test_time and hive table
 sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time 
Timestamp) STORED AS carbondata " +
 "TBLPROPERTIES('dateformat'='-MM-dd', 
'timestampformat'='-MM-dd HH:mm:ss') ")
-sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time")
-sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
-sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+sql("CREATE TABLE testhivetable (ID Int, date Date, time TIMESTAMP) row 
format delimited fields terminated by ',' ")
+// load data into test_time and hive table and validate query result
+sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time options('fileheader'='ID,date,time')")
+sql(s"LOAD DATA local inpath '$resourcesPath/differentZoneTimeStamp.csv' 
overwrite INTO table testhivetable")
+checkAnswer(sql("select * from test_time"), sql("select * from 
testhivetable"))
+sql(s"insert into test_time select 11, '2016-7-24', '2019-3-10 02:00:00' ")
+sql("update test_time set (time) = ('2019-3-10 02:00:00') where ID='2'")
+// Using America/Los_Angeles timezone (timezone is fixed to 
America/Los_Angeles for all tests)
+// Here, 2019-3-10 02:00:00 is invalid data in America/Los_Angeles zone, 
as DST is observed and
+// clocks were turned forward 1 hour to 2019-3-10 03:00:00. With lenience 
property enabled, can parse the time according to DST.
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("2019-3-10 03:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("2019-3-10 03:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("2019-3-10 03:00:00"
 sql("DROP TABLE test_time")
 
CarbonProperties.getInstance().removeProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE)
   }
 
   test("test load, update data with setlenient session level property for 
daylight " +
"saving time from different timezone") {
 sql("set carbon.load.dateformat.setlenient.enable = true")
-TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
 sql("DROP TABLE IF EXISTS test_time")
-sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time 
Timestamp) STORED AS carbondata " +
+sql("DROP TABLE IF EXISTS testhivetable")
+// Create test_time and hive table
+sql("CREATE TABLE test_time (ID Int, date Date, time Timestamp) STORED AS 
carbondata " +
 "TBLPROPERTIES('dateformat'='-MM-dd', 
'timestampformat'='-MM-dd HH:mm:ss') ")
-sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time")
-sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
-sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+sql("CREATE TABLE testhivetable (ID Int, date Date, time TIMESTAMP) row 
format delimited fields terminated by ',' ")
+// load data into test_time and hive table and validate query result
+sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time options('fileheader'='ID,date,time')")
+sql(s"LOAD DATA local inpath '$resourcesPath/differentZoneTimeStamp.csv' 
overwrite INTO table testhivetable")
+checkAnswer(sql("select * from

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905#issuecomment-688747092


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2259/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905#issuecomment-688748633


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3999/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-688760004


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4000/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-688761061


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2260/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.

2020-09-08 Thread GitBox


ShreelekhyaG commented on a change in pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905#discussion_r484802588



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##
@@ -318,48 +318,53 @@ class TestLoadDataWithDiffTimestampFormat extends 
QueryTest with BeforeAndAfterA
   test("test load, update data with setlenient carbon property for daylight " +
"saving time from different timezone") {
 
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE,
 "true")
-TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
 sql("DROP TABLE IF EXISTS test_time")
 sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time 
Timestamp) STORED AS carbondata " +
 "TBLPROPERTIES('dateformat'='-MM-dd', 
'timestampformat'='-MM-dd HH:mm:ss') ")
 sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time")
-sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
-sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+sql(s"insert into test_time select 11, '2016-7-24', '2019-3-10 02:00:00' ")
+sql("update test_time set (time) = ('2019-3-10 02:00:00') where ID='2'")
+// Using America/Los_Angeles timezone (timezone is fixed to 
America/Los_Angeles for all tests)
+// Here, 2019-3-10 02:00:00 is invalid data in America/Los_Angeles zone, 
as DST is observed and
+// clocks were turned forward 1 hour to 2019-3-10 03:00:00. With lenience 
property enabled, can parse the time according to DST.
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("2019-3-10 03:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("2019-3-10 03:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("2019-3-10 03:00:00"
 sql("DROP TABLE test_time")
 
CarbonProperties.getInstance().removeProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE)
   }
 
   test("test load, update data with setlenient session level property for 
daylight " +
"saving time from different timezone") {
 sql("set carbon.load.dateformat.setlenient.enable = true")
-TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
 sql("DROP TABLE IF EXISTS test_time")
 sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time 
Timestamp) STORED AS carbondata " +
 "TBLPROPERTIES('dateformat'='-MM-dd', 
'timestampformat'='-MM-dd HH:mm:ss') ")
 sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time")
-sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
-sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+sql(s"insert into test_time select 11, '2016-7-24', '2019-3-10 02:00:00' ")
+sql("update test_time set (time) = ('2019-3-10 02:00:00') where ID='2'")
+// Using America/Los_Angeles timezone (timezone is fixed to 
America/Los_Angeles for all tests)
+// Here, 2019-3-10 02:00:00 is invalid data in America/Los_Angeles zone, 
as DST is observed and
+// clocks were turned forward 1 hour to 2019-3-10 03:00:00. With lenience 
property enabled, can parse the time according to DST.
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("2019-3-10 03:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("2019-3-10 03:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("2019-3-10 03:00:00"
 sql("DROP TABLE test_time")
 defaultConfig()
   }
 
   def generateCSVFile(): Unit = {
 val rows = new ListBuffer[Array[String]]
 rows += Array("ID", "date", "time")
-rows += Array("1", "1941-3-15", "1941-3-15 00:00:00")
+rows += Array("1", "1941-3-15", "2019-3-10 02:00:00")
 rows += Array("2", "2016-7-24", "2016-7-24 01:02:30")
 BadRecordUtil.createCSV(rows, csvPath)
   }
 
   override def afterAll {

[GitHub] [carbondata] kunal642 commented on a change in pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-09-08 Thread GitBox


kunal642 commented on a change in pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#discussion_r484804408



##
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##
@@ -2512,4 +2512,10 @@ private CarbonCommonConstants() {
* property which defines the presto query default value
*/
   public static final String IS_QUERY_FROM_PRESTO_DEFAULT = "false";
+
+  @CarbonProperty(dynamicConfigurable = true)

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.

2020-09-08 Thread GitBox


akashrn5 commented on pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905#issuecomment-688769415


   @nihal0107 this PR contains some test case fix too, please add the changes 
in PR description and title, you can brief the title, no need to keep so long.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #3906: [CARBONDATA-3968]Added test cases for hive read complex types and handled other issues

2020-09-08 Thread GitBox


kunal642 commented on pull request #3906:
URL: https://github.com/apache/carbondata/pull/3906#issuecomment-688795296


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] asfgit closed pull request #3906: [CARBONDATA-3968]Added test cases for hive read complex types and handled other issues

2020-09-08 Thread GitBox


asfgit closed pull request #3906:
URL: https://github.com/apache/carbondata/pull/3906


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-3968) Hive read complex types issues

2020-09-08 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-3968.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> Hive read complex types issues
> --
>
> Key: CARBONDATA-3968
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3968
> Project: CarbonData
>  Issue Type: Bug
>  Components: hive-integration
>Reporter: Akshay
>Priority: Major
> Fix For: 2.1.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> # Issues in reading array/map/struct of byte, varchar and decimal types.
>  # Map of primitive type with only one row inserted has issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] kunal642 commented on a change in pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning

2020-09-08 Thread GitBox


kunal642 commented on a change in pull request #3908:
URL: https://github.com/apache/carbondata/pull/3908#discussion_r484838229



##
File path: 
integration/spark/src/main/scala/org/apache/spark/util/PartitionCacheManger.scala
##
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util
+
+import java.net.URI
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.log4j.Logger
+import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, 
CatalogTablePartition}
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.cache.{Cache, Cacheable, CarbonLRUCache}
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.metadata.SegmentFileStore
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager
+import org.apache.carbondata.core.util.path.CarbonTablePath
+
+object  PartitionCacheManager extends Cache[PartitionCacheKey, 
CacheablePartitionSpec] {
+
+  private val CACHE = new CarbonLRUCache(
+CarbonCommonConstants.CARBON_PARTITION_MAX_DRIVER_LRU_CACHE_SIZE,
+CarbonCommonConstants.CARBON_MAX_LRU_CACHE_SIZE_DEFAULT)
+
+  val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getName)
+
+  def get(identifier: PartitionCacheKey): CacheablePartitionSpec = {
+val cacheablePartitionSpec =
+  CACHE.get(identifier.tableId).asInstanceOf[CacheablePartitionSpec]
+val tableStatusModifiedTime = FileFactory
+  
.getCarbonFile(CarbonTablePath.getTableStatusFilePath(identifier.tablePath))
+  .getLastModifiedTime
+if (cacheablePartitionSpec != null) {
+  if (tableStatusModifiedTime > cacheablePartitionSpec.timestamp) {
+readPartitions(identifier, tableStatusModifiedTime)
+  } else {
+cacheablePartitionSpec
+  }
+} else {
+  readPartitions(identifier, tableStatusModifiedTime)
+}
+  }
+
+  override def getAll(keys: util.List[PartitionCacheKey]):
+  util.List[CacheablePartitionSpec] = {
+keys.asScala.map(get).toList.asJava
+  }
+
+  override def getIfPresent(key: PartitionCacheKey): CacheablePartitionSpec = {
+CACHE.get(key.tableId).asInstanceOf[CacheablePartitionSpec]
+  }
+
+  override def invalidate(partitionCacheKey: PartitionCacheKey): Unit = {
+CACHE.remove(partitionCacheKey.tableId)
+  }
+
+  private def readPartitions(identifier: PartitionCacheKey, 
tableStatusModifiedTime: Long) = {

Review comment:
   added per segment modification check...now only the updated/new segments 
would be loaded





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on pull request #3905: [CARBONDATA-3964] Fixed, null pointer excption for select query and time zone dependent test failures.

2020-09-08 Thread GitBox


nihal0107 commented on pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905#issuecomment-688802174


   > @nihal0107 this PR contains some test case fix too, please add the changes 
in PR description and title, you can brief the title, no need to keep so long.
   
   Updated the PR description and title.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure commented on pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

2020-09-08 Thread GitBox


marchpure commented on pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#issuecomment-688802921


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

2020-09-08 Thread GitBox


Indhumathi27 commented on a change in pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#discussion_r484845416



##
File path: 
integration/presto/src/main/prestodb/org/apache/carbondata/presto/CarbondataSplitManager.java
##
@@ -117,6 +122,16 @@ public ConnectorSplitSource 
getSplits(ConnectorTransactionHandle transactionHand
   // file metastore case tablePath can be null, so get from location
   location = table.getStorage().getLocation();
 }
+List filteredPartitions = new ArrayList<>();

Review comment:
   Can you add a testcase with partition filter?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

2020-09-08 Thread GitBox


ajantha-bhat commented on a change in pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#discussion_r484845973



##
File path: 
integration/presto/src/main/prestodb/org/apache/carbondata/presto/CarbondataSplitManager.java
##
@@ -117,6 +122,16 @@ public ConnectorSplitSource 
getSplits(ConnectorTransactionHandle transactionHand
   // file metastore case tablePath can be null, so get from location
   location = table.getStorage().getLocation();
 }
+List filteredPartitions = new ArrayList<>();

Review comment:
   please read the description, I have mentioned why UT cannot be added now





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

2020-09-08 Thread GitBox


ajantha-bhat commented on a change in pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#discussion_r484845973



##
File path: 
integration/presto/src/main/prestodb/org/apache/carbondata/presto/CarbondataSplitManager.java
##
@@ -117,6 +122,16 @@ public ConnectorSplitSource 
getSplits(ConnectorTransactionHandle transactionHand
   // file metastore case tablePath can be null, so get from location
   location = table.getStorage().getLocation();
 }
+List filteredPartitions = new ArrayList<>();

Review comment:
   please read the description, I have already mentioned why UT cannot be 
added now





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

2020-09-08 Thread GitBox


Indhumathi27 commented on a change in pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#discussion_r484849645



##
File path: 
integration/presto/src/main/prestosql/org/apache/carbondata/presto/impl/CarbonTableReader.java
##
@@ -245,16 +242,14 @@ private CarbonTableCacheModel 
getValidCacheBySchemaTableName(SchemaTableName sch
*
* @param tableCacheModel cached table
* @param filters carbonData filters
-   * @param constraints presto filters
+   * @param filteredPartitions matched partitionSpec for the filter
* @param config hadoop conf
* @return list of multiblock split
* @throws IOException
*/
-  public List getInputSplits(
-  CarbonTableCacheModel tableCacheModel,
-  Expression filters,
-  TupleDomain constraints,
-  Configuration config) throws IOException {
+  public List getInputSplits(CarbonTableCacheModel 
tableCacheModel,
+  Expression filters, List filteredPartitions, 
Configuration config)

Review comment:
   Can revert to old style





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

2020-09-08 Thread GitBox


Indhumathi27 commented on a change in pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#discussion_r484851607



##
File path: 
integration/presto/src/test/prestodb/org/apache/carbondata/presto/server/PrestoTestUtil.scala
##
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.presto.server
+
+import com.facebook.presto.jdbc.PrestoArray
+

Review comment:
   Remove extra lines





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on pull request #3910: [CARBONDATA-3969] Fix Deserialization issue with DataType class

2020-09-08 Thread GitBox


Indhumathi27 commented on pull request #3910:
URL: https://github.com/apache/carbondata/pull/3910#issuecomment-688813601


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3905: [CARBONDATA-3964] Fixed, null pointer excption for select query and time zone dependent test failures.

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905#issuecomment-688819393


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4001/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3905: [CARBONDATA-3964] Fixed, null pointer excption for select query and time zone dependent test failures.

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905#issuecomment-688820071


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2261/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3908:
URL: https://github.com/apache/carbondata/pull/3908#issuecomment-688848438


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4002/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3908:
URL: https://github.com/apache/carbondata/pull/3908#issuecomment-688852427


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2262/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#issuecomment-688858320


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4003/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#issuecomment-688859734


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2263/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #3905: [CARBONDATA-3964] Fixed, null pointer excption for select query and time zone dependent test failures.

2020-09-08 Thread GitBox


akashrn5 commented on pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905#issuecomment-688866461


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] asfgit closed pull request #3905: [CARBONDATA-3964] Fixed, null pointer excption for select query and time zone dependent test failures.

2020-09-08 Thread GitBox


asfgit closed pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-3964) Select * from table or select count(*) without filter is throwing null pointer exception.

2020-09-08 Thread Akash R Nilugal (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash R Nilugal resolved CARBONDATA-3964.
-
Fix Version/s: 2.1.0
   Resolution: Fixed

> Select * from table or select count(*) without filter is throwing null 
> pointer exception.
> -
>
> Key: CARBONDATA-3964
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3964
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Minor
> Fix For: 2.1.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Steps to reproduce.
> 1. Create a table.
> 2. Load around 500 segments and more than 1 million records.
> 3. Running query select(*) or select count(*) without filter is throwing null 
> pointer exception.
> File: TableIndex.java
> Method: pruneWithMultiThread
> line: 447
> Reason: filter.getresolver() is null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3910: [CARBONDATA-3969] Fix Deserialization issue with DataType class

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3910:
URL: https://github.com/apache/carbondata/pull/3910#issuecomment-65143


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2265/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning

2020-09-08 Thread GitBox


kunal642 commented on a change in pull request #3908:
URL: https://github.com/apache/carbondata/pull/3908#discussion_r484941895



##
File path: 
integration/spark/src/main/scala/org/apache/spark/util/PartitionCacheManger.scala
##
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util
+
+import java.net.URI
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.log4j.Logger
+import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, 
CatalogTablePartition}
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.cache.{Cache, Cacheable, CarbonLRUCache}
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.metadata.SegmentFileStore
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager
+import org.apache.carbondata.core.util.path.CarbonTablePath
+
+object  PartitionCacheManager extends Cache[PartitionCacheKey, 
CacheablePartitionSpec] {
+
+  private val CACHE = new CarbonLRUCache(
+CarbonCommonConstants.CARBON_PARTITION_MAX_DRIVER_LRU_CACHE_SIZE,
+CarbonCommonConstants.CARBON_MAX_LRU_CACHE_SIZE_DEFAULT)
+
+  val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getName)
+
+  def get(identifier: PartitionCacheKey): CacheablePartitionSpec = {
+val cacheablePartitionSpec =
+  CACHE.get(identifier.tableId).asInstanceOf[CacheablePartitionSpec]
+val tableStatusModifiedTime = FileFactory
+  
.getCarbonFile(CarbonTablePath.getTableStatusFilePath(identifier.tablePath))
+  .getLastModifiedTime
+if (cacheablePartitionSpec != null) {
+  if (tableStatusModifiedTime > cacheablePartitionSpec.timestamp) {
+readPartitions(identifier, tableStatusModifiedTime)
+  } else {
+cacheablePartitionSpec
+  }
+} else {
+  readPartitions(identifier, tableStatusModifiedTime)
+}
+  }
+
+  override def getAll(keys: util.List[PartitionCacheKey]):
+  util.List[CacheablePartitionSpec] = {
+keys.asScala.map(get).toList.asJava
+  }
+
+  override def getIfPresent(key: PartitionCacheKey): CacheablePartitionSpec = {
+CACHE.get(key.tableId).asInstanceOf[CacheablePartitionSpec]
+  }
+
+  override def invalidate(partitionCacheKey: PartitionCacheKey): Unit = {
+CACHE.remove(partitionCacheKey.tableId)
+  }
+
+  private def readPartitions(identifier: PartitionCacheKey, 
tableStatusModifiedTime: Long) = {

Review comment:
   @QiangCai each load or query would be loading the already success 
segments, so it will now solve the problem you mentioned
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3910: [CARBONDATA-3969] Fix Deserialization issue with DataType class

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3910:
URL: https://github.com/apache/carbondata/pull/3910#issuecomment-688898175


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4005/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3912: [WIP] Global sort partitions should be determined dynamically

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3912:
URL: https://github.com/apache/carbondata/pull/3912#issuecomment-688912501


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4004/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on pull request #3910: [CARBONDATA-3969] Fix Deserialization issue with DataType class

2020-09-08 Thread GitBox


Indhumathi27 commented on pull request #3910:
URL: https://github.com/apache/carbondata/pull/3910#issuecomment-688914579


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3912: [WIP] Global sort partitions should be determined dynamically

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3912:
URL: https://github.com/apache/carbondata/pull/3912#issuecomment-688914915


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2264/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 opened a new pull request #3916: [CARBONDATA-3935]Support partition table transactional write in presto

2020-09-08 Thread GitBox


akashrn5 opened a new pull request #3916:
URL: https://github.com/apache/carbondata/pull/3916


### Why is this PR needed?
Currently, we support only reading the tables created in spark in presto. 
Its a bottleneck and writing the trasactional is required
   in presto for easy write and read via presto.

### What changes were proposed in this PR?
   This PR iis on top of #3875 
This PR supports writing the partition transactional data in presto, it 
supports multiple partition columns too.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
 - Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3910: [CARBONDATA-3969] Fix Deserialization issue with DataType class

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3910:
URL: https://github.com/apache/carbondata/pull/3910#issuecomment-688991758


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4006/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3910: [CARBONDATA-3969] Fix Deserialization issue with DataType class

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3910:
URL: https://github.com/apache/carbondata/pull/3910#issuecomment-688993910


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2266/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure commented on pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

2020-09-08 Thread GitBox


marchpure commented on pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#issuecomment-689003409


   I just tested. With this PR. Query nonpartition table will has EMPTY RESULT. 
Query  parititon table works well



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [CARBONDATA-3864] Store Size Optimization

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3789:
URL: https://github.com/apache/carbondata/pull/3789#issuecomment-689021438


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4007/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [CARBONDATA-3864] Store Size Optimization

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3789:
URL: https://github.com/apache/carbondata/pull/3789#issuecomment-689024935


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2267/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Karan-c980 commented on a change in pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.

2020-09-08 Thread GitBox


Karan-c980 commented on a change in pull request #3834:
URL: https://github.com/apache/carbondata/pull/3834#discussion_r485093341



##
File path: README.md
##
@@ -100,3 +100,4 @@ To get involved in CarbonData:
 ## About
 Apache CarbonData is an open source project of The Apache Software Foundation 
(ASF).
 
+## PR

Review comment:
   Removed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#issuecomment-689036860


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2268/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#issuecomment-689040369


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4008/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3787: [WIP][CARBONDATA-3923] support global sort for SI

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#issuecomment-689041516


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4009/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3977) Global sort partitions should be determined dynamically

2020-09-08 Thread Mahesh Raju Somalaraju (Jira)
Mahesh Raju Somalaraju created CARBONDATA-3977:
--

 Summary: Global sort partitions should be determined dynamically
 Key: CARBONDATA-3977
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3977
 Project: CarbonData
  Issue Type: New Feature
Reporter: Mahesh Raju Somalaraju


global sort :

if user does not give any number of partitions in table properties and not 
configured property "carbon.load.global.sort.partitions" then need to calculate 
dynamically based on dataframe size.

 number of partition = dataframesizeInMB/partition size



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3916: [CARBONDATA-3935]Support partition table transactional write in presto

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3916:
URL: https://github.com/apache/carbondata/pull/3916#issuecomment-689046280


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2270/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3787: [WIP][CARBONDATA-3923] support global sort for SI

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#issuecomment-689047351


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2269/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3916: [CARBONDATA-3935]Support partition table transactional write in presto

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3916:
URL: https://github.com/apache/carbondata/pull/3916#issuecomment-689055659


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4010/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3834:
URL: https://github.com/apache/carbondata/pull/3834#issuecomment-689092982


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4011/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3834:
URL: https://github.com/apache/carbondata/pull/3834#issuecomment-689096325


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2272/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3912: [CARBONDATA-3977] Global sort partitions should be determined dynamically

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3912:
URL: https://github.com/apache/carbondata/pull/3912#issuecomment-689101595


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4012/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3909: [CARBONDATA-3972] Date/timestamp compatability between hive and carbon

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3909:
URL: https://github.com/apache/carbondata/pull/3909#issuecomment-689102959


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4013/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3909: [CARBONDATA-3972] Date/timestamp compatability between hive and carbon

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3909:
URL: https://github.com/apache/carbondata/pull/3909#issuecomment-689103877


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2273/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3912: [CARBONDATA-3977] Global sort partitions should be determined dynamically

2020-09-08 Thread GitBox


CarbonDataQA1 commented on pull request #3912:
URL: https://github.com/apache/carbondata/pull/3912#issuecomment-689105224


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2274/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.

2020-09-08 Thread GitBox


kunal642 commented on pull request #3834:
URL: https://github.com/apache/carbondata/pull/3834#issuecomment-689111283


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.

2020-09-08 Thread GitBox


kunal642 commented on pull request #3834:
URL: https://github.com/apache/carbondata/pull/3834#issuecomment-689111838


   @Karan-c980 Please rebase, dont use merge to pull the new code



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 edited a comment on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.

2020-09-08 Thread GitBox


kunal642 edited a comment on pull request #3834:
URL: https://github.com/apache/carbondata/pull/3834#issuecomment-689111838


   @Karan-c980 Please rebase, dont use merge to pull the new code.
   Merge commit should not be there



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure removed a comment on pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

2020-09-08 Thread GitBox


marchpure removed a comment on pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#issuecomment-689003409


   I just tested. With this PR. Query nonpartition table will has EMPTY RESULT. 
Query  parititon table works well



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Karan-c980 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.

2020-09-08 Thread GitBox


Karan-c980 commented on pull request #3834:
URL: https://github.com/apache/carbondata/pull/3834#issuecomment-689330508


   done



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Karan-c980 commented on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.

2020-09-08 Thread GitBox


Karan-c980 commented on pull request #3834:
URL: https://github.com/apache/carbondata/pull/3834#issuecomment-689330741


   > @Karan-c980 Please rebase, dont use merge to pull the new code.
   > Merge commit should not be there
   
   Done



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Karan-c980 removed a comment on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.

2020-09-08 Thread GitBox


Karan-c980 removed a comment on pull request #3834:
URL: https://github.com/apache/carbondata/pull/3834#issuecomment-689330508


   done



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 removed a comment on pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.

2020-09-08 Thread GitBox


kunal642 removed a comment on pull request #3834:
URL: https://github.com/apache/carbondata/pull/3834#issuecomment-689111283


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.

2020-09-08 Thread GitBox


kunal642 commented on a change in pull request #3834:
URL: https://github.com/apache/carbondata/pull/3834#discussion_r485374393



##
File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonIUD.java
##
@@ -0,0 +1,376 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.file;
+
+import java.io.File;
+import java.io.FilenameFilter;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.Field;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.LiteralExpression;
+import 
org.apache.carbondata.core.scan.expression.conditional.EqualToExpression;
+import org.apache.carbondata.core.scan.expression.logical.AndExpression;
+import org.apache.carbondata.core.scan.expression.logical.OrExpression;
+import org.apache.carbondata.hadoop.api.CarbonTableOutputFormat;
+import org.apache.carbondata.hadoop.internal.ObjectArrayWritable;
+
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.mapreduce.RecordWriter;
+
+public class CarbonIUD {
+
+  private final Map>> 
filterColumnToValueMappingForDelete;
+  private final Map>> 
filterColumnToValueMappingForUpdate;
+  private final Map> updateColumnToValueMapping;
+
+  private CarbonIUD() {
+filterColumnToValueMappingForDelete = new HashMap<>();
+filterColumnToValueMappingForUpdate = new HashMap<>();
+updateColumnToValueMapping = new HashMap<>();
+  }
+
+  /**
+   * @return CarbonIUD object
+   */
+  public static CarbonIUD getInstance() {
+return new CarbonIUD();
+  }
+
+  /**
+   * @param path   is the table path on which delete is performed
+   * @param column is the columnName on which records have to be deleted
+   * @param value  of column on which the records have to be deleted
+   * @return CarbonIUD object
+   */
+  public CarbonIUD delete(String path, String column, String value) {
+prepareDelete(path, column, value, filterColumnToValueMappingForDelete);
+return this;
+  }
+
+  /**
+   * This method deletes the rows at given path by applying the 
filterExpression
+   *
+   * @param path is the table path on which delete is performed
+   * @param filterExpression is the expression to delete the records
+   * @throws IOException
+   * @throws InterruptedException
+   */
+  public void delete(String path, Expression filterExpression)
+  throws IOException, InterruptedException {
+CarbonReader reader = CarbonReader.builder(path)
+.projection(new String[] { 
CarbonCommonConstants.CARBON_IMPLICIT_COLUMN_TUPLEID })
+.filter(filterExpression).build();
+
+RecordWriter deleteDeltaWriter =
+CarbonTableOutputFormat.getDeleteDeltaRecordWriter(path);
+ObjectArrayWritable writable = new ObjectArrayWritable();
+while (reader.hasNext()) {
+  Object[] row = (Object[]) reader.readNextRow();
+  writable.set(row);
+  deleteDeltaWriter.write(NullWritable.get(), writable);
+}
+deleteDeltaWriter.close(null);
+reader.close();
+  }
+
+  /**
+   * Calling this method will start the execution of delete process
+   *
+   * @throws IOException
+   * @throws InterruptedException
+   */
+  public void closeDelete() throws IOException, InterruptedException {
+for (Map.Entry>> path : 
this.filterColumnToValueMappingForDelete
+.entrySet()) {
+  deleteExecution(path.getKey());
+}
+  }
+
+  /**
+   * @param path  is the table path on which update is performed
+   * @param columnis the columnName on which records have to be updated
+   * @param value of column on which the records have to be updated
+   * @param updColumn is the name of updatedColumn
+   * @param updValue  is the value of updatedCo

[GitHub] [carbondata] kunal642 commented on a change in pull request #3834: [CARBONDATA-3865] Implementation of delete/update feature in carbondata SDK.

2020-09-08 Thread GitBox


kunal642 commented on a change in pull request #3834:
URL: https://github.com/apache/carbondata/pull/3834#discussion_r48538



##
File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonIUD.java
##
@@ -0,0 +1,376 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.file;
+
+import java.io.File;
+import java.io.FilenameFilter;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.Field;
+import org.apache.carbondata.core.scan.expression.ColumnExpression;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.expression.LiteralExpression;
+import 
org.apache.carbondata.core.scan.expression.conditional.EqualToExpression;
+import org.apache.carbondata.core.scan.expression.logical.AndExpression;
+import org.apache.carbondata.core.scan.expression.logical.OrExpression;
+import org.apache.carbondata.hadoop.api.CarbonTableOutputFormat;
+import org.apache.carbondata.hadoop.internal.ObjectArrayWritable;
+
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.mapreduce.RecordWriter;
+
+public class CarbonIUD {
+
+  private final Map>> 
filterColumnToValueMappingForDelete;
+  private final Map>> 
filterColumnToValueMappingForUpdate;
+  private final Map> updateColumnToValueMapping;
+
+  private CarbonIUD() {
+filterColumnToValueMappingForDelete = new HashMap<>();
+filterColumnToValueMappingForUpdate = new HashMap<>();
+updateColumnToValueMapping = new HashMap<>();
+  }
+
+  /**
+   * @return CarbonIUD object
+   */
+  public static CarbonIUD getInstance() {

Review comment:
   take hadoop configuration object here to enable IUD on S3 also.
   You should pass the configuration to Write and Reader API internally





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org