[GitHub] [carbondata] akashrn5 commented on a change in pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.

2020-09-08 Thread GitBox


akashrn5 commented on a change in pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905#discussion_r484752262



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##
@@ -318,48 +318,65 @@ class TestLoadDataWithDiffTimestampFormat extends 
QueryTest with BeforeAndAfterA
   test("test load, update data with setlenient carbon property for daylight " +
"saving time from different timezone") {
 
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE,
 "true")
-TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
 sql("DROP TABLE IF EXISTS test_time")
+sql("DROP TABLE IF EXISTS testhivetable")
+// Create test_time and hive table
 sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time 
Timestamp) STORED AS carbondata " +
 "TBLPROPERTIES('dateformat'='-MM-dd', 
'timestampformat'='-MM-dd HH:mm:ss') ")
-sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time")
-sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
-sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+sql("CREATE TABLE testhivetable (ID Int, date Date, time TIMESTAMP) row 
format delimited fields terminated by ',' ")
+// load data into test_time and hive table and validate query result
+sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time options('fileheader'='ID,date,time')")
+sql(s"LOAD DATA local inpath '$resourcesPath/differentZoneTimeStamp.csv' 
overwrite INTO table testhivetable")
+checkAnswer(sql("select * from test_time"), sql("select * from 
testhivetable"))
+sql(s"insert into test_time select 11, '2016-7-24', '2019-3-10 02:00:00' ")
+sql("update test_time set (time) = ('2019-3-10 02:00:00') where ID='2'")
+// Using America/Los_Angeles timezone (timezone is fixed to 
America/Los_Angeles for all tests)
+// Here, 2019-3-10 02:00:00 is invalid data in America/Los_Angeles zone, 
as DST is observed and
+// clocks were turned forward 1 hour to 2019-3-10 03:00:00. With lenience 
property enabled, can parse the time according to DST.
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("2019-3-10 03:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("2019-3-10 03:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("2019-3-10 03:00:00"
 sql("DROP TABLE test_time")
 
CarbonProperties.getInstance().removeProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE)
   }
 
   test("test load, update data with setlenient session level property for 
daylight " +
"saving time from different timezone") {
 sql("set carbon.load.dateformat.setlenient.enable = true")
-TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
 sql("DROP TABLE IF EXISTS test_time")
-sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time 
Timestamp) STORED AS carbondata " +
+sql("DROP TABLE IF EXISTS testhivetable")
+// Create test_time and hive table
+sql("CREATE TABLE test_time (ID Int, date Date, time Timestamp) STORED AS 
carbondata " +
 "TBLPROPERTIES('dateformat'='-MM-dd', 
'timestampformat'='-MM-dd HH:mm:ss') ")
-sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time")
-sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
-sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+sql("CREATE TABLE testhivetable (ID Int, date Date, time TIMESTAMP) row 
format delimited fields terminated by ',' ")
+// load data into test_time and hive table and validate query result
+sql(s"LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time options('fileheader'='ID,date,time')")
+sql(s"LOAD DATA local inpath '$resourcesPath/differentZoneTimeStamp.csv' 
overwrite INTO table testhivetable")
+checkAnswer(sql("select * from

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.

2020-09-07 Thread GitBox


akashrn5 commented on a change in pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905#discussion_r484649115



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##
@@ -318,48 +318,53 @@ class TestLoadDataWithDiffTimestampFormat extends 
QueryTest with BeforeAndAfterA
   test("test load, update data with setlenient carbon property for daylight " +
"saving time from different timezone") {
 
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE,
 "true")
-TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
 sql("DROP TABLE IF EXISTS test_time")
 sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time 
Timestamp) STORED AS carbondata " +
 "TBLPROPERTIES('dateformat'='-MM-dd', 
'timestampformat'='-MM-dd HH:mm:ss') ")
 sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time")
-sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
-sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+sql(s"insert into test_time select 11, '2016-7-24', '2019-3-10 02:00:00' ")
+sql("update test_time set (time) = ('2019-3-10 02:00:00') where ID='2'")
+// Using America/Los_Angeles timezone (timezone is fixed to 
America/Los_Angeles for all tests)
+// Here, 2019-3-10 02:00:00 is invalid data in America/Los_Angeles zone, 
as DST is observed and
+// clocks were turned forward 1 hour to 2019-3-10 03:00:00. With lenience 
property enabled, can parse the time according to DST.
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("2019-3-10 03:00:00"

Review comment:
   i feel in this test case you can create a hive table and compare the 
results with that also to be in sync





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.

2020-09-07 Thread GitBox


akashrn5 commented on a change in pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905#discussion_r484521356



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##
@@ -318,48 +318,53 @@ class TestLoadDataWithDiffTimestampFormat extends 
QueryTest with BeforeAndAfterA
   test("test load, update data with setlenient carbon property for daylight " +
"saving time from different timezone") {
 
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE,
 "true")
-TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
 sql("DROP TABLE IF EXISTS test_time")
 sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time 
Timestamp) STORED AS carbondata " +
 "TBLPROPERTIES('dateformat'='-MM-dd', 
'timestampformat'='-MM-dd HH:mm:ss') ")
 sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time")
-sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
-sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+sql(s"insert into test_time select 11, '2016-7-24', '2019-3-10 02:00:00' ")
+sql("update test_time set (time) = ('2019-3-10 02:00:00') where ID='2'")
+// Using America/Los_Angeles timezone (timezone is fixed to 
America/Los_Angeles for all tests)
+// Here, 2019-3-10 02:00:00 is invalid data in America/Los_Angeles zone, 
as DST is observed and
+// clocks were turned forward 1 hour to 2019-3-10 03:00:00. With lenience 
property enabled, can parse the time according to DST.
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("2019-3-10 03:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("2019-3-10 03:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("2019-3-10 03:00:00"
 sql("DROP TABLE test_time")
 
CarbonProperties.getInstance().removeProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE)
   }
 
   test("test load, update data with setlenient session level property for 
daylight " +
"saving time from different timezone") {
 sql("set carbon.load.dateformat.setlenient.enable = true")
-TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
 sql("DROP TABLE IF EXISTS test_time")
 sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time 
Timestamp) STORED AS carbondata " +
 "TBLPROPERTIES('dateformat'='-MM-dd', 
'timestampformat'='-MM-dd HH:mm:ss') ")
 sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time")
-sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
-sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+sql(s"insert into test_time select 11, '2016-7-24', '2019-3-10 02:00:00' ")
+sql("update test_time set (time) = ('2019-3-10 02:00:00') where ID='2'")
+// Using America/Los_Angeles timezone (timezone is fixed to 
America/Los_Angeles for all tests)
+// Here, 2019-3-10 02:00:00 is invalid data in America/Los_Angeles zone, 
as DST is observed and
+// clocks were turned forward 1 hour to 2019-3-10 03:00:00. With lenience 
property enabled, can parse the time according to DST.
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("2019-3-10 03:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("2019-3-10 03:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("2019-3-10 03:00:00"
 sql("DROP TABLE test_time")
 defaultConfig()
   }
 
   def generateCSVFile(): Unit = {
 val rows = new ListBuffer[Array[String]]
 rows += Array("ID", "date", "time")
-rows += Array("1", "1941-3-15", "1941-3-15 00:00:00")
+rows += Array("1", "1941-3-15", "2019-3-10 02:00:00")
 rows += Array("2", "2016-7-24", "2016-7-24 01:02:30")
 BadRecordUtil.createCSV(rows, csvPath)
   }
 
   override def afterAll {
 sql

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.

2020-09-07 Thread GitBox


akashrn5 commented on a change in pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905#discussion_r484445743



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##
@@ -318,48 +318,47 @@ class TestLoadDataWithDiffTimestampFormat extends 
QueryTest with BeforeAndAfterA
   test("test load, update data with setlenient carbon property for daylight " +
"saving time from different timezone") {
 
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE,
 "true")
-TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
 sql("DROP TABLE IF EXISTS test_time")
 sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time 
Timestamp) STORED AS carbondata " +
 "TBLPROPERTIES('dateformat'='-MM-dd', 
'timestampformat'='-MM-dd HH:mm:ss') ")
 sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time")
-sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
-sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
-checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+sql(s"insert into test_time select 11, '2016-7-24', '2019-3-10 02:00:00' ")

Review comment:
   just add comment in both the test cases, like what timezone its using 
and about dst for any future reference for other developers





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.

2020-09-07 Thread GitBox


akashrn5 commented on a change in pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905#discussion_r484445218



##
File path: core/src/main/java/org/apache/carbondata/core/index/TableIndex.java
##
@@ -153,7 +153,7 @@ public CarbonTable getTable() {
 int carbonDriverPruningMultiThreadEnableFilesCount =
 CarbonProperties.getDriverPruningMultiThreadEnableFilesCount();
 if (numOfThreadsForPruning == 1 || indexesCount < numOfThreadsForPruning 
|| totalFiles
-< carbonDriverPruningMultiThreadEnableFilesCount) {
+< carbonDriverPruningMultiThreadEnableFilesCount || 
!isFilterPresent) {

Review comment:
   here add a comment saying, when the query is without filter, as we need 
to return all the blocklets, no need to prune multithread





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org