[jira] [Created] (CARBONDATA-4135) insert in partition table should fail from presto side but insert into select * in passing in partition table with single column partition table from presto side

2021-02-22 Thread Chetan Bhat (Jira)
Chetan Bhat created CARBONDATA-4135:
---

 Summary: insert in partition table should fail from presto side  
but insert into select * in passing in partition table with single column 
partition table from presto side
 Key: CARBONDATA-4135
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4135
 Project: CarbonData
  Issue Type: Bug
  Components: presto-integration
Affects Versions: 2.1.0
 Environment: Spark 2.4.5, Presto 316
Reporter: Chetan Bhat


Presto 316 version used.

*Steps :-* 

>From Spark beeline execute the queries - 

0: jdbc:hive2://10.20.254.208:23040/default> drop table 
uniqdata_Partition_single;
+-+
| Result |
+-+
+-+
No rows selected (0.454 seconds)
0: jdbc:hive2://10.20.254.208:23040/default> CREATE TABLE 
uniqdata_Partition_single (CUST_ID int,CUST_NAME String, DOB timestamp, DOJ 
timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 
decimal(30,10), DECIMAL_COLUMN2 decimal(36,36),Double_COLUMN1 double, 
Double_COLUMN2 double,INTEGER_COLUMN1 int) partitioned by (ACTIVE_EMUI_VERSION 
string)stored as carbondata 
tblproperties('COLUMN_META_CACHE'='CUST_ID,CUST_NAME,DECIMAL_COLUMN2,DOJ,Double_COLUMN2,BIGINT_COLUMN2','local_dictionary_enable'='true','local_dictionary_threshold'='1000','local_dictionary_include'='ACTIVE_EMUI_VERSION')
 ;
+-+
| Result |
+-+
+-+
No rows selected (0.202 seconds)
0: jdbc:hive2://10.20.254.208:23040/default> LOAD DATA INPATH 
'hdfs://hacluster/chetan/2000_UniqData_partition.csv' into table 
uniqdata_Partition_single OPTIONS('FILEHEADER'='CUST_ID,CUST_NAME 
,ACTIVE_EMUI_VERSION,DOB,DOJ, 
BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1, 
Double_COLUMN2,INTEGER_COLUMN1','BAD_RECORDS_ACTION'='FORCE');
+-+
| Result |
+-+
+-+
No rows selected (3.471 seconds)
0: jdbc:hive2://10.20.254.208:23040/default>

 

>From prestocli the query is executed - 

presto:ranjan> insert into uniqdata_Partition_single select * from 
uniqdata_Partition_single;

 

*Issue : - insert in partition table should fail from presto side but insert 
into select * in passing in partition table with single column partition table 
from presto side.*

presto:ranjan> insert into uniqdata_Partition_single select * from 
uniqdata_Partition_single;
INSERT: 2002 rows

Query 20210223_044320_0_ggkxh, FINISHED, 1 node
Splits: 45 total, 45 done (100.00%)
0:05 [2K rows, 206KB] [431 rows/s, 44.4KB/s]

presto:ranjan> desc uniqdata_Partition_single;
 Column | Type | Extra | Comment
-++---+-
 cust_id | integer | |
 cust_name | varchar | |
 dob | timestamp | |
 doj | timestamp | |
 bigint_column1 | bigint | |
 bigint_column2 | bigint | |
 decimal_column1 | decimal(30,10) | |
 decimal_column2 | decimal(36,36) | |
 double_column1 | double | |
 double_column2 | double | |
 integer_column1 | integer | |
 active_emui_version | varchar | partition key |
(12 rows)

Query 20210223_044344_1_ggkxh, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:00 [12 rows, 1.07KB] [50 rows/s, 4.53KB/s]

presto:ranjan>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4097: [WIP] Refactor CarbonLateDecodeStrategy and CarbonDataSourceScan

2021-02-22 Thread GitBox


CarbonDataQA2 commented on pull request #4097:
URL: https://github.com/apache/carbondata/pull/4097#issuecomment-783406803


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3738/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4097: [WIP] Refactor CarbonLateDecodeStrategy and CarbonDataSourceScan

2021-02-22 Thread GitBox


CarbonDataQA2 commented on pull request #4097:
URL: https://github.com/apache/carbondata/pull/4097#issuecomment-783401216


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5503/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai opened a new pull request #4097: [WIP] Refactor CarbonLateDecodeStrategy and CarbonDataSourceScan

2021-02-22 Thread GitBox


QiangCai opened a new pull request #4097:
URL: https://github.com/apache/carbondata/pull/4097


### Why is this PR needed?
1.  in spark version 3, org.apache.spark.sql.sources.Filter is sealed
2.  CarbonLateDecodeStrategy class's name is incorrect
3.  CarbonDataSourceScan should be the same for 2.3 and 2.4
  
### What changes were proposed in this PR?
   1. translate spark Expression to carbon Expression directly, skip spark 
Filter step.  Remove all spark Filter in carbon code.
   2. separate CarbonLateDecodeStrategy to CarbonSourceStrategy and 
DMLStrategy, and simplify the code of CarbonSourceStrategy
   3.  move CarbonDataSourceScan back to the source folder
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-22 Thread GitBox


CarbonDataQA2 commented on pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#issuecomment-783297309


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3737/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-22 Thread GitBox


CarbonDataQA2 commented on pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#issuecomment-783296467


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5502/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-22 Thread GitBox


VenuReddy2103 commented on pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#issuecomment-783290929


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-22 Thread GitBox


ajantha-bhat commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r580151473



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala
##
@@ -41,6 +43,26 @@ case class CarbonCleanFilesCommand(
   extends DataCommand {
 
   val LOGGER: Logger = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
+  val isDryRun: Boolean = options.getOrElse("dryrun", "false").toBoolean
+  var showStats: Boolean = options.getOrElse("statistics", "true").toBoolean
+  if (isInternalCleanCall) {
+showStats = false
+  }
+
+  override def output: Seq[AttributeReference] = {
+if (isDryRun) {
+  // dry run operation
+  Seq(
+AttributeReference("Size Freed", StringType, nullable = false)(),

Review comment:
   For dry run, 
   a) we don't free up the space, so change it to `Size to be freed' 
   b) And trash data remaining is only data inside trash right ?  change it to 
'Trash folder size' 
   
   for clean files, why are we not printing trash size after cleaning ? 
@vikramahuja1001 , @akashrn5 
   






This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-22 Thread GitBox


vikramahuja1001 commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r580105024



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala
##
@@ -41,6 +43,26 @@ case class CarbonCleanFilesCommand(
   extends DataCommand {
 
   val LOGGER: Logger = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
+  val isDryRun: Boolean = options.getOrElse("dryrun", "false").toBoolean
+  var showStats: Boolean = options.getOrElse("statistics", "true").toBoolean
+  if (isInternalCleanCall) {
+showStats = false
+  }
+
+  override def output: Seq[AttributeReference] = {
+if (isDryRun) {
+  // dry run operation
+  Seq(
+AttributeReference("Size Freed", StringType, nullable = false)(),

Review comment:
   @ajantha-bhat 
   Clean files with stats: 
   +--+
   |Size Freed|
   +--+
   |  7 KB |
   +--+
   
   Dry Run:
   
   +--++
   |Size Freed|Trash Data Remaining|
   +--++
   |  7 KB|  0 Byte|
   +--+-+





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-22 Thread GitBox


vikramahuja1001 commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r580105024



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala
##
@@ -41,6 +43,26 @@ case class CarbonCleanFilesCommand(
   extends DataCommand {
 
   val LOGGER: Logger = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
+  val isDryRun: Boolean = options.getOrElse("dryrun", "false").toBoolean
+  var showStats: Boolean = options.getOrElse("statistics", "true").toBoolean
+  if (isInternalCleanCall) {
+showStats = false
+  }
+
+  override def output: Seq[AttributeReference] = {
+if (isDryRun) {
+  // dry run operation
+  Seq(
+AttributeReference("Size Freed", StringType, nullable = false)(),

Review comment:
   @ajantha-bhat 
   Clean files with stats: 
   +--+
   |Size Freed|
   +--+
   |  7 KB|
   +--+
   
   Dry Run:
   
   +--++
   |Size Freed|Trash Data Remaining|
   +--++
   |  7 KB|  0 Byte|
   +--++





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-22 Thread GitBox


ajantha-bhat commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r580086955



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala
##
@@ -41,6 +43,26 @@ case class CarbonCleanFilesCommand(
   extends DataCommand {
 
   val LOGGER: Logger = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
+  val isDryRun: Boolean = options.getOrElse("dryrun", "false").toBoolean
+  var showStats: Boolean = options.getOrElse("statistics", "true").toBoolean
+  if (isInternalCleanCall) {
+showStats = false
+  }
+
+  override def output: Seq[AttributeReference] = {
+if (isDryRun) {
+  // dry run operation
+  Seq(
+AttributeReference("Size Freed", StringType, nullable = false)(),

Review comment:
   @vikramahuja1001 : can you paste here one output of clean files and dry 
run now ? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-22 Thread GitBox


CarbonDataQA2 commented on pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#issuecomment-783213957


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3736/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-22 Thread GitBox


CarbonDataQA2 commented on pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#issuecomment-783213702


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5501/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-22 Thread GitBox


vikramahuja1001 commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r580051926



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/cleanfiles/TestCleanFileCommand.scala
##
@@ -466,6 +485,39 @@ class TestCleanFileCommand extends QueryTest with 
BeforeAndAfterAll {
 CarbonCommonConstants.CARBON_CLEAN_FILES_FORCE_ALLOWED_DEFAULT)
 }
 
+  test("Test clean files after delete command") {
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_CLEAN_FILES_FORCE_ALLOWED, 
"true")
+sql("drop table if exists cleantest")
+sql(
+  """
+| CREATE TABLE cleantest (empname String, designation String, doj 
Timestamp,
+|  workgroupcategory int, workgroupcategoryname String, deptno int, 
deptname String,
+|  projectcode int, projectjoindate Timestamp, projectenddate 
Date,attendance int,
+|  utilization int,salary int, empno int)
+| STORED AS carbondata
+  """.stripMargin)
+sql(
+  s"""LOAD DATA local inpath '$resourcesPath/data.csv' INTO TABLE 
cleantest OPTIONS
+ |('DELIMITER'= ',', 'QUOTECHAR'= '"')""".stripMargin)
+val table = CarbonEnv.getCarbonTable(None, "cleantest") 
(sqlContext.sparkSession)
+sql("delete from cleantest where deptno='10'")
+sql(s"""Delete from table cleantest where segment.id in(0)""")
+
+var dryRun = sql(s"CLEAN FILES FOR TABLE cleantest 
OPTIONS('dryrun'='true')").collect()
+var cleanFiles = sql(s"CLEAN FILES FOR TABLE cleantest").collect()
+assert(cleanFiles(0).get(0) == dryRun(0).get(0))
+dryRun = sql(s"CLEAN FILES FOR TABLE cleantest 
OPTIONS('dryrun'='true','force'='true')")
+  .collect()
+cleanFiles = sql(s"CLEAN FILES FOR TABLE cleantest 
OPTIONS('force'='true')").collect()
+assert(cleanFiles(0).get(0) == dryRun(0).get(0))

Review comment:
   done added





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org