[jira] [Created] (CARBONDATA-4135) insert in partition table should fail from presto side but insert into select * in passing in partition table with single column partition table from presto side
Chetan Bhat created CARBONDATA-4135: --- Summary: insert in partition table should fail from presto side but insert into select * in passing in partition table with single column partition table from presto side Key: CARBONDATA-4135 URL: https://issues.apache.org/jira/browse/CARBONDATA-4135 Project: CarbonData Issue Type: Bug Components: presto-integration Affects Versions: 2.1.0 Environment: Spark 2.4.5, Presto 316 Reporter: Chetan Bhat Presto 316 version used. *Steps :-* >From Spark beeline execute the queries - 0: jdbc:hive2://10.20.254.208:23040/default> drop table uniqdata_Partition_single; +-+ | Result | +-+ +-+ No rows selected (0.454 seconds) 0: jdbc:hive2://10.20.254.208:23040/default> CREATE TABLE uniqdata_Partition_single (CUST_ID int,CUST_NAME String, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,36),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) partitioned by (ACTIVE_EMUI_VERSION string)stored as carbondata tblproperties('COLUMN_META_CACHE'='CUST_ID,CUST_NAME,DECIMAL_COLUMN2,DOJ,Double_COLUMN2,BIGINT_COLUMN2','local_dictionary_enable'='true','local_dictionary_threshold'='1000','local_dictionary_include'='ACTIVE_EMUI_VERSION') ; +-+ | Result | +-+ +-+ No rows selected (0.202 seconds) 0: jdbc:hive2://10.20.254.208:23040/default> LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData_partition.csv' into table uniqdata_Partition_single OPTIONS('FILEHEADER'='CUST_ID,CUST_NAME ,ACTIVE_EMUI_VERSION,DOB,DOJ, BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1, Double_COLUMN2,INTEGER_COLUMN1','BAD_RECORDS_ACTION'='FORCE'); +-+ | Result | +-+ +-+ No rows selected (3.471 seconds) 0: jdbc:hive2://10.20.254.208:23040/default> >From prestocli the query is executed - presto:ranjan> insert into uniqdata_Partition_single select * from uniqdata_Partition_single; *Issue : - insert in partition table should fail from presto side but insert into select * in passing in partition table with single column partition table from presto side.* presto:ranjan> insert into uniqdata_Partition_single select * from uniqdata_Partition_single; INSERT: 2002 rows Query 20210223_044320_0_ggkxh, FINISHED, 1 node Splits: 45 total, 45 done (100.00%) 0:05 [2K rows, 206KB] [431 rows/s, 44.4KB/s] presto:ranjan> desc uniqdata_Partition_single; Column | Type | Extra | Comment -++---+- cust_id | integer | | cust_name | varchar | | dob | timestamp | | doj | timestamp | | bigint_column1 | bigint | | bigint_column2 | bigint | | decimal_column1 | decimal(30,10) | | decimal_column2 | decimal(36,36) | | double_column1 | double | | double_column2 | double | | integer_column1 | integer | | active_emui_version | varchar | partition key | (12 rows) Query 20210223_044344_1_ggkxh, FINISHED, 1 node Splits: 19 total, 19 done (100.00%) 0:00 [12 rows, 1.07KB] [50 rows/s, 4.53KB/s] presto:ranjan> -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4097: [WIP] Refactor CarbonLateDecodeStrategy and CarbonDataSourceScan
CarbonDataQA2 commented on pull request #4097: URL: https://github.com/apache/carbondata/pull/4097#issuecomment-783406803 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3738/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4097: [WIP] Refactor CarbonLateDecodeStrategy and CarbonDataSourceScan
CarbonDataQA2 commented on pull request #4097: URL: https://github.com/apache/carbondata/pull/4097#issuecomment-783401216 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5503/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai opened a new pull request #4097: [WIP] Refactor CarbonLateDecodeStrategy and CarbonDataSourceScan
QiangCai opened a new pull request #4097: URL: https://github.com/apache/carbondata/pull/4097 ### Why is this PR needed? 1. in spark version 3, org.apache.spark.sql.sources.Filter is sealed 2. CarbonLateDecodeStrategy class's name is incorrect 3. CarbonDataSourceScan should be the same for 2.3 and 2.4 ### What changes were proposed in this PR? 1. translate spark Expression to carbon Expression directly, skip spark Filter step. Remove all spark Filter in carbon code. 2. separate CarbonLateDecodeStrategy to CarbonSourceStrategy and DMLStrategy, and simplify the code of CarbonSourceStrategy 3. move CarbonDataSourceScan back to the source folder ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
CarbonDataQA2 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-783297309 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3737/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
CarbonDataQA2 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-783296467 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5502/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
VenuReddy2103 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-783290929 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
ajantha-bhat commented on a change in pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#discussion_r580151473 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala ## @@ -41,6 +43,26 @@ case class CarbonCleanFilesCommand( extends DataCommand { val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getCanonicalName) + val isDryRun: Boolean = options.getOrElse("dryrun", "false").toBoolean + var showStats: Boolean = options.getOrElse("statistics", "true").toBoolean + if (isInternalCleanCall) { +showStats = false + } + + override def output: Seq[AttributeReference] = { +if (isDryRun) { + // dry run operation + Seq( +AttributeReference("Size Freed", StringType, nullable = false)(), Review comment: For dry run, a) we don't free up the space, so change it to `Size to be freed' b) And trash data remaining is only data inside trash right ? change it to 'Trash folder size' for clean files, why are we not printing trash size after cleaning ? @vikramahuja1001 , @akashrn5 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
vikramahuja1001 commented on a change in pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#discussion_r580105024 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala ## @@ -41,6 +43,26 @@ case class CarbonCleanFilesCommand( extends DataCommand { val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getCanonicalName) + val isDryRun: Boolean = options.getOrElse("dryrun", "false").toBoolean + var showStats: Boolean = options.getOrElse("statistics", "true").toBoolean + if (isInternalCleanCall) { +showStats = false + } + + override def output: Seq[AttributeReference] = { +if (isDryRun) { + // dry run operation + Seq( +AttributeReference("Size Freed", StringType, nullable = false)(), Review comment: @ajantha-bhat Clean files with stats: +--+ |Size Freed| +--+ | 7 KB | +--+ Dry Run: +--++ |Size Freed|Trash Data Remaining| +--++ | 7 KB| 0 Byte| +--+-+ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
vikramahuja1001 commented on a change in pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#discussion_r580105024 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala ## @@ -41,6 +43,26 @@ case class CarbonCleanFilesCommand( extends DataCommand { val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getCanonicalName) + val isDryRun: Boolean = options.getOrElse("dryrun", "false").toBoolean + var showStats: Boolean = options.getOrElse("statistics", "true").toBoolean + if (isInternalCleanCall) { +showStats = false + } + + override def output: Seq[AttributeReference] = { +if (isDryRun) { + // dry run operation + Seq( +AttributeReference("Size Freed", StringType, nullable = false)(), Review comment: @ajantha-bhat Clean files with stats: +--+ |Size Freed| +--+ | 7 KB| +--+ Dry Run: +--++ |Size Freed|Trash Data Remaining| +--++ | 7 KB| 0 Byte| +--++ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
ajantha-bhat commented on a change in pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#discussion_r580086955 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala ## @@ -41,6 +43,26 @@ case class CarbonCleanFilesCommand( extends DataCommand { val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getCanonicalName) + val isDryRun: Boolean = options.getOrElse("dryrun", "false").toBoolean + var showStats: Boolean = options.getOrElse("statistics", "true").toBoolean + if (isInternalCleanCall) { +showStats = false + } + + override def output: Seq[AttributeReference] = { +if (isDryRun) { + // dry run operation + Seq( +AttributeReference("Size Freed", StringType, nullable = false)(), Review comment: @vikramahuja1001 : can you paste here one output of clean files and dry run now ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
CarbonDataQA2 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-783213957 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3736/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
CarbonDataQA2 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-783213702 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5501/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
vikramahuja1001 commented on a change in pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#discussion_r580051926 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/cleanfiles/TestCleanFileCommand.scala ## @@ -466,6 +485,39 @@ class TestCleanFileCommand extends QueryTest with BeforeAndAfterAll { CarbonCommonConstants.CARBON_CLEAN_FILES_FORCE_ALLOWED_DEFAULT) } + test("Test clean files after delete command") { +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_CLEAN_FILES_FORCE_ALLOWED, "true") +sql("drop table if exists cleantest") +sql( + """ +| CREATE TABLE cleantest (empname String, designation String, doj Timestamp, +| workgroupcategory int, workgroupcategoryname String, deptno int, deptname String, +| projectcode int, projectjoindate Timestamp, projectenddate Date,attendance int, +| utilization int,salary int, empno int) +| STORED AS carbondata + """.stripMargin) +sql( + s"""LOAD DATA local inpath '$resourcesPath/data.csv' INTO TABLE cleantest OPTIONS + |('DELIMITER'= ',', 'QUOTECHAR'= '"')""".stripMargin) +val table = CarbonEnv.getCarbonTable(None, "cleantest") (sqlContext.sparkSession) +sql("delete from cleantest where deptno='10'") +sql(s"""Delete from table cleantest where segment.id in(0)""") + +var dryRun = sql(s"CLEAN FILES FOR TABLE cleantest OPTIONS('dryrun'='true')").collect() +var cleanFiles = sql(s"CLEAN FILES FOR TABLE cleantest").collect() +assert(cleanFiles(0).get(0) == dryRun(0).get(0)) +dryRun = sql(s"CLEAN FILES FOR TABLE cleantest OPTIONS('dryrun'='true','force'='true')") + .collect() +cleanFiles = sql(s"CLEAN FILES FOR TABLE cleantest OPTIONS('force'='true')").collect() +assert(cleanFiles(0).get(0) == dryRun(0).get(0)) Review comment: done added This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org