[GitHub] [carbondata] Indhumathi27 closed pull request #4094: [TEST] test
Indhumathi27 closed pull request #4094: URL: https://github.com/apache/carbondata/pull/4094 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (CARBONDATA-3962) Empty fact dirs are present in case of flat folder, which are unnecessary
[ https://issues.apache.org/jira/browse/CARBONDATA-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286278#comment-17286278 ] Indhumathi Muthu Murugesh edited comment on CARBONDATA-3962 at 2/18/21, 5:16 AM: - Reverted PR-3904. [Refer https://github.com/apache/carbondata/pull/4095|https://github.com/apache/carbondata/pull/4095] was (Author: indhumuthumurugesh): [Refer https://github.com/apache/carbondata/pull/4095|https://github.com/apache/carbondata/pull/4095] > Empty fact dirs are present in case of flat folder, which are unnecessary > - > > Key: CARBONDATA-3962 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3962 > Project: CarbonData > Issue Type: Bug >Reporter: Akash R Nilugal >Assignee: Akash R Nilugal >Priority: Minor > Fix For: 2.1.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Empty fact dirs are present in case of flat folder, which are unnecessary -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-3962) Empty fact dirs are present in case of flat folder, which are unnecessary
[ https://issues.apache.org/jira/browse/CARBONDATA-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286278#comment-17286278 ] Indhumathi Muthu Murugesh commented on CARBONDATA-3962: --- https://github.com/apache/carbondata/pull/4095 > Empty fact dirs are present in case of flat folder, which are unnecessary > - > > Key: CARBONDATA-3962 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3962 > Project: CarbonData > Issue Type: Bug >Reporter: Akash R Nilugal >Assignee: Akash R Nilugal >Priority: Minor > Fix For: 2.1.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Empty fact dirs are present in case of flat folder, which are unnecessary -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (CARBONDATA-3962) Empty fact dirs are present in case of flat folder, which are unnecessary
[ https://issues.apache.org/jira/browse/CARBONDATA-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286278#comment-17286278 ] Indhumathi Muthu Murugesh edited comment on CARBONDATA-3962 at 2/18/21, 5:15 AM: - [Refer https://github.com/apache/carbondata/pull/4095|https://github.com/apache/carbondata/pull/4095] was (Author: indhumuthumurugesh): https://github.com/apache/carbondata/pull/4095 > Empty fact dirs are present in case of flat folder, which are unnecessary > - > > Key: CARBONDATA-3962 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3962 > Project: CarbonData > Issue Type: Bug >Reporter: Akash R Nilugal >Assignee: Akash R Nilugal >Priority: Minor > Fix For: 2.1.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Empty fact dirs are present in case of flat folder, which are unnecessary -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] nihal0107 commented on pull request #4095: [CARBONDATA-3962] Fixed concurrent load failure with flat folder structure.
nihal0107 commented on pull request #4095: URL: https://github.com/apache/carbondata/pull/4095#issuecomment-781060145 merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 closed pull request #4095: [CARBONDATA-3962] Fixed concurrent load failure with flat folder structure.
nihal0107 closed pull request #4095: URL: https://github.com/apache/carbondata/pull/4095 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on pull request #4095: [CARBONDATA-3962] Fixed concurrent load failure with flat folder structure.
Indhumathi27 commented on pull request #4095: URL: https://github.com/apache/carbondata/pull/4095#issuecomment-781054526 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
CarbonDataQA2 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-780826470 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3726/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
CarbonDataQA2 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-780823505 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5489/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4088: [CARBONDATA-4121] Prepriming is not working in Index Server.
CarbonDataQA2 commented on pull request #4088: URL: https://github.com/apache/carbondata/pull/4088#issuecomment-780817533 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3725/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4088: [CARBONDATA-4121] Prepriming is not working in Index Server.
CarbonDataQA2 commented on pull request #4088: URL: https://github.com/apache/carbondata/pull/4088#issuecomment-780816628 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5488/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4095: [CARBONDATA-3962] Fixed concurrent load failure with flat folder structure.
CarbonDataQA2 commented on pull request #4095: URL: https://github.com/apache/carbondata/pull/4095#issuecomment-780786063 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3722/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4095: [CARBONDATA-3962] Fixed concurrent load failure with flat folder structure.
CarbonDataQA2 commented on pull request #4095: URL: https://github.com/apache/carbondata/pull/4095#issuecomment-780780682 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5485/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 commented on a change in pull request #4088: [CARBONDATA-4121] Prepriming is not working in Index Server.
Karan980 commented on a change in pull request #4088: URL: https://github.com/apache/carbondata/pull/4088#discussion_r577845412 ## File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala ## @@ -128,6 +128,10 @@ object IndexServer extends ServerInterface { def getCount(request: IndexInputFormat): LongWritable = { doAs { val sparkSession = SparkSQLUtil.getSparkSession + var currentUser: String = null + if (!request.isFallbackJob && Server.getRemoteUser != null) { +currentUser = Server.getRemoteUser.getShortUserName Review comment: Null check for Server.getRemoteUser removed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 removed a comment on pull request #4093: [CARBONDATA-4126] Concurrent compaction failed with load on table.
kunal642 removed a comment on pull request #4093: URL: https://github.com/apache/carbondata/pull/4093#issuecomment-780735856 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #4093: [CARBONDATA-4126] Concurrent compaction failed with load on table.
kunal642 commented on pull request #4093: URL: https://github.com/apache/carbondata/pull/4093#issuecomment-780735856 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4093: [CARBONDATA-4126] Concurrent compaction failed with load on table.
CarbonDataQA2 commented on pull request #4093: URL: https://github.com/apache/carbondata/pull/4093#issuecomment-780735611 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5484/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4093: [CARBONDATA-4126] Concurrent compaction failed with load on table.
CarbonDataQA2 commented on pull request #4093: URL: https://github.com/apache/carbondata/pull/4093#issuecomment-780730471 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3720/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on pull request #4095: [CARBONDATA-3962] Fixed concurrent load failure with flat folder structure.
nihal0107 commented on pull request #4095: URL: https://github.com/apache/carbondata/pull/4095#issuecomment-780721152 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4095: [CARBONDATA-3962] Fixed concurrent load failure with flat folder structure.
CarbonDataQA2 commented on pull request #4095: URL: https://github.com/apache/carbondata/pull/4095#issuecomment-780687519 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3719/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4095: [CARBONDATA-3962] Fixed concurrent load failure with flat folder structure.
CarbonDataQA2 commented on pull request #4095: URL: https://github.com/apache/carbondata/pull/4095#issuecomment-780681729 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5483/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #1409: [WIP][CARBODNATA-1377] support hive partition
CarbonDataQA2 commented on pull request #1409: URL: https://github.com/apache/carbondata/pull/1409#issuecomment-780675686 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3721/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-4123) Bloom index query with Index server giving incorrect results
[ https://issues.apache.org/jira/browse/CARBONDATA-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4123. -- Fix Version/s: 2.1.1 Resolution: Fixed > Bloom index query with Index server giving incorrect results > > > Key: CARBONDATA-4123 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4123 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Fix For: 2.1.1 > > > Queries: create table and load data so that it can create >1 blocklet. > > spark-sql> select count(*) from test_rcd where city = 'city40'; > 2021-02-04 22:13:29,759 | WARN | pool-24-thread-1 | It is not recommended to > set off-heap working memory size less than 512MB, so setting default value to > 512 | > org.apache.carbondata.core.memory.UnsafeMemoryManager.(UnsafeMemoryManager.java:83) > 10 > Time taken: 2.417 seconds, Fetched 1 row(s) > spark-sql> CREATE INDEX dm_rcd ON TABLE test_rcd (city) AS 'bloomfilter' > properties ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1'); > 2021-02-04 22:13:58,683 | AUDIT | main | \{"time":"February 4, 2021 10:13:58 > PM CST","username":"carbon","opName":"CREATE > INDEX","opId":"15148202700230273","opStatus":"START"} | > carbon.audit.logOperationStart(Auditor.java:74) > 2021-02-04 22:13:58,759 | WARN | main | Bloom compress is not configured for > index dm_rcd, use default value true | > org.apache.carbondata.index.bloom.BloomCoarseGrainIndexFactory.validateAndGetBloomCompress(BloomCoarseGrainIndexFactory.java:202) > 2021-02-04 22:13:59,292 | WARN | Executor task launch worker for task 2 | > Bloom compress is not configured for index dm_rcd, use default value true | > org.apache.carbondata.index.bloom.BloomCoarseGrainIndexFactory.validateAndGetBloomCompress(BloomCoarseGrainIndexFactory.java:202) > 2021-02-04 22:13:59,629 | WARN | main | Bloom compress is not configured for > index dm_rcd, use default value true | > org.apache.carbondata.index.bloom.BloomCoarseGrainIndexFactory.validateAndGetBloomCompress(BloomCoarseGrainIndexFactory.java:202) > 2021-02-04 22:14:00,331 | AUDIT | main | \{"time":"February 4, 2021 10:14:00 > PM CST","username":"carbon","opName":"CREATE > INDEX","opId":"15148202700230273","opStatus":"SUCCESS","opTime":"1648 > ms","table":"default.test_rcd","extraInfo":{"provider":"bloomfilter","indexName":"dm_rcd","bloom_size":"64","bloom_fpp":"0.1"}} > | carbon.audit.logOperationEnd(Auditor.java:97) > Time taken: 1.818 seconds > spark-sql> select count(*) from test_rcd where city = 'city40'; > 30 > Time taken: 0.556 seconds, Fetched 1 row(s) > spark-sql> -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4117) Test cg index query with Index server fails with NPE
[ https://issues.apache.org/jira/browse/CARBONDATA-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-4117. -- Fix Version/s: 2.1.1 Resolution: Fixed > Test cg index query with Index server fails with NPE > > > Key: CARBONDATA-4117 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4117 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Fix For: 2.1.1 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > Test queries to execute: > spark-sql> CREATE TABLE index_test_cg(id INT, name STRING, city STRING, age > INT) STORED AS carbondata TBLPROPERTIES('SORT_COLUMNS'='city,name', > 'SORT_SCOPE'='LOCAL_SORT'); > spark-sql> create index cgindex on table index_test_cg (name) as > 'org.apache.carbondata.spark.testsuite.index.CGIndexFactory'; > LOAD DATA LOCAL INPATH '$file2' INTO TABLE index_test_cg > OPTIONS('header'='false') > spark-sql> select * from index_test_cg where name='n502670'; > 2021-01-29 15:09:25,881 | ERROR | main | Exception occurred while getting > splits using index server. Initiating Fallback to embedded mode | > org.apache.carbondata.hadoop.api.CarbonInputFormat.getDistributedSplit(CarbonInputFormat.java:454) > java.lang.reflect.UndeclaredThrowableException > at com.sun.proxy.$Proxy69.getSplits(Unknown Source) > at > org.apache.carbondata.indexserver.DistributedIndexJob$$anonfun$1.apply(IndexJobs.scala:85) > at > org.apache.carbondata.indexserver.DistributedIndexJob$$anonfun$1.apply(IndexJobs.scala:59) > at > org.apache.carbondata.spark.util.CarbonScalaUtil$.logTime(CarbonScalaUtil.scala:769) > at > org.apache.carbondata.indexserver.DistributedIndexJob.execute(IndexJobs.scala:58) > at > org.apache.carbondata.core.index.IndexUtil.executeIndexJob(IndexUtil.java:307) > at > org.apache.carbondata.hadoop.api.CarbonInputFormat.getDistributedSplit(CarbonInputFormat.java:443) > at > org.apache.carbondata.hadoop.api.CarbonInputFormat.getPrunedBlocklets(CarbonInputFormat.java:555) > at > org.apache.carbondata.hadoop.api.CarbonInputFormat.getDataBlocksOfSegment(CarbonInputFormat.java:500) > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:357) > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:205) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD.internalGetPartitions(CarbonScanRDD.scala:159) > at org.apache.carbondata.spark.rdd.CarbonRDD.getPartitions(CarbonRDD.scala:68) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:269) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:269) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:269) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2299) > at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:989) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:384) > at org.apache.spark.rdd.RDD.collect(RDD.scala:988) > at > org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:345) > at > org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:372) > at > org.apache.spark.sql.execution.QueryExecution.hiveResultString(QueryExecution.scala:127) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver$$anonfun$run$1.apply(SparkSQLDriver.scala:66) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver$$anonfun$run$1.apply(SparkSQLDriver.scala:66) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:95) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:144) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:86) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:789) > at >
[GitHub] [carbondata] asfgit closed pull request #4089: [CARBONDATA-4117][[CARBONDATA-4123] cg index and bloom index query issue with Index server
asfgit closed pull request #4089: URL: https://github.com/apache/carbondata/pull/4089 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #4093: [CARBONDATA-4126] Concurrent compaction failed with load on table.
kunal642 commented on pull request #4093: URL: https://github.com/apache/carbondata/pull/4093#issuecomment-780665646 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #4093: [CARBONDATA-4126] Concurrent compaction failed with load on table.
kunal642 commented on a change in pull request #4093: URL: https://github.com/apache/carbondata/pull/4093#discussion_r577741017 ## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala ## @@ -92,19 +92,22 @@ class CarbonTableCompactor( val lastSegment = sortedSegments.get(sortedSegments.size() - 1) val compactedLoad = CarbonDataMergerUtil.getMergedLoadName(loadsToMerge) var segmentLocks: ListBuffer[ICarbonLock] = ListBuffer.empty + val validSegments = new java.util.ArrayList[LoadMetadataDetails] loadsToMerge.asScala.foreach { segmentId => val segmentLock = CarbonLockFactory .getCarbonLockObj(carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable .getAbsoluteTableIdentifier, CarbonTablePath.addSegmentPrefix(segmentId.getLoadName) + LockUsage.LOCK) -if (!segmentLock.lockWithRetries()) { - throw new Exception(s"Failed to acquire lock on segment ${segmentId.getLoadName}," + -s" during compaction of table ${compactionModel.carbonTable.getQualifiedName}") +if (segmentLock.lockWithRetries()) { + validSegments.add(segmentId) + segmentLocks += segmentLock +} else { + LOGGER.warn(s"Failed to acquire lock on segment ${segmentId.getLoadName}, " + + s"during compaction of table ${compactionModel.carbonTable.getQualifiedName}") } -segmentLocks += segmentLock } try { -scanSegmentsAndSubmitJob(loadsToMerge, compactedSegments, compactedLoad) +scanSegmentsAndSubmitJob(validSegments, compactedSegments, compactedLoad) Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #4089: [CARBONDATA-4117][[CARBONDATA-4123] cg index and bloom index query issue with Index server
kunal642 commented on pull request #4089: URL: https://github.com/apache/carbondata/pull/4089#issuecomment-780665000 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-4124) Refresh MV which does not exist is not throwing proper message
[ https://issues.apache.org/jira/browse/CARBONDATA-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash R Nilugal resolved CARBONDATA-4124. - Fix Version/s: 2.2.0 Assignee: Indhumathi Muthu Murugesh Resolution: Fixed > Refresh MV which does not exist is not throwing proper message > -- > > Key: CARBONDATA-4124 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4124 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi Muthu Murugesh >Assignee: Indhumathi Muthu Murugesh >Priority: Minor > Fix For: 2.2.0 > > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #4091: [CARBONDATA-4124] Fix Refresh MV which does not exist error message
asfgit closed pull request #4091: URL: https://github.com/apache/carbondata/pull/4091 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #4091: [CARBONDATA-4124] Fix Refresh MV which does not exist error message
akashrn5 commented on pull request #4091: URL: https://github.com/apache/carbondata/pull/4091#issuecomment-780642185 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 removed a comment on pull request #4091: [CARBONDATA-4124] Fix Refresh MV which does not exist error message
akashrn5 removed a comment on pull request #4091: URL: https://github.com/apache/carbondata/pull/4091#issuecomment-779619403 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4091: [CARBONDATA-4124] Fix Refresh MV which does not exist error message
CarbonDataQA2 commented on pull request #4091: URL: https://github.com/apache/carbondata/pull/4091#issuecomment-780630691 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5482/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4091: [CARBONDATA-4124] Fix Refresh MV which does not exist error message
CarbonDataQA2 commented on pull request #4091: URL: https://github.com/apache/carbondata/pull/4091#issuecomment-780624830 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3718/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on pull request #4095: [CARBONDATA-3962] Fixed concurrent load failure with flat folder structure.
Indhumathi27 commented on pull request #4095: URL: https://github.com/apache/carbondata/pull/4095#issuecomment-780616594 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4089: [CARBONDATA-4117][[CARBONDATA-4123] cg index and bloom index query issue with Index server
CarbonDataQA2 commented on pull request #4089: URL: https://github.com/apache/carbondata/pull/4089#issuecomment-780605031 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3717/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4089: [CARBONDATA-4117][[CARBONDATA-4123] cg index and bloom index query issue with Index server
CarbonDataQA2 commented on pull request #4089: URL: https://github.com/apache/carbondata/pull/4089#issuecomment-780600848 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5481/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on pull request #4091: [CARBONDATA-4124] Fix Refresh MV which does not exist error message
Indhumathi27 commented on pull request #4091: URL: https://github.com/apache/carbondata/pull/4091#issuecomment-780563734 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4091: [CARBONDATA-4124] Fix Refresh MV which does not exist error message
CarbonDataQA2 commented on pull request #4091: URL: https://github.com/apache/carbondata/pull/4091#issuecomment-780543770 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on pull request #4089: [CARBONDATA-4117][[CARBONDATA-4123] cg index and bloom index query issue with Index server
ShreelekhyaG commented on pull request #4089: URL: https://github.com/apache/carbondata/pull/4089#issuecomment-780542693 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4089: [CARBONDATA-4117][[CARBONDATA-4123] cg index and bloom index query issue with Index server
CarbonDataQA2 commented on pull request #4089: URL: https://github.com/apache/carbondata/pull/4089#issuecomment-780536006 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3715/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4089: [CARBONDATA-4117][[CARBONDATA-4123] cg index and bloom index query issue with Index server
CarbonDataQA2 commented on pull request #4089: URL: https://github.com/apache/carbondata/pull/4089#issuecomment-780532287 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5479/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on pull request #4091: [CARBONDATA-4124] Fix Refresh MV which does not exist error message
Indhumathi27 commented on pull request #4091: URL: https://github.com/apache/carbondata/pull/4091#issuecomment-780497167 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
akashrn5 commented on a change in pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#discussion_r577533482 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala ## @@ -37,11 +38,24 @@ case class CarbonCleanFilesCommand( databaseNameOp: Option[String], tableName: String, options: Map[String, String] = Map.empty, +dryRun: Boolean, isInternalCleanCall: Boolean = false) extends DataCommand { val LOGGER: Logger = LogServiceFactory.getLogService(this.getClass.getCanonicalName) + override def output: Seq[AttributeReference] = { +if (dryRun) { + Seq( +AttributeReference("Size Freed", LongType, nullable = false)(), +AttributeReference("Trash Data Remaining", LongType, nullable = false)()) +} else { + Seq( +AttributeReference("Size Freed", LongType, nullable = false)(), +AttributeReference("Trash Data Remaining", LongType, nullable = false)()) +} Review comment: if else both blocks are same? i think better to give these rows only in case of dry run ## File path: integration/spark/src/main/scala/org/apache/carbondata/trash/DataTrashManager.scala ## @@ -87,13 +101,28 @@ object DataTrashManager { } } - private def checkAndCleanTrashFolder(carbonTable: CarbonTable, isForceDelete: Boolean): Unit = { + def cleanFilesDryRunOperation ( + carbonTable: CarbonTable, + isForceDelete: Boolean, + cleanStaleInProgress: Boolean, + partitionSpecs: Option[Seq[PartitionSpec]] = None): Seq[Long] = { +// get size freed from the trash folder +val trashFolderSizeStats = checkAndCleanTrashFolder(carbonTable, isForceDelete, isDryRun = true) +// get size that will be deleted (MFD, COmpacted, Inprogress segments) +val expiredSegmentsSizeStats = dryRunOnExpiredSegments(carbonTable, isForceDelete, + cleanStaleInProgress, partitionSpecs) +Seq(trashFolderSizeStats.head + expiredSegmentsSizeStats.head, trashFolderSizeStats(1) + +expiredSegmentsSizeStats(1)) + } + + private def checkAndCleanTrashFolder(carbonTable: CarbonTable, isForceDelete: Boolean, + isDryRun: Boolean): Seq[Long] = { Review comment: i think we are mixing the dry run option also along with forcedelete, and making this complex with code and combination handling, what i think is, when user say dry run, it should be clear that i dont take any other options and i just tell user in return how much and what i am going to clean, thats all, we will not delete or clear any files when dry run. So it will be easy to code and cleaner, may be new class or a new method in clean files command class. What you guys think @ajantha-bhat @QiangCai This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4091: [CARBONDATA-4124] Fix Refresh MV which does not exist error message
CarbonDataQA2 commented on pull request #4091: URL: https://github.com/apache/carbondata/pull/4091#issuecomment-780491628 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-4131) Concurrent load on table with flat folder structure fails with FileNotFound
[ https://issues.apache.org/jira/browse/CARBONDATA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nihal kumar ojha resolved CARBONDATA-4131. -- Resolution: Duplicate > Concurrent load on table with flat folder structure fails with FileNotFound > --- > > Key: CARBONDATA-4131 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4131 > Project: CarbonData > Issue Type: Bug >Reporter: Nihal kumar ojha >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-4131) Concurrent load on table with flat folder structure fails with FileNotFound
[ https://issues.apache.org/jira/browse/CARBONDATA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285791#comment-17285791 ] Nihal kumar ojha commented on CARBONDATA-4131: -- Duplicate of CARBONDATA-3962 > Concurrent load on table with flat folder structure fails with FileNotFound > --- > > Key: CARBONDATA-4131 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4131 > Project: CarbonData > Issue Type: Bug >Reporter: Nihal kumar ojha >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
akashrn5 commented on a change in pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#discussion_r577527003 ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ## @@ -1297,4 +1359,37 @@ public static TableStatusReturnTuple separateVisibleAndInvisibleSegments( return new HashMap<>(0); } } + + public static long partitionTableSegmentSize(CarbonTable carbonTable, LoadMetadataDetails Review comment: yes, better not mix the logic of dry run size calculation and actual clean files, keep it separate, so that user will know for sure that when he/she runs the dry run it might take some time as it will do calculation of size. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
akashrn5 commented on a change in pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#discussion_r577523404 ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ## @@ -1072,7 +1097,22 @@ public static void deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean isUpdateRequired(isForceDeletion, carbonTable, identifier, details, cleanStaleInprogress); if (!tuple2.isUpdateRequired) { - return; + try { +for (LoadMetadataDetails oneLoad : details) { + if (isExpiredSegment(oneLoad, carbonTable.getAbsoluteTableIdentifier())) { +if (!carbonTable.isHivePartitionTable()) { + trashSizeRemaining += FileFactory.getDirectorySize(CarbonTablePath +.getSegmentPath(carbonTable.getTablePath(), oneLoad.getLoadName())); +} else { + trashSizeRemaining += partitionTableSegmentSize(carbonTable, oneLoad, +details, partitionSpecs); +} + } +} + } catch (Exception e) { +LOG.error("Unable to calculate size of garbage data", e); + } + return new long[]{sizeFreed, trashSizeRemaining}; Review comment: yes, agree with @ajantha-bhat , i meant the same This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on pull request #4089: [CARBONDATA-4117][[CARBONDATA-4123] cg index and bloom index query issue with Index server
ShreelekhyaG commented on pull request #4089: URL: https://github.com/apache/carbondata/pull/4089#issuecomment-780480217 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
ajantha-bhat commented on a change in pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#discussion_r577500588 ## File path: integration/spark/src/main/scala/org/apache/carbondata/events/CleanFilesEvents.scala ## @@ -26,5 +26,6 @@ case class CleanFilesPreEvent(carbonTable: CarbonTable, sparkSession: SparkSessi case class CleanFilesPostEvent( carbonTable: CarbonTable, sparkSession: SparkSession, -options: Map[String, String]) +options: Map[String, String], +dryRun: Boolean) Review comment: why not sending this in options itself ? ## File path: integration/spark/src/main/scala/org/apache/carbondata/trash/DataTrashManager.scala ## @@ -112,13 +141,91 @@ object DataTrashManager { carbonTable: CarbonTable, isForceDelete: Boolean, cleanStaleInProgress: Boolean, - partitionSpecsOption: Option[Seq[PartitionSpec]]): Unit = { + partitionSpecsOption: Option[Seq[PartitionSpec]]): Seq[Long] = { val partitionSpecs = partitionSpecsOption.map(_.asJava).orNull -SegmentStatusManager.deleteLoadsAndUpdateMetadata(carbonTable, +val sizeStatistics = SegmentStatusManager.deleteLoadsAndUpdateMetadata(carbonTable, isForceDelete, partitionSpecs, cleanStaleInProgress, true) if (carbonTable.isHivePartitionTable && partitionSpecsOption.isDefined) { SegmentFileStore.cleanSegments(carbonTable, partitionSpecs, isForceDelete) } +sizeStatistics + } + + private def dryRunOnExpiredSegments( + carbonTable: CarbonTable, + isForceDelete: Boolean, + cleanStaleInProgress: Boolean, + partitionSpecsOption: Option[Seq[PartitionSpec]]): Seq[Long] = { +var sizeFreed: Long = 0 +var trashSizeRemaining: Long = 0 +val loadMetadataDetails = SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath) +if (SegmentStatusManager.isLoadDeletionRequired(loadMetadataDetails)) { + loadMetadataDetails.foreach { oneLoad => +val segmentFilePath = CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath, + oneLoad.getSegmentFile) +if (DeleteLoadFolders.canDeleteThisLoad(oneLoad, isForceDelete, cleanStaleInProgress)) { + // No need to consider physical data for external segments, only consider metadata. + if (oneLoad.getPath() == null || oneLoad.getPath().equalsIgnoreCase("NA")) { +if (!carbonTable.isHivePartitionTable) { + sizeFreed += FileFactory.getDirectorySize(CarbonTablePath.getSegmentPath(carbonTable + .getTablePath, oneLoad.getLoadName)) +} else { + sizeFreed += partitionTableSegmentSize(carbonTable, oneLoad, loadMetadataDetails, +partitionSpecsOption) +} + } + sizeFreed += FileFactory.getCarbonFile(segmentFilePath).getSize +} else { + if (SegmentStatusManager.isExpiredSegment(oneLoad, carbonTable + .getAbsoluteTableIdentifier)) { +if (!carbonTable.isHivePartitionTable) { + trashSizeRemaining += FileFactory.getDirectorySize(CarbonTablePath.getSegmentPath( Review comment: I see that size calculation code is duplicate in dryrun flow and in clean up flow, can we extract a common method and use it ? ## File path: integration/spark/src/main/scala/org/apache/carbondata/trash/DataTrashManager.scala ## @@ -112,13 +141,91 @@ object DataTrashManager { carbonTable: CarbonTable, isForceDelete: Boolean, cleanStaleInProgress: Boolean, - partitionSpecsOption: Option[Seq[PartitionSpec]]): Unit = { + partitionSpecsOption: Option[Seq[PartitionSpec]]): Seq[Long] = { val partitionSpecs = partitionSpecsOption.map(_.asJava).orNull -SegmentStatusManager.deleteLoadsAndUpdateMetadata(carbonTable, +val sizeStatistics = SegmentStatusManager.deleteLoadsAndUpdateMetadata(carbonTable, isForceDelete, partitionSpecs, cleanStaleInProgress, true) if (carbonTable.isHivePartitionTable && partitionSpecsOption.isDefined) { SegmentFileStore.cleanSegments(carbonTable, partitionSpecs, isForceDelete) } +sizeStatistics + } + + private def dryRunOnExpiredSegments( + carbonTable: CarbonTable, + isForceDelete: Boolean, + cleanStaleInProgress: Boolean, + partitionSpecsOption: Option[Seq[PartitionSpec]]): Seq[Long] = { +var sizeFreed: Long = 0 +var trashSizeRemaining: Long = 0 +val loadMetadataDetails = SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath) +if (SegmentStatusManager.isLoadDeletionRequired(loadMetadataDetails)) { + loadMetadataDetails.foreach { oneLoad => +val segmentFilePath = CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath, + oneLoad.getSegmentFile) +if (DeleteLoadFolders.canDeleteThisLoad(oneLoad, isForceDelete, cleanStaleInProgress)) { + // No need to
[jira] [Commented] (CARBONDATA-4120) select queries against carbondata tables getting stuck when fired through Apache Hive
[ https://issues.apache.org/jira/browse/CARBONDATA-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285745#comment-17285745 ] Indhumathi Muthu Murugesh commented on CARBONDATA-4120: --- can i know what is the schema of your table? > select queries against carbondata tables getting stuck when fired through > Apache Hive > - > > Key: CARBONDATA-4120 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4120 > Project: CarbonData > Issue Type: Improvement > Components: core >Affects Versions: 2.0.1 > Environment: Apache Hive 3.1.2, Apache carbondata 2.0.1 >Reporter: suyash yadav >Priority: Critical > > Hi Team need one more help..We have created a table which has around 172 > million records and we have connected this table through Apache Hive..but > whenever we are running select count(*) on this table through hive, the > query gets stuck.We can run the query successfully when we run it through > spark shell but through Hive it is always getting stuck.One more observation > is, Whenever we run any query which contains join the query gets stuck. Also > for where clause the query gets executed with smaller table but when we run > it against the bigger table, it also gets stuck. So could you giys guide us > how can we run all these queries successfully without any issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-4108) How to connect carbondata with Hive
[ https://issues.apache.org/jira/browse/CARBONDATA-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285744#comment-17285744 ] Indhumathi Muthu Murugesh commented on CARBONDATA-4108: --- Hi, Please follow the below link, to setup carbondata with hive. [https://github.com/apache/carbondata/blob/master/docs/hive-guide.md] Let us know, if your requirement is solved by this hive-carbon setup. > How to connect carbondata with Hive > --- > > Key: CARBONDATA-4108 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4108 > Project: CarbonData > Issue Type: Improvement > Components: core >Affects Versions: 2.0.1 > Environment: apache carbondata 2.0.1, spark 2.4.5, Hive 2.0 >Reporter: suyash yadav >Priority: Major > Fix For: 2.0.1 > > > Hi Team, > We would like to know how to connect hive with carbondata.We are doing a POC > where in we need to access carbondata table through hive but we need this > configuration with username and password. So our hive connection should have > some username and password configuration to connect to carbondata tables. > > Could you guys please review above requirement and suggest steps to achieve > the same. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] akashrn5 commented on pull request #4091: [CARBONDATA-4124] Fix Refresh MV which does not exist error message
akashrn5 commented on pull request #4091: URL: https://github.com/apache/carbondata/pull/4091#issuecomment-780437251 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-4125) SI compatability issue fix
[ https://issues.apache.org/jira/browse/CARBONDATA-4125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash R Nilugal resolved CARBONDATA-4125. - Fix Version/s: 2.2.0 Assignee: Indhumathi Muthu Murugesh Resolution: Fixed > SI compatability issue fix > -- > > Key: CARBONDATA-4125 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4125 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi Muthu Murugesh >Assignee: Indhumathi Muthu Murugesh >Priority: Major > Fix For: 2.2.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Refer > [http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Bug-SI-Compatibility-Issue-td105485.html] > for this issue -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #4087: [CARBONDATA-4125] SI compatability issue fix
asfgit closed pull request #4087: URL: https://github.com/apache/carbondata/pull/4087 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
ajantha-bhat commented on a change in pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#discussion_r577458113 ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ## @@ -1297,4 +1359,37 @@ public static TableStatusReturnTuple separateVisibleAndInvisibleSegments( return new HashMap<>(0); } } + + public static long partitionTableSegmentSize(CarbonTable carbonTable, LoadMetadataDetails Review comment: I am thinking now all the clean file operations will become slow because of these size calculation code, which need to interact with the file system. Default we can have this size calculation. but if user wants clean files to be faster. Can we have some option as `summary = false`, which won't do any new size calculation operation and clean the files faster ?? @akashrn5 , @QiangCai what you think ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
ajantha-bhat commented on a change in pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#discussion_r577458113 ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ## @@ -1297,4 +1359,37 @@ public static TableStatusReturnTuple separateVisibleAndInvisibleSegments( return new HashMap<>(0); } } + + public static long partitionTableSegmentSize(CarbonTable carbonTable, LoadMetadataDetails Review comment: I am thinking now all the clean file operations will become slow because of these size calculation code, which need to interact with the file system. so, can we can some option as `summary = false`, which won't do any new size calculation operation and clean the files faster ?? @akashrn5 , @QiangCai what you think ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #4087: [CARBONDATA-4125] SI compatability issue fix
akashrn5 commented on pull request #4087: URL: https://github.com/apache/carbondata/pull/4087#issuecomment-780424725 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
ajantha-bhat commented on a change in pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#discussion_r577452054 ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ## @@ -1125,13 +1165,32 @@ public static void deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean CarbonLockUtil.fileUnlock(carbonTableStatusLock, LockUsage.TABLE_STATUS_LOCK); } if (updateCompletionStatus) { -DeleteLoadFolders +long[] cleanFileSizeFreed = DeleteLoadFolders .physicalFactAndMeasureMetadataDeletion(carbonTable, newAddedLoadHistoryList, isForceDeletion, partitionSpecs, cleanStaleInprogress); +sizeFreed += cleanFileSizeFreed[0]; +trashSizeRemaining += cleanFileSizeFreed[1]; + } +} + } else { +try { + for (LoadMetadataDetails oneLoad : metadataDetails) { +if (isExpiredSegment(oneLoad, carbonTable.getAbsoluteTableIdentifier())) { + if (!carbonTable.isHivePartitionTable()) { +trashSizeRemaining += FileFactory.getDirectorySize(CarbonTablePath.getSegmentPath( + carbonTable.getTablePath(), oneLoad.getLoadName())); + } else { +trashSizeRemaining += partitionTableSegmentSize(carbonTable, oneLoad, + metadataDetails, partitionSpecs); + } +} } +} catch (Exception e) { + LOG.error("Unable to calculate size of garbage data", e); } } } +return new long[]{sizeFreed, trashSizeRemaining}; Review comment: when nothing is freed by clean files, returning 0 is ok. But when some exception happens in dry run may be better to throw exception This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
ajantha-bhat commented on a change in pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#discussion_r577450283 ## File path: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ## @@ -1072,7 +1097,22 @@ public static void deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean isUpdateRequired(isForceDeletion, carbonTable, identifier, details, cleanStaleInprogress); if (!tuple2.isUpdateRequired) { - return; + try { +for (LoadMetadataDetails oneLoad : details) { + if (isExpiredSegment(oneLoad, carbonTable.getAbsoluteTableIdentifier())) { +if (!carbonTable.isHivePartitionTable()) { + trashSizeRemaining += FileFactory.getDirectorySize(CarbonTablePath +.getSegmentPath(carbonTable.getTablePath(), oneLoad.getLoadName())); +} else { + trashSizeRemaining += partitionTableSegmentSize(carbonTable, oneLoad, +details, partitionSpecs); +} + } +} + } catch (Exception e) { +LOG.error("Unable to calculate size of garbage data", e); + } + return new long[]{sizeFreed, trashSizeRemaining}; Review comment: when nothing is freed by clean files, returning 0 is ok. But when some exception happens in dry run may be better to throw exception This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
CarbonDataQA2 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-780388512 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3713/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4089: [CARBONDATA-4117][[CARBONDATA-4123] cg index and bloom index query issue with Index server
CarbonDataQA2 commented on pull request #4089: URL: https://github.com/apache/carbondata/pull/4089#issuecomment-780381020 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5476/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation
CarbonDataQA2 commented on pull request #4072: URL: https://github.com/apache/carbondata/pull/4072#issuecomment-780379984 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5477/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org