[GitHub] [carbondata] Indhumathi27 closed pull request #4094: [TEST] test

2021-02-17 Thread GitBox


Indhumathi27 closed pull request #4094:
URL: https://github.com/apache/carbondata/pull/4094


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (CARBONDATA-3962) Empty fact dirs are present in case of flat folder, which are unnecessary

2021-02-17 Thread Indhumathi Muthu Murugesh (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286278#comment-17286278
 ] 

Indhumathi Muthu Murugesh edited comment on CARBONDATA-3962 at 2/18/21, 5:16 AM:
-

Reverted PR-3904. [Refer 
https://github.com/apache/carbondata/pull/4095|https://github.com/apache/carbondata/pull/4095]


was (Author: indhumuthumurugesh):
[Refer 
https://github.com/apache/carbondata/pull/4095|https://github.com/apache/carbondata/pull/4095]

> Empty fact dirs are present in case of flat folder, which are unnecessary
> -
>
> Key: CARBONDATA-3962
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3962
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Akash R Nilugal
>Assignee: Akash R Nilugal
>Priority: Minor
> Fix For: 2.1.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Empty fact dirs are present in case of flat folder, which are unnecessary



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-3962) Empty fact dirs are present in case of flat folder, which are unnecessary

2021-02-17 Thread Indhumathi Muthu Murugesh (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286278#comment-17286278
 ] 

Indhumathi Muthu Murugesh commented on CARBONDATA-3962:
---

https://github.com/apache/carbondata/pull/4095

> Empty fact dirs are present in case of flat folder, which are unnecessary
> -
>
> Key: CARBONDATA-3962
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3962
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Akash R Nilugal
>Assignee: Akash R Nilugal
>Priority: Minor
> Fix For: 2.1.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Empty fact dirs are present in case of flat folder, which are unnecessary



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CARBONDATA-3962) Empty fact dirs are present in case of flat folder, which are unnecessary

2021-02-17 Thread Indhumathi Muthu Murugesh (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286278#comment-17286278
 ] 

Indhumathi Muthu Murugesh edited comment on CARBONDATA-3962 at 2/18/21, 5:15 AM:
-

[Refer 
https://github.com/apache/carbondata/pull/4095|https://github.com/apache/carbondata/pull/4095]


was (Author: indhumuthumurugesh):
https://github.com/apache/carbondata/pull/4095

> Empty fact dirs are present in case of flat folder, which are unnecessary
> -
>
> Key: CARBONDATA-3962
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3962
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Akash R Nilugal
>Assignee: Akash R Nilugal
>Priority: Minor
> Fix For: 2.1.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Empty fact dirs are present in case of flat folder, which are unnecessary



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] nihal0107 commented on pull request #4095: [CARBONDATA-3962] Fixed concurrent load failure with flat folder structure.

2021-02-17 Thread GitBox


nihal0107 commented on pull request #4095:
URL: https://github.com/apache/carbondata/pull/4095#issuecomment-781060145


   merged to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 closed pull request #4095: [CARBONDATA-3962] Fixed concurrent load failure with flat folder structure.

2021-02-17 Thread GitBox


nihal0107 closed pull request #4095:
URL: https://github.com/apache/carbondata/pull/4095


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on pull request #4095: [CARBONDATA-3962] Fixed concurrent load failure with flat folder structure.

2021-02-17 Thread GitBox


Indhumathi27 commented on pull request #4095:
URL: https://github.com/apache/carbondata/pull/4095#issuecomment-781054526


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-17 Thread GitBox


CarbonDataQA2 commented on pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#issuecomment-780826470


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3726/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-17 Thread GitBox


CarbonDataQA2 commented on pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#issuecomment-780823505


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5489/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4088: [CARBONDATA-4121] Prepriming is not working in Index Server.

2021-02-17 Thread GitBox


CarbonDataQA2 commented on pull request #4088:
URL: https://github.com/apache/carbondata/pull/4088#issuecomment-780817533


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3725/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4088: [CARBONDATA-4121] Prepriming is not working in Index Server.

2021-02-17 Thread GitBox


CarbonDataQA2 commented on pull request #4088:
URL: https://github.com/apache/carbondata/pull/4088#issuecomment-780816628


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5488/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4095: [CARBONDATA-3962] Fixed concurrent load failure with flat folder structure.

2021-02-17 Thread GitBox


CarbonDataQA2 commented on pull request #4095:
URL: https://github.com/apache/carbondata/pull/4095#issuecomment-780786063


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3722/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4095: [CARBONDATA-3962] Fixed concurrent load failure with flat folder structure.

2021-02-17 Thread GitBox


CarbonDataQA2 commented on pull request #4095:
URL: https://github.com/apache/carbondata/pull/4095#issuecomment-780780682


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5485/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Karan980 commented on a change in pull request #4088: [CARBONDATA-4121] Prepriming is not working in Index Server.

2021-02-17 Thread GitBox


Karan980 commented on a change in pull request #4088:
URL: https://github.com/apache/carbondata/pull/4088#discussion_r577845412



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala
##
@@ -128,6 +128,10 @@ object IndexServer extends ServerInterface {
   def getCount(request: IndexInputFormat): LongWritable = {
 doAs {
   val sparkSession = SparkSQLUtil.getSparkSession
+  var currentUser: String = null
+  if (!request.isFallbackJob && Server.getRemoteUser != null) {
+currentUser = Server.getRemoteUser.getShortUserName

Review comment:
   Null check for Server.getRemoteUser removed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 removed a comment on pull request #4093: [CARBONDATA-4126] Concurrent compaction failed with load on table.

2021-02-17 Thread GitBox


kunal642 removed a comment on pull request #4093:
URL: https://github.com/apache/carbondata/pull/4093#issuecomment-780735856


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #4093: [CARBONDATA-4126] Concurrent compaction failed with load on table.

2021-02-17 Thread GitBox


kunal642 commented on pull request #4093:
URL: https://github.com/apache/carbondata/pull/4093#issuecomment-780735856


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4093: [CARBONDATA-4126] Concurrent compaction failed with load on table.

2021-02-17 Thread GitBox


CarbonDataQA2 commented on pull request #4093:
URL: https://github.com/apache/carbondata/pull/4093#issuecomment-780735611


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5484/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4093: [CARBONDATA-4126] Concurrent compaction failed with load on table.

2021-02-17 Thread GitBox


CarbonDataQA2 commented on pull request #4093:
URL: https://github.com/apache/carbondata/pull/4093#issuecomment-780730471


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3720/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on pull request #4095: [CARBONDATA-3962] Fixed concurrent load failure with flat folder structure.

2021-02-17 Thread GitBox


nihal0107 commented on pull request #4095:
URL: https://github.com/apache/carbondata/pull/4095#issuecomment-780721152


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4095: [CARBONDATA-3962] Fixed concurrent load failure with flat folder structure.

2021-02-17 Thread GitBox


CarbonDataQA2 commented on pull request #4095:
URL: https://github.com/apache/carbondata/pull/4095#issuecomment-780687519


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3719/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4095: [CARBONDATA-3962] Fixed concurrent load failure with flat folder structure.

2021-02-17 Thread GitBox


CarbonDataQA2 commented on pull request #4095:
URL: https://github.com/apache/carbondata/pull/4095#issuecomment-780681729


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5483/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #1409: [WIP][CARBODNATA-1377] support hive partition

2021-02-17 Thread GitBox


CarbonDataQA2 commented on pull request #1409:
URL: https://github.com/apache/carbondata/pull/1409#issuecomment-780675686


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3721/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-4123) Bloom index query with Index server giving incorrect results

2021-02-17 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4123.
--
Fix Version/s: 2.1.1
   Resolution: Fixed

> Bloom index query with Index server giving incorrect results
> 
>
> Key: CARBONDATA-4123
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4123
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Minor
> Fix For: 2.1.1
>
>
> Queries: create table and load data so that it can create >1 blocklet.
>  
> spark-sql> select count(*) from test_rcd where city = 'city40';
> 2021-02-04 22:13:29,759 | WARN | pool-24-thread-1 | It is not recommended to 
> set off-heap working memory size less than 512MB, so setting default value to 
> 512 | 
> org.apache.carbondata.core.memory.UnsafeMemoryManager.(UnsafeMemoryManager.java:83)
> 10
> Time taken: 2.417 seconds, Fetched 1 row(s)
> spark-sql> CREATE INDEX dm_rcd ON TABLE test_rcd (city) AS 'bloomfilter' 
> properties ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');
> 2021-02-04 22:13:58,683 | AUDIT | main | \{"time":"February 4, 2021 10:13:58 
> PM CST","username":"carbon","opName":"CREATE 
> INDEX","opId":"15148202700230273","opStatus":"START"} | 
> carbon.audit.logOperationStart(Auditor.java:74)
> 2021-02-04 22:13:58,759 | WARN | main | Bloom compress is not configured for 
> index dm_rcd, use default value true | 
> org.apache.carbondata.index.bloom.BloomCoarseGrainIndexFactory.validateAndGetBloomCompress(BloomCoarseGrainIndexFactory.java:202)
> 2021-02-04 22:13:59,292 | WARN | Executor task launch worker for task 2 | 
> Bloom compress is not configured for index dm_rcd, use default value true | 
> org.apache.carbondata.index.bloom.BloomCoarseGrainIndexFactory.validateAndGetBloomCompress(BloomCoarseGrainIndexFactory.java:202)
> 2021-02-04 22:13:59,629 | WARN | main | Bloom compress is not configured for 
> index dm_rcd, use default value true | 
> org.apache.carbondata.index.bloom.BloomCoarseGrainIndexFactory.validateAndGetBloomCompress(BloomCoarseGrainIndexFactory.java:202)
> 2021-02-04 22:14:00,331 | AUDIT | main | \{"time":"February 4, 2021 10:14:00 
> PM CST","username":"carbon","opName":"CREATE 
> INDEX","opId":"15148202700230273","opStatus":"SUCCESS","opTime":"1648 
> ms","table":"default.test_rcd","extraInfo":{"provider":"bloomfilter","indexName":"dm_rcd","bloom_size":"64","bloom_fpp":"0.1"}}
>  | carbon.audit.logOperationEnd(Auditor.java:97)
> Time taken: 1.818 seconds
> spark-sql> select count(*) from test_rcd where city = 'city40';
> 30
> Time taken: 0.556 seconds, Fetched 1 row(s)
> spark-sql>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4117) Test cg index query with Index server fails with NPE

2021-02-17 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4117.
--
Fix Version/s: 2.1.1
   Resolution: Fixed

> Test cg index query with Index server fails with NPE
> 
>
> Key: CARBONDATA-4117
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4117
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Test queries to execute:
> spark-sql> CREATE TABLE index_test_cg(id INT, name STRING, city STRING, age 
> INT) STORED AS carbondata TBLPROPERTIES('SORT_COLUMNS'='city,name', 
> 'SORT_SCOPE'='LOCAL_SORT');
> spark-sql> create index cgindex on table index_test_cg (name) as 
> 'org.apache.carbondata.spark.testsuite.index.CGIndexFactory';
> LOAD DATA LOCAL INPATH '$file2' INTO TABLE index_test_cg 
> OPTIONS('header'='false')
> spark-sql> select * from index_test_cg where name='n502670';
> 2021-01-29 15:09:25,881 | ERROR | main | Exception occurred while getting 
> splits using index server. Initiating Fallback to embedded mode | 
> org.apache.carbondata.hadoop.api.CarbonInputFormat.getDistributedSplit(CarbonInputFormat.java:454)
> java.lang.reflect.UndeclaredThrowableException
> at com.sun.proxy.$Proxy69.getSplits(Unknown Source)
> at 
> org.apache.carbondata.indexserver.DistributedIndexJob$$anonfun$1.apply(IndexJobs.scala:85)
> at 
> org.apache.carbondata.indexserver.DistributedIndexJob$$anonfun$1.apply(IndexJobs.scala:59)
> at 
> org.apache.carbondata.spark.util.CarbonScalaUtil$.logTime(CarbonScalaUtil.scala:769)
> at 
> org.apache.carbondata.indexserver.DistributedIndexJob.execute(IndexJobs.scala:58)
> at 
> org.apache.carbondata.core.index.IndexUtil.executeIndexJob(IndexUtil.java:307)
> at 
> org.apache.carbondata.hadoop.api.CarbonInputFormat.getDistributedSplit(CarbonInputFormat.java:443)
> at 
> org.apache.carbondata.hadoop.api.CarbonInputFormat.getPrunedBlocklets(CarbonInputFormat.java:555)
> at 
> org.apache.carbondata.hadoop.api.CarbonInputFormat.getDataBlocksOfSegment(CarbonInputFormat.java:500)
> at 
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:357)
> at 
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:205)
> at 
> org.apache.carbondata.spark.rdd.CarbonScanRDD.internalGetPartitions(CarbonScanRDD.scala:159)
> at org.apache.carbondata.spark.rdd.CarbonRDD.getPartitions(CarbonRDD.scala:68)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2299)
> at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:989)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:384)
> at org.apache.spark.rdd.RDD.collect(RDD.scala:988)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:345)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:372)
> at 
> org.apache.spark.sql.execution.QueryExecution.hiveResultString(QueryExecution.scala:127)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver$$anonfun$run$1.apply(SparkSQLDriver.scala:66)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver$$anonfun$run$1.apply(SparkSQLDriver.scala:66)
> at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:95)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:144)
> at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:86)
> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:789)
> at 
> 

[GitHub] [carbondata] asfgit closed pull request #4089: [CARBONDATA-4117][[CARBONDATA-4123] cg index and bloom index query issue with Index server

2021-02-17 Thread GitBox


asfgit closed pull request #4089:
URL: https://github.com/apache/carbondata/pull/4089


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #4093: [CARBONDATA-4126] Concurrent compaction failed with load on table.

2021-02-17 Thread GitBox


kunal642 commented on pull request #4093:
URL: https://github.com/apache/carbondata/pull/4093#issuecomment-780665646


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #4093: [CARBONDATA-4126] Concurrent compaction failed with load on table.

2021-02-17 Thread GitBox


kunal642 commented on a change in pull request #4093:
URL: https://github.com/apache/carbondata/pull/4093#discussion_r577741017



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala
##
@@ -92,19 +92,22 @@ class CarbonTableCompactor(
   val lastSegment = sortedSegments.get(sortedSegments.size() - 1)
   val compactedLoad = CarbonDataMergerUtil.getMergedLoadName(loadsToMerge)
   var segmentLocks: ListBuffer[ICarbonLock] = ListBuffer.empty
+  val validSegments = new java.util.ArrayList[LoadMetadataDetails]
   loadsToMerge.asScala.foreach { segmentId =>
 val segmentLock = CarbonLockFactory
   
.getCarbonLockObj(carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
 .getAbsoluteTableIdentifier,
 CarbonTablePath.addSegmentPrefix(segmentId.getLoadName) + 
LockUsage.LOCK)
-if (!segmentLock.lockWithRetries()) {
-  throw new Exception(s"Failed to acquire lock on segment 
${segmentId.getLoadName}," +
-s" during compaction of table 
${compactionModel.carbonTable.getQualifiedName}")
+if (segmentLock.lockWithRetries()) {
+  validSegments.add(segmentId)
+  segmentLocks += segmentLock
+} else {
+  LOGGER.warn(s"Failed to acquire lock on segment 
${segmentId.getLoadName}, " +
+  s"during compaction of table 
${compactionModel.carbonTable.getQualifiedName}")
 }
-segmentLocks += segmentLock
   }
   try {
-scanSegmentsAndSubmitJob(loadsToMerge, compactedSegments, 
compactedLoad)
+scanSegmentsAndSubmitJob(validSegments, compactedSegments, 
compactedLoad)

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #4089: [CARBONDATA-4117][[CARBONDATA-4123] cg index and bloom index query issue with Index server

2021-02-17 Thread GitBox


kunal642 commented on pull request #4089:
URL: https://github.com/apache/carbondata/pull/4089#issuecomment-780665000


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-4124) Refresh MV which does not exist is not throwing proper message

2021-02-17 Thread Akash R Nilugal (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash R Nilugal resolved CARBONDATA-4124.
-
Fix Version/s: 2.2.0
 Assignee: Indhumathi Muthu Murugesh
   Resolution: Fixed

> Refresh MV which does not exist is not throwing proper message
> --
>
> Key: CARBONDATA-4124
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4124
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthu Murugesh
>Assignee: Indhumathi Muthu Murugesh
>Priority: Minor
> Fix For: 2.2.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] asfgit closed pull request #4091: [CARBONDATA-4124] Fix Refresh MV which does not exist error message

2021-02-17 Thread GitBox


asfgit closed pull request #4091:
URL: https://github.com/apache/carbondata/pull/4091


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #4091: [CARBONDATA-4124] Fix Refresh MV which does not exist error message

2021-02-17 Thread GitBox


akashrn5 commented on pull request #4091:
URL: https://github.com/apache/carbondata/pull/4091#issuecomment-780642185


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 removed a comment on pull request #4091: [CARBONDATA-4124] Fix Refresh MV which does not exist error message

2021-02-17 Thread GitBox


akashrn5 removed a comment on pull request #4091:
URL: https://github.com/apache/carbondata/pull/4091#issuecomment-779619403


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4091: [CARBONDATA-4124] Fix Refresh MV which does not exist error message

2021-02-17 Thread GitBox


CarbonDataQA2 commented on pull request #4091:
URL: https://github.com/apache/carbondata/pull/4091#issuecomment-780630691


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5482/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4091: [CARBONDATA-4124] Fix Refresh MV which does not exist error message

2021-02-17 Thread GitBox


CarbonDataQA2 commented on pull request #4091:
URL: https://github.com/apache/carbondata/pull/4091#issuecomment-780624830


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3718/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on pull request #4095: [CARBONDATA-3962] Fixed concurrent load failure with flat folder structure.

2021-02-17 Thread GitBox


Indhumathi27 commented on pull request #4095:
URL: https://github.com/apache/carbondata/pull/4095#issuecomment-780616594


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4089: [CARBONDATA-4117][[CARBONDATA-4123] cg index and bloom index query issue with Index server

2021-02-17 Thread GitBox


CarbonDataQA2 commented on pull request #4089:
URL: https://github.com/apache/carbondata/pull/4089#issuecomment-780605031


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3717/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4089: [CARBONDATA-4117][[CARBONDATA-4123] cg index and bloom index query issue with Index server

2021-02-17 Thread GitBox


CarbonDataQA2 commented on pull request #4089:
URL: https://github.com/apache/carbondata/pull/4089#issuecomment-780600848


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5481/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on pull request #4091: [CARBONDATA-4124] Fix Refresh MV which does not exist error message

2021-02-17 Thread GitBox


Indhumathi27 commented on pull request #4091:
URL: https://github.com/apache/carbondata/pull/4091#issuecomment-780563734


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4091: [CARBONDATA-4124] Fix Refresh MV which does not exist error message

2021-02-17 Thread GitBox


CarbonDataQA2 commented on pull request #4091:
URL: https://github.com/apache/carbondata/pull/4091#issuecomment-780543770







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on pull request #4089: [CARBONDATA-4117][[CARBONDATA-4123] cg index and bloom index query issue with Index server

2021-02-17 Thread GitBox


ShreelekhyaG commented on pull request #4089:
URL: https://github.com/apache/carbondata/pull/4089#issuecomment-780542693


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4089: [CARBONDATA-4117][[CARBONDATA-4123] cg index and bloom index query issue with Index server

2021-02-17 Thread GitBox


CarbonDataQA2 commented on pull request #4089:
URL: https://github.com/apache/carbondata/pull/4089#issuecomment-780536006


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3715/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4089: [CARBONDATA-4117][[CARBONDATA-4123] cg index and bloom index query issue with Index server

2021-02-17 Thread GitBox


CarbonDataQA2 commented on pull request #4089:
URL: https://github.com/apache/carbondata/pull/4089#issuecomment-780532287


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5479/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on pull request #4091: [CARBONDATA-4124] Fix Refresh MV which does not exist error message

2021-02-17 Thread GitBox


Indhumathi27 commented on pull request #4091:
URL: https://github.com/apache/carbondata/pull/4091#issuecomment-780497167


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-17 Thread GitBox


akashrn5 commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r577533482



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala
##
@@ -37,11 +38,24 @@ case class CarbonCleanFilesCommand(
 databaseNameOp: Option[String],
 tableName: String,
 options: Map[String, String] = Map.empty,
+dryRun: Boolean,
 isInternalCleanCall: Boolean = false)
   extends DataCommand {
 
   val LOGGER: Logger = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
 
+  override def output: Seq[AttributeReference] = {
+if (dryRun) {
+  Seq(
+AttributeReference("Size Freed", LongType, nullable = false)(),
+AttributeReference("Trash Data Remaining", LongType, nullable = 
false)())
+} else {
+  Seq(
+AttributeReference("Size Freed", LongType, nullable = false)(),
+AttributeReference("Trash Data Remaining", LongType, nullable = 
false)())
+}

Review comment:
   if else both blocks are same? i think better to give these rows only in 
case of dry run

##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/trash/DataTrashManager.scala
##
@@ -87,13 +101,28 @@ object DataTrashManager {
 }
   }
 
-  private def checkAndCleanTrashFolder(carbonTable: CarbonTable, 
isForceDelete: Boolean): Unit = {
+  def cleanFilesDryRunOperation (
+  carbonTable: CarbonTable,
+  isForceDelete: Boolean,
+  cleanStaleInProgress: Boolean,
+  partitionSpecs: Option[Seq[PartitionSpec]] = None): Seq[Long] = {
+// get size freed from the trash folder
+val trashFolderSizeStats = checkAndCleanTrashFolder(carbonTable, 
isForceDelete, isDryRun = true)
+// get size that will be deleted (MFD, COmpacted, Inprogress segments)
+val expiredSegmentsSizeStats = dryRunOnExpiredSegments(carbonTable, 
isForceDelete,
+  cleanStaleInProgress, partitionSpecs)
+Seq(trashFolderSizeStats.head + expiredSegmentsSizeStats.head, 
trashFolderSizeStats(1) +
+expiredSegmentsSizeStats(1))
+  }
+
+  private def checkAndCleanTrashFolder(carbonTable: CarbonTable, 
isForceDelete: Boolean,
+  isDryRun: Boolean): Seq[Long] = {

Review comment:
   i think we are mixing the dry run option also along with forcedelete, 
and making this complex with code and combination handling, what i think is, 
when user say dry run, it should be clear that i dont take any other options 
and i just tell user in return how much and what i am going to clean, thats 
all, we will not delete or clear any files when dry run. So it will be easy to 
code and cleaner, may be new class or a new method in clean files command 
class. What you guys think @ajantha-bhat @QiangCai 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4091: [CARBONDATA-4124] Fix Refresh MV which does not exist error message

2021-02-17 Thread GitBox


CarbonDataQA2 commented on pull request #4091:
URL: https://github.com/apache/carbondata/pull/4091#issuecomment-780491628







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-4131) Concurrent load on table with flat folder structure fails with FileNotFound

2021-02-17 Thread Nihal kumar ojha (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nihal kumar ojha resolved CARBONDATA-4131.
--
Resolution: Duplicate

> Concurrent load on table with flat folder structure fails with FileNotFound
> ---
>
> Key: CARBONDATA-4131
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4131
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4131) Concurrent load on table with flat folder structure fails with FileNotFound

2021-02-17 Thread Nihal kumar ojha (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285791#comment-17285791
 ] 

Nihal kumar ojha commented on CARBONDATA-4131:
--

Duplicate of CARBONDATA-3962

> Concurrent load on table with flat folder structure fails with FileNotFound
> ---
>
> Key: CARBONDATA-4131
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4131
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] akashrn5 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-17 Thread GitBox


akashrn5 commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r577527003



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -1297,4 +1359,37 @@ public static TableStatusReturnTuple 
separateVisibleAndInvisibleSegments(
   return new HashMap<>(0);
 }
   }
+
+  public static long partitionTableSegmentSize(CarbonTable carbonTable, 
LoadMetadataDetails

Review comment:
   yes, better not mix the logic of dry run size calculation and actual 
clean files, keep it separate, so that user will know for sure that when he/she 
runs the dry run it might take some time as it will do calculation of size.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-17 Thread GitBox


akashrn5 commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r577523404



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -1072,7 +1097,22 @@ public static void 
deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean
 isUpdateRequired(isForceDeletion, carbonTable,
 identifier, details, cleanStaleInprogress);
 if (!tuple2.isUpdateRequired) {
-  return;
+  try {
+for (LoadMetadataDetails oneLoad : details) {
+  if (isExpiredSegment(oneLoad, 
carbonTable.getAbsoluteTableIdentifier())) {
+if (!carbonTable.isHivePartitionTable()) {
+  trashSizeRemaining += 
FileFactory.getDirectorySize(CarbonTablePath
+.getSegmentPath(carbonTable.getTablePath(), 
oneLoad.getLoadName()));
+} else {
+  trashSizeRemaining += 
partitionTableSegmentSize(carbonTable, oneLoad,
+details, partitionSpecs);
+}
+  }
+}
+  } catch (Exception e) {
+LOG.error("Unable to calculate size of garbage data", e);
+  }
+  return new long[]{sizeFreed, trashSizeRemaining};

Review comment:
   yes, agree with @ajantha-bhat , i meant the same





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on pull request #4089: [CARBONDATA-4117][[CARBONDATA-4123] cg index and bloom index query issue with Index server

2021-02-17 Thread GitBox


ShreelekhyaG commented on pull request #4089:
URL: https://github.com/apache/carbondata/pull/4089#issuecomment-780480217


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-17 Thread GitBox


ajantha-bhat commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r577500588



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/events/CleanFilesEvents.scala
##
@@ -26,5 +26,6 @@ case class CleanFilesPreEvent(carbonTable: CarbonTable, 
sparkSession: SparkSessi
 case class CleanFilesPostEvent(
 carbonTable: CarbonTable,
 sparkSession: SparkSession,
-options: Map[String, String])
+options: Map[String, String],
+dryRun: Boolean)

Review comment:
   why not sending this in options itself ?

##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/trash/DataTrashManager.scala
##
@@ -112,13 +141,91 @@ object DataTrashManager {
   carbonTable: CarbonTable,
   isForceDelete: Boolean,
   cleanStaleInProgress: Boolean,
-  partitionSpecsOption: Option[Seq[PartitionSpec]]): Unit = {
+  partitionSpecsOption: Option[Seq[PartitionSpec]]): Seq[Long] = {
 val partitionSpecs = partitionSpecsOption.map(_.asJava).orNull
-SegmentStatusManager.deleteLoadsAndUpdateMetadata(carbonTable,
+val sizeStatistics = 
SegmentStatusManager.deleteLoadsAndUpdateMetadata(carbonTable,
   isForceDelete, partitionSpecs, cleanStaleInProgress, true)
 if (carbonTable.isHivePartitionTable && partitionSpecsOption.isDefined) {
   SegmentFileStore.cleanSegments(carbonTable, partitionSpecs, 
isForceDelete)
 }
+sizeStatistics
+  }
+
+  private def dryRunOnExpiredSegments(
+  carbonTable: CarbonTable,
+  isForceDelete: Boolean,
+  cleanStaleInProgress: Boolean,
+  partitionSpecsOption: Option[Seq[PartitionSpec]]): Seq[Long] = {
+var sizeFreed: Long = 0
+var trashSizeRemaining: Long = 0
+val loadMetadataDetails = 
SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)
+if (SegmentStatusManager.isLoadDeletionRequired(loadMetadataDetails)) {
+  loadMetadataDetails.foreach { oneLoad =>
+val segmentFilePath = 
CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath,
+  oneLoad.getSegmentFile)
+if (DeleteLoadFolders.canDeleteThisLoad(oneLoad, isForceDelete, 
cleanStaleInProgress)) {
+  // No need to consider physical data for external segments, only 
consider metadata.
+  if (oneLoad.getPath() == null || 
oneLoad.getPath().equalsIgnoreCase("NA")) {
+if (!carbonTable.isHivePartitionTable) {
+  sizeFreed += 
FileFactory.getDirectorySize(CarbonTablePath.getSegmentPath(carbonTable
+  .getTablePath, oneLoad.getLoadName))
+} else {
+  sizeFreed += partitionTableSegmentSize(carbonTable, oneLoad, 
loadMetadataDetails,
+partitionSpecsOption)
+}
+  }
+  sizeFreed += FileFactory.getCarbonFile(segmentFilePath).getSize
+} else {
+  if (SegmentStatusManager.isExpiredSegment(oneLoad, carbonTable
+  .getAbsoluteTableIdentifier)) {
+if (!carbonTable.isHivePartitionTable) {
+  trashSizeRemaining += 
FileFactory.getDirectorySize(CarbonTablePath.getSegmentPath(

Review comment:
   I see that size calculation code is duplicate in dryrun flow and in 
clean up flow, can we extract a common method and use it ?

##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/trash/DataTrashManager.scala
##
@@ -112,13 +141,91 @@ object DataTrashManager {
   carbonTable: CarbonTable,
   isForceDelete: Boolean,
   cleanStaleInProgress: Boolean,
-  partitionSpecsOption: Option[Seq[PartitionSpec]]): Unit = {
+  partitionSpecsOption: Option[Seq[PartitionSpec]]): Seq[Long] = {
 val partitionSpecs = partitionSpecsOption.map(_.asJava).orNull
-SegmentStatusManager.deleteLoadsAndUpdateMetadata(carbonTable,
+val sizeStatistics = 
SegmentStatusManager.deleteLoadsAndUpdateMetadata(carbonTable,
   isForceDelete, partitionSpecs, cleanStaleInProgress, true)
 if (carbonTable.isHivePartitionTable && partitionSpecsOption.isDefined) {
   SegmentFileStore.cleanSegments(carbonTable, partitionSpecs, 
isForceDelete)
 }
+sizeStatistics
+  }
+
+  private def dryRunOnExpiredSegments(
+  carbonTable: CarbonTable,
+  isForceDelete: Boolean,
+  cleanStaleInProgress: Boolean,
+  partitionSpecsOption: Option[Seq[PartitionSpec]]): Seq[Long] = {
+var sizeFreed: Long = 0
+var trashSizeRemaining: Long = 0
+val loadMetadataDetails = 
SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)
+if (SegmentStatusManager.isLoadDeletionRequired(loadMetadataDetails)) {
+  loadMetadataDetails.foreach { oneLoad =>
+val segmentFilePath = 
CarbonTablePath.getSegmentFilePath(carbonTable.getTablePath,
+  oneLoad.getSegmentFile)
+if (DeleteLoadFolders.canDeleteThisLoad(oneLoad, isForceDelete, 
cleanStaleInProgress)) {
+  // No need to 

[jira] [Commented] (CARBONDATA-4120) select queries against carbondata tables getting stuck when fired through Apache Hive

2021-02-17 Thread Indhumathi Muthu Murugesh (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285745#comment-17285745
 ] 

Indhumathi Muthu Murugesh commented on CARBONDATA-4120:
---

can i know what is the schema of your table?

> select queries against carbondata tables getting stuck when fired through 
> Apache Hive
> -
>
> Key: CARBONDATA-4120
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4120
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.1
> Environment: Apache Hive 3.1.2, Apache carbondata 2.0.1
>Reporter: suyash yadav
>Priority: Critical
>
> Hi Team need one more help..We have created a table which has around 172 
> million records and we have connected this table through Apache Hive..but 
> whenever we are running  select count(*) on this table through hive, the 
> query gets stuck.We can run the query successfully when we run it through 
> spark shell but through Hive it is always getting stuck.One more observation 
> is, Whenever we run any query which contains join the query gets stuck. Also 
> for where clause the query gets executed with smaller table but when we run 
> it against the bigger table, it also gets stuck. So could you giys guide us 
> how can we run all these queries successfully without any issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4108) How to connect carbondata with Hive

2021-02-17 Thread Indhumathi Muthu Murugesh (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285744#comment-17285744
 ] 

Indhumathi Muthu Murugesh commented on CARBONDATA-4108:
---

Hi,

Please follow the below link, to setup carbondata with hive.

[https://github.com/apache/carbondata/blob/master/docs/hive-guide.md]

Let us know, if your requirement is solved by this hive-carbon setup.

> How to connect carbondata with Hive
> ---
>
> Key: CARBONDATA-4108
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4108
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.1
> Environment: apache carbondata 2.0.1, spark 2.4.5, Hive 2.0
>Reporter: suyash yadav
>Priority: Major
> Fix For: 2.0.1
>
>
> Hi Team,
> We would like to know how to connect hive with carbondata.We are doing a POC 
> where in we need to access carbondata table through hive but we need this 
> configuration with username and password. So our hive connection should have 
> some username and password configuration to connect to carbondata tables.
>  
> Could you guys please review above requirement and suggest steps to achieve 
> the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] akashrn5 commented on pull request #4091: [CARBONDATA-4124] Fix Refresh MV which does not exist error message

2021-02-17 Thread GitBox


akashrn5 commented on pull request #4091:
URL: https://github.com/apache/carbondata/pull/4091#issuecomment-780437251


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-4125) SI compatability issue fix

2021-02-17 Thread Akash R Nilugal (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash R Nilugal resolved CARBONDATA-4125.
-
Fix Version/s: 2.2.0
 Assignee: Indhumathi Muthu Murugesh
   Resolution: Fixed

> SI compatability issue fix
> --
>
> Key: CARBONDATA-4125
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4125
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthu Murugesh
>Assignee: Indhumathi Muthu Murugesh
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Refer 
> [http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Bug-SI-Compatibility-Issue-td105485.html]
>  for this issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] asfgit closed pull request #4087: [CARBONDATA-4125] SI compatability issue fix

2021-02-17 Thread GitBox


asfgit closed pull request #4087:
URL: https://github.com/apache/carbondata/pull/4087


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-17 Thread GitBox


ajantha-bhat commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r577458113



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -1297,4 +1359,37 @@ public static TableStatusReturnTuple 
separateVisibleAndInvisibleSegments(
   return new HashMap<>(0);
 }
   }
+
+  public static long partitionTableSegmentSize(CarbonTable carbonTable, 
LoadMetadataDetails

Review comment:
   I am thinking now all the clean file operations will become slow because 
of these size calculation code, which need to interact with the file system.
   
   Default we can have this size calculation. but if user wants clean files to 
be faster. Can we have some option as `summary = false`, which won't do any new 
size calculation operation and clean the files faster ?? @akashrn5 , @QiangCai 
what you think ?
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-17 Thread GitBox


ajantha-bhat commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r577458113



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -1297,4 +1359,37 @@ public static TableStatusReturnTuple 
separateVisibleAndInvisibleSegments(
   return new HashMap<>(0);
 }
   }
+
+  public static long partitionTableSegmentSize(CarbonTable carbonTable, 
LoadMetadataDetails

Review comment:
   I am thinking now all the clean file operations will become slow because 
of these size calculation code, which need to interact with the file system.
   
   so, can we can some option as `summary = false`, which won't do any new size 
calculation operation and clean the files faster ?? @akashrn5 , @QiangCai what 
you think ?
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #4087: [CARBONDATA-4125] SI compatability issue fix

2021-02-17 Thread GitBox


akashrn5 commented on pull request #4087:
URL: https://github.com/apache/carbondata/pull/4087#issuecomment-780424725


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-17 Thread GitBox


ajantha-bhat commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r577452054



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -1125,13 +1165,32 @@ public static void 
deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean
 CarbonLockUtil.fileUnlock(carbonTableStatusLock, 
LockUsage.TABLE_STATUS_LOCK);
   }
   if (updateCompletionStatus) {
-DeleteLoadFolders
+long[] cleanFileSizeFreed = DeleteLoadFolders
 .physicalFactAndMeasureMetadataDeletion(carbonTable, 
newAddedLoadHistoryList,
   isForceDeletion, partitionSpecs, cleanStaleInprogress);
+sizeFreed += cleanFileSizeFreed[0];
+trashSizeRemaining += cleanFileSizeFreed[1];
+  }
+}
+  } else {
+try {
+  for (LoadMetadataDetails oneLoad : metadataDetails) {
+if (isExpiredSegment(oneLoad, 
carbonTable.getAbsoluteTableIdentifier())) {
+  if (!carbonTable.isHivePartitionTable()) {
+trashSizeRemaining += 
FileFactory.getDirectorySize(CarbonTablePath.getSegmentPath(
+  carbonTable.getTablePath(), oneLoad.getLoadName()));
+  } else {
+trashSizeRemaining += partitionTableSegmentSize(carbonTable, 
oneLoad,
+  metadataDetails, partitionSpecs);
+  }
+}
   }
+} catch (Exception e) {
+  LOG.error("Unable to calculate size of garbage data", e);
 }
   }
 }
+return new long[]{sizeFreed, trashSizeRemaining};

Review comment:
   when nothing is freed by clean files, returning 0 is ok. But when some 
exception happens in dry run may be better to throw exception





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-17 Thread GitBox


ajantha-bhat commented on a change in pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#discussion_r577450283



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -1072,7 +1097,22 @@ public static void 
deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean
 isUpdateRequired(isForceDeletion, carbonTable,
 identifier, details, cleanStaleInprogress);
 if (!tuple2.isUpdateRequired) {
-  return;
+  try {
+for (LoadMetadataDetails oneLoad : details) {
+  if (isExpiredSegment(oneLoad, 
carbonTable.getAbsoluteTableIdentifier())) {
+if (!carbonTable.isHivePartitionTable()) {
+  trashSizeRemaining += 
FileFactory.getDirectorySize(CarbonTablePath
+.getSegmentPath(carbonTable.getTablePath(), 
oneLoad.getLoadName()));
+} else {
+  trashSizeRemaining += 
partitionTableSegmentSize(carbonTable, oneLoad,
+details, partitionSpecs);
+}
+  }
+}
+  } catch (Exception e) {
+LOG.error("Unable to calculate size of garbage data", e);
+  }
+  return new long[]{sizeFreed, trashSizeRemaining};

Review comment:
   when nothing is freed by clean files, returning 0 is ok. But when some 
exception happens in dry run may be better to throw exception 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-17 Thread GitBox


CarbonDataQA2 commented on pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#issuecomment-780388512


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3713/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4089: [CARBONDATA-4117][[CARBONDATA-4123] cg index and bloom index query issue with Index server

2021-02-17 Thread GitBox


CarbonDataQA2 commented on pull request #4089:
URL: https://github.com/apache/carbondata/pull/4089#issuecomment-780381020


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5476/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4072: [CARBONDATA-4110] Support clean files dry run operation and show statistics after clean files operation

2021-02-17 Thread GitBox


CarbonDataQA2 commented on pull request #4072:
URL: https://github.com/apache/carbondata/pull/4072#issuecomment-780379984


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5477/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org