[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#issuecomment-738628213


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3306/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#issuecomment-738625327


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5065/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress seg

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#issuecomment-738625173


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3302/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress seg

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#issuecomment-738624282


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5061/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4039: [WIP] Refactor and Fix Insert into partition issue with FileMergeSortComparator

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4039:
URL: https://github.com/apache/carbondata/pull/4039#issuecomment-738619646


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3300/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4039: [WIP] Refactor and Fix Insert into partition issue with FileMergeSortComparator

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4039:
URL: https://github.com/apache/carbondata/pull/4039#issuecomment-738618787


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5059/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-4070) Handle the scenario mentioned in description for SI.

2020-12-03 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-4070:


 Summary: Handle the scenario mentioned in description for SI.
 Key: CARBONDATA-4070
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4070
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha


# SI creation should not be allowed on SI table.
 # SI table should not be scanned with like filter on MT.
 # Drop column should not be allowed on SI table.

Add the FT for all above scenario and sort column related scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] nihal0107 opened a new pull request #4042: [CARBONDATA-4069] handled set streaming for SI table or table having SI.

2020-12-03 Thread GitBox


nihal0107 opened a new pull request #4042:
URL: https://github.com/apache/carbondata/pull/4042


### Why is this PR needed?
Currently `alter table set streaming = true` is allowed either for SI table 
or MT which has SI. 

### What changes were proposed in this PR?
   Handled set streaming true for SI table and MT which has SI and throws the 
exception.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-4069) Alter table set streaming=true should not be allowed on SI table or table having SI.

2020-12-03 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-4069:


 Summary: Alter table set streaming=true should not be allowed on 
SI table or table having SI.
 Key: CARBONDATA-4069
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4069
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha


# Create carbon table and SI .
 # Now set streaming = true on either SI table or main table.

Both the operation should not be allowed because SI is not supported on 
streaming table.

 

create table maintable2 (a string,b string,c int) STORED AS carbondata;

insert into maintable2 values('k','x',2);

create index m_indextable on table maintable2(b) AS 'carbondata';

ALTER TABLE maintable2 SET TBLPROPERTIES('streaming'='true');  => operation 
should not be allowed.

ALTER TABLE m_indextable SET TBLPROPERTIES('streaming'='true') => operation 
should not be allowed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] nihal0107 opened a new pull request #4041: [CARBONDATA-4068] handled set long string column on SI table.

2020-12-03 Thread GitBox


nihal0107 opened a new pull request #4041:
URL: https://github.com/apache/carbondata/pull/4041


### Why is this PR needed?
Currently  set column as long string on which SI is already created is 
allowed. This operation should not be allowed because SI doesn't support long 
strings.

### What changes were proposed in this PR?
   Handled set long string on SI columns and throws the exception.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-4068) Alter table set long string should not allowed on SI column.

2020-12-03 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-4068:


 Summary: Alter table set long string should not allowed on SI 
column.
 Key: CARBONDATA-4068
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4068
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha


# Create table and create SI.
 # Now try to set the column data type to long string on which SI is created.

Operation should not be allowed because we don't support SI on long string.

create table maintable (a string,b string,c int) STORED AS carbondata;

create index indextable on table maintable(b) AS 'carbondata';

insert into maintable values('k','x',2);

ALTER TABLE maintable SET TBLPROPERTIES('long_String_columns'='b');



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] akkio-97 commented on a change in pull request #4001: [CARBONDATA-4029] [CARBONDATA-3908] Issue while adding segments through alter add segment command

2020-12-03 Thread GitBox


akkio-97 commented on a change in pull request #4001:
URL: https://github.com/apache/carbondata/pull/4001#discussion_r535857150



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala
##
@@ -89,8 +89,13 @@ case class CarbonAddLoadCommand(
   throw new ConcurrentOperationException(carbonTable, "insert overwrite", 
"add segment")
 }
 
-val inputPath = options.getOrElse(
+var givenPath = options.getOrElse(
   "path", throw new UnsupportedOperationException("PATH is mandatory"))
+// remove file separator if already present
+if (givenPath.charAt(givenPath.length - 1) == '/') {

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4040: [WIP][CI TEST]

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4040:
URL: https://github.com/apache/carbondata/pull/4040#issuecomment-738578591


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3298/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4040: [WIP][CI TEST]

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4040:
URL: https://github.com/apache/carbondata/pull/4040#issuecomment-738576741


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5057/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4032: [WIP][CARBONDATA-4065] Support MERGE INTO SQL Command

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#issuecomment-738549058


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3297/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4032: [WIP][CARBONDATA-4065] Support MERGE INTO SQL Command

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#issuecomment-738546905


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5056/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Zhangshunyu opened a new pull request #4040: [WIP][CI TEST]

2020-12-03 Thread GitBox


Zhangshunyu opened a new pull request #4040:
URL: https://github.com/apache/carbondata/pull/4040


### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-738528336


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3296/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-738527880


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5055/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-4054) Size control of minor compaction

2020-12-03 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-4054.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

> Size control of minor compaction
> 
>
> Key: CARBONDATA-4054
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4054
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: ZHANGSHUNYU
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> {{Currentlly, minor compaction only consider the num of segments and major}}
> compaction only consider the SUM size of segments, but consider a scenario
>  that the user want to use minor compaction by the num of segments but he
>  dont want to merge the segment whose datasize larger the threshold for
>  example 2GB, as it is no need to merge so much big segment and it is time
>  costly.
>  so we need to add a parameter to control the threshold of segment included
>  in minor compaction, so that the user can specify the segment not included
>  in minor compaction once the datasize exeed the threshold, of course default
>  value must be threre.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] asfgit closed pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction

2020-12-03 Thread GitBox


asfgit closed pull request #4020:
URL: https://github.com/apache/carbondata/pull/4020


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Zhangshunyu commented on pull request #4032: [CARBONDATA-4065] Support MERGE INTO SQL Command

2020-12-03 Thread GitBox


Zhangshunyu commented on pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#issuecomment-738502710


   Added support insert literal
   TODO: Add doc and more test cases



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

2020-12-03 Thread GitBox


shenjiayu17 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-738483684


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress seg

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#issuecomment-738280044







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4039: [WIP] Refactor and Fix Insert into partition issue with FileMergeSortComparator

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4039:
URL: https://github.com/apache/carbondata/pull/4039#issuecomment-738201557


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3293/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4039: [WIP] Refactor and Fix Insert into partition issue with FileMergeSortComparator

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4039:
URL: https://github.com/apache/carbondata/pull/4039#issuecomment-738200415


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5052/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale

2020-12-03 Thread GitBox


vikramahuja1001 commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535465503



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -1095,7 +1099,8 @@ public static void 
deleteLoadsAndUpdateMetadata(CarbonTable carbonTable, boolean
 // if execute command 'clean files' or the number of invisible 
segment info
 // exceeds the value of 'carbon.invisible.segments.preserve.count',
 // it need to append the invisible segment list to 
'tablestatus.history' file.
-if (isForceDeletion || (invisibleSegmentCnt > 
invisibleSegmentPreserveCnt)) {
+if (cleanStaleInprogress || cleanCompactedAndMFD || 
(invisibleSegmentCnt >

Review comment:
   changes it to isCleanFilesOperation, if we do not keep this, we will 
change the behaviour of show history segments, as that file would never be 
updated, since previously clean files was always force option, it would always 
go in that if condition for clean files





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale

2020-12-03 Thread GitBox


vikramahuja1001 commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535287432



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -1039,17 +1039,19 @@ private static void 
writeLoadMetadata(AbsoluteTableIdentifier identifier,
 }
   }
 
-  private static ReturnTuple isUpdateRequired(boolean isForceDeletion, 
CarbonTable carbonTable,
-  AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] 
details) {
+  private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, 
boolean

Review comment:
   i have put it back

##
File path: 
core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java
##
@@ -173,40 +176,68 @@ public boolean accept(CarbonFile file) {
   }
 
   private static boolean checkIfLoadCanBeDeleted(LoadMetadataDetails oneLoad,
-  boolean isForceDelete) {
-if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() ||
-SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() ||
-SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() ||
-SegmentStatus.INSERT_OVERWRITE_IN_PROGRESS == 
oneLoad.getSegmentStatus())
-&& oneLoad.getVisibility().equalsIgnoreCase("true")) {
-  if (isForceDelete) {
-return true;
-  }
-  long deletionTime = oneLoad.getModificationOrDeletionTimestamp();
-  return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && 
CarbonUpdateUtil
-  .isMaxQueryTimeoutExceeded(deletionTime);
+  boolean cleanStaleInProgress, boolean cleanCompactedAndMFD) {
+if (oneLoad.getVisibility().equalsIgnoreCase("true")) {

Review comment:
   it's handeled in checkIfLoadCanBeDeletedPhysically method, it does not 
check visibility before deleting

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala
##
@@ -117,9 +119,9 @@ case class CarbonCleanFilesCommand(
 CleanFilesUtil.cleanStaleSegments(carbonTable)
   }
   if (forceTableClean) {

Review comment:
   it is removed in the PR4013 by David

##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -1039,17 +1039,19 @@ private static void 
writeLoadMetadata(AbsoluteTableIdentifier identifier,
 }
   }
 
-  private static ReturnTuple isUpdateRequired(boolean isForceDeletion, 
CarbonTable carbonTable,
-  AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] 
details) {
+  private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, 
boolean

Review comment:
   updated as per the above discussion

##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -1039,17 +1039,19 @@ private static void 
writeLoadMetadata(AbsoluteTableIdentifier identifier,
 }
   }
 
-  private static ReturnTuple isUpdateRequired(boolean isForceDeletion, 
CarbonTable carbonTable,

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 opened a new pull request #4039: [WIP] Refactor and Fix Insert into partition issue with FileMergeSortComparator

2020-12-03 Thread GitBox


Indhumathi27 opened a new pull request #4039:
URL: https://github.com/apache/carbondata/pull/4039


   
### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress seg

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#issuecomment-738128556


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3291/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogress seg

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#issuecomment-738128162


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5049/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager

2020-12-03 Thread GitBox


ajantha-bhat commented on a change in pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#discussion_r535392789



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala
##
@@ -90,7 +90,6 @@ class CarbonTableCompactor(carbonLoadModel: CarbonLoadModel,
 
 while (loadsToMerge.size() > 1 || needSortSingleSegment(loadsToMerge)) {
   val lastSegment = sortedSegments.get(sortedSegments.size() - 1)
-  deletePartialLoadsInCompaction()

Review comment:
   I think this will go as a stale segment in trash folder when clean files 
is called right ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager

2020-12-03 Thread GitBox


ajantha-bhat commented on a change in pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#discussion_r535391315



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/DropIndexCommand.scala
##
@@ -210,12 +210,6 @@ private[sql] case class DropIndexCommand(
   logError("Table metadata unlocking is unsuccessful, index table may 
be in stale state")
 }
   }
-  // in case if the the physical folders still exists for the index table
-  // but the carbon and hive info for the index table is removed,
-  // DROP INDEX IF EXISTS should clean up those physical directories
-  if (ifExistsSet && carbonTable.isEmpty) {

Review comment:
   I agree with @akashrn5 : If code is buggy, we need to fix it. If we 
remove it. It might bring back the original issues that caused this code 
change. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp

2020-12-03 Thread GitBox


ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535385941



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/cleanfiles/TestCleanFileCommand.scala
##
@@ -346,4 +346,10 @@ class TestCleanFileCommand extends QueryTest with 
BeforeAndAfterAll {
 sql(s"""INSERT INTO CLEANTEST SELECT "abc", 1, "name)
   }
 
+  def removeSegmentEntryFromTableStatusFile(carbonTable: CarbonTable, 
segmentNo: String) : Unit = {

Review comment:
   @vikramahuja1001 : yes, instead of defining in each file, move it to 
some test util.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#issuecomment-738119523


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3292/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#issuecomment-738118531


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5050/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4001: [CARBONDATA-4029] [CARBONDATA-3908] Issue while adding segments through alter add segment command

2020-12-03 Thread GitBox


ajantha-bhat commented on a change in pull request #4001:
URL: https://github.com/apache/carbondata/pull/4001#discussion_r535365912



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala
##
@@ -89,8 +89,13 @@ case class CarbonAddLoadCommand(
   throw new ConcurrentOperationException(carbonTable, "insert overwrite", 
"add segment")
 }
 
-val inputPath = options.getOrElse(
+var givenPath = options.getOrElse(
   "path", throw new UnsupportedOperationException("PATH is mandatory"))
+// remove file separator if already present
+if (givenPath.charAt(givenPath.length - 1) == '/') {
+  givenPath = givenPath.substring(0, givenPath.length - 1)
+}
+val inputPath = givenPath

Review comment:
   can you modify current add segment testcase where one has / at the end 
and the one doesn't ? so, testcase will cover new code also





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4001: [CARBONDATA-4029] [CARBONDATA-3908] Issue while adding segments through alter add segment command

2020-12-03 Thread GitBox


ajantha-bhat commented on a change in pull request #4001:
URL: https://github.com/apache/carbondata/pull/4001#discussion_r535365097



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala
##
@@ -89,8 +89,13 @@ case class CarbonAddLoadCommand(
   throw new ConcurrentOperationException(carbonTable, "insert overwrite", 
"add segment")
 }
 
-val inputPath = options.getOrElse(
+var givenPath = options.getOrElse(
   "path", throw new UnsupportedOperationException("PATH is mandatory"))
+// remove file separator if already present
+if (givenPath.charAt(givenPath.length - 1) == '/') {

Review comment:
   ```suggestion
   if (givenPath.charAt(givenPath.length - 1) == 
'CarbonCommonConstans.FILE_SEPARATOR') {
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogr

2020-12-03 Thread GitBox


akashrn5 commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535359638



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/CleanFilesPostEventListener.scala
##
@@ -53,13 +51,15 @@ class CleanFilesPostEventListener extends 
OperationEventListener with Logging {
 val carbonTable = cleanFilesPostEvent.carbonTable
 val indexTables = CarbonIndexUtil
   .getIndexCarbonTables(carbonTable, cleanFilesPostEvent.sparkSession)
+val force = cleanFilesPostEvent.ifForceDeletion

Review comment:
   ```suggestion
   val isForceDelete = cleanFilesPostEvent.ifForceDeletion
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogr

2020-12-03 Thread GitBox


akashrn5 commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535359069



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonCleanFilesCommand.scala
##
@@ -58,8 +58,11 @@ case class CarbonCleanFilesCommand(
   var carbonTable: CarbonTable = _
   var cleanFileCommands: List[CarbonCleanFilesCommand] = List.empty
   val optionsMap = options.getOrElse(List.empty[(String, String)]).toMap
-  // forceClean will empty trash
+  // forceClean will clean the MFD and Compacted segments immediately and also 
empty the trash
+  // folder
   val forceClean = optionsMap.getOrElse("force", "false").toBoolean
+  // stale_inprogress will clean the In Progress segments

Review comment:
   update comment saying, it will clean based on retention time and it will 
clean immediately when force is true





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp

2020-12-03 Thread GitBox


ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535358582



##
File path: docs/clean-files.md
##
@@ -37,10 +38,23 @@ Clean files command is used to remove the Compacted, Marked 
For Delete ,In Progr
``` 
   Once the timestamp subdirectory is expired as per the configured expiration 
day value, that subdirectory is deleted from the trash folder in the subsequent 
clean files command.
 
-### FORCE DELETE TRASH
-The force option with clean files command deletes all the files and folders 
from the trash folder.
+### FORCE OPTION
+The force option with clean files command deletes all the files and folders 
from the trash folder and delete the Marked for Delete and Compacted segments 
immediately.
 
   ```
   CLEAN FILES FOR TABLE TABLE_NAME options('force'='true')
   ```
-Since Clean Files operation with force option will delete data that can never 
be recovered, the force option by default is disabled. Clean files with force 
option is only allowed when the carbon property 
```carbon.clean.file.force.allowed``` is set to true. The default value of this 
property is false.
\ No newline at end of file
+Since Clean Files operation with force option will delete data that can never 
be recovered, the force option by default is disabled. Clean files with force 
option is only allowed when the carbon property 
```carbon.clean.file.force.allowed``` is set to true. The default value of this 
property is false.

Review comment:
   move this section, above force as they have to enable this before trying 
force





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp

2020-12-03 Thread GitBox


ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535357362



##
File path: docs/clean-files.md
##
@@ -24,6 +24,7 @@ Clean files command is used to remove the Compacted, Marked 
For Delete ,In Progr
```
CLEAN FILES FOR TABLE TABLE_NAME
```
+The above clean files command will clean Marked For Delete and Compacted 
segments depending on query time (default 1 hr) and trash retention timeout 
(default 7 days). It will also delete the timestamp subdirectories from the 
trash folder after expiration day(default 7 day, can be configured)

Review comment:
   query time,  trash retention timeout  => use the original carbon 
property names 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp

2020-12-03 Thread GitBox


ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535356095



##
File path: 
core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java
##
@@ -173,42 +176,73 @@ public boolean accept(CarbonFile file) {
   }
 
   private static boolean checkIfLoadCanBeDeleted(LoadMetadataDetails oneLoad,
-  boolean isForceDelete) {
-if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() ||
-SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() ||
-SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() ||
-SegmentStatus.INSERT_OVERWRITE_IN_PROGRESS == 
oneLoad.getSegmentStatus())
-&& oneLoad.getVisibility().equalsIgnoreCase("true")) {
-  if (isForceDelete) {
-return true;
-  }
-  long deletionTime = oneLoad.getModificationOrDeletionTimestamp();
-  return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && 
CarbonUpdateUtil
-  .isMaxQueryTimeoutExceeded(deletionTime);
+  boolean cleanStaleInProgress, boolean isForceDeletion) {
+if (oneLoad.getVisibility().equalsIgnoreCase("true")) {
+  return checkLoadDeletionLogic(oneLoad, isForceDeletion, 
cleanStaleInProgress);
 }
-
 return false;
   }
 
   private static boolean checkIfLoadCanBeDeletedPhysically(LoadMetadataDetails 
oneLoad,
-  boolean isForceDelete) {
+  boolean isForceDeletion, boolean cleanStaleInProgress) {
 // Check if the segment is added externally and path is set then do not 
delete it
-if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus()
-|| SegmentStatus.COMPACTED == oneLoad.getSegmentStatus()) && 
(oneLoad.getPath() == null
-|| oneLoad.getPath().equalsIgnoreCase("NA"))) {
-  if (isForceDelete) {
-return true;
-  }
-  long deletionTime = oneLoad.getModificationOrDeletionTimestamp();
-
-  return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && 
CarbonUpdateUtil
-  .isMaxQueryTimeoutExceeded(deletionTime);
-
+if (oneLoad.getPath() == null || oneLoad.getPath().equalsIgnoreCase("NA")) 
{
+  return checkLoadDeletionLogic(oneLoad,  isForceDeletion, 
cleanStaleInProgress);
 }
-
 return false;
   }
 
+  private static Boolean checkLoadDeletionLogic(LoadMetadataDetails oneLoad,
+  boolean isForceDeletion, boolean cleanStaleInProgress) {
+/*
+ * if cleanStaleInProgress == false and  force == false, clean MFD and 
Compacted
+ *  segments will depend on query timeout(1 hr) and 
trashRetentionTimeout(7 days, default).
+ *  For example:
+ *  If trashRetentionTimeout is 7 days and query timeout is 1 hr--> Delete 
after 7 days
+ *  If trashRetentionTimeout is 0 days and query timeout is 1 hr--> Delete 
after 1 hr
+ *
+ * if cleanStaleInProgress == false and  force == true, clean MFD and 
Compacted
+ *  segments immediately(Do not check for any timeout)
+ *
+ * if cleanStaleInProgress == true and  force == false, clean Stale 
Inprogress, MFD and
+ *  compacted segments after 7 days(taking carbon.trash.retention.time 
value)
+ *
+ * if cleanStaleInProgress == true and  force == true, clean MFD, 
Compacted and
+ *  stale inprogress segments immediately.(Do not check for any timeout)
+ */
+if (isForceDeletion) {
+  // immediately delete compacted and MFD
+  if (cleanStaleInProgress) {
+// immediately delete inprogress segments too
+return SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() 
|| SegmentStatus
+.INSERT_OVERWRITE_IN_PROGRESS == oneLoad.getSegmentStatus() || 
SegmentStatus
+.COMPACTED == oneLoad.getSegmentStatus() || 
SegmentStatus.MARKED_FOR_DELETE ==
+oneLoad.getSegmentStatus();
+  }
+  return SegmentStatus.COMPACTED ==
+  oneLoad.getSegmentStatus() || SegmentStatus.MARKED_FOR_DELETE == 
oneLoad
+  .getSegmentStatus();
+}
+long deletionTime = oneLoad.getModificationOrDeletionTimestamp();
+// in case there is no deletion or modification timestamp, take the load 
start time as
+// deleteTime
+if (deletionTime == 0) {
+  deletionTime = oneLoad.getLoadStartTime();
+}
+if (cleanStaleInProgress) {
+  // delete MFD, compacted and Inprogress segments after trash timeout
+  return (SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() 
|| SegmentStatus

Review comment:
   same as above





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp

2020-12-03 Thread GitBox


ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535355003



##
File path: 
core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java
##
@@ -173,42 +176,73 @@ public boolean accept(CarbonFile file) {
   }
 
   private static boolean checkIfLoadCanBeDeleted(LoadMetadataDetails oneLoad,
-  boolean isForceDelete) {
-if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() ||
-SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() ||
-SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() ||
-SegmentStatus.INSERT_OVERWRITE_IN_PROGRESS == 
oneLoad.getSegmentStatus())
-&& oneLoad.getVisibility().equalsIgnoreCase("true")) {
-  if (isForceDelete) {
-return true;
-  }
-  long deletionTime = oneLoad.getModificationOrDeletionTimestamp();
-  return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && 
CarbonUpdateUtil
-  .isMaxQueryTimeoutExceeded(deletionTime);
+  boolean cleanStaleInProgress, boolean isForceDeletion) {
+if (oneLoad.getVisibility().equalsIgnoreCase("true")) {
+  return checkLoadDeletionLogic(oneLoad, isForceDeletion, 
cleanStaleInProgress);
 }
-
 return false;
   }
 
   private static boolean checkIfLoadCanBeDeletedPhysically(LoadMetadataDetails 
oneLoad,
-  boolean isForceDelete) {
+  boolean isForceDeletion, boolean cleanStaleInProgress) {
 // Check if the segment is added externally and path is set then do not 
delete it
-if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus()
-|| SegmentStatus.COMPACTED == oneLoad.getSegmentStatus()) && 
(oneLoad.getPath() == null
-|| oneLoad.getPath().equalsIgnoreCase("NA"))) {
-  if (isForceDelete) {
-return true;
-  }
-  long deletionTime = oneLoad.getModificationOrDeletionTimestamp();
-
-  return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && 
CarbonUpdateUtil
-  .isMaxQueryTimeoutExceeded(deletionTime);
-
+if (oneLoad.getPath() == null || oneLoad.getPath().equalsIgnoreCase("NA")) 
{
+  return checkLoadDeletionLogic(oneLoad,  isForceDeletion, 
cleanStaleInProgress);
 }
-
 return false;
   }
 
+  private static Boolean checkLoadDeletionLogic(LoadMetadataDetails oneLoad,
+  boolean isForceDeletion, boolean cleanStaleInProgress) {
+/*
+ * if cleanStaleInProgress == false and  force == false, clean MFD and 
Compacted
+ *  segments will depend on query timeout(1 hr) and 
trashRetentionTimeout(7 days, default).
+ *  For example:
+ *  If trashRetentionTimeout is 7 days and query timeout is 1 hr--> Delete 
after 7 days
+ *  If trashRetentionTimeout is 0 days and query timeout is 1 hr--> Delete 
after 1 hr
+ *
+ * if cleanStaleInProgress == false and  force == true, clean MFD and 
Compacted
+ *  segments immediately(Do not check for any timeout)
+ *
+ * if cleanStaleInProgress == true and  force == false, clean Stale 
Inprogress, MFD and
+ *  compacted segments after 7 days(taking carbon.trash.retention.time 
value)
+ *
+ * if cleanStaleInProgress == true and  force == true, clean MFD, 
Compacted and
+ *  stale inprogress segments immediately.(Do not check for any timeout)
+ */
+if (isForceDeletion) {
+  // immediately delete compacted and MFD
+  if (cleanStaleInProgress) {
+// immediately delete inprogress segments too
+return SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() 
|| SegmentStatus
+.INSERT_OVERWRITE_IN_PROGRESS == oneLoad.getSegmentStatus() || 
SegmentStatus
+.COMPACTED == oneLoad.getSegmentStatus() || 
SegmentStatus.MARKED_FOR_DELETE ==

Review comment:
   move COMPACTED and MARKED_FOR_DELETE above and return if true.
   check only INSERT_OVERWRITE_IN_PROGRESS, INSERT_IN_PROGRESS in 
INSERT_IN_PROGRESS





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp

2020-12-03 Thread GitBox


ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535348536



##
File path: 
core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java
##
@@ -173,42 +176,73 @@ public boolean accept(CarbonFile file) {
   }
 
   private static boolean checkIfLoadCanBeDeleted(LoadMetadataDetails oneLoad,
-  boolean isForceDelete) {
-if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus() ||
-SegmentStatus.COMPACTED == oneLoad.getSegmentStatus() ||
-SegmentStatus.INSERT_IN_PROGRESS == oneLoad.getSegmentStatus() ||
-SegmentStatus.INSERT_OVERWRITE_IN_PROGRESS == 
oneLoad.getSegmentStatus())
-&& oneLoad.getVisibility().equalsIgnoreCase("true")) {
-  if (isForceDelete) {
-return true;
-  }
-  long deletionTime = oneLoad.getModificationOrDeletionTimestamp();
-  return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && 
CarbonUpdateUtil
-  .isMaxQueryTimeoutExceeded(deletionTime);
+  boolean cleanStaleInProgress, boolean isForceDeletion) {
+if (oneLoad.getVisibility().equalsIgnoreCase("true")) {
+  return checkLoadDeletionLogic(oneLoad, isForceDeletion, 
cleanStaleInProgress);
 }
-
 return false;
   }
 
   private static boolean checkIfLoadCanBeDeletedPhysically(LoadMetadataDetails 
oneLoad,
-  boolean isForceDelete) {
+  boolean isForceDeletion, boolean cleanStaleInProgress) {
 // Check if the segment is added externally and path is set then do not 
delete it
-if ((SegmentStatus.MARKED_FOR_DELETE == oneLoad.getSegmentStatus()
-|| SegmentStatus.COMPACTED == oneLoad.getSegmentStatus()) && 
(oneLoad.getPath() == null
-|| oneLoad.getPath().equalsIgnoreCase("NA"))) {
-  if (isForceDelete) {
-return true;
-  }
-  long deletionTime = oneLoad.getModificationOrDeletionTimestamp();
-
-  return TrashUtil.isTrashRetentionTimeoutExceeded(deletionTime) && 
CarbonUpdateUtil
-  .isMaxQueryTimeoutExceeded(deletionTime);
-
+if (oneLoad.getPath() == null || oneLoad.getPath().equalsIgnoreCase("NA")) 
{
+  return checkLoadDeletionLogic(oneLoad,  isForceDeletion, 
cleanStaleInProgress);

Review comment:
   ```suggestion
 return CanDeleteThisLoad(oneLoad,  isForceDeletion, 
cleanStaleInProgress);
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager

2020-12-03 Thread GitBox


akashrn5 commented on a change in pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#discussion_r535345986



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/DropIndexCommand.scala
##
@@ -210,12 +210,6 @@ private[sql] case class DropIndexCommand(
   logError("Table metadata unlocking is unsuccessful, index table may 
be in stale state")
 }
   }
-  // in case if the the physical folders still exists for the index table
-  // but the carbon and hive info for the index table is removed,
-  // DROP INDEX IF EXISTS should clean up those physical directories
-  if (ifExistsSet && carbonTable.isEmpty) {

Review comment:
   are you saying that, even though the t2 is not an index, and i call drop 
index on t2, it will consider as a index and drop t2 table folder? If so, we 
should handle there and keep this code here. This was handled specially for SI 
i think in negative scenarios





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager

2020-12-03 Thread GitBox


akashrn5 commented on a change in pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#discussion_r535339696



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala
##
@@ -90,7 +90,6 @@ class CarbonTableCompactor(carbonLoadModel: CarbonLoadModel,
 
 while (loadsToMerge.size() > 1 || needSortSingleSegment(loadsToMerge)) {
   val lastSegment = sortedSegments.get(sortedSegments.size() - 1)
-  deletePartialLoadsInCompaction()

Review comment:
   but the stale data will always be present inside segment folder right? 
by chance assume if the segment file is corrupted or deleted, still carbon 
should pass the query, which it does by listing, in that case we will get the 
wrong data or query will always fail. I think we need to have some way to clean 
them, what you think @QiangCai @ajantha-bhat 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp

2020-12-03 Thread GitBox


ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535334869



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -1039,17 +999,19 @@ private static void 
writeLoadMetadata(AbsoluteTableIdentifier identifier,
 }
   }
 
-  private static ReturnTuple isUpdateRequired(boolean isForceDeletion, 
CarbonTable carbonTable,
-  AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] 
details) {
+  private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, 
boolean
+  isForceDeletion, CarbonTable carbonTable, AbsoluteTableIdentifier
+  absoluteTableIdentifier, LoadMetadataDetails[] details) {
 // Delete marked loads
 boolean isUpdateRequired = DeleteLoadFolders
-.deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, 
isForceDeletion, details,
-carbonTable.getMetadataPath());
+.deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, 
cleanStaleInProgress,

Review comment:
   it is always advisable to add a new argument at the end





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp

2020-12-03 Thread GitBox


ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535333805



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -1039,17 +999,19 @@ private static void 
writeLoadMetadata(AbsoluteTableIdentifier identifier,
 }
   }
 
-  private static ReturnTuple isUpdateRequired(boolean isForceDeletion, 
CarbonTable carbonTable,
-  AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] 
details) {
+  private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, 
boolean
+  isForceDeletion, CarbonTable carbonTable, AbsoluteTableIdentifier
+  absoluteTableIdentifier, LoadMetadataDetails[] details) {
 // Delete marked loads
 boolean isUpdateRequired = DeleteLoadFolders
-.deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, 
isForceDeletion, details,
-carbonTable.getMetadataPath());
+.deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, 
cleanStaleInProgress,

Review comment:
   check for all the places and handle





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager

2020-12-03 Thread GitBox


akashrn5 commented on a change in pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#discussion_r535332892



##
File path: 
core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
##
@@ -482,176 +482,6 @@ public boolean accept(CarbonFile file) {
 
   }
 
-  /**
-   * Handling of the clean up of old carbondata files, index files , delete 
delta,
-   * update status files.
-   * @param table clean up will be handled on this table.
-   * @param forceDelete if true then max query execution timeout will not be 
considered.
-   */
-  public static void cleanUpDeltaFiles(CarbonTable table, boolean forceDelete) 
throws IOException {

Review comment:
   cleanStaleDeltaFiles will be called only in case of exception here, 
please see below points
   1. in Projectfordeletecommand, its called two times both in exception  case 
and finally block, thats wrong, it should be only in finally block
   2. its called multiple times in `DeleteExecution ` class which needs to be 
checked and avoid it.
   3. since its handled only in case of exception case, how its taken care in 
case of application crash or shutdown scenario?
   4. cleanStaleDeltaFiles  had cases of handling the update and delete aborted 
case, how its handled now, as the complete method is removed?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp

2020-12-03 Thread GitBox


ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535332006



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -1039,17 +999,19 @@ private static void 
writeLoadMetadata(AbsoluteTableIdentifier identifier,
 }
   }
 
-  private static ReturnTuple isUpdateRequired(boolean isForceDeletion, 
CarbonTable carbonTable,
-  AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] 
details) {
+  private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, 
boolean
+  isForceDeletion, CarbonTable carbonTable, AbsoluteTableIdentifier
+  absoluteTableIdentifier, LoadMetadataDetails[] details) {
 // Delete marked loads
 boolean isUpdateRequired = DeleteLoadFolders
-.deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, 
isForceDeletion, details,
-carbonTable.getMetadataPath());
+.deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, 
cleanStaleInProgress,

Review comment:
   keep the original order of arguments (identifier, isForceDeletion, 
cleanStaleInProgress, deatils, path), to avoid sending wrong arguments values 
from callers





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp

2020-12-03 Thread GitBox


ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535321147



##
File path: 
core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java
##
@@ -90,13 +93,13 @@ public static void 
physicalFactAndMeasureMetadataDeletion(CarbonTable carbonTabl
* Delete the invalid data physically from table.
* @param carbonTable table
* @param loadDetails Load details which need clean up
-   * @param isForceDelete is Force delete requested by user
+   * @param force Force delete Compacted and MFD segments

Review comment:
   force --> isForceDelete
   Also it will force clean the trash. add that also in header description 

##
File path: 
core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java
##
@@ -68,19 +68,22 @@ private static String 
getSegmentPath(AbsoluteTableIdentifier identifier,
 
   public static void physicalFactAndMeasureMetadataDeletion(CarbonTable 
carbonTable,
   LoadMetadataDetails[] newAddedLoadHistoryList,
-  boolean isForceDelete,
+  boolean isForceDeletion,

Review comment:
   keep it same as previous `isForceDelete`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4037: [WIP] Handle Alter long string for SI col and added FTs for SI

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4037:
URL: https://github.com/apache/carbondata/pull/4037#issuecomment-738046006


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3289/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4037: [WIP] Handle Alter long string for SI col and added FTs for SI

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4037:
URL: https://github.com/apache/carbondata/pull/4037#issuecomment-738041238


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5047/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale

2020-12-03 Thread GitBox


vikramahuja1001 commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r534678032



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -1039,17 +1039,19 @@ private static void 
writeLoadMetadata(AbsoluteTableIdentifier identifier,
 }
   }
 
-  private static ReturnTuple isUpdateRequired(boolean isForceDeletion, 
CarbonTable carbonTable,
-  AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] 
details) {
+  private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, 
boolean

Review comment:
   Instead of just a force option deleting all MFD, Compacted and stale 
Insert In Progress segments, dividing them into 2 parameters, 
cleanMFDandCompacted and cleanStaleInProgress. CleanMFDAndCompacted parameter 
will decide if MFD and Compacted segments can be deleted and 
cleanStaleInProgress will decide if stale InProgress segments can be deleted. 
After giving these 2 parameters, force can be removed. if the user wants to do 
force deletion, they can set both cleanMFDandCompacted and cleanStaleInProgress 
to true.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-4050) TPC-DS queries performance degraded when compared to older versions due to redundant getFileStatus() invocations

2020-12-03 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4050.
--
Resolution: Fixed

> TPC-DS queries performance degraded when compared to older versions due to 
> redundant getFileStatus() invocations
> 
>
> Key: CARBONDATA-4050
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4050
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Venugopal Reddy K
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> *Issue:*
> In createCarbonDataFileBlockMetaInfoMapping method, we get list of carbondata 
> files in the segment, loop through all the carbon files and make a map of 
> fileNameToMetaInfoMapping
>       In that carbon files loop, if the file is of AbstractDFSCarbonFile 
> type, we get the org.apache.hadoop.fs.FileStatus thrice for each file. And 
> the method to get file status is an RPC call(fileSystem.getFileStatus(path)). 
> It takes ~2ms in the cluster for each call. Thus, incur an overhead of ~6ms 
> per file. So overall driver side query processing time has increased 
> significantly when there are more carbon files. Hence caused TPC-DS queries 
> performance degradation.
> Have shown the methods/calls which get the file status for the carbon file in 
> loop:
> {code:java}
> public static Map 
> createCarbonDataFileBlockMetaInfoMapping(
> String segmentFilePath, Configuration configuration) throws IOException {
>   Map fileNameToMetaInfoMapping = new TreeMap();
>   CarbonFile carbonFile = FileFactory.getCarbonFile(segmentFilePath, 
> configuration);
>   if (carbonFile instanceof AbstractDFSCarbonFile && !(carbonFile instanceof 
> S3CarbonFile)) {
> PathFilter pathFilter = new PathFilter() {
>   @Override
>   public boolean accept(Path path) {
> return CarbonTablePath.isCarbonDataFile(path.getName());
>   }
> };
> CarbonFile[] carbonFiles = carbonFile.locationAwareListFiles(pathFilter);
> for (CarbonFile file : carbonFiles) {
>   String[] location = file.getLocations(); // RPC call - 1
>   long len = file.getSize(); // RPC call - 2
>   BlockMetaInfo blockMetaInfo = new BlockMetaInfo(location, len);
>   fileNameToMetaInfoMapping.put(file.getPath(), blockMetaInfo); // RPC 
> call - 3 in file.getpath() method
> }
>   }
>   return fileNameToMetaInfoMapping;
> }
> {code}
>  
> *Suggestion:*
> I think, currently we make RPC call to get the file status upon each 
> invocation because file status may change over a period of time. And we 
> shouldn't cache the file status in AbstractDFSCarbonFile.
>      In the current case, just before the loop of carbon files, we get the 
> file status of all the carbon files in the segment with RPC call shown below. 
> LocatedFileStatus is a child class of FileStatus. It has BlockLocation along 
> with file status. 
> {code:java}
> RemoteIterator iter = 
> fileSystem.listLocatedStatus(path);{code}
>         Intention of getting all the file status here is to create instance 
> of BlockMetaInfo and maintain the map of fileNameToMetaInfoMapping.
> So it is safe to avoid these unnecessary rpc calls to get file status again 
> in getLocations(), getSize() and getPath() methods.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] asfgit closed pull request #4010: [CARBONDATA-4050]Avoid redundant RPC calls to get file status when CarbonFile is instantiated with fileStatus constructor

2020-12-03 Thread GitBox


asfgit closed pull request #4010:
URL: https://github.com/apache/carbondata/pull/4010


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-4046) Select count(*) fails on partition table.

2020-12-03 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4046.
--
Fix Version/s: 2.1.1
   Resolution: Fixed

> Select count(*) fails on partition table.
> -
>
> Key: CARBONDATA-4046
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4046
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Steps to reproduce
> 1. set property `carbon.read.partition.hive.direct=false`
> 2. Create table which contain more than one partition column.
> 3. run query select count (*)
>  
> It fails with exception as `Key not found`.
>  
> create table partition_cache(a string) partitioned by(b int, c String) stored 
> as carbondata;
> insert into partition_cache select 'k',1,'nihal';
> select count(*) from partition_cache where b = 1;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] asfgit closed pull request #4002: [CARBONDATA-4046] Handled multiple partition columns for partition cache

2020-12-03 Thread GitBox


asfgit closed pull request #4002:
URL: https://github.com/apache/carbondata/pull/4002


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4038: [WIP] Fix MV having Subquery alias used in query projection

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4038:
URL: https://github.com/apache/carbondata/pull/4038#issuecomment-738017364


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3288/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-4022) Getting the error - "PathName is not a valid DFS filename." with index server and after adding carbon SDK segments and then doing select/update/delete operations.

2020-12-03 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4022.
--
Fix Version/s: 2.1.1
   Resolution: Fixed

> Getting the error - "PathName is not a valid DFS filename." with index server 
> and after adding carbon SDK segments and then doing select/update/delete 
> operations.
> --
>
> Key: CARBONDATA-4022
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4022
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Prasanna Ravichandran
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
>  Getting this error - "PathName is not a valid DFS filename." during the 
> update/delete/select queries on a added SDK segment table. Also the path 
> represented in the error is not proper, which is the cause of error. This is 
> seen only when index server is running and disable fallback is true.
> Queries and errors:
> > create table sdk_2level_1(name string, rec1 
> > struct>) stored as carbondata;
> +-+
> | Result |
> +-+
> +-+
> No rows selected (0.425 seconds)
> > alter table sdk_2level_1 add segment 
> > options('path'='hdfs://hacluster/sdkfiles/twolevelnestedrecwitharray','format'='carbondata');
> +-+
> | Result |
> +-+
> +-+
> No rows selected (0.77 seconds)
> > select * from sdk_2level_1;
> INFO : Execution ID: 1855
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 600.0 failed 4 times, most recent failure: Lost task 0.3 in 
> stage 600.0 (TID 21345, linux, executor 16): 
> java.lang.IllegalArgumentException: Pathname 
> /user/hive/warehouse/carbon.store/rps/sdk_2level_1hdfs:/hacluster/sdkfiles/twolevelnestedrecwitharray/part-0-188852617294480_batchno0-0-null-188852332673632.carbondata
>  from 
> hdfs://hacluster/user/hive/warehouse/carbon.store/rps/sdk_2level_1hdfs:/hacluster/sdkfiles/twolevelnestedrecwitharray/part-0-188852617294480_batchno0-0-null-188852332673632.carbondata
>  is not a valid DFS filename.
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:249)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:332)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:328)
>  at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:340)
>  at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:955)
>  at 
> org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getDataInputStream(AbstractDFSCarbonFile.java:316)
>  at 
> org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getDataInputStream(AbstractDFSCarbonFile.java:293)
>  at 
> org.apache.carbondata.core.datastore.impl.FileFactory.getDataInputStream(FileFactory.java:198)
>  at 
> org.apache.carbondata.core.datastore.impl.FileFactory.getDataInputStream(FileFactory.java:188)
>  at org.apache.carbondata.core.reader.ThriftReader.open(ThriftReader.java:100)
>  at 
> org.apache.carbondata.core.reader.CarbonHeaderReader.readHeader(CarbonHeaderReader.java:60)
>  at 
> org.apache.carbondata.core.util.DataFileFooterConverterV3.readDataFileFooter(DataFileFooterConverterV3.java:65)
>  at 
> org.apache.carbondata.core.util.CarbonUtil.getDataFileFooter(CarbonUtil.java:902)
>  at 
> org.apache.carbondata.core.util.CarbonUtil.readMetadataFile(CarbonUtil.java:874)
>  at 
> org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getDataBlocks(AbstractQueryExecutor.java:216)
>  at 
> org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:138)
>  at 
> org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:382)
>  at 
> org.apache.carbondata.core.scan.executor.impl.DetailQueryExecutor.execute(DetailQueryExecutor.java:47)
>  at 
> org.apache.carbondata.hadoop.CarbonRecordReader.initialize(CarbonRecordReader.java:117)
>  at 
> org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:540)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>  at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:584)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPl

[GitHub] [carbondata] asfgit closed pull request #4017: [CARBONDATA-4022] Fix invalid path issue for segment added through alter table add segment query.

2020-12-03 Thread GitBox


asfgit closed pull request #4017:
URL: https://github.com/apache/carbondata/pull/4017


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4038: [WIP] Fix MV having Subquery alias used in query projection

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4038:
URL: https://github.com/apache/carbondata/pull/4038#issuecomment-738012712


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5046/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4018:
URL: https://github.com/apache/carbondata/pull/4018#issuecomment-738008523


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3287/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4018:
URL: https://github.com/apache/carbondata/pull/4018#issuecomment-738006908


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5045/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #4002: [CARBONDATA-4046] Handled multiple partition columns for partition cache

2020-12-03 Thread GitBox


kunal642 commented on pull request #4002:
URL: https://github.com/apache/carbondata/pull/4002#issuecomment-738002378


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #4017: [CARBONDATA-4022] Fix invalid path issue for segment added through alter table add segment query.

2020-12-03 Thread GitBox


kunal642 commented on pull request #4017:
URL: https://github.com/apache/carbondata/pull/4017#issuecomment-738002313


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction

2020-12-03 Thread GitBox


akashrn5 commented on pull request #4020:
URL: https://github.com/apache/carbondata/pull/4020#issuecomment-738001983


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4001: [CARBONDATA-4029] [CARBONDATA-3908] Issue while adding segments through alter add segment command

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4001:
URL: https://github.com/apache/carbondata/pull/4001#issuecomment-738000543


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3286/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4001: [CARBONDATA-4029] [CARBONDATA-3908] Issue while adding segments through alter add segment command

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4001:
URL: https://github.com/apache/carbondata/pull/4001#issuecomment-73704


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5044/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager

2020-12-03 Thread GitBox


QiangCai commented on a change in pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#discussion_r535224194



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/events/CleanFilesEvents.scala
##
@@ -34,5 +34,6 @@ case class CleanFilesPreEvent(carbonTable: CarbonTable, 
sparkSession: SparkSessi
  * @param carbonTable
  * @param sparkSession
  */
-case class CleanFilesPostEvent(carbonTable: CarbonTable, sparkSession: 
SparkSession)
-  extends Event with CleanFilesEventInfo
+case class CleanFilesPostEvent(carbonTable: CarbonTable,

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager

2020-12-03 Thread GitBox


QiangCai commented on a change in pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#discussion_r535228453



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/trash/DataTrashManager.scala
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.trash
+
+import scala.collection.JavaConverters._
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.indexstore.PartitionSpec
+import org.apache.carbondata.core.locks.{CarbonLockUtil, ICarbonLock, 
LockUsage}
+import org.apache.carbondata.core.metadata.SegmentFileStore
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager
+import org.apache.carbondata.core.util.{CarbonProperties, CleanFilesUtil, 
TrashUtil}
+
+/**
+ * This object will manage the following data.
+ * 1. .Trash folder
+ * 2. stale segments without metadata
+ * 3. expired segments (MARKED_FOR_DELETE, Compacted, In Progress)
+ */
+object DataTrashManager {
+  private val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
+
+  /**
+   * clean garbage data
+   *  1. clean .Trash folder
+   *  2. clean stale segments without metadata
+   *  3. clean expired segments (MARKED_FOR_DELETE, Compacted, In Progress)
+   *
+   * @param carbonTable : CarbonTable Object
+   * @param partitionSpecs : Hive Partitions  details
+   */
+  def cleanGarbageData(
+  carbonTable: CarbonTable,
+  force: Boolean = false,
+  partitionSpecs: Option[Seq[PartitionSpec]] = None): Unit = {
+var carbonCleanFilesLock: ICarbonLock = null
+val absoluteTableIdentifier = carbonTable.getAbsoluteTableIdentifier
+try {
+  val errorMsg = "Clean files request is failed for " +
+s"${ carbonTable.getQualifiedName }" +
+". Not able to acquire the clean files lock due to another clean files 
" +
+"operation is running in the background."
+  carbonCleanFilesLock = 
CarbonLockUtil.getLockObject(absoluteTableIdentifier,
+LockUsage.CLEAN_FILES_LOCK, errorMsg)
+  // step 1: clean trash folder
+  cleanTrashFolder(carbonTable, force)
+  // step 2: clean stale segments which are not exists in metadata
+  cleanStaleSegments(carbonTable)

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager

2020-12-03 Thread GitBox


QiangCai commented on a change in pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#discussion_r535228252



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/trash/DataTrashManager.scala
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.trash
+
+import scala.collection.JavaConverters._
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.indexstore.PartitionSpec
+import org.apache.carbondata.core.locks.{CarbonLockUtil, ICarbonLock, 
LockUsage}
+import org.apache.carbondata.core.metadata.SegmentFileStore
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager
+import org.apache.carbondata.core.util.{CarbonProperties, CleanFilesUtil, 
TrashUtil}
+
+/**
+ * This object will manage the following data.
+ * 1. .Trash folder
+ * 2. stale segments without metadata
+ * 3. expired segments (MARKED_FOR_DELETE, Compacted, In Progress)
+ */
+object DataTrashManager {
+  private val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
+
+  /**
+   * clean garbage data
+   *  1. clean .Trash folder
+   *  2. clean stale segments without metadata
+   *  3. clean expired segments (MARKED_FOR_DELETE, Compacted, In Progress)
+   *
+   * @param carbonTable : CarbonTable Object
+   * @param partitionSpecs : Hive Partitions  details
+   */
+  def cleanGarbageData(
+  carbonTable: CarbonTable,
+  force: Boolean = false,
+  partitionSpecs: Option[Seq[PartitionSpec]] = None): Unit = {
+var carbonCleanFilesLock: ICarbonLock = null
+val absoluteTableIdentifier = carbonTable.getAbsoluteTableIdentifier
+try {
+  val errorMsg = "Clean files request is failed for " +
+s"${ carbonTable.getQualifiedName }" +
+". Not able to acquire the clean files lock due to another clean files 
" +
+"operation is running in the background."
+  carbonCleanFilesLock = 
CarbonLockUtil.getLockObject(absoluteTableIdentifier,
+LockUsage.CLEAN_FILES_LOCK, errorMsg)
+  // step 1: clean trash folder
+  cleanTrashFolder(carbonTable, force)

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager

2020-12-03 Thread GitBox


QiangCai commented on a change in pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#discussion_r535224194



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/events/CleanFilesEvents.scala
##
@@ -34,5 +34,6 @@ case class CleanFilesPreEvent(carbonTable: CarbonTable, 
sparkSession: SparkSessi
  * @param carbonTable
  * @param sparkSession
  */
-case class CleanFilesPostEvent(carbonTable: CarbonTable, sparkSession: 
SparkSession)
-  extends Event with CleanFilesEventInfo
+case class CleanFilesPostEvent(carbonTable: CarbonTable,

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager

2020-12-03 Thread GitBox


QiangCai commented on a change in pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#discussion_r535224083



##
File path: 
core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
##
@@ -482,176 +482,6 @@ public boolean accept(CarbonFile file) {
 
   }
 
-  /**
-   * Handling of the clean up of old carbondata files, index files , delete 
delta,
-   * update status files.
-   * @param table clean up will be handled on this table.
-   * @param forceDelete if true then max query execution timeout will not be 
considered.
-   */
-  public static void cleanUpDeltaFiles(CarbonTable table, boolean forceDelete) 
throws IOException {
-
-SegmentStatusManager ssm = new 
SegmentStatusManager(table.getAbsoluteTableIdentifier());
-
-LoadMetadataDetails[] details =
-SegmentStatusManager.readLoadMetadata(table.getMetadataPath());
-
-SegmentUpdateStatusManager updateStatusManager = new 
SegmentUpdateStatusManager(table);
-SegmentUpdateDetails[] segmentUpdateDetails = 
updateStatusManager.getUpdateStatusDetails();
-// hold all the segments updated so that wen can check the delta files in 
them, ne need to
-// check the others.
-Set updatedSegments = new HashSet<>();
-for (SegmentUpdateDetails updateDetails : segmentUpdateDetails) {
-  updatedSegments.add(updateDetails.getSegmentName());
-}
-
-String validUpdateStatusFile = "";
-
-boolean isAbortedFile = true;
-
-boolean isInvalidFile = false;
-
-// take the update status file name from 0th segment.
-validUpdateStatusFile = ssm.getUpdateStatusFileName(details);
-// scan through each segment.
-for (LoadMetadataDetails segment : details) {
-  // if this segment is valid then only we will go for delta file deletion.
-  // if the segment is mark for delete or compacted then any way it will 
get deleted.
-  if (segment.getSegmentStatus() == SegmentStatus.SUCCESS
-  || segment.getSegmentStatus() == 
SegmentStatus.LOAD_PARTIAL_SUCCESS) {
-// when there is no update operations done on table, then no need to 
go ahead. So
-// just check the update delta start timestamp and proceed if not empty
-if (!segment.getUpdateDeltaStartTimestamp().isEmpty()
-|| updatedSegments.contains(segment.getLoadName())) {
-  // take the list of files from this segment.
-  String segmentPath = CarbonTablePath.getSegmentPath(
-  table.getAbsoluteTableIdentifier().getTablePath(), 
segment.getLoadName());
-  CarbonFile segDir =
-  FileFactory.getCarbonFile(segmentPath);
-  CarbonFile[] allSegmentFiles = segDir.listFiles();
-
-  // now handle all the delete delta files which needs to be deleted.
-  // there are 2 cases here .
-  // 1. if the block is marked as compacted then the corresponding 
delta files
-  //can be deleted if query exec timeout is done.
-  // 2. if the block is in success state then also there can be delete
-  //delta compaction happened and old files can be deleted.
-
-  SegmentUpdateDetails[] updateDetails = 
updateStatusManager.readLoadMetadata();
-  for (SegmentUpdateDetails block : updateDetails) {
-CarbonFile[] completeListOfDeleteDeltaFiles;
-CarbonFile[] invalidDeleteDeltaFiles;
-
-if 
(!block.getSegmentName().equalsIgnoreCase(segment.getLoadName())) {
-  continue;
-}
-
-// aborted scenario.
-invalidDeleteDeltaFiles = updateStatusManager
-.getDeleteDeltaInvalidFilesList(block, false,
-allSegmentFiles, isAbortedFile);
-for (CarbonFile invalidFile : invalidDeleteDeltaFiles) {
-  boolean doForceDelete = true;
-  compareTimestampsAndDelete(invalidFile, doForceDelete, false);
-}
-
-// case 1
-if (CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
-  completeListOfDeleteDeltaFiles = updateStatusManager
-  .getDeleteDeltaInvalidFilesList(block, true,
-  allSegmentFiles, isInvalidFile);
-  for (CarbonFile invalidFile : completeListOfDeleteDeltaFiles) {
-compareTimestampsAndDelete(invalidFile, forceDelete, false);
-  }
-
-} else {
-  invalidDeleteDeltaFiles = updateStatusManager
-  .getDeleteDeltaInvalidFilesList(block, false,
-  allSegmentFiles, isInvalidFile);
-  for (CarbonFile invalidFile : invalidDeleteDeltaFiles) {
-compareTimestampsAndDelete(invalidFile, forceDelete, false);
-  }
-}
-  }
-}
-// handle cleanup of merge index files and data files after small 
files merge happened for
-// SI table
-cleanUpDataFilesAfterSmallFilesMerg

[GitHub] [carbondata] QiangCai commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager

2020-12-03 Thread GitBox


QiangCai commented on a change in pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#discussion_r535218031



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/DropIndexCommand.scala
##
@@ -210,12 +210,6 @@ private[sql] case class DropIndexCommand(
   logError("Table metadata unlocking is unsuccessful, index table may 
be in stale state")
 }
   }
-  // in case if the the physical folders still exists for the index table
-  // but the carbon and hive info for the index table is removed,
-  // DROP INDEX IF EXISTS should clean up those physical directories
-  if (ifExistsSet && carbonTable.isEmpty) {

Review comment:
   create table t1...
   create table t2...
   drop index t2 on t1
   
   this drop index sql will remove table folder of t2 by mistake.

##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/trash/DataTrashManager.scala
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.trash
+
+import scala.collection.JavaConverters._
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.indexstore.PartitionSpec
+import org.apache.carbondata.core.locks.{CarbonLockUtil, ICarbonLock, 
LockUsage}
+import org.apache.carbondata.core.metadata.SegmentFileStore
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager
+import org.apache.carbondata.core.util.{CarbonProperties, CleanFilesUtil, 
TrashUtil}
+
+/**
+ * This object will manage the following data.
+ * 1. .Trash folder
+ * 2. stale segments without metadata
+ * 3. expired segments (MARKED_FOR_DELETE, Compacted, In Progress)
+ */
+object DataTrashManager {
+  private val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
+
+  /**
+   * clean garbage data
+   *  1. clean .Trash folder
+   *  2. clean stale segments without metadata
+   *  3. clean expired segments (MARKED_FOR_DELETE, Compacted, In Progress)
+   *
+   * @param carbonTable : CarbonTable Object
+   * @param partitionSpecs : Hive Partitions  details

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager

2020-12-03 Thread GitBox


QiangCai commented on a change in pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#discussion_r535216366



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala
##
@@ -90,7 +90,6 @@ class CarbonTableCompactor(carbonLoadModel: CarbonLoadModel,
 
 while (loadsToMerge.size() > 1 || needSortSingleSegment(loadsToMerge)) {
   val lastSegment = sortedSegments.get(sortedSegments.size() - 1)
-  deletePartialLoadsInCompaction()

Review comment:
   In this case, clean files have no chance to clean data.
   keep the stale data in the folder, it will not impact compaction.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-737977578


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3285/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager

2020-12-03 Thread GitBox


QiangCai commented on a change in pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#discussion_r535212465



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/events/CleanFilesEvents.scala
##
@@ -34,5 +34,6 @@ case class CleanFilesPreEvent(carbonTable: CarbonTable, 
sparkSession: SparkSessi
  * @param carbonTable
  * @param sparkSession
  */
-case class CleanFilesPostEvent(carbonTable: CarbonTable, sparkSession: 
SparkSession)
-  extends Event with CleanFilesEventInfo
+case class CleanFilesPostEvent(carbonTable: CarbonTable,

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-737968960


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5043/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 opened a new pull request #4038: [WIP] Fix MV having Subquery alias used in query projection

2020-12-03 Thread GitBox


Indhumathi27 opened a new pull request #4038:
URL: https://github.com/apache/carbondata/pull/4038


### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akkio-97 commented on a change in pull request #4001: [CARBONDATA-4029] [CARBONDATA-3908] Issue while adding segments through alter add segment command

2020-12-03 Thread GitBox


akkio-97 commented on a change in pull request #4001:
URL: https://github.com/apache/carbondata/pull/4001#discussion_r535160669



##
File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##
@@ -877,6 +877,10 @@ public SegmentFile getSegmentFile() {
 }
 if 
(entry.getValue().status.equals(SegmentStatus.SUCCESS.getMessage())) {
   String mergeFileName = entry.getValue().getMergeFileName();
+  // remove file separator if already present

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

2020-12-03 Thread GitBox


shenjiayu17 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r535124613



##
File path: geo/src/main/java/org/apache/carbondata/geo/GeoHashUtils.java
##
@@ -0,0 +1,411 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo;
+
+import java.math.BigDecimal;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Objects;
+
+public class GeoHashUtils {
+
+  /**
+   * Get the degree of each grid in the east-west direction.
+   *
+   * @param originLatitude the origin point latitude
+   * @param gridSize the grid size
+   * @return Delta X is the degree of each grid in the east-west direction
+   */
+  public static double getDeltaX(double originLatitude, int gridSize) {
+double mCos = Math.cos(originLatitude * Math.PI / 
GeoConstants.CONVERT_FACTOR);
+return (GeoConstants.CONVERT_FACTOR * gridSize) / (Math.PI * 
GeoConstants.EARTH_RADIUS * mCos);
+  }
+
+  /**
+   * Get the degree of each grid in the north-south direction.
+   *
+   * @param gridSize the grid size
+   * @return Delta Y is the degree of each grid in the north-south direction
+   */
+  public static double getDeltaY(int gridSize) {
+return (GeoConstants.CONVERT_FACTOR * gridSize) / (Math.PI * 
GeoConstants.EARTH_RADIUS);
+  }
+
+  /**
+   * Calculate the number of knives cut
+   *
+   * @param gridSize the grid size
+   * @param originLatitude the origin point latitude
+   * @return The number of knives cut
+   */
+  public static int getCutCount(int gridSize, double originLatitude) {
+double deltaX = getDeltaX(originLatitude, gridSize);
+int countX = Double.valueOf(
+Math.ceil(Math.log(2 * GeoConstants.CONVERT_FACTOR / deltaX) / 
Math.log(2))).intValue();
+double deltaY = getDeltaY(gridSize);
+int countY = Double.valueOf(
+Math.ceil(Math.log(GeoConstants.CONVERT_FACTOR / deltaY) / 
Math.log(2))).intValue();
+return Math.max(countX, countY);
+  }
+
+  /**
+   * Convert input longitude and latitude to GeoID
+   *
+   * @param longitude Longitude, the actual longitude and latitude are 
processed by * coefficient,
+   *  and the floating-point calculation is converted to 
integer calculation
+   * @param latitude Latitude, the actual longitude and latitude are processed 
by * coefficient,
+   *  and the floating-point calculation is converted to 
integer calculation.
+   * @param oriLatitude the origin point latitude
+   * @param gridSize the grid size
+   * @return GeoID
+   */
+  public static long lonLat2GeoID(long longitude, long latitude, double 
oriLatitude, int gridSize) {
+long longtitudeByRatio = longitude * 
GeoConstants.CONVERSION_FACTOR_FOR_ACCURACY;
+long latitudeByRatio = latitude * 
GeoConstants.CONVERSION_FACTOR_FOR_ACCURACY;
+int[] ij = lonLat2ColRow(longtitudeByRatio, latitudeByRatio, oriLatitude, 
gridSize);
+return colRow2GeoID(ij[0], ij[1]);
+  }
+
+  /**
+   * Calculate geo id through grid index coordinates, the row and column of 
grid coordinates
+   * can be transformed by latitude and longitude
+   *
+   * @param longitude Longitude, the actual longitude and latitude are 
processed by * coefficient,
+   * and the floating-point calculation is converted to integer calculation
+   * @param latitude Latitude, the actual longitude and latitude are processed 
by * coefficient,
+   * and the floating-point calculation is converted to integer calculation
+   * @param oriLatitude the latitude of origin point,which is used to 
calculate the deltaX and cut
+   * level.
+   * @param gridSize the size of minimal grid after cut
+   * @return Grid ID value [row, column], column starts from 1
+   */
+  public static int[] lonLat2ColRow(long longitude, long latitude, double 
oriLatitude,
+  int gridSize) {
+int cutLevel = getCutCount(gridSize, oriLatitude);
+int column = (int) Math.floor(longitude / getDeltaX(oriLatitude, gridSize) 
/
+GeoConstants.CONVERSION_RATIO) + (1 << (cutLevel - 1));
+int row = (int) Math.floor(latitude / getDeltaY(gridSize) /
+GeoConstants.CONVERSION_R

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inprogr

2020-12-03 Thread GitBox


akashrn5 commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535070890



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -1039,17 +1039,19 @@ private static void 
writeLoadMetadata(AbsoluteTableIdentifier identifier,
 }
   }
 
-  private static ReturnTuple isUpdateRequired(boolean isForceDeletion, 
CarbonTable carbonTable,
-  AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] 
details) {
+  private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, 
boolean

Review comment:
   agree with @ajantha-bhat , the clean files with options, let it do its 
specific functionality, along with this, if others expired we should do that 
also, to save time for users.It should be like (1+2), (1+3) and (1+4). I think 
this will be better





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale

2020-12-03 Thread GitBox


vikramahuja1001 commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535066388



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -1039,17 +1039,19 @@ private static void 
writeLoadMetadata(AbsoluteTableIdentifier identifier,
 }
   }
 
-  private static ReturnTuple isUpdateRequired(boolean isForceDeletion, 
CarbonTable carbonTable,
-  AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] 
details) {
+  private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, 
boolean

Review comment:
   @QiangCai @akashrn5 , can you clarify if we should delete the expired 
MFD, compacted in point 3?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp

2020-12-03 Thread GitBox


ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535062363



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -1039,17 +1039,19 @@ private static void 
writeLoadMetadata(AbsoluteTableIdentifier identifier,
 }
   }
 
-  private static ReturnTuple isUpdateRequired(boolean isForceDeletion, 
CarbonTable carbonTable,
-  AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] 
details) {
+  private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, 
boolean

Review comment:
   in point 3), If we don't clean expired MFD, expired compacted and 
expired trash along with expired stale_inprogress. we have to call clean files 
at least two times to clean all expired data. I feel it is not good. 
   
   @QiangCai @akashrn5 : what you think ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp

2020-12-03 Thread GitBox


ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535062363



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -1039,17 +1039,19 @@ private static void 
writeLoadMetadata(AbsoluteTableIdentifier identifier,
 }
   }
 
-  private static ReturnTuple isUpdateRequired(boolean isForceDeletion, 
CarbonTable carbonTable,
-  AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] 
details) {
+  private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, 
boolean

Review comment:
   If we don't clean expired MFD, expired compacted and expired trash along 
with expired stale_inprogress. we have to call clean files at least two times 
to clean all expired data. I feel it is not good. 
   
   @QiangCai @akashrn5 : what you think ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale

2020-12-03 Thread GitBox


vikramahuja1001 commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535054960



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -1039,17 +1039,19 @@ private static void 
writeLoadMetadata(AbsoluteTableIdentifier identifier,
 }
   }
 
-  private static ReturnTuple isUpdateRequired(boolean isForceDeletion, 
CarbonTable carbonTable,
-  AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] 
details) {
+  private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, 
boolean

Review comment:
   1. Trash behaviour remains unchanged. Trash will be deleted 
automatically when option force = true, it's already in the document, If force 
= true is not given, then it will delete trash according to timestamp retention 
property.
   2. in_progress = true, should only delete in_progress segments. Expired 
trash will be deleted anyways. 
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp

2020-12-03 Thread GitBox


ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535043361



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -1039,17 +1039,19 @@ private static void 
writeLoadMetadata(AbsoluteTableIdentifier identifier,
 }
   }
 
-  private static ReturnTuple isUpdateRequired(boolean isForceDeletion, 
CarbonTable carbonTable,
-  AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] 
details) {
+  private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, 
boolean
+  cleanCompactedAndMFD, CarbonTable carbonTable, AbsoluteTableIdentifier
+  absoluteTableIdentifier, LoadMetadataDetails[] details) {
 // Delete marked loads
 boolean isUpdateRequired = DeleteLoadFolders
-.deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, 
isForceDeletion, details,
-carbonTable.getMetadataPath());
+.deleteLoadFoldersFromFileSystem(absoluteTableIdentifier, 
cleanStaleInProgress,
+cleanCompactedAndMFD, details, carbonTable.getMetadataPath());

Review comment:
   clean `cleanCompactedAndMFD` flag is no need as everytime we need to 
clean it, after you fix your point 3





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4035: [CARBONDATA-4067]: Removing force option in clean files command and changing behaviour when MFD, Compacted and stale Inp

2020-12-03 Thread GitBox


ajantha-bhat commented on a change in pull request #4035:
URL: https://github.com/apache/carbondata/pull/4035#discussion_r535042468



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ -1039,17 +1039,19 @@ private static void 
writeLoadMetadata(AbsoluteTableIdentifier identifier,
 }
   }
 
-  private static ReturnTuple isUpdateRequired(boolean isForceDeletion, 
CarbonTable carbonTable,
-  AbsoluteTableIdentifier absoluteTableIdentifier, LoadMetadataDetails[] 
details) {
+  private static ReturnTuple isUpdateRequired(boolean cleanStaleInProgress, 
boolean

Review comment:
   @vikramahuja1001 : please update the document of clean files.
   
   > 1 ) default clean files behaviour(clean files on table t1): clean MFD and 
Compacted segments will depend on query timeout(1 hr) and 
trashRetentionTimeout(7 days, default). 
   2) clean files on table t1 options('force'='true'): clean MFD and Compacted 
segments immediately(Do not check for any timeout)
   3) clean files on table t1 options('clean_inprgress'='true') : clean stale 
inprogress segments depends on trashRetentionTimeout, after 7 days(default 
behaviour)
   4) clean files on table t1 options('clean_inprgress'='true', 'force'='true') 
: clean MFD, Compacted and stale inprogress segments immediately.(Do not check 
for any timeout)
   
   **my comments for description** 
   1) you have not mentioned about trash in any of this. mention what happens 
to trash in each. 
   2)  And in your point 3) we should delete the expired MFD, compacted and 
expired trash folders also.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#issuecomment-737786983


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5042/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#issuecomment-737784784


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3284/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction

2020-12-03 Thread GitBox


ajantha-bhat commented on pull request #4020:
URL: https://github.com/apache/carbondata/pull/4020#issuecomment-737782499


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager

2020-12-03 Thread GitBox


QiangCai commented on a change in pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#discussion_r534983455



##
File path: 
core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
##
@@ -482,176 +482,6 @@ public boolean accept(CarbonFile file) {
 
   }
 
-  /**
-   * Handling of the clean up of old carbondata files, index files , delete 
delta,
-   * update status files.
-   * @param table clean up will be handled on this table.
-   * @param forceDelete if true then max query execution timeout will not be 
considered.
-   */
-  public static void cleanUpDeltaFiles(CarbonTable table, boolean forceDelete) 
throws IOException {

Review comment:
   only keep cleanStaleDeltaFiles to handle exceptions.
   
   clean files should work with the concurrent update, so remove it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager

2020-12-03 Thread GitBox


QiangCai commented on a change in pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#discussion_r534982270



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonTruncateCommand.scala
##
@@ -18,39 +18,31 @@
 package org.apache.spark.sql.execution.command.mutation
 
 import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
-import org.apache.spark.sql.catalyst.analysis.NoSuchTableException
-import org.apache.spark.sql.execution.command.{DataCommand, 
TruncateTableCommand}
-import 
org.apache.spark.sql.execution.command.management.CarbonCleanFilesCommand
-import org.apache.spark.sql.hive.CarbonRelation
+import org.apache.spark.sql.execution.command.{Checker, DataCommand, 
TruncateTableCommand}
 
 import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
 import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager
 
 case class CarbonTruncateCommand(child: TruncateTableCommand) extends 
DataCommand {
   override def processData(sparkSession: SparkSession): Seq[Row] = {
 val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
-val dbName = 
CarbonEnv.getDatabaseName(child.tableName.database)(sparkSession)
-val tableName = child.tableName.table
-setAuditTable(dbName, tableName)
-val relation = CarbonEnv.getInstance(sparkSession).carbonMetaStore
-  .lookupRelation(Option(dbName), 
tableName)(sparkSession).asInstanceOf[CarbonRelation]
-if (relation == null) {
-  throw new NoSuchTableException(dbName, tableName)
-}
-if (null == relation.carbonTable) {
-  LOGGER.error(s"Truncate table failed. table not found: 
$dbName.$child.tableName.table")
-  throw new NoSuchTableException(dbName, child.tableName.table)
+Checker.validateTableExists(child.tableName.database, 
child.tableName.table, sparkSession)
+val carbonTable = CarbonEnv.getCarbonTable(
+  child.tableName.database, child.tableName.table)(sparkSession)
+setAuditTable(carbonTable)
+if (!carbonTable.isTransactionalTable) {
+  LOGGER.error(s"Unsupported truncate non-transactional table")
+  throw new MalformedCarbonCommandException(
+"Unsupported truncate non-transactional table")
 }
 if (child.partitionSpec.isDefined) {
   throw new MalformedCarbonCommandException(
 "Unsupported truncate table with specified partition")
 }
-CarbonCleanFilesCommand(
-  databaseNameOp = Option(dbName),
-  tableName = Option(tableName),
-  None,
-  truncateTable = true
-).run(sparkSession)
+

Review comment:
   let clean files to remove data.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4018:
URL: https://github.com/apache/carbondata/pull/4018#issuecomment-737766059


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3283/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #4013: [CARBONDATA-4062] Make clean files as data trash manager

2020-12-03 Thread GitBox


akashrn5 commented on a change in pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#discussion_r534930213



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala
##
@@ -90,7 +90,6 @@ class CarbonTableCompactor(carbonLoadModel: CarbonLoadModel,
 
 while (loadsToMerge.size() > 1 || needSortSingleSegment(loadsToMerge)) {
   val lastSegment = sortedSegments.get(sortedSegments.size() - 1)
-  deletePartialLoadsInCompaction()

Review comment:
   if you are removing this, how are you handling the case of segment 
number same in next compaction retry as we dont have any entry for table status 
file during compaction.

##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/trash/DataTrashManager.scala
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.trash
+
+import scala.collection.JavaConverters._
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.indexstore.PartitionSpec
+import org.apache.carbondata.core.locks.{CarbonLockUtil, ICarbonLock, 
LockUsage}
+import org.apache.carbondata.core.metadata.SegmentFileStore
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable
+import org.apache.carbondata.core.statusmanager.SegmentStatusManager
+import org.apache.carbondata.core.util.{CarbonProperties, CleanFilesUtil, 
TrashUtil}
+
+/**
+ * This object will manage the following data.
+ * 1. .Trash folder
+ * 2. stale segments without metadata
+ * 3. expired segments (MARKED_FOR_DELETE, Compacted, In Progress)
+ */
+object DataTrashManager {
+  private val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
+
+  /**
+   * clean garbage data
+   *  1. clean .Trash folder
+   *  2. clean stale segments without metadata
+   *  3. clean expired segments (MARKED_FOR_DELETE, Compacted, In Progress)
+   *
+   * @param carbonTable : CarbonTable Object
+   * @param partitionSpecs : Hive Partitions  details

Review comment:
   please remove @params, name suggests what it is clearly

##
File path: 
core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
##
@@ -482,176 +482,6 @@ public boolean accept(CarbonFile file) {
 
   }
 
-  /**
-   * Handling of the clean up of old carbondata files, index files , delete 
delta,
-   * update status files.
-   * @param table clean up will be handled on this table.
-   * @param forceDelete if true then max query execution timeout will not be 
considered.
-   */
-  public static void cleanUpDeltaFiles(CarbonTable table, boolean forceDelete) 
throws IOException {
-
-SegmentStatusManager ssm = new 
SegmentStatusManager(table.getAbsoluteTableIdentifier());
-
-LoadMetadataDetails[] details =
-SegmentStatusManager.readLoadMetadata(table.getMetadataPath());
-
-SegmentUpdateStatusManager updateStatusManager = new 
SegmentUpdateStatusManager(table);
-SegmentUpdateDetails[] segmentUpdateDetails = 
updateStatusManager.getUpdateStatusDetails();
-// hold all the segments updated so that wen can check the delta files in 
them, ne need to
-// check the others.
-Set updatedSegments = new HashSet<>();
-for (SegmentUpdateDetails updateDetails : segmentUpdateDetails) {
-  updatedSegments.add(updateDetails.getSegmentName());
-}
-
-String validUpdateStatusFile = "";
-
-boolean isAbortedFile = true;
-
-boolean isInvalidFile = false;
-
-// take the update status file name from 0th segment.
-validUpdateStatusFile = ssm.getUpdateStatusFileName(details);
-// scan through each segment.
-for (LoadMetadataDetails segment : details) {
-  // if this segment is valid then only we will go for delta file deletion.
-  // if the segment is mark for delete or compacted then any way it will 
get deleted.
-  if (segment.getSegmentStatus() == SegmentStatus.SUCCESS
-  || segment.getSegmentStatus() == 
SegmentStatus.LOAD_PARTIAL_SUCCESS) {
-// when there is no update operations done on table, then n

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and meta entry when there is no update/insert data

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4018:
URL: https://github.com/apache/carbondata/pull/4018#issuecomment-737759882


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5041/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4037: [WIP] Handle Alter long string for SI col and added FTs for SI

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4037:
URL: https://github.com/apache/carbondata/pull/4037#issuecomment-737747052


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3282/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4037: [WIP] Handle Alter long string for SI col and added FTs for SI

2020-12-03 Thread GitBox


CarbonDataQA2 commented on pull request #4037:
URL: https://github.com/apache/carbondata/pull/4037#issuecomment-737741961


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5039/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   >