[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#issuecomment-715144339


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2902/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on pull request #3984: [CARBONDATA-4035]Fix MV query issue with aggregation on decimal column

2020-10-23 Thread GitBox


Indhumathi27 commented on pull request #3984:
URL: https://github.com/apache/carbondata/pull/3984#issuecomment-715202939


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] asfgit closed pull request #3984: [CARBONDATA-4035]Fix MV query issue with aggregation on decimal column

2020-10-23 Thread GitBox


asfgit closed pull request #3984:
URL: https://github.com/apache/carbondata/pull/3984


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3987: [CARBONDATA-4039] Support Local dictionary for Presto complex datatypes

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3987:
URL: https://github.com/apache/carbondata/pull/3987#issuecomment-715270845


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2908/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3972: [CARBONDATA-4042]Launch same number of task as select query for insert into select and ctas cases when target table is of no_sort

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3972:
URL: https://github.com/apache/carbondata/pull/3972#issuecomment-715134807


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4655/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3914: [CARBONDATA-3979] Added Hive local dictionary support example

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3914:
URL: https://github.com/apache/carbondata/pull/3914#issuecomment-715230967


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4662/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

2020-10-23 Thread GitBox


vikramahuja1001 commented on a change in pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#discussion_r510658251



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/TableProcessingOperations.java
##
@@ -53,12 +52,14 @@
   private static final Logger LOGGER =
   LogServiceFactory.getLogService(CarbonLoaderUtil.class.getName());
 
+  private static List filesInTrashFolder = new 
ArrayList();
+
   /**
* delete folder which metadata no exist in tablestatus
* this method don't check tablestatus history.
*/
   public static void deletePartialLoadDataIfExist(CarbonTable carbonTable,

Review comment:
   This method is being called from CarbonCleanFIlesCommand class





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai opened a new pull request #3996: [DOC] Adjust document for partition table

2020-10-23 Thread GitBox


QiangCai opened a new pull request #3996:
URL: https://github.com/apache/carbondata/pull/3996


### Why is this PR needed?
   the function description of the partitioned table is incorrect

### What changes were proposed in this PR?
   1. not support splitting partition
   2. change "STANDARD PARTITION" to "PARTITION"
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3987: [CARBONDATA-4039] Support Local dictionary for Presto complex datatypes

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3987:
URL: https://github.com/apache/carbondata/pull/3987#issuecomment-715151930


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4658/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994#issuecomment-715171177


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2903/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-3954) Global sorting with array, if read from ORC format, write to carbon, error; If you use no_sort, success;

2020-10-23 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-3954.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> Global sorting with array, if read from ORC format, write to carbon, error; 
> If you use no_sort, success;
> 
>
> Key: CARBONDATA-3954
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3954
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 2.0.0
>Reporter: xiaohui
>Priority: Major
> Fix For: 2.1.0
>
> Attachments: wx20200818-174...@2x.png, wx20200818-174...@2x.png
>
>
> orc table sql test: 
> create table array_orc(name string, col array,fee int) STORED AS orc;
> insert into array_orc values("xiao3",array('上呼吸道疾病 1','白内障1','胃溃疡1'),2);
> insert into array_orc values("xiao3",array('上呼吸道疾病1 ','白内障1','胃溃疡1'),2);
> insert into array_orc values("xiao3",array('上呼吸道疾病1','白内障 1','胃溃疡1'),2);
> insert into array_orc values("xiao3",array('上呼吸道疾病1','白内障1','胃溃疡 1'),2);
> insert into array_orc values("xiao3",array('上呼吸道疾病1','白内障1','胃溃疡1'),2);
> insert into array_orc values("xiao5",array(null,'白内障1','胃溃疡1'),2);
> insert into array_orc values("xiao5",null,2);
> insert into array_orc values("xiao3",array('j'),2);
> insert into array_orc values("xiao4",array('j','j'),2);
> insert into array_orc values("xiao4",NULL,2);
> 0: jdbc:hive2://localhost:1> use dict;
> +-+--+
> | Result  |
> +-+--+
> +-+--+
> No rows selected (0.391 seconds)
> 0: jdbc:hive2://localhost:1> select * from array_orc;
> ++---+--+--+
> |  name  |  col  | fee  |
> ++---+--+--+
> | xiao3  | ["",null,"j"] | 3|
> | xiao2  | ["上呼吸道疾病1","白内障1","胃溃疡1"] | 2|
> | xiao3  | ["",null,"j"] | 3|
> | xiao1  | ["上呼吸道疾病","白内障","胃溃疡"]| 1|
> | xiao9  | NULL  | 3|
> | xiao9  | NULL  | 3|
> | xiao3  | NULL  | 3|
> | xiao6  | NULL  | 3|
> | xiao2  | ["上呼吸道疾病 1","白内障 1","胃溃疡 1"]  | 2|
> | xiao1  | ["上呼吸道疾病 ","白内障 ","胃溃疡 "] | 1|
> | xiao3  | NULL  | 3|
> | xiao3  | [null]| 3|
> | xiao3  | [""]  | 3|
> ++---+--+--+
> 13 rows selected (0.416 seconds)
> 0: jdbc:hive2://localhost:1> create table array_carbon4(name string, col 
> array,fee int) STORED AS carbondata TBLPROPERTIES 
> ('SORT_COLUMNS'='name',
> 0: jdbc:hive2://localhost:1> 'TABLE_BLOCKSIZE'='128',
> 0: jdbc:hive2://localhost:1> 'TABLE_BLOCKLET_SIZE'='128',
> 0: jdbc:hive2://localhost:1> 'SORT_SCOPE'='no_SORT');
> +-+--+
> | Result  |
> +-+--+
> +-+--+
> No rows selected (1.04 seconds)
> 0: jdbc:hive2://localhost:1> insert overwrite table array_carbon4 select 
> name,col,fee from array_orc;
> +-+--+
> | Result  |
> +-+--+
> +-+--+
> No rows selected (5.065 seconds)
> 0: jdbc:hive2://localhost:1> create table array_carbon5(name string, col 
> array,fee int) STORED AS carbondata TBLPROPERTIES 
> ('SORT_COLUMNS'='name',
> 0: jdbc:hive2://localhost:1> 'TABLE_BLOCKSIZE'='128',
> 0: jdbc:hive2://localhost:1> 'TABLE_BLOCKLET_SIZE'='128',
> 0: jdbc:hive2://localhost:1> 'SORT_SCOPE'='global_SORT');
> +-+--+
> | Result  |
> +-+--+
> +-+--+
> No rows selected (0.098 seconds)
> 0: jdbc:hive2://localhost:1> insert overwrite table array_carbon5 select 
> name,col,fee from array_orc;
> Error: java.lang.Exception: DataLoad failure (state=,code=0)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

2020-10-23 Thread GitBox


Indhumathi27 commented on a change in pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#discussion_r510762119



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/events/MergeIndexEventListener.scala
##
@@ -43,11 +43,6 @@ class MergeIndexEventListener extends OperationEventListener 
with Logging {
   override def onEvent(event: Event, operationContext: OperationContext): Unit 
= {
 event match {
   case preStatusUpdateEvent: LoadTablePreStatusUpdateEvent =>
-// skip merge index in case of insert stage flow
-if (null != 
operationContext.getProperty(CarbonCommonConstants.IS_INSERT_STAGE) &&

Review comment:
   This property also has to be removed from CarbonCommonConstants, as no 
more usage will be found





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

2020-10-23 Thread GitBox


akashrn5 commented on a change in pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#discussion_r510614551



##
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##
@@ -1427,6 +1427,25 @@ private CarbonCommonConstants() {
 
   public static final String BITSET_PIPE_LINE_DEFAULT = "true";
 
+  public static final String MICROSECONDS_IN_A_DAY = "8640";
+
+  /**
+   * this is the user defined time(in days), when a specific timestamp 
subdirectory in
+   * trash folder will expire
+   */
+  @CarbonProperty
+  public static final String TRASH_EXPIRATION_TIME = 
"carbon.trash.expiration.time";

Review comment:
   ```suggestion
 public static final String CARBON_TRASH_EXPIRATION_TIME = 
"carbon.trash.expiration.time";
   ```

##
File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##
@@ -1105,28 +1109,79 @@ public static void cleanSegments(CarbonTable table, 
List partitio
* @throws IOException
*/
   public static void deleteSegment(String tablePath, Segment segment,
-  List partitionSpecs,
-  SegmentUpdateStatusManager updateStatusManager) throws Exception {
+  List partitionSpecs, SegmentUpdateStatusManager 
updateStatusManager,
+  SegmentStatus segmentStatus, Boolean isPartitionTable, String timeStamp)

Review comment:
   please rename timeStamp, same as above comment

##
File path: 
core/src/main/java/org/apache/carbondata/core/util/DeleteLoadFolders.java
##
@@ -67,22 +69,23 @@ private static String 
getSegmentPath(AbsoluteTableIdentifier identifier,
   }
 
   public static void physicalFactAndMeasureMetadataDeletion(CarbonTable 
carbonTable,
-  LoadMetadataDetails[] newAddedLoadHistoryList,
-  boolean isForceDelete,
-  List specs) {
+  LoadMetadataDetails[] newAddedLoadHistoryList, boolean isForceDelete,
+  List specs, String timeStamp) {

Review comment:
   variable name

##
File path: 
core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##
@@ -2116,6 +2086,20 @@ public int getMaxSIRepairLimit(String dbName, String 
tableName) {
 return Math.abs(Integer.parseInt(thresholdValue));
   }
 
+  /**
+   * The below method returns the microseconds after which the trash folder 
will expire
+   */
+  public long getTrashFolderExpirationTime() {
+String configuredValue = 
getProperty(CarbonCommonConstants.TRASH_EXPIRATION_DAYS,
+CarbonCommonConstants.TRASH_EXPIRATION_DAYS_DEFAULT);
+int result = Integer.parseInt(configuredValue);

Review comment:
   it may throw numberFormatException

##
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##
@@ -1427,6 +1427,25 @@ private CarbonCommonConstants() {
 
   public static final String BITSET_PIPE_LINE_DEFAULT = "true";
 
+  public static final String MICROSECONDS_IN_A_DAY = "8640";
+
+  /**
+   * this is the user defined time(in days), when a specific timestamp 
subdirectory in
+   * trash folder will expire
+   */
+  @CarbonProperty
+  public static final String TRASH_EXPIRATION_TIME = 
"carbon.trash.expiration.time";
+
+  /**
+   * Default expiration time of trash folder is 3 days.
+   */
+  public static final String TRASH_EXPIRATION_TIME_DEFAULT = "3";

Review comment:
   ```suggestion
 public static final String CARBON_TRASH_EXPIRATION_TIME_DEFAULT = "3";
   ```

##
File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##
@@ -1033,7 +1034,7 @@ public static void commitDropPartitions(CarbonTable 
carbonTable, String uniqueId
* @throws IOException
*/
   public static void cleanSegments(CarbonTable table, List 
partitionSpecs,
-  boolean forceDelete) throws IOException {
+  String timeStamp, boolean forceDelete) throws IOException {

Review comment:
   what is this timestamp? please give a meaningful variable name

##
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##
@@ -1427,6 +1428,25 @@ private CarbonCommonConstants() {
 
   public static final String BITSET_PIPE_LINE_DEFAULT = "true";
 
+  public static final long MILLIS_SECONDS_IN_A_DAY = TimeUnit.DAYS.toMillis(1);
+
+  /**
+   * this is the user defined time(in days), when a specific timestamp 
subdirectory in
+   * trash folder will expire
+   */
+  @CarbonProperty
+  public static final String TRASH_EXPIRATION_DAYS = 
"carbon.trash.expiration.days";
+
+  /**
+   * Default expiration time of trash folder is 3 days.
+   */
+  public static final String TRASH_EXPIRATION_DAYS_DEFAULT = "3";

Review comment:
   ```suggestion
 public static final String CARBON_TRASH_EXPIRATION_DAYS_DEFAULT = "3";
   ```

##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java
##
@@ 

[GitHub] [carbondata] ShreelekhyaG commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

2020-10-23 Thread GitBox


ShreelekhyaG commented on pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#issuecomment-715268458


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3987: [CARBONDATA-4039] Support Local dictionary for Presto complex datatypes

2020-10-23 Thread GitBox


ajantha-bhat commented on a change in pull request #3987:
URL: https://github.com/apache/carbondata/pull/3987#discussion_r510810713



##
File path: 
integration/presto/src/test/prestosql/org/apache/carbondata/presto/server/PrestoTestUtil.scala
##
@@ -114,4 +114,60 @@ object PrestoTestUtil {
   }
 }
   }
+
+  // this method depends on prestodb jdbc PrestoArray class

Review comment:
   ```suggestion
 // this method depends on prestosql jdbc PrestoArray class
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akkio-97 commented on a change in pull request #3987: [CARBONDATA-4039] Support Local dictionary for Presto complex datatypes

2020-10-23 Thread GitBox


akkio-97 commented on a change in pull request #3987:
URL: https://github.com/apache/carbondata/pull/3987#discussion_r510819971



##
File path: 
integration/presto/src/test/prestosql/org/apache/carbondata/presto/server/PrestoTestUtil.scala
##
@@ -114,4 +114,60 @@ object PrestoTestUtil {
   }
 }
   }
+
+  // this method depends on prestodb jdbc PrestoArray class

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

2020-10-23 Thread GitBox


vikramahuja1001 commented on a change in pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#discussion_r510659146



##
File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##
@@ -1105,28 +1109,79 @@ public static void cleanSegments(CarbonTable table, 
List partitio
* @throws IOException
*/
   public static void deleteSegment(String tablePath, Segment segment,
-  List partitionSpecs,
-  SegmentUpdateStatusManager updateStatusManager) throws Exception {
+  List partitionSpecs, SegmentUpdateStatusManager 
updateStatusManager,
+  SegmentStatus segmentStatus, Boolean isPartitionTable, String timeStamp)
+  throws Exception {
 SegmentFileStore fileStore = new SegmentFileStore(tablePath, 
segment.getSegmentFileName());
 List indexOrMergeFiles = 
fileStore.readIndexFiles(SegmentStatus.SUCCESS, true,
 FileFactory.getConfiguration());
+List filesToDelete = new ArrayList<>();
 Map> indexFilesMap = fileStore.getIndexFilesMap();
 for (Map.Entry> entry : indexFilesMap.entrySet()) {
-  FileFactory.deleteFile(entry.getKey());
+  // Move the file to the trash folder in case the segment status is 
insert in progress
+  if (segmentStatus == SegmentStatus.INSERT_IN_PROGRESS) {
+if (!isPartitionTable) {
+  TrashUtil.moveDataToTrashFolderByFile(tablePath, entry.getKey(), 
timeStamp +
+  CarbonCommonConstants.FILE_SEPARATOR + 
CarbonCommonConstants.LOAD_FOLDER + segment
+  .getSegmentNo());
+} else {
+  TrashUtil.moveDataToTrashFolderByFile(tablePath, entry.getKey(), 
timeStamp +

Review comment:
   For normal table, we do like: timestamp/Segment_#, there is no use of 
having Fact and Part0 folders in trash
   For partition table, we do like: timestamp/Segment_#/partition_folder, the 
segment number is added so as the recovery can be segment wise.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #3970: [CARBONDATA-4007] Fix multiple issues in SDK

2020-10-23 Thread GitBox


kunal642 commented on pull request #3970:
URL: https://github.com/apache/carbondata/pull/3970#issuecomment-715121428


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #3974: [Carbondata-3999] Fix permission issue of indexServerTmp directory

2020-10-23 Thread GitBox


kunal642 commented on pull request #3974:
URL: https://github.com/apache/carbondata/pull/3974#issuecomment-715121915


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3972: [CARBONDATA-4042]Launch same number of task as select query for insert into select and ctas cases when target table is of no_sort

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3972:
URL: https://github.com/apache/carbondata/pull/3972#issuecomment-715138306


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2899/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3979: [Carbondata-3954] Fix insertion from ORC table into carbon table when sort scope is global sort

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3979:
URL: https://github.com/apache/carbondata/pull/3979#issuecomment-715139071


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4657/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3979: [Carbondata-3954] Fix insertion from ORC table into carbon table when sort scope is global sort

2020-10-23 Thread GitBox


ajantha-bhat commented on pull request #3979:
URL: https://github.com/apache/carbondata/pull/3979#issuecomment-715211651


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

2020-10-23 Thread GitBox


Indhumathi27 commented on pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#issuecomment-715226050


   @marchpure please update the PR description for MergeIndex changes also



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3987: [CARBONDATA-4039] Support Local dictionary for Presto complex datatypes

2020-10-23 Thread GitBox


ajantha-bhat commented on a change in pull request #3987:
URL: https://github.com/apache/carbondata/pull/3987#discussion_r510813895



##
File path: 
integration/presto/src/main/prestodb/org/apache/carbondata/presto/readers/ComplexTypeStreamReader.java
##
@@ -139,7 +139,7 @@ public void putComplexObject(List offsetVector) {
   Block rowBlock = RowBlock
   .fromFieldBlocks(childBlocks.get(0).getPositionCount(), 
Optional.empty(),
   childBlocks.toArray(new Block[0]));
-  for (int position = 0; position < childBlocks.get(0).getPositionCount(); 
position++) {
+  for (int position = 0; position < offsetVector.size(); position++) {

Review comment:
   please check agian. Both prestodb and prestosql has this class
   
   
integration/presto/src/main/prestosql/org/apache/carbondata/presto/readers/ComplexTypeStreamReader.java
   
integration/presto/src/main/prestodb/org/apache/carbondata/presto/readers/ComplexTypeStreamReader.java





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-4043) Fix data load failure issue for columns added in legacy store

2020-10-23 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-4043:


 Summary: Fix data load failure issue for columns added in legacy 
store
 Key: CARBONDATA-4043
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4043
 Project: CarbonData
  Issue Type: Bug
Affects Versions: 2.1.0
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#issuecomment-714983281


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2897/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on pull request #3982: [CARBONDATA-4032] Fix drop partition command clean data issue

2020-10-23 Thread GitBox


QiangCai commented on pull request #3982:
URL: https://github.com/apache/carbondata/pull/3982#issuecomment-715048761


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994#issuecomment-715172063


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4659/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3987: [CARBONDATA-4039] Support Local dictionary for Presto complex datatypes

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3987:
URL: https://github.com/apache/carbondata/pull/3987#issuecomment-715194344


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2904/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

2020-10-23 Thread GitBox


vikramahuja1001 commented on pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#issuecomment-714957473


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

2020-10-23 Thread GitBox


ajantha-bhat commented on a change in pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994#discussion_r510671271



##
File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##
@@ -398,27 +398,29 @@ public static void 
mergeIndexAndWriteSegmentFile(CarbonTable carbonTable, String
* @throws IOException
*/
   public static String writeSegmentFile(CarbonTable carbonTable, String 
segmentId, String UUID,

Review comment:
   I have pushed now. waiting for the build





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-23 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r510677675



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1138,73 +1126,36 @@ private static Boolean 
checkUpdateDeltaFilesInSeg(Segment seg,
   }
 
   /**
-   * Check is the segment passed qualifies for IUD delete delta compaction or 
not i.e.
-   * if the number of delete delta files present in the segment is more than
-   * numberDeltaFilesThreshold.
+   * Check whether the segment passed qualifies for IUD delete delta 
compaction or not,
+   * i.e., if the number of delete delta files present in the segment is more 
than
+   * numberDeltaFilesThreshold, this segment will be selected.
*
-   * @param seg
-   * @param segmentUpdateStatusManager
-   * @param numberDeltaFilesThreshold
-   * @return
+   * @param seg segment to be qualified
+   * @param segmentUpdateStatusManager segments & blocks details management
+   * @param numberDeltaFilesThreshold threshold of delete delta files
+   * @return block list of the segment
*/
-  private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
+  private static List checkDeleteDeltaFilesInSeg(Segment seg,
   SegmentUpdateStatusManager segmentUpdateStatusManager, int 
numberDeltaFilesThreshold) {
 
+List blockLists = new ArrayList<>();
 Set uniqueBlocks = new HashSet();
 List blockNameList =
 segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
-
-for (final String blockName : blockNameList) {
-
-  CarbonFile[] deleteDeltaFiles =
+for (String blockName : blockNameList) {
+  List deleteDeltaFiles =
   segmentUpdateStatusManager.getDeleteDeltaFilesList(seg, blockName);
-  if (null != deleteDeltaFiles) {
-// The Delete Delta files may have Spill over blocks. Will consider 
multiple spill over
-// blocks as one. Currently DeleteDeltaFiles array contains Delete 
Delta Block name which
-// lies within Delete Delta Start TimeStamp and End TimeStamp. In 
order to eliminate
-// Spill Over Blocks will choose files with unique taskID.
-for (CarbonFile blocks : deleteDeltaFiles) {
-  // Get Task ID and the Timestamp from the Block name for e.g.
-  // part-0-3-1481084721319.carbondata => "3-1481084721319"
-  String task = 
CarbonTablePath.DataFileUtil.getTaskNo(blocks.getName());
-  String timestamp =
-  
CarbonTablePath.DataFileUtil.getTimeStampFromDeleteDeltaFile(blocks.getName());
-  String taskAndTimeStamp = task + "-" + timestamp;
+  if (null != deleteDeltaFiles && deleteDeltaFiles.size() > 
numberDeltaFilesThreshold) {
+for (String file : deleteDeltaFiles) {

Review comment:
   formatted and modified the variable name





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #3982: [CARBONDATA-4032] Fix drop partition command clean data issue

2020-10-23 Thread GitBox


QiangCai commented on a change in pull request #3982:
URL: https://github.com/apache/carbondata/pull/3982#discussion_r510680548



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/partition/CarbonAlterTableDropHivePartitionCommand.scala
##
@@ -181,9 +182,11 @@ case class CarbonAlterTableDropHivePartitionCommand(
   OperationListenerBus.getInstance().fireEvent(postStatusEvent, 
operationContext)
 
   
IndexStoreManager.getInstance().clearIndex(table.getAbsoluteTableIdentifier)
+  tobeCleanSegs.addAll(tobeUpdatedSegs)
+  tobeCleanSegs.addAll(tobeDeletedSegs)

Review comment:
   add all twice?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] asfgit closed pull request #3982: [CARBONDATA-4032] Fix drop partition command clean data issue

2020-10-23 Thread GitBox


asfgit closed pull request #3982:
URL: https://github.com/apache/carbondata/pull/3982


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (CARBONDATA-4042) Insert into select and CTAS launches fewer tasks(task count limited to number of nodes in cluster) even when target table is of no_sort

2020-10-23 Thread Venugopal Reddy K (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venugopal Reddy K updated CARBONDATA-4042:
--
Description: 
*Issue:*

At present, When we do insert into table select from or create table as select 
from, we lauch one single task per node. Whereas when we do a simple select * 
from table query, tasks launched are equal to number of carbondata 
files(CARBON_TASK_DISTRIBUTION default is CARBON_TASK_DISTRIBUTION_BLOCK). 

Thus, slows down the load performance of insert into select and ctas cases.

Refer [Community discussion regd. task 
lauch|http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Query-Regarding-Task-launch-mechanism-for-data-load-operations-tt98711.html]

 

*Suggestion:*

Lauch the same number of tasks as in select query for insert into select and 
ctas cases when the target table is of no-sort.

 

SI creationSI creation
1. DDL -> Parser -> CarbonCreateSecondaryIndexCommand do all Validations(list 
important once) acquireLockForSecondaryIndexCreation()acquire locks(compact, 
meta, dele_seg lock) preparetableInfo(prepare column schema, set positionref as 
sort,inherit local dict from main table ) for SI table & 
addIndexInfoToParentTable (create indexinfo and add to main table) 
CreateTablePreExecutionEvent(for acl work) Create SI 
table(sparksession.sql(create ...)) addIndexTableInfo, refreshTable index 
table, add indexInfo to hive metastore as Serde 
addOrModifyTableProperty(indexTableExists -> true) and refresh table(refresh 
catalog table) 2. try load, LoadDataForSecondaryIndex 1. prepare load model for 
SI 2. read table status and setinto load model 3. if loadmeta is empty, just 
return, else start load to SI 4. getValidseg, if yes go ahead, else return 5. 
prepare segmentIdToLoadStartTimeMapping, and prepare secindeModel 6. create 
exeSer based on threadpool size for parallel load of segments to SI 7. 
LoadTableSIPreExecutionEvent(ACL load events) 8. try toget seg local for all 
valid segment, if u get for all , add to valid, else add to skipped segment 9. 
start load for valid segs, update SI table status for in progress 10. if sort 
scope not global sort CarbonSecondaryIndexRDD internalGetPartitions 
prepareInputFormat and getSplits() internalCompute Sort blocks, 
prepareTaskBlockMap, prepare CarbonSecondaryIndexExecutor 
exec.processTableBlocks(prepare query model and execute query and return 
Iterator) SecondaryIndexQueryResultProcessor(prepare seg prop 
from processingQuery result) 
SecondaryIndexQueryResultProcessor.processQueryResult(init tempLoc, sort data 
rows,processResult(does sort on data in iterators) prepareRowObjectForSorting 
addRowForSorting and startSorting 
initializeFinalThreadMergerForMergeSort();,initDataHandler();readAndLoadDataFromSortTempFiles();
 write the carbon files to indextable store path WriteSegmentfile getLoadresult 
from future and make, SUccess and failed seg list if (failedSeglist not empty) 
if (isCompactionCall || !isLoadToFailedSISegments) \{ fail the SI load } else 
\{ just make markedfordelet and next load take care } } else create projections 
list including PR and create a datafram from MT loadDataUsingGlobalSort 
writeSegmentFile getLoadresult from future and make, SUccess and failed seg 
list if (failedSeglist not empty) if (isCompactionCall || 
!isLoadToFailedSISegments) \{ fail the SI load } else \{ just make 
markedfordelet and next load take care } } 11. if (successSISegments.nonEmpty 
&& !isCompactionCall) update status for in progress (can avoid this) 
mergeIndexFiles writeSegmentFile(can be avoided, shreelekya working on it) 
readTable statusfile and prepareLoad model for merge datafiles 
mergeDataFilesSISegments -> scanSegmentsAndSubmitJob -> triggerCompaction ->  
CarbonSIRebuildRDD internalGetPartitions prepareInputFormat and getSplits() 
internalCompute CarbonCompactionExecutor.processTableBlocks() close(delete old 
data files) deleteOldIndexOrMergeIndexFiles writeSegmentFile for each 
mergedSegment updateTableStatusFile readTableStatusFile 
writeLoadDetailsIntoFile(updated, new Index and datasize into tablestatus file) 
mergeIndexFiles for newly generated index files for merged data files If 
IndexServer enabled clear cahce else clear driver cache 12. update table staus 
for success 13. if (!isCompactionCall) { triggerPrepriming(trigger pre priming 
for SI) 14. if (failedSISegments.nonEmpty && !isCompactionCall) { update table 
status for MFD 15. if (!isCompactionCall) { LoadTableSIPostExecutionEvent 16. 
if skippedSegmentNotEmpty, isSITableEnabled to false 17. 
deleteLoadsAndUpdateMetadata 18. Relase segment locks3. if 
checkMainTableSegEqualToSISeg isSITableEnabled - true 4. 
CreateTablePostExecutionEvent 5. releaseLocks(meta, dele_seg, compact) 

 

 

 

 

 

 

 

Refresh issue1. calling 3 refresh avoid it2. check if dummy is req or not3. it 
inherits same sort 

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

2020-10-23 Thread GitBox


ajantha-bhat commented on a change in pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#discussion_r510707186



##
File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##
@@ -1105,28 +1109,79 @@ public static void cleanSegments(CarbonTable table, 
List partitio
* @throws IOException
*/
   public static void deleteSegment(String tablePath, Segment segment,
-  List partitionSpecs,
-  SegmentUpdateStatusManager updateStatusManager) throws Exception {
+  List partitionSpecs, SegmentUpdateStatusManager 
updateStatusManager,
+  SegmentStatus segmentStatus, Boolean isPartitionTable, String timeStamp)
+  throws Exception {
 SegmentFileStore fileStore = new SegmentFileStore(tablePath, 
segment.getSegmentFileName());
 List indexOrMergeFiles = 
fileStore.readIndexFiles(SegmentStatus.SUCCESS, true,
 FileFactory.getConfiguration());
+List filesToDelete = new ArrayList<>();
 Map> indexFilesMap = fileStore.getIndexFilesMap();
 for (Map.Entry> entry : indexFilesMap.entrySet()) {
-  FileFactory.deleteFile(entry.getKey());
+  // Move the file to the trash folder in case the segment status is 
insert in progress
+  if (segmentStatus == SegmentStatus.INSERT_IN_PROGRESS) {
+if (!isPartitionTable) {
+  TrashUtil.copyDataToTrashFolderByFile(tablePath, entry.getKey(), 
timeStamp +

Review comment:
   why not copy whole segment ? why copying file by file.
   Multiple interactions to file system may become bottleneck for concurrent 
queries. Suggest to copy whole segment  once.
   

##
File path: 
core/src/main/java/org/apache/carbondata/core/util/path/CarbonTablePath.java
##
@@ -47,6 +47,7 @@
   public static final String BATCH_PREFIX = "_batchno";
   private static final String LOCK_DIR = "LockFiles";
 
+  public static final String SEGMENTS_FOLDER = "segments";

Review comment:
   ```suggestion
 public static final String SEGMENTS_METADATA_FOLDER = "segments";
   ```

##
File path: 
core/src/main/java/org/apache/carbondata/core/util/path/TrashUtil.java
##
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.util.path;
+
+import java.io.File;
+import java.io.IOException;
+import java.sql.Timestamp;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.exception.CarbonFileException;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.commons.io.FileUtils;
+
+import org.apache.log4j.Logger;
+
+public final class TrashUtil {
+
+  private static final Logger LOGGER =
+  LogServiceFactory.getLogService(CarbonUtil.class.getName());
+
+  /**
+   * The below method copies the complete a file to the trash folder. Provide 
necessary
+   * timestamp and the segment number in the suffixToAdd  variable, so that 
the proper folder is
+   * created in the trash folder.
+   */
+  public static void copyDataToTrashFolderByFile(String carbonTablePath, 
String pathOfFileToCopy,
+  String suffixToAdd) {
+String trashFolderPath = CarbonTablePath.getTrashFolder(carbonTablePath) +
+CarbonCommonConstants.FILE_SEPARATOR + suffixToAdd;
+try {
+  if (new File(pathOfFileToCopy).exists()) {
+FileUtils.copyFileToDirectory(new File(pathOfFileToCopy), new 
File(trashFolderPath));
+LOGGER.info("File: " + pathOfFileToCopy + " successfully copied to the 
trash folder: "
++ trashFolderPath);
+  }
+} catch (IOException e) {
+  LOGGER.error("Unable to copy " + pathOfFileToCopy + " to the trash 
folder", e);
+}
+  }
+
+  /**
+   * The below method copies the complete segment folder to the trash folder. 
Provide necessary
+   * timestamp and the segment number in the suffixToAdd  variable, so that 
the proper folder is
+   * created in the trash folder.
+   */
+  public 

[jira] [Resolved] (CARBONDATA-4035) MV table is not hit when sum() is applied on decimal column.

2020-10-23 Thread Indhumathi Muthu Murugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthu Murugesh resolved CARBONDATA-4035.
---
Fix Version/s: 2.1.0
   Resolution: Fixed

> MV table is not hit when sum() is applied on decimal column.
> 
>
> Key: CARBONDATA-4035
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4035
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Akash R Nilugal
>Assignee: Akash R Nilugal
>Priority: Minor
> Fix For: 2.1.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> MV table is not hit when sum() is applied on decimal column.
> sql("drop table if exists sum_agg_decimal")
> sql("create table sum_agg_decimal(salary1 decimal(7,2),salary2 
> decimal(7,2),salary3 decimal(7,2),salary4 decimal(7,2),empname string) stored 
> as carbondata")
> sql("drop materialized view if exists decimal_mv")
> sql("create materialized view decimal_mv as select empname, sum(salary1 - 
> salary2) from sum_agg_decimal group by empname")
> sql("explain select empname, sum( salary1 - salary2) from sum_agg_decimal 
> group by empname").show(false)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3914: [CARBONDATA-3979] Added Hive local dictionary support example

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3914:
URL: https://github.com/apache/carbondata/pull/3914#issuecomment-715233479


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2907/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3987: [CARBONDATA-4039] Support Local dictionary for Presto complex datatypes

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3987:
URL: https://github.com/apache/carbondata/pull/3987#issuecomment-715270525


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4663/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akkio-97 commented on a change in pull request #3987: [CARBONDATA-4039] Support Local dictionary for Presto complex datatypes

2020-10-23 Thread GitBox


akkio-97 commented on a change in pull request #3987:
URL: https://github.com/apache/carbondata/pull/3987#discussion_r510812324



##
File path: 
integration/presto/src/main/prestodb/org/apache/carbondata/presto/readers/ComplexTypeStreamReader.java
##
@@ -139,7 +139,7 @@ public void putComplexObject(List offsetVector) {
   Block rowBlock = RowBlock
   .fromFieldBlocks(childBlocks.get(0).getPositionCount(), 
Optional.empty(),
   childBlocks.toArray(new Block[0]));
-  for (int position = 0; position < childBlocks.get(0).getPositionCount(); 
position++) {
+  for (int position = 0; position < offsetVector.size(); position++) {

Review comment:
   We don't have that file in prestoDB profile





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akkio-97 commented on a change in pull request #3987: [CARBONDATA-4039] Support Local dictionary for Presto complex datatypes

2020-10-23 Thread GitBox


akkio-97 commented on a change in pull request #3987:
URL: https://github.com/apache/carbondata/pull/3987#discussion_r510819729



##
File path: 
integration/presto/src/main/prestodb/org/apache/carbondata/presto/readers/ComplexTypeStreamReader.java
##
@@ -139,7 +139,7 @@ public void putComplexObject(List offsetVector) {
   Block rowBlock = RowBlock
   .fromFieldBlocks(childBlocks.get(0).getPositionCount(), 
Optional.empty(),
   childBlocks.toArray(new Block[0]));
-  for (int position = 0; position < childBlocks.get(0).getPositionCount(); 
position++) {
+  for (int position = 0; position < offsetVector.size(); position++) {

Review comment:
   It had passed earlier because this is just refactoring. I have anyway 
added changes in other profile.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#issuecomment-714943018


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2898/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#issuecomment-714944897


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4654/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (CARBONDATA-4042) Insert into select and CTAS launches fewer tasks(task count limited to number of nodes in cluster) even when target table is of no_sort

2020-10-23 Thread Venugopal Reddy K (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venugopal Reddy K updated CARBONDATA-4042:
--
Summary: Insert into select and CTAS launches fewer tasks(task count 
limited to number of nodes in cluster) even when target table is of no_sort  
(was: Insert into select and CTAS launches fewer tasks(limited to max nodes) 
even when target table is of no_sort)

> Insert into select and CTAS launches fewer tasks(task count limited to number 
> of nodes in cluster) even when target table is of no_sort
> ---
>
> Key: CARBONDATA-4042
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4042
> Project: CarbonData
>  Issue Type: Improvement
>  Components: data-load, spark-integration
>Reporter: Venugopal Reddy K
>Priority: Major
>
> *Issue:*
> At present, When we do insert into table select from or create table as 
> select from, we lauch one single task per node. Whereas when we do a simple 
> select * from table query, tasks launched are equal to number of carbondata 
> files(CARBON_TASK_DISTRIBUTION default is CARBON_TASK_DISTRIBUTION_BLOCK). 
> Thus, slows down the load performance of insert into select and ctas cases.
> Refer [Community discussion regd. task 
> lauch|http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Query-Regarding-Task-launch-mechanism-for-data-load-operations-tt98711.html]
>  
> *Suggestion:*
> Lauch the same number of tasks as in select query for insert into select and 
> ctas cases when the target table is of no-sort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#issuecomment-715146278


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4656/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-715197754


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2905/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-715196563


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4660/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3987: [CARBONDATA-4039] Support Local dictionary for Presto complex datatypes

2020-10-23 Thread GitBox


ajantha-bhat commented on a change in pull request #3987:
URL: https://github.com/apache/carbondata/pull/3987#discussion_r510810498



##
File path: 
integration/presto/src/main/prestodb/org/apache/carbondata/presto/readers/ComplexTypeStreamReader.java
##
@@ -139,7 +139,7 @@ public void putComplexObject(List offsetVector) {
   Block rowBlock = RowBlock
   .fromFieldBlocks(childBlocks.get(0).getPositionCount(), 
Optional.empty(),
   childBlocks.toArray(new Block[0]));
-  for (int position = 0; position < childBlocks.get(0).getPositionCount(); 
position++) {
+  for (int position = 0; position < offsetVector.size(); position++) {

Review comment:
   missed this to handle at presto sql file, I wonder how the test case 
passed for prestosql





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3875: [CARBONDATA-3934]Support write transactional table with presto.

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#issuecomment-714982722


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4653/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3979: [Carbondata-3954] Fix insertion from ORC table into carbon table when sort scope is global sort

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3979:
URL: https://github.com/apache/carbondata/pull/3979#issuecomment-715183340


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2901/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] asfgit closed pull request #3979: [Carbondata-3954] Fix insertion from ORC table into carbon table when sort scope is global sort

2020-10-23 Thread GitBox


asfgit closed pull request #3979:
URL: https://github.com/apache/carbondata/pull/3979


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] asfgit closed pull request #3974: [Carbondata-3999] Fix permission issue of indexServerTmp directory

2020-10-23 Thread GitBox


asfgit closed pull request #3974:
URL: https://github.com/apache/carbondata/pull/3974


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-3999) The permission of IndexServer's temporary directory /tmp/indexservertmp is not 777 after running sometime.

2020-10-23 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-3999.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> The permission of IndexServer's temporary directory /tmp/indexservertmp is 
> not 777 after running sometime.
> --
>
> Key: CARBONDATA-3999
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3999
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 2.0.0
>Reporter: renhao
>Priority: Critical
>  Labels: IndexServer
> Fix For: 2.1.0
>
> Attachments: 4700942c-3158-424f-8861-3dfcb6fae205.png
>
>
> 1.start index server in FI.check the permission of "/tmp/indexservertmp" in 
> hdfs is 777;
> 2.run sometime,an error occured when using indexserver,and check the 
> permission of "/tmp/indexservertmp" became 755



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] ajantha-bhat commented on pull request #3987: [CARBONDATA-4039] Support Local dictionary for Presto complex datatypes

2020-10-23 Thread GitBox


ajantha-bhat commented on pull request #3987:
URL: https://github.com/apache/carbondata/pull/3987#issuecomment-715333548


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-3979) Added Hive local dictionary support example

2020-10-23 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-3979.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> Added Hive local dictionary support example
> ---
>
> Key: CARBONDATA-3979
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3979
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Minor
> Fix For: 2.1.0
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
>  To verify local dictionary support in hive for the carbon tables created 
> from spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] asfgit closed pull request #3914: [CARBONDATA-3979] Added Hive local dictionary support example

2020-10-23 Thread GitBox


asfgit closed pull request #3914:
URL: https://github.com/apache/carbondata/pull/3914


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure commented on a change in pull request #3981: [CARBONDATA-4031] Incorrect query result after Update/Delete and Inse…

2020-10-23 Thread GitBox


marchpure commented on a change in pull request #3981:
URL: https://github.com/apache/carbondata/pull/3981#discussion_r510863739



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/merge/MergeTestCase.scala
##
@@ -723,6 +723,15 @@ class MergeTestCase extends QueryTest with 
BeforeAndAfterAll {
 assert(getDeleteDeltaFileCount("target", "0") == 0)
 checkAnswer(sql("select count(*) from target"), Seq(Row(3)))
 checkAnswer(sql("select * from target order by key"), Seq(Row("c", "200"), 
Row("d", "3"), Row("e", "100")))
+
+// insert overwrite a partition. make sure the merge executed before still 
works.
+sql(
+  """insert overwrite table target
+| partition (value=3)
+| select * from target where value = 100""".stripMargin)
+checkAnswer(sql("select * from target"), Seq(Row("c", "200"), Row("e", 
"3"), Row("e", "100")))

Review comment:
   I have modifed code according to your suggestion

##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/iud/UpdateCarbonTableTestCase.scala
##
@@ -69,6 +69,60 @@ class UpdateCarbonTableTestCase extends QueryTest with 
BeforeAndAfterAll {
 sql("""drop table iud.zerorows""")
   }
 
+  test("update and insert overwrite partition") {
+sql("""drop table if exists iud.updateinpartition""")
+sql(
+  """CREATE TABLE iud.updateinpartition (id STRING, sales INT)
+| PARTITIONED BY (dtm STRING)
+| STORED AS carbondata""".stripMargin)
+sql(
+  s"""load data local
+ | inpath '$resourcesPath/IUD/updateinpartition.csv' into table 
updateinpartition""".stripMargin)
+sql(
+  """update iud.updateinpartition u set (u.sales) = (u.sales + 1) where 
id='001'""".stripMargin)
+sql(
+  """update iud.updateinpartition u set (u.sales) = (u.sales + 2) where 
id='011'""".stripMargin)
+
+// delete data from a partition, make sure the update executed before 
still works.
+sql("""delete from updateinpartition where dtm=20200908 and 
id='012'""".stripMargin)
+checkAnswer(
+  sql("""select sales from iud.updateinpartition where 
id='001'""".stripMargin), Seq(Row(1))
+)
+checkAnswer(
+  sql("""select sales from iud.updateinpartition where 
id='011'""".stripMargin), Seq(Row(2))
+)
+checkAnswer(
+  sql("""select sales from iud.updateinpartition where 
id='012'""".stripMargin), Seq()
+)
+
+// insert overwrite a partition. make sure the update executed before 
still works.
+sql(
+  """insert overwrite table iud.updateinpartition
+| partition (dtm=20200908)
+| select * from iud.updateinpartition where dtm = 
20200907""".stripMargin)
+checkAnswer(
+  sql(
+"""select sales from iud.updateinpartition
+  | where dtm=20200908 and id='001'""".stripMargin), Seq(Row(1))
+)
+checkAnswer(
+  sql(
+"""select sales from iud.updateinpartition
+  | where dtm=20200908 and id='001'""".stripMargin), Seq(Row(1))
+)

Review comment:
   I have modifed code according to your suggestion

##
File path: integration/spark/src/test/resources/IUD/updateinpartition.csv
##
@@ -0,0 +1,21 @@
+id,sales,dtm
+001,0,20200907
+002,0,20200907
+003,0,20200907
+004,0,20200907
+005,0,20200907
+006,0,20200907
+007,0,20200907
+008,0,20200907
+009,0,20200907
+010,0,20200907
+011,0,20200908
+012,0,20200908
+013,0,20200908
+014,0,20200908
+015,0,20200908
+016,0,20200908
+017,0,20200908
+018,0,20200908
+019,0,20200908
+020,0,20200908

Review comment:
   I have modifed code according to your suggestion





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3987: [CARBONDATA-4039] Support Local dictionary for Presto complex datatypes

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3987:
URL: https://github.com/apache/carbondata/pull/3987#issuecomment-715331462


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4667/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3995: [CARBONDATA-4043] Fix data load failure issue for columns added in legacy store

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3995:
URL: https://github.com/apache/carbondata/pull/3995#issuecomment-715331700


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2911/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-23 Thread GitBox


akashrn5 commented on pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#issuecomment-715336355







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-4039) Support Local dictionary for presto complex datatypes

2020-10-23 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-4039.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> Support Local dictionary for presto complex datatypes
> -
>
> Key: CARBONDATA-4039
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4039
> Project: CarbonData
>  Issue Type: New Feature
>  Components: core, presto-integration
>Reporter: Akshay
>Priority: Major
> Fix For: 2.1.0
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Support Local dictionary for presto complex datatypes - 
> Presto complex datatypes - array and struct only.
> [https://github.com/apache/carbondata/pull/3987]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3981: [CARBONDATA-4031] Incorrect query result after Update/Delete and Inse…

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3981:
URL: https://github.com/apache/carbondata/pull/3981#issuecomment-715371390


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4671/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#issuecomment-715314491


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2909/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#issuecomment-715314385


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4664/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] asfgit closed pull request #3987: [CARBONDATA-4039] Support Local dictionary for Presto complex datatypes

2020-10-23 Thread GitBox


asfgit closed pull request #3987:
URL: https://github.com/apache/carbondata/pull/3987


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3914: [CARBONDATA-3979] Added Hive local dictionary support example

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3914:
URL: https://github.com/apache/carbondata/pull/3914#issuecomment-715339301


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4669/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#issuecomment-715375389


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4670/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#issuecomment-715375887


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2914/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3981: [CARBONDATA-4031] Incorrect query result after Update/Delete and Inse…

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3981:
URL: https://github.com/apache/carbondata/pull/3981#issuecomment-715375742


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2915/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3994: [CARBONDATA-4040] Fix data mismatch incase of compaction failure and retry success

2020-10-23 Thread GitBox


ajantha-bhat commented on a change in pull request #3994:
URL: https://github.com/apache/carbondata/pull/3994#discussion_r510987950



##
File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##
@@ -398,27 +398,29 @@ public static void 
mergeIndexAndWriteSegmentFile(CarbonTable carbonTable, String
* @throws IOException
*/
   public static String writeSegmentFile(CarbonTable carbonTable, String 
segmentId, String UUID,

Review comment:
   @QiangCai : I have thought about, only non update scenario I can handle 
this issue. For update there is no easy way currently to find out which is 
stale and which is not. One way for update is to read old segment file and add 
files that timestamp greater than old segment file content + old segment file 
content. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3995: [CARBONDATA-4043] Fix data load failure issue for columns added in legacy store

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3995:
URL: https://github.com/apache/carbondata/pull/3995#issuecomment-715332608


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4666/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3987: [CARBONDATA-4039] Support Local dictionary for Presto complex datatypes

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3987:
URL: https://github.com/apache/carbondata/pull/3987#issuecomment-715332125


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2912/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3914: [CARBONDATA-3979] Added Hive local dictionary support example

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3914:
URL: https://github.com/apache/carbondata/pull/3914#issuecomment-715349613


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2913/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3769: [WIP][Perf] Upgrade zstd-jni version to supportReusableBuffer

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3769:
URL: https://github.com/apache/carbondata/pull/3769#issuecomment-715395416


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4672/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#issuecomment-715470071


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4673/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#issuecomment-715472504


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2916/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3981: [CARBONDATA-4031] Incorrect query result after Update/Delete and Inse…

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3981:
URL: https://github.com/apache/carbondata/pull/3981#issuecomment-715532823


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4674/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3981: [CARBONDATA-4031] Incorrect query result after Update/Delete and Inse…

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3981:
URL: https://github.com/apache/carbondata/pull/3981#issuecomment-715535323


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2917/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-4042) Insert into select and CTAS launches fewer tasks(limited to max nodes) even when target table is of no_sort

2020-10-23 Thread Venugopal Reddy K (Jira)
Venugopal Reddy K created CARBONDATA-4042:
-

 Summary: Insert into select and CTAS launches fewer tasks(limited 
to max nodes) even when target table is of no_sort
 Key: CARBONDATA-4042
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4042
 Project: CarbonData
  Issue Type: Improvement
  Components: data-load, spark-integration
Reporter: Venugopal Reddy K


*Issue:*

At present, When we do insert into table select from or create table as select 
from, we lauch one single task per node. Whereas when we do a simple select * 
from table query, tasks launched are equal to number of carbondata 
files(CARBON_TASK_DISTRIBUTION default is CARBON_TASK_DISTRIBUTION_BLOCK). 

Thus, slows down the load performance of insert into select and ctas cases.

Refer [Community discussion regd. task 
lauch|http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Query-Regarding-Task-launch-mechanism-for-data-load-operations-tt98711.html]

 

*Suggestion:*

Lauch the same number of tasks as in select query for insert into select and 
ctas cases when the target table is of no-sort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] marchpure commented on a change in pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

2020-10-23 Thread GitBox


marchpure commented on a change in pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#discussion_r510663360



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala
##
@@ -430,9 +430,8 @@ class CarbonFileMetastore extends CarbonMetaStore {
 thriftWriter.open(FileWriteOperation.OVERWRITE)
 thriftWriter.write(thriftTableInfo)
 thriftWriter.close()
-val modifiedTime = System.currentTimeMillis()
-FileFactory.getCarbonFile(schemaFilePath).setLastModifiedTime(modifiedTime)
-updateSchemasUpdatedTime(identifier.getCarbonTableIdentifier.getTableId, 
modifiedTime)
+updateSchemasUpdatedTime(identifier.getCarbonTableIdentifier.getTableId,
+  System.currentTimeMillis())

Review comment:
   the setLastModifiedTime funciton in CarbonFileMetastore.scala  is 
related to other issue:
   when delete external table, data loss may happen. I will handle this in 
another PR.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akkio-97 commented on pull request #3987: [CARBONDATA-4039] Support Local dictionary for Presto complex datatypes

2020-10-23 Thread GitBox


akkio-97 commented on pull request #3987:
URL: https://github.com/apache/carbondata/pull/3987#issuecomment-715002753


   > please rebase, compile and push.
   > And I hope you have locally compiled prestodb and prestosql both.
   
   yes



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (CARBONDATA-4042) Insert into select and CTAS launches fewer tasks(task count limited to number of nodes in cluster) even when target table is of no_sort

2020-10-23 Thread Venugopal Reddy K (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venugopal Reddy K updated CARBONDATA-4042:
--
Description: 
*Issue:*

At present, When we do insert into table select from or create table as select 
from, we lauch one single task per node. Whereas when we do a simple select * 
from table query, tasks launched are equal to number of carbondata 
files(CARBON_TASK_DISTRIBUTION default is CARBON_TASK_DISTRIBUTION_BLOCK). 

Thus, slows down the load performance of insert into select and ctas cases.

Refer [Community discussion regd. task 
lauch|http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Query-Regarding-Task-launch-mechanism-for-data-load-operations-tt98711.html]

 

*Suggestion:*

Launch the same number of tasks as in select query for insert into select and 
ctas cases when the target table is of no-sort.

  was:
*Issue:*

At present, When we do insert into table select from or create table as select 
from, we lauch one single task per node. Whereas when we do a simple select * 
from table query, tasks launched are equal to number of carbondata 
files(CARBON_TASK_DISTRIBUTION default is CARBON_TASK_DISTRIBUTION_BLOCK). 

Thus, slows down the load performance of insert into select and ctas cases.

Refer [Community discussion regd. task 
lauch|http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Query-Regarding-Task-launch-mechanism-for-data-load-operations-tt98711.html]

 

*Suggestion:*

Lauch the same number of tasks as in select query for insert into select and 
ctas cases when the target table is of no-sort.

 

SI creationSI creation
1. DDL -> Parser -> CarbonCreateSecondaryIndexCommand do all Validations(list 
important once) acquireLockForSecondaryIndexCreation()acquire locks(compact, 
meta, dele_seg lock) preparetableInfo(prepare column schema, set positionref as 
sort,inherit local dict from main table ) for SI table & 
addIndexInfoToParentTable (create indexinfo and add to main table) 
CreateTablePreExecutionEvent(for acl work) Create SI 
table(sparksession.sql(create ...)) addIndexTableInfo, refreshTable index 
table, add indexInfo to hive metastore as Serde 
addOrModifyTableProperty(indexTableExists -> true) and refresh table(refresh 
catalog table) 2. try load, LoadDataForSecondaryIndex 1. prepare load model for 
SI 2. read table status and setinto load model 3. if loadmeta is empty, just 
return, else start load to SI 4. getValidseg, if yes go ahead, else return 5. 
prepare segmentIdToLoadStartTimeMapping, and prepare secindeModel 6. create 
exeSer based on threadpool size for parallel load of segments to SI 7. 
LoadTableSIPreExecutionEvent(ACL load events) 8. try toget seg local for all 
valid segment, if u get for all , add to valid, else add to skipped segment 9. 
start load for valid segs, update SI table status for in progress 10. if sort 
scope not global sort CarbonSecondaryIndexRDD internalGetPartitions 
prepareInputFormat and getSplits() internalCompute Sort blocks, 
prepareTaskBlockMap, prepare CarbonSecondaryIndexExecutor 
exec.processTableBlocks(prepare query model and execute query and return 
Iterator) SecondaryIndexQueryResultProcessor(prepare seg prop 
from processingQuery result) 
SecondaryIndexQueryResultProcessor.processQueryResult(init tempLoc, sort data 
rows,processResult(does sort on data in iterators) prepareRowObjectForSorting 
addRowForSorting and startSorting 
initializeFinalThreadMergerForMergeSort();,initDataHandler();readAndLoadDataFromSortTempFiles();
 write the carbon files to indextable store path WriteSegmentfile getLoadresult 
from future and make, SUccess and failed seg list if (failedSeglist not empty) 
if (isCompactionCall || !isLoadToFailedSISegments) \{ fail the SI load } else 
\{ just make markedfordelet and next load take care } } else create projections 
list including PR and create a datafram from MT loadDataUsingGlobalSort 
writeSegmentFile getLoadresult from future and make, SUccess and failed seg 
list if (failedSeglist not empty) if (isCompactionCall || 
!isLoadToFailedSISegments) \{ fail the SI load } else \{ just make 
markedfordelet and next load take care } } 11. if (successSISegments.nonEmpty 
&& !isCompactionCall) update status for in progress (can avoid this) 
mergeIndexFiles writeSegmentFile(can be avoided, shreelekya working on it) 
readTable statusfile and prepareLoad model for merge datafiles 
mergeDataFilesSISegments -> scanSegmentsAndSubmitJob -> triggerCompaction ->  
CarbonSIRebuildRDD internalGetPartitions prepareInputFormat and getSplits() 
internalCompute CarbonCompactionExecutor.processTableBlocks() close(delete old 
data files) deleteOldIndexOrMergeIndexFiles writeSegmentFile for each 
mergedSegment updateTableStatusFile readTableStatusFile 
writeLoadDetailsIntoFile(updated, new Index and datasize into tablestatus file) 

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3996: [DOC] Adjust document for partition table

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3996:
URL: https://github.com/apache/carbondata/pull/3996#issuecomment-715200172


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2906/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3996: [DOC] Adjust document for partition table

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3996:
URL: https://github.com/apache/carbondata/pull/3996#issuecomment-715200476


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4661/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (CARBONDATA-4043) Fix data load failure issue for columns added in legacy store

2020-10-23 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-4043:
-
Description: 
h3. When dimension is added in older versions like 1.1, by default it will be 
sort column. In sort step we assume data will be coming as sort column in the 
beginning. But the added column will be at last eventhough sort column. So, 
while building the dataload configurations for loading data, we rearrange the 
columns(dimensions and datafields) in order to bring the sort column to 
beginning and no-sort to last and revert them back to schema order before 
FinalMerge/DataWriter step.

Issue:
 Data loading is failing because of castException in data writing step in case 
of NO_SORT and in final sort step in case of LOCAL_SORT.

> Fix data load failure issue for columns added in legacy store
> -
>
> Key: CARBONDATA-4043
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4043
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>
> h3. When dimension is added in older versions like 1.1, by default it will be 
> sort column. In sort step we assume data will be coming as sort column in the 
> beginning. But the added column will be at last eventhough sort column. So, 
> while building the dataload configurations for loading data, we rearrange the 
> columns(dimensions and datafields) in order to bring the sort column to 
> beginning and no-sort to last and revert them back to schema order before 
> FinalMerge/DataWriter step.
> Issue:
>  Data loading is failing because of castException in data writing step in 
> case of NO_SORT and in final sort step in case of LOCAL_SORT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#issuecomment-715583950


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2918/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3988: [CARBONDATA-4037] Improve the table status and segment file writing

2020-10-23 Thread GitBox


CarbonDataQA1 commented on pull request #3988:
URL: https://github.com/apache/carbondata/pull/3988#issuecomment-715583322


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4675/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

2020-10-23 Thread GitBox


vikramahuja1001 commented on a change in pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#discussion_r510647379



##
File path: 
core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java
##
@@ -1105,28 +1109,79 @@ public static void cleanSegments(CarbonTable table, 
List partitio
* @throws IOException
*/
   public static void deleteSegment(String tablePath, Segment segment,
-  List partitionSpecs,
-  SegmentUpdateStatusManager updateStatusManager) throws Exception {
+  List partitionSpecs, SegmentUpdateStatusManager 
updateStatusManager,
+  SegmentStatus segmentStatus, Boolean isPartitionTable, String timeStamp)
+  throws Exception {
 SegmentFileStore fileStore = new SegmentFileStore(tablePath, 
segment.getSegmentFileName());
 List indexOrMergeFiles = 
fileStore.readIndexFiles(SegmentStatus.SUCCESS, true,
 FileFactory.getConfiguration());
+List filesToDelete = new ArrayList<>();
 Map> indexFilesMap = fileStore.getIndexFilesMap();
 for (Map.Entry> entry : indexFilesMap.entrySet()) {
-  FileFactory.deleteFile(entry.getKey());
+  // Move the file to the trash folder in case the segment status is 
insert in progress
+  if (segmentStatus == SegmentStatus.INSERT_IN_PROGRESS) {
+if (!isPartitionTable) {
+  TrashUtil.moveDataToTrashFolderByFile(tablePath, entry.getKey(), 
timeStamp +
+  CarbonCommonConstants.FILE_SEPARATOR + 
CarbonCommonConstants.LOAD_FOLDER + segment
+  .getSegmentNo());
+} else {
+  TrashUtil.moveDataToTrashFolderByFile(tablePath, entry.getKey(), 
timeStamp +
+  CarbonCommonConstants.FILE_SEPARATOR + 
CarbonCommonConstants.LOAD_FOLDER + segment
+  .getSegmentNo() + CarbonCommonConstants.FILE_SEPARATOR + 
entry.getKey().substring(
+tablePath.length() + 
1).split(CarbonCommonConstants.FILE_SEPARATOR)[0]);
+}
+  }
+  // add the file to the filesToDelete map to delete it after the complete 
segment
+  // has been copied.
+  filesToDelete.add(entry.getKey());
   for (String file : entry.getValue()) {
 String[] deltaFilePaths =
 updateStatusManager.getDeleteDeltaFilePath(file, 
segment.getSegmentNo());
 for (String deltaFilePath : deltaFilePaths) {
-  FileFactory.deleteFile(deltaFilePath);
+  // Move the file to the trash folder in case the segment status is 
insert in progress
+  if (segmentStatus == SegmentStatus.INSERT_IN_PROGRESS) {
+if (!isPartitionTable) {
+  TrashUtil.moveDataToTrashFolderByFile(tablePath, deltaFilePath, 
timeStamp +
+  CarbonCommonConstants.FILE_SEPARATOR + 
CarbonCommonConstants.LOAD_FOLDER + segment
+  .getSegmentNo());
+} else {
+  TrashUtil.moveDataToTrashFolderByFile(tablePath, deltaFilePath, 
timeStamp +
+  CarbonCommonConstants.FILE_SEPARATOR + 
CarbonCommonConstants.LOAD_FOLDER + segment
+  .getSegmentNo() + CarbonCommonConstants.FILE_SEPARATOR + 
deltaFilePath.substring(
+tablePath.length() + 
1).split(CarbonCommonConstants.FILE_SEPARATOR)[0]);
+}
+  }
+  filesToDelete.add(deltaFilePath);
+}
+// If the file to be deleted is a carbondata file, copy that file to 
the trash folder.
+if (file.endsWith(CarbonCommonConstants.FACT_FILE_EXT) && 
segmentStatus ==
+SegmentStatus.INSERT_IN_PROGRESS) {
+  if (!isPartitionTable) {
+TrashUtil.moveDataToTrashFolderByFile(tablePath, file, timeStamp +
+CarbonCommonConstants.FILE_SEPARATOR + 
CarbonCommonConstants.LOAD_FOLDER + segment
+.getSegmentNo());
+  } else {
+TrashUtil.moveDataToTrashFolderByFile(tablePath, file, timeStamp +
+CarbonCommonConstants.FILE_SEPARATOR + 
CarbonCommonConstants.LOAD_FOLDER + segment
+.getSegmentNo() + CarbonCommonConstants.FILE_SEPARATOR + 
file.substring(tablePath
+.length() + 1).split(CarbonCommonConstants.FILE_SEPARATOR)[0]);
+  }
 }
-FileFactory.deleteFile(file);
+filesToDelete.add(file);
   }
 }
-deletePhysicalPartition(partitionSpecs, indexFilesMap, indexOrMergeFiles, 
tablePath);
+LoadMetadataDetails loadMetaDataDetail = new LoadMetadataDetails();
+loadMetaDataDetail.setSegmentStatus(segmentStatus);
+loadMetaDataDetail.setLoadName(segment.getSegmentNo());
+deletePhysicalPartition(partitionSpecs, indexFilesMap, indexOrMergeFiles, 
tablePath,
+loadMetaDataDetail, filesToDelete, timeStamp);
 String segmentFilePath =
 CarbonTablePath.getSegmentFilePath(tablePath, 
segment.getSegmentFileName());
 // Deletes the physical segment file
 FileFactory.deleteFile(segmentFilePath);

Review comment:
   no, 

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3917: [CARBONDATA-3978] Clean Files Refactor and support for trash folder in carbondata

2020-10-23 Thread GitBox


vikramahuja1001 commented on a change in pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#discussion_r510647134



##
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##
@@ -1427,6 +1427,25 @@ private CarbonCommonConstants() {
 
   public static final String BITSET_PIPE_LINE_DEFAULT = "true";
 
+  public static final String MICROSECONDS_IN_A_DAY = "8640";

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org