[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3935: [CARBONDATA-3993] Remove auto data deletion in IUD processs

2020-10-11 Thread GitBox


CarbonDataQA1 commented on pull request #3935:
URL: https://github.com/apache/carbondata/pull/3935#issuecomment-706875962


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2613/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3935: [CARBONDATA-3993] Remove auto data deletion in IUD processs

2020-10-11 Thread GitBox


CarbonDataQA1 commented on pull request #3935:
URL: https://github.com/apache/carbondata/pull/3935#issuecomment-706873519


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4363/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3924: [CARBONDATA-3988] Allow SI creation on first dimension column

2020-10-11 Thread GitBox


ajantha-bhat commented on pull request #3924:
URL: https://github.com/apache/carbondata/pull/3924#issuecomment-706872243


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 closed pull request #3971: [WIP] Do not clean stale data

2020-10-11 Thread GitBox


Indhumathi27 closed pull request #3971:
URL: https://github.com/apache/carbondata/pull/3971


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #3914: [CARBONDATA-3979] Added Hive local dictionary support example

2020-10-11 Thread GitBox


kunal642 commented on pull request #3914:
URL: https://github.com/apache/carbondata/pull/3914#issuecomment-706868808


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3948: [WIP] Analyze random 11 testcase failure in CI

2020-10-11 Thread GitBox


ajantha-bhat commented on pull request #3948:
URL: https://github.com/apache/carbondata/pull/3948#issuecomment-706862593


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3948: [WIP] Analyze random 11 testcase failure in CI

2020-10-11 Thread GitBox


CarbonDataQA1 commented on pull request #3948:
URL: https://github.com/apache/carbondata/pull/3948#issuecomment-706860685


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2612/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure opened a new pull request #3978: [CARBONDATA-4028] Fix failed to unlock during update

2020-10-11 Thread GitBox


marchpure opened a new pull request #3978:
URL: https://github.com/apache/carbondata/pull/3978


   
### Why is this PR needed?
   1. In the update flow, we unpresist dataset before unlocking. unlock will 
fail once the dataset unpresist is interrupted.
   2. cleanStaleDeltaFiles will hold the lock, which degrade the concurrency 
perf a lot.

### What changes were proposed in this PR?
   1. unlock before unpresisting dataset
   2. cleanStaleDeltaFiles won't hold the lock.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3948: [WIP] Analyze random 11 testcase failure in CI

2020-10-11 Thread GitBox


CarbonDataQA1 commented on pull request #3948:
URL: https://github.com/apache/carbondata/pull/3948#issuecomment-706858431


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4362/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3935: [CARBONDATA-3993] Remove auto data deletion in IUD processs

2020-10-11 Thread GitBox


akashrn5 commented on a change in pull request #3935:
URL: https://github.com/apache/carbondata/pull/3935#discussion_r503037126



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##
@@ -267,9 +266,8 @@ object CarbonDataRDDFactory {
 throw new Exception("Exception in compaction " + 
exception.getMessage)
   }
 } finally {
-  executor.shutdownNow()
   try {
-compactor.deletePartialLoadsInCompaction()

Review comment:
   @Pickupolddriver We cannot remove the clean stale files in case of 
IUD and wait for clean files command to clean them, we should immediately clean 
the stale ones in the respective command itself, as there will be chances of 
extra data or data inconsistency. @QiangCai we can avoid this may be once we 
implement the writing the update data to new segment and writing only the 
delete delta files to the updated segment.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3935: [CARBONDATA-3993] Remove auto data deletion in IUD processs

2020-10-11 Thread GitBox


akashrn5 commented on a change in pull request #3935:
URL: https://github.com/apache/carbondata/pull/3935#discussion_r503037126



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##
@@ -267,9 +266,8 @@ object CarbonDataRDDFactory {
 throw new Exception("Exception in compaction " + 
exception.getMessage)
   }
 } finally {
-  executor.shutdownNow()
   try {
-compactor.deletePartialLoadsInCompaction()

Review comment:
   > We cannot remove the clean stale files in case of IUD and wait for 
clean files command to clean them, we should immediately clean the stale ones 
in the respective command itself, as there will be chances of extra data or 
data inconsistency. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-4028) Fail to unlock during update

2020-10-11 Thread Xingjun Hao (Jira)
Xingjun Hao created CARBONDATA-4028:
---

 Summary: Fail to unlock during update
 Key: CARBONDATA-4028
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4028
 Project: CarbonData
  Issue Type: Bug
Reporter: Xingjun Hao


In the update flow, we unpresist {{dataset before unlocking. unlock will fail 
once the dataset unpresist is interrupted.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3935: [CARBONDATA-3993] Remove auto data deletion in IUD processs

2020-10-11 Thread GitBox


akashrn5 commented on a change in pull request #3935:
URL: https://github.com/apache/carbondata/pull/3935#discussion_r503036710



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##
@@ -267,9 +266,8 @@ object CarbonDataRDDFactory {
 throw new Exception("Exception in compaction " + 
exception.getMessage)
   }
 } finally {
-  executor.shutdownNow()
   try {
-compactor.deletePartialLoadsInCompaction()

Review comment:
   @QiangCai how its handled now, without list files? why cant we do list 
files with the timestamp filter, which is load timestamp/fact timestamp, we can 
get from load model or somewhere right?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3950: [CARBONDATA-3889] Enable scalastyle check for all scala test code

2020-10-11 Thread GitBox


CarbonDataQA1 commented on pull request #3950:
URL: https://github.com/apache/carbondata/pull/3950#issuecomment-706854702


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4364/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on pull request #3966: [CARBONDATA-4023] Create MV failed on table with geospatial index using carbonsession.

2020-10-11 Thread GitBox


Indhumathi27 commented on pull request #3966:
URL: https://github.com/apache/carbondata/pull/3966#issuecomment-706852014


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3950: [CARBONDATA-3889] Enable scalastyle check for all scala test code

2020-10-11 Thread GitBox


CarbonDataQA1 commented on pull request #3950:
URL: https://github.com/apache/carbondata/pull/3950#issuecomment-706851626


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2614/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure commented on a change in pull request #3976: [CARBONDATA-4026] Fix Thread leakage while Loading

2020-10-11 Thread GitBox


marchpure commented on a change in pull request #3976:
URL: https://github.com/apache/carbondata/pull/3976#discussion_r503031223



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/StageInputCollector.java
##
@@ -65,8 +65,15 @@
 collectStageFiles(table, hadoopConf, stageInputFiles, successFiles);
 if (stageInputFiles.size() > 0) {
   int numThreads = Math.min(Math.max(stageInputFiles.size(), 1), 10);
-  ExecutorService executorService = 
Executors.newFixedThreadPool(numThreads);
-  return createInputSplits(executorService, stageInputFiles);
+  ExecutorService executorService = null;
+  try {
+executorService = Executors.newFixedThreadPool(numThreads);

Review comment:
   I have modified code according to your suggestion

##
File path: 
integration/spark/src/main/scala/org/apache/spark/rdd/CarbonMergeFilesRDD.scala
##
@@ -158,20 +158,27 @@ object CarbonMergeFilesRDD {
   // remove all tmp folder of index files
   val startDelete = System.currentTimeMillis()
   val numThreads = Math.min(Math.max(partitionInfo.size(), 1), 10)
-  val executorService = Executors.newFixedThreadPool(numThreads)
-  val carbonSessionInfo = ThreadLocalSessionInfo.getCarbonSessionInfo
-  partitionInfo
-.asScala
-.map { partitionPath =>
-  executorService.submit(new Runnable {
-override def run(): Unit = {
-  ThreadLocalSessionInfo.setCarbonSessionInfo(carbonSessionInfo)
-  FileFactory.deleteAllCarbonFilesOfDir(
-FileFactory.getCarbonFile(partitionPath + "/" + 
tempFolderPath))
-}
-  })
+  var executorService: ExecutorService = null
+  try {
+executorService = Executors.newFixedThreadPool(numThreads)

Review comment:
   I have modified code according to your suggestion

##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala
##
@@ -123,6 +123,9 @@ object IndexServer extends ServerInterface {
 t
   }
 })
+indexServerExecutorService.get.shutdown()

Review comment:
   I have modified code according to your suggestion





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3976: [CARBONDATA-4026] Fix Thread leakage while Loading

2020-10-11 Thread GitBox


CarbonDataQA1 commented on pull request #3976:
URL: https://github.com/apache/carbondata/pull/3976#issuecomment-706849697


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2611/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3976: [CARBONDATA-4026] Fix Thread leakage while Loading

2020-10-11 Thread GitBox


CarbonDataQA1 commented on pull request #3976:
URL: https://github.com/apache/carbondata/pull/3976#issuecomment-706847834


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4361/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on pull request #3959: [CARBONDATA-4010] Doc changes for long strings.

2020-10-11 Thread GitBox


Indhumathi27 commented on pull request #3959:
URL: https://github.com/apache/carbondata/pull/3959#issuecomment-706845951


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

2020-10-11 Thread GitBox


CarbonDataQA1 commented on pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#issuecomment-706845337


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4360/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

2020-10-11 Thread GitBox


CarbonDataQA1 commented on pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#issuecomment-706844880


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2610/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #3935: [CARBONDATA-3993] Remove auto data deletion in IUD processs

2020-10-11 Thread GitBox


QiangCai commented on a change in pull request #3935:
URL: https://github.com/apache/carbondata/pull/3935#discussion_r503018114



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##
@@ -267,9 +266,8 @@ object CarbonDataRDDFactory {
 throw new Exception("Exception in compaction " + 
exception.getMessage)
   }
 } finally {
-  executor.shutdownNow()
   try {
-compactor.deletePartialLoadsInCompaction()

Review comment:
   a) root cause of stale files:  It uses listFiles to collect index files 
when writing the segment file, so it will add stale index file names into 
segment file. 
 it is hard to add a unique id, so we change the plan to avoid to listFiles 
during wirte segment file.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on pull request #3972: [WIP]Launch same number of task as select query for insert into select and ctas cases when target table is of no_sort

2020-10-11 Thread GitBox


QiangCai commented on pull request #3972:
URL: https://github.com/apache/carbondata/pull/3972#issuecomment-706830943


   How about load data of no_sort?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3948: [WIP] Analyze random 11 testcase failure in CI

2020-10-11 Thread GitBox


ajantha-bhat commented on pull request #3948:
URL: https://github.com/apache/carbondata/pull/3948#issuecomment-706829982


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure commented on pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

2020-10-11 Thread GitBox


marchpure commented on pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977#issuecomment-706815843


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #3976: [CARBONDATA-4026] Fix Thread leakage while Loading

2020-10-11 Thread GitBox


QiangCai commented on a change in pull request #3976:
URL: https://github.com/apache/carbondata/pull/3976#discussion_r503000562



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala
##
@@ -123,6 +123,9 @@ object IndexServer extends ServerInterface {
 t
   }
 })
+indexServerExecutorService.get.shutdown()

Review comment:
   It will not accept more tasks.

##
File path: 
integration/spark/src/main/scala/org/apache/spark/rdd/CarbonMergeFilesRDD.scala
##
@@ -158,20 +158,27 @@ object CarbonMergeFilesRDD {
   // remove all tmp folder of index files
   val startDelete = System.currentTimeMillis()
   val numThreads = Math.min(Math.max(partitionInfo.size(), 1), 10)
-  val executorService = Executors.newFixedThreadPool(numThreads)
-  val carbonSessionInfo = ThreadLocalSessionInfo.getCarbonSessionInfo
-  partitionInfo
-.asScala
-.map { partitionPath =>
-  executorService.submit(new Runnable {
-override def run(): Unit = {
-  ThreadLocalSessionInfo.setCarbonSessionInfo(carbonSessionInfo)
-  FileFactory.deleteAllCarbonFilesOfDir(
-FileFactory.getCarbonFile(partitionPath + "/" + 
tempFolderPath))
-}
-  })
+  var executorService: ExecutorService = null
+  try {
+executorService = Executors.newFixedThreadPool(numThreads)

Review comment:
   move line 163 to line 161

##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/StageInputCollector.java
##
@@ -65,8 +65,15 @@
 collectStageFiles(table, hadoopConf, stageInputFiles, successFiles);
 if (stageInputFiles.size() > 0) {
   int numThreads = Math.min(Math.max(stageInputFiles.size(), 1), 10);
-  ExecutorService executorService = 
Executors.newFixedThreadPool(numThreads);
-  return createInputSplits(executorService, stageInputFiles);
+  ExecutorService executorService = null;
+  try {
+executorService = Executors.newFixedThreadPool(numThreads);

Review comment:
   move line 70 to line 68





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Kejian-Li commented on a change in pull request #3976: [CARBONDATA-4026] Fix Thread leakage while Loading

2020-10-11 Thread GitBox


Kejian-Li commented on a change in pull request #3976:
URL: https://github.com/apache/carbondata/pull/3976#discussion_r503000913



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/StageInputCollector.java
##
@@ -65,8 +65,15 @@
 collectStageFiles(table, hadoopConf, stageInputFiles, successFiles);
 if (stageInputFiles.size() > 0) {
   int numThreads = Math.min(Math.max(stageInputFiles.size(), 1), 10);
-  ExecutorService executorService = 
Executors.newFixedThreadPool(numThreads);
-  return createInputSplits(executorService, stageInputFiles);
+  ExecutorService executorService = null;
+  try {
+executorService = Executors.newFixedThreadPool(numThreads);
+return createInputSplits(executorService, stageInputFiles);
+  } finally {
+if (executorService != null && !executorService.isShutdown()) {
+  executorService.shutdownNow();
+}
+  }

Review comment:
   executorService has already been shut down, and then pass it in 
createInputSplits, is that okay?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure opened a new pull request #3977: [CARBONDATA-4027] Fix the wrong modifiedtime of loading files in inse…

2020-10-11 Thread GitBox


marchpure opened a new pull request #3977:
URL: https://github.com/apache/carbondata/pull/3977


   …rt stage
   
### Why is this PR needed?
In the insertstage flow, there is a empty file with suffix '.loading' to 
mark the stage in the status of 'in processing'. We update the modifiedtime of 
'.loading' file for monitoring the insertstage start time, which can be used 
for calculate TIMEOUT, help to retry and recovery.
   Before, we use setModifiedTime function to update the modifiedtime, which 
has a serious bug.
   For S3 file, setModifiedTime operation do not take effect. leading to the 
incorrect inserstage starttime of 'loading' file.

### What changes were proposed in this PR?
   Update the modifiedtime of loading files based on recreating files.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Kejian-Li commented on pull request #3976: [CARBONDATA-4026] Fix Thread leakage while Loading

2020-10-11 Thread GitBox


Kejian-Li commented on pull request #3976:
URL: https://github.com/apache/carbondata/pull/3976#issuecomment-706813764


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-4027) Fix the wrong modifiedtime of loading files in insert stage

2020-10-11 Thread Xingjun Hao (Jira)
Xingjun Hao created CARBONDATA-4027:
---

 Summary: Fix the wrong modifiedtime of loading files in insert 
stage
 Key: CARBONDATA-4027
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4027
 Project: CarbonData
  Issue Type: Bug
Reporter: Xingjun Hao


In the insertstage flow, there is a empty file with suffix '.loading' to mark 
the stage in the status of 'in processing'. We update the modifiedtime of 
'.loading' file for monitoring the insertstage start time, which can be used 
for calculate TIMEOUT, help to retry and recovery.

Before, we use setModifiedTime function to update the modifiedtime, which has a 
serious bug.

For S3 file, setModifiedTime operation do not take effect. leading to the 
incorrect inserstage starttime of 'loading' file.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] Kejian-Li commented on a change in pull request #3976: [CARBONDATA-4026] Fix Thread leakage while Loading

2020-10-11 Thread GitBox


Kejian-Li commented on a change in pull request #3976:
URL: https://github.com/apache/carbondata/pull/3976#discussion_r503000913



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/StageInputCollector.java
##
@@ -65,8 +65,15 @@
 collectStageFiles(table, hadoopConf, stageInputFiles, successFiles);
 if (stageInputFiles.size() > 0) {
   int numThreads = Math.min(Math.max(stageInputFiles.size(), 1), 10);
-  ExecutorService executorService = 
Executors.newFixedThreadPool(numThreads);
-  return createInputSplits(executorService, stageInputFiles);
+  ExecutorService executorService = null;
+  try {
+executorService = Executors.newFixedThreadPool(numThreads);
+return createInputSplits(executorService, stageInputFiles);
+  } finally {
+if (executorService != null && !executorService.isShutdown()) {
+  executorService.shutdownNow();
+}
+  }

Review comment:
   executorService has already been shut down, and then pass it in 
createInputSplits, is that okay?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Kejian-Li commented on a change in pull request #3976: [CARBONDATA-4026] Fix Thread leakage while Loading

2020-10-11 Thread GitBox


Kejian-Li commented on a change in pull request #3976:
URL: https://github.com/apache/carbondata/pull/3976#discussion_r503000769



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/StageInputCollector.java
##
@@ -65,8 +65,15 @@
 collectStageFiles(table, hadoopConf, stageInputFiles, successFiles);
 if (stageInputFiles.size() > 0) {
   int numThreads = Math.min(Math.max(stageInputFiles.size(), 1), 10);
-  ExecutorService executorService = 
Executors.newFixedThreadPool(numThreads);
-  return createInputSplits(executorService, stageInputFiles);
+  ExecutorService executorService = null;
+  try {
+executorService = Executors.newFixedThreadPool(numThreads);
+return createInputSplits(executorService, stageInputFiles);
+  } finally {
+if (executorService != null && !executorService.isShutdown()) {
+  executorService.shutdownNow();
+}
+  }

Review comment:
   executorService has already been shut down, and then pass it in 
createInputSplits, is that okay?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3917: [CARBONDATA-3978] Clean files refactor and added support for a trash folder where all the carbondata files will be copied to after

2020-10-11 Thread GitBox


CarbonDataQA1 commented on pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#issuecomment-706777285







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3976: [CARBONDATA-4026] Fix Thread leakage while Loading

2020-10-11 Thread GitBox


CarbonDataQA1 commented on pull request #3976:
URL: https://github.com/apache/carbondata/pull/3976#issuecomment-706767341


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4356/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3939: [CARBONDATA-3991]Fix the set modified time function on S3 and Alluxio…

2020-10-11 Thread GitBox


CarbonDataQA1 commented on pull request #3939:
URL: https://github.com/apache/carbondata/pull/3939#issuecomment-706767318


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2605/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3976: [CARBONDATA-4026] Fix Thread leakage while Loading

2020-10-11 Thread GitBox


CarbonDataQA1 commented on pull request #3976:
URL: https://github.com/apache/carbondata/pull/3976#issuecomment-706766379


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2606/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3972: [WIP]Launch same number of task as select query for insert into select and ctas cases when target table is of no_sort

2020-10-11 Thread GitBox


CarbonDataQA1 commented on pull request #3972:
URL: https://github.com/apache/carbondata/pull/3972#issuecomment-706765963


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4354/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3972: [WIP]Launch same number of task as select query for insert into select and ctas cases when target table is of no_sort

2020-10-11 Thread GitBox


CarbonDataQA1 commented on pull request #3972:
URL: https://github.com/apache/carbondata/pull/3972#issuecomment-706765787


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2604/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3939: [CARBONDATA-3991]Fix the set modified time function on S3 and Alluxio…

2020-10-11 Thread GitBox


CarbonDataQA1 commented on pull request #3939:
URL: https://github.com/apache/carbondata/pull/3939#issuecomment-706764153


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4355/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure opened a new pull request #3976: [CARBONDATA-4026] Fix Thread leakage while Loading

2020-10-11 Thread GitBox


marchpure opened a new pull request #3976:
URL: https://github.com/apache/carbondata/pull/3976


   
### Why is this PR needed?
A few code of Inserting/Loading/InsertStage/IndexServer won't shutdown 
executorservice. leads to thread leakage which will degrade the performance of 
the driver and executor.

### What changes were proposed in this PR?
   Shutdown executorservices as soon as finish using them.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-4026) Thread leakage while Loading

2020-10-11 Thread Xingjun Hao (Jira)
Xingjun Hao created CARBONDATA-4026:
---

 Summary: Thread leakage while Loading
 Key: CARBONDATA-4026
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4026
 Project: CarbonData
  Issue Type: Bug
  Components: spark-integration
Affects Versions: 2.0.1
Reporter: Xingjun Hao
 Fix For: 2.1.0


A few code of Inserting/Loading/InsertStage/IndexServer won't shutdown 
executorservice. leads to thread leakage which will degrade the performance of 
the driver and executor. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] marchpure commented on pull request #3939: [CARBONDATA-3991]Fix the set modified time function on S3 and Alluxio…

2020-10-11 Thread GitBox


marchpure commented on pull request #3939:
URL: https://github.com/apache/carbondata/pull/3939#issuecomment-706748773


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on pull request #3972: [WIP]Launch same number of task as select query for insert into select and ctas cases when target table is of no_sort

2020-10-11 Thread GitBox


VenuReddy2103 commented on pull request #3972:
URL: https://github.com/apache/carbondata/pull/3972#issuecomment-706748561


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (CARBONDATA-3830) Presto read support for complex columns

2020-10-11 Thread Akshay (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshay updated CARBONDATA-3830:
---
Attachment: Presto Read Support.pdf

> Presto read support for complex columns
> ---
>
> Key: CARBONDATA-3830
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3830
> Project: CarbonData
>  Issue Type: New Feature
>  Components: core, presto-integration
>Reporter: Akshay
>Assignee: Ajantha Bhat
>Priority: Minor
> Fix For: 2.1.0
>
> Attachments: Presto Read Support.pdf
>
>  Time Spent: 33h 40m
>  Remaining Estimate: 0h
>
> This feature is to enable Presto to read complex columns from carbon file.
> Complex columns include - array, map and struct.
> This design document handles only for array and struct type.
> Map type will be handled later.
>  
> PR - [https://github.com/apache/carbondata/pull/3887]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (CARBONDATA-3850) Presto read support for Array datatype

2020-10-11 Thread Akshay (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshay closed CARBONDATA-3850.
--
Resolution: Fixed

> Presto read support for Array datatype
> --
>
> Key: CARBONDATA-3850
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3850
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: core, presto-integration
>Reporter: Akshay
>Priority: Minor
> Attachments: Presto Read Support.pdf
>
>
> Handles both single-level and multi-level reading of array data type form 
> Presto.
> Attached is the design doc for the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3830) Presto read support for complex columns

2020-10-11 Thread Akshay (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshay updated CARBONDATA-3830:
---
Attachment: (was: Presto Read Support.pdf)

> Presto read support for complex columns
> ---
>
> Key: CARBONDATA-3830
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3830
> Project: CarbonData
>  Issue Type: New Feature
>  Components: core, presto-integration
>Reporter: Akshay
>Assignee: Ajantha Bhat
>Priority: Minor
> Fix For: 2.1.0
>
>  Time Spent: 33h 40m
>  Remaining Estimate: 0h
>
> This feature is to enable Presto to read complex columns from carbon file.
> Complex columns include - array, map and struct.
> This design document handles only for array and struct type.
> Map type will be handled later.
>  
> PR - [https://github.com/apache/carbondata/pull/3887]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3972: [WIP]Launch same number of task as select query for insert into select and ctas cases when target table is of no_sort

2020-10-11 Thread GitBox


CarbonDataQA1 commented on pull request #3972:
URL: https://github.com/apache/carbondata/pull/3972#issuecomment-706705226


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4353/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3972: [WIP]Launch same number of task as select query for insert into select and ctas cases when target table is of no_sort

2020-10-11 Thread GitBox


CarbonDataQA1 commented on pull request #3972:
URL: https://github.com/apache/carbondata/pull/3972#issuecomment-706705055


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2603/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org