[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3799: [CARBONDATA-3862] Insert stage performance optimazation

2020-06-22 Thread GitBox


CarbonDataQA1 commented on pull request #3799:
URL: https://github.com/apache/carbondata/pull/3799#issuecomment-647923116


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3194/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #3793: [CARBONDATA-3858] Check CDC deltafiles count in the testcase

2020-06-22 Thread GitBox


akashrn5 commented on pull request #3793:
URL: https://github.com/apache/carbondata/pull/3793#issuecomment-647900779


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3795: [CARBONDATA-3860] Fix IndexServer keeps loading some segments index repeatly

2020-06-22 Thread GitBox


CarbonDataQA1 commented on pull request #3795:
URL: https://github.com/apache/carbondata/pull/3795#issuecomment-647875150


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3193/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3795: [CARBONDATA-3860] Fix IndexServer keeps loading some segments index repeatly

2020-06-22 Thread GitBox


CarbonDataQA1 commented on pull request #3795:
URL: https://github.com/apache/carbondata/pull/3795#issuecomment-647874762


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1467/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] xubo245 commented on a change in pull request #3795: [CARBONDATA-3860] Fix IndexServer keeps loading some segments index repeatly

2020-06-22 Thread GitBox


xubo245 commented on a change in pull request #3795:
URL: https://github.com/apache/carbondata/pull/3795#discussion_r443917288



##
File path: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletIndexFactory.java
##
@@ -352,9 +352,12 @@ private void modifyColumnSchemaForSortColumn(ColumnSchema 
columnSchema, boolean
   throws IOException {
 SegmentBlockIndexInfo segmentBlockIndexInfo = 
segmentMap.get(segment.getSegmentNo());
 Set tableBlockIndexUniqueIdentifiers = 
null;
-if (null != segmentBlockIndexInfo && null != 
segmentBlockIndexInfo.getSegmentMetaDataInfo()) {
-  segment.setSegmentMetaDataInfo(
-  segmentMap.get(segment.getSegmentNo()).getSegmentMetaDataInfo());
+if (null != segmentBlockIndexInfo
+&& 
segmentBlockIndexInfo.getTableBlockIndexUniqueIdentifiers().size() > 0) {

Review comment:
   Suggestion:use CollectionUtils.isNotEmpty() to judge 
segmentBlockIndexInfo.getTableBlockIndexUniqueIdentifiers(), isNotEmpty include 
judge null





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] xubo245 commented on a change in pull request #3795: [CARBONDATA-3860] Fix IndexServer keeps loading some segments index repeatly

2020-06-22 Thread GitBox


xubo245 commented on a change in pull request #3795:
URL: https://github.com/apache/carbondata/pull/3795#discussion_r443917288



##
File path: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletIndexFactory.java
##
@@ -352,9 +352,12 @@ private void modifyColumnSchemaForSortColumn(ColumnSchema 
columnSchema, boolean
   throws IOException {
 SegmentBlockIndexInfo segmentBlockIndexInfo = 
segmentMap.get(segment.getSegmentNo());
 Set tableBlockIndexUniqueIdentifiers = 
null;
-if (null != segmentBlockIndexInfo && null != 
segmentBlockIndexInfo.getSegmentMetaDataInfo()) {
-  segment.setSegmentMetaDataInfo(
-  segmentMap.get(segment.getSegmentNo()).getSegmentMetaDataInfo());
+if (null != segmentBlockIndexInfo
+&& 
segmentBlockIndexInfo.getTableBlockIndexUniqueIdentifiers().size() > 0) {

Review comment:
   
建议用CollectionUtils.isNotEmpty()来判断segmentBlockIndexInfo.getTableBlockIndexUniqueIdentifiers(),
 这个多了null的判断





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3800: [WIP] Lock to read tablestatus

2020-06-22 Thread GitBox


CarbonDataQA1 commented on pull request #3800:
URL: https://github.com/apache/carbondata/pull/3800#issuecomment-647689995


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1466/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3800: [WIP] Lock to read tablestatus

2020-06-22 Thread GitBox


CarbonDataQA1 commented on pull request #3800:
URL: https://github.com/apache/carbondata/pull/3800#issuecomment-647689389


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3192/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (CARBONDATA-3865) Implement delete and update feature in carbondata SDK.

2020-06-22 Thread Karanpreet Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karanpreet Singh updated CARBONDATA-3865:
-
Attachment: (was: Implement delete and update feature in carbondata 
SDK.pdf)

> Implement delete and update feature in carbondata SDK.
> --
>
> Key: CARBONDATA-3865
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3865
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Karanpreet Singh
>Priority: Major
> Attachments: Implement delete and update feature in carbondata SDK.pdf
>
>
> Please find the design document attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3865) Implement delete and update feature in carbondata SDK.

2020-06-22 Thread Karanpreet Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karanpreet Singh updated CARBONDATA-3865:
-
Attachment: Implement delete and update feature in carbondata SDK.pdf

> Implement delete and update feature in carbondata SDK.
> --
>
> Key: CARBONDATA-3865
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3865
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Karanpreet Singh
>Priority: Major
> Attachments: Implement delete and update feature in carbondata SDK.pdf
>
>
> Please find the design document attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3865) Implement delete and update feature in carbondata SDK.

2020-06-22 Thread Karanpreet Singh (Jira)
Karanpreet Singh created CARBONDATA-3865:


 Summary: Implement delete and update feature in carbondata SDK.
 Key: CARBONDATA-3865
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3865
 Project: CarbonData
  Issue Type: New Feature
Reporter: Karanpreet Singh
 Attachments: Implement delete and update feature in carbondata SDK.pdf

Please find the design document attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (CARBONDATA-3857) Implement delete and update feature in carbondata SDK.

2020-06-22 Thread Karanpreet Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karanpreet Singh closed CARBONDATA-3857.

Resolution: Invalid

> Implement delete and update feature in carbondata SDK.
> --
>
> Key: CARBONDATA-3857
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3857
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Karanpreet Singh
>Priority: Major
> Attachments: Implement delete and update feature in carbondata SDK.pdf
>
>
> Please find the design document attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3799: [CARBONDATA-3862] Insert stage performance optimazation

2020-06-22 Thread GitBox


CarbonDataQA1 commented on pull request #3799:
URL: https://github.com/apache/carbondata/pull/3799#issuecomment-647492250


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1465/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3799: [CARBONDATA-3862] Insert stage performance optimazation

2020-06-22 Thread GitBox


CarbonDataQA1 commented on pull request #3799:
URL: https://github.com/apache/carbondata/pull/3799#issuecomment-647491662


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3191/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3864) Store Size Optimization

2020-06-22 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3864:


 Summary: Store Size Optimization
 Key: CARBONDATA-3864
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3864
 Project: CarbonData
  Issue Type: Improvement
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] Indhumathi27 closed pull request #3789: [WIP] Store Size Optimization

2020-06-22 Thread GitBox


Indhumathi27 closed pull request #3789:
URL: https://github.com/apache/carbondata/pull/3789


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on pull request #3795: [CARBONDATA-3860] Fix IndexServer keeps loading some segments index repeatly

2020-06-22 Thread GitBox


Indhumathi27 commented on pull request #3795:
URL: https://github.com/apache/carbondata/pull/3795#issuecomment-647458097


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3795: [CARBONDATA-3860] Fix IndexServer keeps loading some segments index repeatly

2020-06-22 Thread GitBox


CarbonDataQA1 commented on pull request #3795:
URL: https://github.com/apache/carbondata/pull/3795#issuecomment-647457682


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3189/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3795: [CARBONDATA-3860] Fix IndexServer keeps loading some segments index repeatly

2020-06-22 Thread GitBox


CarbonDataQA1 commented on pull request #3795:
URL: https://github.com/apache/carbondata/pull/3795#issuecomment-647456554


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1463/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3799: [CARBONDATA-3862] Insert stage performance optimazation

2020-06-22 Thread GitBox


CarbonDataQA1 commented on pull request #3799:
URL: https://github.com/apache/carbondata/pull/3799#issuecomment-647418962


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3190/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3799: [CARBONDATA-3862] Insert stage performance optimazation

2020-06-22 Thread GitBox


CarbonDataQA1 commented on pull request #3799:
URL: https://github.com/apache/carbondata/pull/3799#issuecomment-647418441


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1464/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure commented on a change in pull request #3799: [CARBONDATA-3862] Insert stage performance optimazation

2020-06-22 Thread GitBox


marchpure commented on a change in pull request #3799:
URL: https://github.com/apache/carbondata/pull/3799#discussion_r443452016



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertFromStageCommand.scala
##
@@ -477,6 +487,48 @@ case class CarbonInsertFromStageCommand(
 output.asScala
   }
 
+  /**
+   * create '.loading' file to tag the stage in process
+   * Return false means the stage files were creat successfully
+   * While return true means the stage files were failed to create
+   */
+  private def createStageLoadingFiles(
+  executorService: ExecutorService,
+  stageFiles: Array[(CarbonFile, CarbonFile)]): Array[(CarbonFile, 
CarbonFile)] = {
+stageFiles.map { files =>
+  executorService.submit(new Callable[Boolean] {
+override def call(): Boolean = {
+  val stageLoadingFile =
+FileFactory.getCarbonFile(files._1.getAbsolutePath +
+  CarbonTablePath.LOADING_FILE_SUBFIX);
+  if (!stageLoadingFile.exists()) {
+stageLoadingFile.createNewFile();
+  } else {
+stageLoadingFile.setLastModifiedTime(System.currentTimeMillis());
+  }
+}
+  })
+}.filter { future =>
+  future.get()
+}
+stageFiles
+  }
+
+  /**
+   * create '.loading' file with retry
+   */
+  private def createStageLoadingFilesWithRetry(
+  executorService: ExecutorService,
+  stageFiles: Array[(CarbonFile, CarbonFile)]): Unit = {
+val startTime = System.currentTimeMillis()
+var retry = CarbonInsertFromStageCommand.DELETE_FILES_RETRY_TIMES
+while (createStageLoadingFiles(executorService, stageFiles).length > 0 && 
retry > 0) {

Review comment:
   checked. it shall loop continue.  
createStageLoadingFiles(executorService, stageFiles).length is equal to the 
stages fails to tag 'loading'. if length >0, we shall loop continue and retry 
to tag 'loading' again.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure commented on a change in pull request #3799: [CARBONDATA-3862] Insert stage performance optimazation

2020-06-22 Thread GitBox


marchpure commented on a change in pull request #3799:
URL: https://github.com/apache/carbondata/pull/3799#discussion_r443451334



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertFromStageCommand.scala
##
@@ -148,10 +149,19 @@ case class CarbonInsertFromStageCommand(
 return Seq.empty
   }
 
-  // 2) read all stage files to collect input files for data loading
-  // create a thread pool to read them
+  // We add a tag 'loading' to the stages in process.
+  // different insertstage processes can load different data separately
+  // by choose the stages without 'loading' tag or stages loaded timeout.
+  // which avoid loading the same data between concurrent insertstage 
processes.
+  // The 'loading' tag is actually an empty file with
+  // '.loading' suffix filename
   val numThreads = Math.min(Math.max(stageFiles.length, 1), 10)
   val executorService = Executors.newFixedThreadPool(numThreads)
+  createStageLoadingFilesWithRetry(executorService, stageFiles)
+  lock.unlock()

Review comment:
   it can't be removed, as we aim to release ingest lock once complete tag 
'loading' for the choosed stage.

##
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##
@@ -1521,6 +1521,10 @@ private CarbonCommonConstants() {
 
   public static final String CARBON_QUERY_STAGE_INPUT_DEFAULT = "false";
 
+  public static final String CARBON_INSERT_STAGE_TIMEOUT = 
"carbon.insert.stage.timeout";
+
+  public static final long CARBON_INSERT_STAGE_TIMEOUT_DEFAULT = 2880;

Review comment:
   modified





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure commented on a change in pull request #3799: [CARBONDATA-3862] Insert stage performance optimazation

2020-06-22 Thread GitBox


marchpure commented on a change in pull request #3799:
URL: https://github.com/apache/carbondata/pull/3799#discussion_r443451443



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertFromStageCommand.scala
##
@@ -477,6 +487,48 @@ case class CarbonInsertFromStageCommand(
 output.asScala
   }
 
+  /**
+   * create '.loading' file to tag the stage in process
+   * Return false means the stage files were creat successfully
+   * While return true means the stage files were failed to create
+   */
+  private def createStageLoadingFiles(
+  executorService: ExecutorService,
+  stageFiles: Array[(CarbonFile, CarbonFile)]): Array[(CarbonFile, 
CarbonFile)] = {
+stageFiles.map { files =>
+  executorService.submit(new Callable[Boolean] {
+override def call(): Boolean = {
+  val stageLoadingFile =
+FileFactory.getCarbonFile(files._1.getAbsolutePath +
+  CarbonTablePath.LOADING_FILE_SUBFIX);
+  if (!stageLoadingFile.exists()) {

Review comment:
   modified





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure commented on a change in pull request #3799: [CARBONDATA-3862] Insert stage performance optimazation

2020-06-22 Thread GitBox


marchpure commented on a change in pull request #3799:
URL: https://github.com/apache/carbondata/pull/3799#discussion_r443450793



##
File path: 
core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##
@@ -1903,6 +1903,30 @@ public static Long getInputMetricsInterval() {
 }
   }
 
+  /**
+   * Validate and get the input metrics interval
+   *
+   * @return input metrics interval
+   */
+  public static Long getInsertStageTimeout() {
+String timeout = CarbonProperties.getInstance()
+.getProperty(CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT);
+if (timeout == null) {
+  return CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT_DEFAULT;
+} else {
+  try {
+long configuredValue = Long.parseLong(timeout);
+if (configuredValue < 0) {
+  return CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT_DEFAULT;
+} else {
+  return configuredValue;
+}
+  } catch (Exception ex) {

Review comment:
   modified

##
File path: 
core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##
@@ -1903,6 +1903,30 @@ public static Long getInputMetricsInterval() {
 }
   }
 
+  /**
+   * Validate and get the input metrics interval
+   *
+   * @return input metrics interval
+   */
+  public static Long getInsertStageTimeout() {
+String timeout = CarbonProperties.getInstance()
+.getProperty(CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT);
+if (timeout == null) {
+  return CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT_DEFAULT;
+} else {
+  try {
+long configuredValue = Long.parseLong(timeout);
+if (configuredValue < 0) {
+  return CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT_DEFAULT;

Review comment:
   modified





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3793: [CARBONDATA-3858] Check CDC deltafiles count in the testcase

2020-06-22 Thread GitBox


CarbonDataQA1 commented on pull request #3793:
URL: https://github.com/apache/carbondata/pull/3793#issuecomment-647407337


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1462/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3793: [CARBONDATA-3858] Check CDC deltafiles count in the testcase

2020-06-22 Thread GitBox


CarbonDataQA1 commented on pull request #3793:
URL: https://github.com/apache/carbondata/pull/3793#issuecomment-647406601


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3188/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on pull request #3795: [CARBONDATA-3860] Fix IndexServer keeps loading some segments index repeatly

2020-06-22 Thread GitBox


Indhumathi27 commented on pull request #3795:
URL: https://github.com/apache/carbondata/pull/3795#issuecomment-647390472


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] niuge01 commented on a change in pull request #3799: [CARBONDATA-3862] Insert stage performance optimazation

2020-06-22 Thread GitBox


niuge01 commented on a change in pull request #3799:
URL: https://github.com/apache/carbondata/pull/3799#discussion_r443412925



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertFromStageCommand.scala
##
@@ -477,6 +487,48 @@ case class CarbonInsertFromStageCommand(
 output.asScala
   }
 
+  /**
+   * create '.loading' file to tag the stage in process
+   * Return false means the stage files were creat successfully
+   * While return true means the stage files were failed to create
+   */
+  private def createStageLoadingFiles(
+  executorService: ExecutorService,
+  stageFiles: Array[(CarbonFile, CarbonFile)]): Array[(CarbonFile, 
CarbonFile)] = {
+stageFiles.map { files =>
+  executorService.submit(new Callable[Boolean] {
+override def call(): Boolean = {
+  val stageLoadingFile =
+FileFactory.getCarbonFile(files._1.getAbsolutePath +
+  CarbonTablePath.LOADING_FILE_SUBFIX);
+  if (!stageLoadingFile.exists()) {
+stageLoadingFile.createNewFile();
+  } else {
+stageLoadingFile.setLastModifiedTime(System.currentTimeMillis());
+  }
+}
+  })
+}.filter { future =>
+  future.get()
+}
+stageFiles
+  }
+
+  /**
+   * create '.loading' file with retry
+   */
+  private def createStageLoadingFilesWithRetry(
+  executorService: ExecutorService,
+  stageFiles: Array[(CarbonFile, CarbonFile)]): Unit = {
+val startTime = System.currentTimeMillis()
+var retry = CarbonInsertFromStageCommand.DELETE_FILES_RETRY_TIMES
+while (createStageLoadingFiles(executorService, stageFiles).length > 0 && 
retry > 0) {

Review comment:
   Please check this loop condition, if 
createStageLoadingFiles(executorService, stageFiles).length > 0, should loop 
continue?

##
File path: 
core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##
@@ -1903,6 +1903,30 @@ public static Long getInputMetricsInterval() {
 }
   }
 
+  /**
+   * Validate and get the input metrics interval
+   *
+   * @return input metrics interval
+   */
+  public static Long getInsertStageTimeout() {
+String timeout = CarbonProperties.getInstance()
+.getProperty(CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT);
+if (timeout == null) {
+  return CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT_DEFAULT;
+} else {
+  try {
+long configuredValue = Long.parseLong(timeout);
+if (configuredValue < 0) {
+  return CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT_DEFAULT;

Review comment:
   Log a warning for illegal configuration value

##
File path: 
core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##
@@ -1903,6 +1903,30 @@ public static Long getInputMetricsInterval() {
 }
   }
 
+  /**
+   * Validate and get the input metrics interval
+   *
+   * @return input metrics interval
+   */
+  public static Long getInsertStageTimeout() {
+String timeout = CarbonProperties.getInstance()
+.getProperty(CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT);
+if (timeout == null) {
+  return CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT_DEFAULT;
+} else {
+  try {
+long configuredValue = Long.parseLong(timeout);
+if (configuredValue < 0) {
+  return CarbonCommonConstants.CARBON_INSERT_STAGE_TIMEOUT_DEFAULT;
+} else {
+  return configuredValue;
+}
+  } catch (Exception ex) {

Review comment:
   Catch NumberFormatException。
   Log a warning for exception.

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonInsertFromStageCommand.scala
##
@@ -148,10 +149,19 @@ case class CarbonInsertFromStageCommand(
 return Seq.empty
   }
 
-  // 2) read all stage files to collect input files for data loading
-  // create a thread pool to read them
+  // We add a tag 'loading' to the stages in process.
+  // different insertstage processes can load different data separately
+  // by choose the stages without 'loading' tag or stages loaded timeout.
+  // which avoid loading the same data between concurrent insertstage 
processes.
+  // The 'loading' tag is actually an empty file with
+  // '.loading' suffix filename
   val numThreads = Math.min(Math.max(stageFiles.length, 1), 10)
   val executorService = Executors.newFixedThreadPool(numThreads)
+  createStageLoadingFilesWithRetry(executorService, stageFiles)
+  lock.unlock()

Review comment:
   remove this line, lock will unlock in finally block.

##
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##
@@ -1521,6 +1521,10 @@ private CarbonCommonConstants() {
 
   public static final String 

[GitHub] [carbondata] niuge01 closed pull request #3797: [WIP] Support show segment information

2020-06-22 Thread GitBox


niuge01 closed pull request #3797:
URL: https://github.com/apache/carbondata/pull/3797


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (CARBONDATA-3858) Check CDC deltafiles count in the testcase

2020-06-22 Thread Xingjun Hao (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xingjun Hao updated CARBONDATA-3858:

Description: Current there is no deltafiles count check in the testcase, 
which shall be supplemented.  (was: In the CDC flow. the parallelism of 
deltafiles processing is the same as executor number, which reduce the 
parallelism heavily. The insufficient parallelism limits CPU overhead, hampers 
CDC's performance.)
Summary: Check CDC deltafiles count in the testcase  (was: Increase the 
parallelism of CDC intermediate files processing)

> Check CDC deltafiles count in the testcase
> --
>
> Key: CARBONDATA-3858
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3858
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Xingjun Hao
>Priority: Minor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Current there is no deltafiles count check in the testcase, which shall be 
> supplemented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3793: [CARBONDATA-3858] Increase the parallelism of CDC intermediate files processing

2020-06-22 Thread GitBox


akashrn5 commented on a change in pull request #3793:
URL: https://github.com/apache/carbondata/pull/3793#discussion_r443361855



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/merge/CarbonMergeDataSetCommand.scala
##
@@ -269,11 +271,10 @@ case class CarbonMergeDataSetCommand(
   new SparkCarbonFileFormat().prepareWrite(sparkSession, job,
 Map(), schema)
 val config = SparkSQLUtil.broadCastHadoopConf(sparkSession.sparkContext, 
job.getConfiguration)
-
(frame.rdd.coalesce(DistributionUtil.getConfiguredExecutors(sparkSession.sparkContext)).
-  mapPartitionsWithIndex { case (index, iter) =>
+(frame.rdd.mapPartitionsWithIndex { case (index, iter) =>
 CarbonProperties.getInstance().addProperty(CarbonLoadOptionConstants
   .ENABLE_CARBON_LOAD_DIRECT_WRITE_TO_STORE_PATH, "true")
-val confB = config.value.value
+val confB = new Configuration(config.value.value)

Review comment:
   i think adding new conf for it is not correct we need to analyze 
properly, may be you can revert these changes and we can handle during other 
cdc  optimizations





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org