[jira] [Commented] (HUDI-2366) fix hudi generating too many logs

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405000#comment-17405000
 ] 

ASF GitHub Bot commented on HUDI-2366:
--

hudi-bot edited a comment on pull request #3543:
URL: https://github.com/apache/hudi/pull/3543#issuecomment-906068990


   
   ## CI report:
   
   * 95b6526c86d0428b80c9525b94f3bceb1d883bda Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1901)
 
   * e35de35adcb9a28db97935f938b277511b17accd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1910)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> fix hudi generating too many logs
> -
>
> Key: HUDI-2366
> URL: https://issues.apache.org/jira/browse/HUDI-2366
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: WangZhongze
>Assignee: WangZhongze
>Priority: Major
>  Labels: pull-request-available
>
> In AbstractTableFileSystemView.isFileSliceAfterPendingCompaction, 
> compactionWithInstantTime of FileSlice will be print in the log output. but 
> in general, FileSlice is not in compaction state, resulting in a null value 
> in the log output. And if there are many FIleslices, a large number of logs 
> will be output in a short time, which can reach more than 90% of the total 
> log, which seriously affects viewing other log information
> Advice: delete this log



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2365) Optimizing overwriteField method with Objects.equals

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405001#comment-17405001
 ] 

ASF GitHub Bot commented on HUDI-2365:
--

hudi-bot edited a comment on pull request #3542:
URL: https://github.com/apache/hudi/pull/3542#issuecomment-906008628


   
   ## CI report:
   
   * c5288975f7f3a2a04cc07bc236a3849342e99520 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1905)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Optimizing overwriteField method with Objects.equals
> 
>
> Key: HUDI-2365
> URL: https://issues.apache.org/jira/browse/HUDI-2365
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Optimizing overwriteField method with Objects.equals



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3542: [HUDI-2365]Optimizing overwriteField method with Objects.equals

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3542:
URL: https://github.com/apache/hudi/pull/3542#issuecomment-906008628


   
   ## CI report:
   
   * c5288975f7f3a2a04cc07bc236a3849342e99520 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1905)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3543: [HUDI-2366] fix too many logs

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3543:
URL: https://github.com/apache/hudi/pull/3543#issuecomment-906068990


   
   ## CI report:
   
   * 95b6526c86d0428b80c9525b94f3bceb1d883bda Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1901)
 
   * e35de35adcb9a28db97935f938b277511b17accd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1910)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2366) fix hudi generating too many logs

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404998#comment-17404998
 ] 

ASF GitHub Bot commented on HUDI-2366:
--

hudi-bot edited a comment on pull request #3543:
URL: https://github.com/apache/hudi/pull/3543#issuecomment-906068990


   
   ## CI report:
   
   * 95b6526c86d0428b80c9525b94f3bceb1d883bda Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1901)
 
   * e35de35adcb9a28db97935f938b277511b17accd UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> fix hudi generating too many logs
> -
>
> Key: HUDI-2366
> URL: https://issues.apache.org/jira/browse/HUDI-2366
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: WangZhongze
>Assignee: WangZhongze
>Priority: Major
>  Labels: pull-request-available
>
> In AbstractTableFileSystemView.isFileSliceAfterPendingCompaction, 
> compactionWithInstantTime of FileSlice will be print in the log output. but 
> in general, FileSlice is not in compaction state, resulting in a null value 
> in the log output. And if there are many FIleslices, a large number of logs 
> will be output in a short time, which can reach more than 90% of the total 
> log, which seriously affects viewing other log information
> Advice: delete this log



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3543: [HUDI-2366] fix too many logs

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3543:
URL: https://github.com/apache/hudi/pull/3543#issuecomment-906068990


   
   ## CI report:
   
   * 95b6526c86d0428b80c9525b94f3bceb1d883bda Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1901)
 
   * e35de35adcb9a28db97935f938b277511b17accd UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2355) after clustering with archive meet data incorrect

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404996#comment-17404996
 ] 

ASF GitHub Bot commented on HUDI-2355:
--

hudi-bot edited a comment on pull request #3545:
URL: https://github.com/apache/hudi/pull/3545#issuecomment-906138986


   
   ## CI report:
   
   * f7c1e9665c61ac30016c6519ce32945cc27b2a79 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1909)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> after clustering with archive  meet data incorrect
> --
>
> Key: HUDI-2355
> URL: https://issues.apache.org/jira/browse/HUDI-2355
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>  Labels: pull-request-available
>
> after  [https://github.com/apache/hudi/pull/3310]  replace data file clean in 
> clean. but if replacecommit file deleted , in clean can not read the 
> datafile. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3545: [HUDI-2355]Archive service executed after cleaner finished.

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3545:
URL: https://github.com/apache/hudi/pull/3545#issuecomment-906138986


   
   ## CI report:
   
   * f7c1e9665c61ac30016c6519ce32945cc27b2a79 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1909)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2355) after clustering with archive meet data incorrect

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404993#comment-17404993
 ] 

ASF GitHub Bot commented on HUDI-2355:
--

hudi-bot commented on pull request #3545:
URL: https://github.com/apache/hudi/pull/3545#issuecomment-906138986


   
   ## CI report:
   
   * f7c1e9665c61ac30016c6519ce32945cc27b2a79 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> after clustering with archive  meet data incorrect
> --
>
> Key: HUDI-2355
> URL: https://issues.apache.org/jira/browse/HUDI-2355
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>  Labels: pull-request-available
>
> after  [https://github.com/apache/hudi/pull/3310]  replace data file clean in 
> clean. but if replacecommit file deleted , in clean can not read the 
> datafile. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot commented on pull request #3545: [HUDI-2355]Archive service executed after cleaner finished.

2021-08-25 Thread GitBox


hudi-bot commented on pull request #3545:
URL: https://github.com/apache/hudi/pull/3545#issuecomment-906138986


   
   ## CI report:
   
   * f7c1e9665c61ac30016c6519ce32945cc27b2a79 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (HUDI-2355) after clustering with archive meet data incorrect

2021-08-25 Thread Yue Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404991#comment-17404991
 ] 

Yue Zhang edited comment on HUDI-2355 at 8/26/21, 6:41 AM:
---

Actually, this problems does exist  based to current master branch that cleaner 
happened first then archival executed. 
{code:java}
protected void postCommit(HoodieTable table, HoodieCommitMetadata 
metadata, String instantTime, Option> extraMetadata) {
 try {
 // Delete the marker directory for the instant.
 WriteMarkersFactory.get(config.getMarkersType(), table, instantTime)
 .quietDeleteMarkerDir(context, config.getMarkersDeleteParallelism());
 // We cannot have unbounded commit files. Archive commits if we have to archive
 HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(config, 
table);
 archiveLog.archiveIfRequired(context);
 if (operationType != null && operationType != WriteOperationType.CLUSTER && 
operationType != WriteOperationType.COMPACT)
{ syncTableMetadata(); }
} catch (IOException ioe)
{ throw new HoodieIOException(ioe.getMessage(), ioe); }
finally
{ this.heartbeatClient.stop(instantTime); }
}
{code}
 

Even using async cleaner mode, the archival will not wait for async cleaner 
service finished and start to archive/delete commits.

 

Just raise a PR to fix this problem to adjust the execution order that execute 
cleaner first then archive


was (Author: zhangyue19921010):
Actually, this problems does exist  based to current master branch that cleaner 
happened first then archival executed. 

`protected void postCommit(HoodieTable table, HoodieCommitMetadata 
metadata, String instantTime, Option> extraMetadata) {
 try {
 // Delete the marker directory for the instant.
 WriteMarkersFactory.get(config.getMarkersType(), table, instantTime)
 .quietDeleteMarkerDir(context, config.getMarkersDeleteParallelism());
 // We cannot have unbounded commit files. Archive commits if we have to archive
 HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(config, 
table);
 archiveLog.archiveIfRequired(context);
 if (operationType != null && operationType != WriteOperationType.CLUSTER && 
operationType != WriteOperationType.COMPACT)

{ syncTableMetadata(); }

} catch (IOException ioe)

{ throw new HoodieIOException(ioe.getMessage(), ioe); }

finally

{ this.heartbeatClient.stop(instantTime); }

}

`

Even using async cleaner mode, the archival will not wait for async cleaner 
service finished and start to archive/delete commits.

 

Just raise a PR to fix this problem to adjust the execution order that execute 
cleaner first then archive

> after clustering with archive  meet data incorrect
> --
>
> Key: HUDI-2355
> URL: https://issues.apache.org/jira/browse/HUDI-2355
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>  Labels: pull-request-available
>
> after  [https://github.com/apache/hudi/pull/3310]  replace data file clean in 
> clean. but if replacecommit file deleted , in clean can not read the 
> datafile. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2355) after clustering with archive meet data incorrect

2021-08-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2355:
-
Labels: pull-request-available  (was: )

> after clustering with archive  meet data incorrect
> --
>
> Key: HUDI-2355
> URL: https://issues.apache.org/jira/browse/HUDI-2355
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>  Labels: pull-request-available
>
> after  [https://github.com/apache/hudi/pull/3310]  replace data file clean in 
> clean. but if replacecommit file deleted , in clean can not read the 
> datafile. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2355) after clustering with archive meet data incorrect

2021-08-25 Thread Yue Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404991#comment-17404991
 ] 

Yue Zhang commented on HUDI-2355:
-

Actually, this problems does exist  based to current master branch that cleaner 
happened first then archival executed. 
```

protected void postCommit(HoodieTable table, HoodieCommitMetadata 
metadata, String instantTime, Option> extraMetadata) {
 try {
 // Delete the marker directory for the instant.
 WriteMarkersFactory.get(config.getMarkersType(), table, instantTime)
 .quietDeleteMarkerDir(context, config.getMarkersDeleteParallelism());
 // We cannot have unbounded commit files. Archive commits if we have to archive
 HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(config, 
table);
 archiveLog.archiveIfRequired(context);
 if (operationType != null && operationType != WriteOperationType.CLUSTER && 
operationType != WriteOperationType.COMPACT) {
 syncTableMetadata();
 }
 } catch (IOException ioe) {
 throw new HoodieIOException(ioe.getMessage(), ioe);
 } finally {
 this.heartbeatClient.stop(instantTime);
 }
}


```

Even using async cleaner mode, the archival will not wait for async cleaner 
service finished and start to archive/delete commits.

 

Just raise a PR to fix this problem to adjust the execution order that execute 
cleaner first then archive

> after clustering with archive  meet data incorrect
> --
>
> Key: HUDI-2355
> URL: https://issues.apache.org/jira/browse/HUDI-2355
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>
> after  [https://github.com/apache/hudi/pull/3310]  replace data file clean in 
> clean. but if replacecommit file deleted , in clean can not read the 
> datafile. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-2355) after clustering with archive meet data incorrect

2021-08-25 Thread Yue Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404991#comment-17404991
 ] 

Yue Zhang edited comment on HUDI-2355 at 8/26/21, 6:40 AM:
---

Actually, this problems does exist  based to current master branch that cleaner 
happened first then archival executed. 

`protected void postCommit(HoodieTable table, HoodieCommitMetadata 
metadata, String instantTime, Option> extraMetadata) {
 try {
 // Delete the marker directory for the instant.
 WriteMarkersFactory.get(config.getMarkersType(), table, instantTime)
 .quietDeleteMarkerDir(context, config.getMarkersDeleteParallelism());
 // We cannot have unbounded commit files. Archive commits if we have to archive
 HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(config, 
table);
 archiveLog.archiveIfRequired(context);
 if (operationType != null && operationType != WriteOperationType.CLUSTER && 
operationType != WriteOperationType.COMPACT)

{ syncTableMetadata(); }

} catch (IOException ioe)

{ throw new HoodieIOException(ioe.getMessage(), ioe); }

finally

{ this.heartbeatClient.stop(instantTime); }

}

`

Even using async cleaner mode, the archival will not wait for async cleaner 
service finished and start to archive/delete commits.

 

Just raise a PR to fix this problem to adjust the execution order that execute 
cleaner first then archive


was (Author: zhangyue19921010):
Actually, this problems does exist  based to current master branch that cleaner 
happened first then archival executed. 
```

protected void postCommit(HoodieTable table, HoodieCommitMetadata 
metadata, String instantTime, Option> extraMetadata) {
 try {
 // Delete the marker directory for the instant.
 WriteMarkersFactory.get(config.getMarkersType(), table, instantTime)
 .quietDeleteMarkerDir(context, config.getMarkersDeleteParallelism());
 // We cannot have unbounded commit files. Archive commits if we have to archive
 HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(config, 
table);
 archiveLog.archiveIfRequired(context);
 if (operationType != null && operationType != WriteOperationType.CLUSTER && 
operationType != WriteOperationType.COMPACT) {
 syncTableMetadata();
 }
 } catch (IOException ioe) {
 throw new HoodieIOException(ioe.getMessage(), ioe);
 } finally {
 this.heartbeatClient.stop(instantTime);
 }
}


```

Even using async cleaner mode, the archival will not wait for async cleaner 
service finished and start to archive/delete commits.

 

Just raise a PR to fix this problem to adjust the execution order that execute 
cleaner first then archive

> after clustering with archive  meet data incorrect
> --
>
> Key: HUDI-2355
> URL: https://issues.apache.org/jira/browse/HUDI-2355
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>  Labels: pull-request-available
>
> after  [https://github.com/apache/hudi/pull/3310]  replace data file clean in 
> clean. but if replacecommit file deleted , in clean can not read the 
> datafile. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2355) after clustering with archive meet data incorrect

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404992#comment-17404992
 ] 

ASF GitHub Bot commented on HUDI-2355:
--

zhangyue19921010 opened a new pull request #3545:
URL: https://github.com/apache/hudi/pull/3545


   https://issues.apache.org/jira/browse/HUDI-2355
   ## What is the purpose of the pull request
   
   After  https://github.com/apache/hudi/pull/3310 merged, replaced data file 
is cleaned in cleaner service. But if replacecommit file deleted by archival, 
cleaner service can not delete the replaced data file anymore.
   
   ## Brief change log
   Just change the execution order between cleaner and archival services.
   
  -> cleaner service starts first then archival service works.
   
   For sync clean, we trigger clean first. After clean finished then trigger 
archival service
   For async clean, we will wait cleaner service finished in `postCommit` 
function and then trigger archival service.
   
   This change is tested on dev env.
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> after clustering with archive  meet data incorrect
> --
>
> Key: HUDI-2355
> URL: https://issues.apache.org/jira/browse/HUDI-2355
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>
> after  [https://github.com/apache/hudi/pull/3310]  replace data file clean in 
> clean. but if replacecommit file deleted , in clean can not read the 
> datafile. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] zhangyue19921010 opened a new pull request #3545: [HUDI-2355]Archive service executed after cleaner finished.

2021-08-25 Thread GitBox


zhangyue19921010 opened a new pull request #3545:
URL: https://github.com/apache/hudi/pull/3545


   https://issues.apache.org/jira/browse/HUDI-2355
   ## What is the purpose of the pull request
   
   After  https://github.com/apache/hudi/pull/3310 merged, replaced data file 
is cleaned in cleaner service. But if replacecommit file deleted by archival, 
cleaner service can not delete the replaced data file anymore.
   
   ## Brief change log
   Just change the execution order between cleaner and archival services.
   
  -> cleaner service starts first then archival service works.
   
   For sync clean, we trigger clean first. After clean finished then trigger 
archival service
   For async clean, we will wait cleaner service finished in `postCommit` 
function and then trigger archival service.
   
   This change is tested on dev env.
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2264) Refactor Spark datasource functional tests

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404989#comment-17404989
 ] 

ASF GitHub Bot commented on HUDI-2264:
--

hudi-bot edited a comment on pull request #3544:
URL: https://github.com/apache/hudi/pull/3544#issuecomment-906134959


   
   ## CI report:
   
   * 1a1f7c1a38e6181afbe315789b3f6e81584f75f6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1908)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Spark datasource functional tests
> --
>
> Key: HUDI-2264
> URL: https://issues.apache.org/jira/browse/HUDI-2264
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
>
> Changing and running a test like HoodieSparkSQLWriterSuite is a huge pain. 
> Modularize to reuse code as much as possible. common setup and tear down 
> methods. For HoodieSparkSqlWriter, TestCOWDatasource, TestMORDataSource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3544: [HUDI-2264] refactor HoodieSparkSqlWriterSuite

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3544:
URL: https://github.com/apache/hudi/pull/3544#issuecomment-906134959


   
   ## CI report:
   
   * 1a1f7c1a38e6181afbe315789b3f6e81584f75f6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1908)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2264) Refactor Spark datasource functional tests

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404987#comment-17404987
 ] 

ASF GitHub Bot commented on HUDI-2264:
--

hudi-bot commented on pull request #3544:
URL: https://github.com/apache/hudi/pull/3544#issuecomment-906134959


   
   ## CI report:
   
   * 1a1f7c1a38e6181afbe315789b3f6e81584f75f6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Spark datasource functional tests
> --
>
> Key: HUDI-2264
> URL: https://issues.apache.org/jira/browse/HUDI-2264
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
>
> Changing and running a test like HoodieSparkSQLWriterSuite is a huge pain. 
> Modularize to reuse code as much as possible. common setup and tear down 
> methods. For HoodieSparkSqlWriter, TestCOWDatasource, TestMORDataSource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot commented on pull request #3544: [HUDI-2264] refactor HoodieSparkSqlWriterSuite

2021-08-25 Thread GitBox


hudi-bot commented on pull request #3544:
URL: https://github.com/apache/hudi/pull/3544#issuecomment-906134959


   
   ## CI report:
   
   * 1a1f7c1a38e6181afbe315789b3f6e81584f75f6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2264) Refactor Spark datasource functional tests

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404985#comment-17404985
 ] 

ASF GitHub Bot commented on HUDI-2264:
--

data-storyteller closed pull request #3518:
URL: https://github.com/apache/hudi/pull/3518


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Spark datasource functional tests
> --
>
> Key: HUDI-2264
> URL: https://issues.apache.org/jira/browse/HUDI-2264
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
>
> Changing and running a test like HoodieSparkSQLWriterSuite is a huge pain. 
> Modularize to reuse code as much as possible. common setup and tear down 
> methods. For HoodieSparkSqlWriter, TestCOWDatasource, TestMORDataSource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2264) Refactor Spark datasource functional tests

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404986#comment-17404986
 ] 

ASF GitHub Bot commented on HUDI-2264:
--

data-storyteller commented on pull request #3518:
URL: https://github.com/apache/hudi/pull/3518#issuecomment-906133836


   @nsivabalan  Created new PR - https://github.com/apache/hudi/pull/3544/files


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Spark datasource functional tests
> --
>
> Key: HUDI-2264
> URL: https://issues.apache.org/jira/browse/HUDI-2264
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
>
> Changing and running a test like HoodieSparkSQLWriterSuite is a huge pain. 
> Modularize to reuse code as much as possible. common setup and tear down 
> methods. For HoodieSparkSqlWriter, TestCOWDatasource, TestMORDataSource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] data-storyteller commented on pull request #3518: [HUDI-2264] refactor HoodieSparkSqlWriterSuite

2021-08-25 Thread GitBox


data-storyteller commented on pull request #3518:
URL: https://github.com/apache/hudi/pull/3518#issuecomment-906133836


   @nsivabalan  Created new PR - https://github.com/apache/hudi/pull/3544/files


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] data-storyteller closed pull request #3518: [HUDI-2264] refactor HoodieSparkSqlWriterSuite

2021-08-25 Thread GitBox


data-storyteller closed pull request #3518:
URL: https://github.com/apache/hudi/pull/3518


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2264) Refactor Spark datasource functional tests

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404984#comment-17404984
 ] 

ASF GitHub Bot commented on HUDI-2264:
--

data-storyteller opened a new pull request #3544:
URL: https://github.com/apache/hudi/pull/3544


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   - Modularized to reuse code as much as possible. 
   - Added common setup and tear down methods. 
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [x] Has a corresponding JIRA in PR title & commit

- [x] Commit message is descriptive of the change

- [ ] CI is green
   
- [x] Necessary doc changes done or have another open PR
  
- [x] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Spark datasource functional tests
> --
>
> Key: HUDI-2264
> URL: https://issues.apache.org/jira/browse/HUDI-2264
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
>
> Changing and running a test like HoodieSparkSQLWriterSuite is a huge pain. 
> Modularize to reuse code as much as possible. common setup and tear down 
> methods. For HoodieSparkSqlWriter, TestCOWDatasource, TestMORDataSource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] data-storyteller opened a new pull request #3544: [HUDI-2264] refactor HoodieSparkSqlWriterSuite

2021-08-25 Thread GitBox


data-storyteller opened a new pull request #3544:
URL: https://github.com/apache/hudi/pull/3544


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   - Modularized to reuse code as much as possible. 
   - Added common setup and tear down methods. 
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [x] Has a corresponding JIRA in PR title & commit

- [x] Commit message is descriptive of the change

- [ ] CI is green
   
- [x] Necessary doc changes done or have another open PR
  
- [x] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2354) archive delete replacecommit, but stop timeline server meet file not found

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404980#comment-17404980
 ] 

ASF GitHub Bot commented on HUDI-2354:
--

hudi-bot edited a comment on pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#issuecomment-905325754


   
   ## CI report:
   
   * 92cdf3364c6c8f947f1a3bf5c5795a1b987ab83d Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1904)
 
   * 523afa82d084f878d1990d056c5beb1ef3417501 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1907)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> archive delete replacecommit, but stop timeline server meet file not found
> --
>
> Key: HUDI-2354
> URL: https://issues.apache.org/jira/browse/HUDI-2354
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>  Labels: pull-request-available
>
> 1、in spark writeclient postcommit will archive replacecommit which meet the 
> archive Requirement
> 21/08/23 14:57:12 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114552.commit
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.requested
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.inflight
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit
>  
> 2、if you start timelineservice, after sparksqlwrite post commit  it will stop 
> .  In HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106) need 
> to read replace instant metadata ,  but the replace instant file is delete , 
> but the timeline not update
>  
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:297)
> at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
> Caused by: org.apache.hudi.exception.HoodieIOException: Could not read commit 
> details from .hoodie/20210823114553.replacecommit
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:555)
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:219)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$resetFileGroupsReplaced$8(AbstractTableFileSystemView.java:217)
> at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:228)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.close(HoodieTableFileSystemView.java:353)
> at 
> java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)
> at 
> org.apache.hudi.common.table.view.FileSystemViewManager.close(FileSystemViewManager.java:118)
> at 
> org.apache.hudi.timeline.service.TimelineService.close(TimelineService.java:207)
> at 
> org.apache.hudi.client.embedded.EmbeddedTimelineService.stop(EmbeddedTimelineService.java:121)
> at 
> org.apache.hudi.client.AbstractHoodieClient.stopEmbeddedServerView(AbstractHoodieClient.

[GitHub] [hudi] hudi-bot edited a comment on pull request #3536: [HUDI-2354] Fix TimelineServer error because of replacecommit archive

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#issuecomment-905325754


   
   ## CI report:
   
   * 92cdf3364c6c8f947f1a3bf5c5795a1b987ab83d Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1904)
 
   * 523afa82d084f878d1990d056c5beb1ef3417501 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1907)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2364) Run compaction without user schema file provided

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404979#comment-17404979
 ] 

ASF GitHub Bot commented on HUDI-2364:
--

hudi-bot edited a comment on pull request #3540:
URL: https://github.com/apache/hudi/pull/3540#issuecomment-905980036


   
   ## CI report:
   
   * 01cf15ad7e1cfdca2b282236756e31e8338b5a09 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1897)
 
   * 0572b237073f4f08e9dd666cbe55608d82e24d82 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1906)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Run compaction without user schema file provided
> 
>
> Key: HUDI-2364
> URL: https://issues.apache.org/jira/browse/HUDI-2364
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Wenning Ding
>Priority: Major
>  Labels: pull-request-available
>
> Currently to run Hudi compaction manually, customers have to pass the avsc 
> file of data schema by themselves,
>  e.g. in Hudi CLI,
>  
> {{}}
> {code:java}
> compaction run --compactionInstant 20201203005420 \ --parallelism 2 
> --sparkMemory 2G \ --schemaFilePath s3://xxx/hudi/mor_schema.avsc \ 
> --propsFilePath file:///home/hadoop/config.properties --retry 1
> {code}
> Let customers provide avsc file is not a good option. Some customers don’t 
> know how to generate this schema file, and some customers pass the wrong 
> schema file and get other exceptions. We should handle this logic inside Hudi 
> if possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3540: [WIP][HUDI-2364] Run compaction without user schema file provided

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3540:
URL: https://github.com/apache/hudi/pull/3540#issuecomment-905980036


   
   ## CI report:
   
   * 01cf15ad7e1cfdca2b282236756e31e8338b5a09 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1897)
 
   * 0572b237073f4f08e9dd666cbe55608d82e24d82 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1906)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2364) Run compaction without user schema file provided

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404974#comment-17404974
 ] 

ASF GitHub Bot commented on HUDI-2364:
--

hudi-bot edited a comment on pull request #3540:
URL: https://github.com/apache/hudi/pull/3540#issuecomment-905980036


   
   ## CI report:
   
   * 01cf15ad7e1cfdca2b282236756e31e8338b5a09 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1897)
 
   * 0572b237073f4f08e9dd666cbe55608d82e24d82 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Run compaction without user schema file provided
> 
>
> Key: HUDI-2364
> URL: https://issues.apache.org/jira/browse/HUDI-2364
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Wenning Ding
>Priority: Major
>  Labels: pull-request-available
>
> Currently to run Hudi compaction manually, customers have to pass the avsc 
> file of data schema by themselves,
>  e.g. in Hudi CLI,
>  
> {{}}
> {code:java}
> compaction run --compactionInstant 20201203005420 \ --parallelism 2 
> --sparkMemory 2G \ --schemaFilePath s3://xxx/hudi/mor_schema.avsc \ 
> --propsFilePath file:///home/hadoop/config.properties --retry 1
> {code}
> Let customers provide avsc file is not a good option. Some customers don’t 
> know how to generate this schema file, and some customers pass the wrong 
> schema file and get other exceptions. We should handle this logic inside Hudi 
> if possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3540: [WIP][HUDI-2364] Run compaction without user schema file provided

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3540:
URL: https://github.com/apache/hudi/pull/3540#issuecomment-905980036


   
   ## CI report:
   
   * 01cf15ad7e1cfdca2b282236756e31e8338b5a09 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1897)
 
   * 0572b237073f4f08e9dd666cbe55608d82e24d82 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xer001 commented on issue #3191: [SUPPORT]client spark-submit cmd error:Caused by: java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.DataSourceUtils$.PARTITIONING

2021-08-25 Thread GitBox


xer001 commented on issue #3191:
URL: https://github.com/apache/hudi/issues/3191#issuecomment-906125473


   Spark on CDH 6.3.2 is spark2.4.0 and hadoop is 3.0.0。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2365) Optimizing overwriteField method with Objects.equals

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404971#comment-17404971
 ] 

ASF GitHub Bot commented on HUDI-2365:
--

hudi-bot edited a comment on pull request #3542:
URL: https://github.com/apache/hudi/pull/3542#issuecomment-906008628


   
   ## CI report:
   
   * 547b7c48a0866e10b6c0032d2e4948d71e208b42 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1898)
 
   * c5288975f7f3a2a04cc07bc236a3849342e99520 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1905)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Optimizing overwriteField method with Objects.equals
> 
>
> Key: HUDI-2365
> URL: https://issues.apache.org/jira/browse/HUDI-2365
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Optimizing overwriteField method with Objects.equals



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3542: [HUDI-2365]Optimizing overwriteField method with Objects.equals

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3542:
URL: https://github.com/apache/hudi/pull/3542#issuecomment-906008628


   
   ## CI report:
   
   * 547b7c48a0866e10b6c0032d2e4948d71e208b42 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1898)
 
   * c5288975f7f3a2a04cc07bc236a3849342e99520 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1905)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2365) Optimizing overwriteField method with Objects.equals

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404970#comment-17404970
 ] 

ASF GitHub Bot commented on HUDI-2365:
--

hudi-bot edited a comment on pull request #3542:
URL: https://github.com/apache/hudi/pull/3542#issuecomment-906008628


   
   ## CI report:
   
   * 547b7c48a0866e10b6c0032d2e4948d71e208b42 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1898)
 
   * c5288975f7f3a2a04cc07bc236a3849342e99520 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Optimizing overwriteField method with Objects.equals
> 
>
> Key: HUDI-2365
> URL: https://issues.apache.org/jira/browse/HUDI-2365
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Optimizing overwriteField method with Objects.equals



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3542: [HUDI-2365]Optimizing overwriteField method with Objects.equals

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3542:
URL: https://github.com/apache/hudi/pull/3542#issuecomment-906008628


   
   ## CI report:
   
   * 547b7c48a0866e10b6c0032d2e4948d71e208b42 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1898)
 
   * c5288975f7f3a2a04cc07bc236a3849342e99520 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2357) MERGE INTO doesn't work for tables created using CTAS

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404968#comment-17404968
 ] 

ASF GitHub Bot commented on HUDI-2357:
--

pengzhiwei2018 commented on a change in pull request #3534:
URL: https://github.com/apache/hudi/pull/3534#discussion_r696311825



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -193,15 +193,16 @@ object InsertIntoHoodieTableCommand extends Logging {
 s"[${insertPartitions.keys.mkString(" " )}]" +
 s" not equal to the defined partition in 
table[${table.partitionColumnNames.mkString(",")}]")
 }
-val parameters = withSparkConf(sparkSession, table.storage.properties)() 
++ extraOptions
+val options = table.storage.properties ++ extraOptions

Review comment:
   Yes, we have the UT for merge with a CTAS table. But this bug happen 
only for hive meta enabled because we have missed some hive table properties 
before this PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> MERGE INTO doesn't work for tables created using CTAS
> -
>
> Key: HUDI-2357
> URL: https://issues.apache.org/jira/browse/HUDI-2357
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Vinoth Govindarajan
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> MERGE INTO command doesn't select the correct primary key for tables created 
> using CTAS, whereas it works for tables created using CREATE TABLE command.
> I guess we are hitting this issue because the key generator class is set to 
> SqlKeyGenerator for tables created using CTAS:
> working use-case:
> {code:java}
> create table h5 (id bigint, name string, ts bigint) using hudi
> options (type = "cow" , primaryKey="id" , preCombineField="ts" );
> merge into h5 as t0
> using (
> select 5 as s_id, 'vinoth' as s_name, current_timestamp() as s_ts
> ) t1
> on t1.s_id = t0.id
> when matched then update set * 
> when not matched then insert *;
> {code}
> hoodie.properties for working use-case:
> {code:java}
> ➜  analytics.db git:(apache_hudi_support) cat h5/.hoodie/hoodie.properties
> #Properties saved on Wed Aug 25 04:10:33 UTC 2021
> #Wed Aug 25 04:10:33 UTC 2021
> hoodie.table.name=h5
> hoodie.table.recordkey.fields=id
> hoodie.table.type=COPY_ON_WRITE
> hoodie.table.precombine.field=ts
> hoodie.table.partition.fields=
> hoodie.archivelog.folder=archived
> hoodie.table.create.schema={"type"\:"record","name"\:"topLevelRecord","fields"\:[{"name"\:"_hoodie_commit_time","type"\:["string","null"]},{"name"\:"_hoodie_commit_seqno","type"\:["string","null"]},{"name"\:"_hoodie_record_key","type"\:["string","null"]},{"name"\:"_hoodie_partition_path","type"\:["string","null"]},{"name"\:"_hoodie_file_name","type"\:["string","null"]},{"name"\:"id","type"\:["long","null"]},{"name"\:"name","type"\:["string","null"]},{"name"\:"ts","type"\:["long","null"]}]}
> hoodie.timeline.layout.version=1
> hoodie.table.version=1{code}
>  
> Whereas this doesn't work:
> {code:java}
> create table h4 using hudi options (type = "cow" , primaryKey="id" , 
> preCombineField="ts" ) as select 5 as id, cast(rand() as string) as name, 
> current_timestamp();
> merge into h3 as t0u sing (select '5' as s_id, 'vinoth' as s_name, 
> current_timestamp() as s_ts) t1 on t1.s_id = t0.id when matched then update 
> set * when not matched then insert *;
> ERROR LOG
> 544702 [main] ERROR org.apache.spark.sql.hive.thriftserver.SparkSQLDriver  - 
> Failed in [merge into analytics.h3 as t0using (    select '5' as s_id, 
> 'vinoth' as s_name, current_timestamp() as s_ts) t1on t1.s_id = t0.idwhen 
> matched then update set *when not matched then insert 
> *]java.lang.IllegalArgumentException: Merge Key[id] is not Equal to the 
> defined primary key[] in table h3 at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.buildMergeIntoConfig(MergeIntoHoodieTableCommand.scala:425)
>  at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.run(MergeIntoHoodieTableCommand.scala:147)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scal

[jira] [Commented] (HUDI-2357) MERGE INTO doesn't work for tables created using CTAS

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404967#comment-17404967
 ] 

ASF GitHub Bot commented on HUDI-2357:
--

pengzhiwei2018 commented on a change in pull request #3534:
URL: https://github.com/apache/hudi/pull/3534#discussion_r696311825



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -193,15 +193,16 @@ object InsertIntoHoodieTableCommand extends Logging {
 s"[${insertPartitions.keys.mkString(" " )}]" +
 s" not equal to the defined partition in 
table[${table.partitionColumnNames.mkString(",")}]")
 }
-val parameters = withSparkConf(sparkSession, table.storage.properties)() 
++ extraOptions
+val options = table.storage.properties ++ extraOptions

Review comment:
   Yes, we have the UT for merge with a CTAS table. But this bug happen 
only for hive meta because we have missed some hive table properties before 
this PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> MERGE INTO doesn't work for tables created using CTAS
> -
>
> Key: HUDI-2357
> URL: https://issues.apache.org/jira/browse/HUDI-2357
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Vinoth Govindarajan
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> MERGE INTO command doesn't select the correct primary key for tables created 
> using CTAS, whereas it works for tables created using CREATE TABLE command.
> I guess we are hitting this issue because the key generator class is set to 
> SqlKeyGenerator for tables created using CTAS:
> working use-case:
> {code:java}
> create table h5 (id bigint, name string, ts bigint) using hudi
> options (type = "cow" , primaryKey="id" , preCombineField="ts" );
> merge into h5 as t0
> using (
> select 5 as s_id, 'vinoth' as s_name, current_timestamp() as s_ts
> ) t1
> on t1.s_id = t0.id
> when matched then update set * 
> when not matched then insert *;
> {code}
> hoodie.properties for working use-case:
> {code:java}
> ➜  analytics.db git:(apache_hudi_support) cat h5/.hoodie/hoodie.properties
> #Properties saved on Wed Aug 25 04:10:33 UTC 2021
> #Wed Aug 25 04:10:33 UTC 2021
> hoodie.table.name=h5
> hoodie.table.recordkey.fields=id
> hoodie.table.type=COPY_ON_WRITE
> hoodie.table.precombine.field=ts
> hoodie.table.partition.fields=
> hoodie.archivelog.folder=archived
> hoodie.table.create.schema={"type"\:"record","name"\:"topLevelRecord","fields"\:[{"name"\:"_hoodie_commit_time","type"\:["string","null"]},{"name"\:"_hoodie_commit_seqno","type"\:["string","null"]},{"name"\:"_hoodie_record_key","type"\:["string","null"]},{"name"\:"_hoodie_partition_path","type"\:["string","null"]},{"name"\:"_hoodie_file_name","type"\:["string","null"]},{"name"\:"id","type"\:["long","null"]},{"name"\:"name","type"\:["string","null"]},{"name"\:"ts","type"\:["long","null"]}]}
> hoodie.timeline.layout.version=1
> hoodie.table.version=1{code}
>  
> Whereas this doesn't work:
> {code:java}
> create table h4 using hudi options (type = "cow" , primaryKey="id" , 
> preCombineField="ts" ) as select 5 as id, cast(rand() as string) as name, 
> current_timestamp();
> merge into h3 as t0u sing (select '5' as s_id, 'vinoth' as s_name, 
> current_timestamp() as s_ts) t1 on t1.s_id = t0.id when matched then update 
> set * when not matched then insert *;
> ERROR LOG
> 544702 [main] ERROR org.apache.spark.sql.hive.thriftserver.SparkSQLDriver  - 
> Failed in [merge into analytics.h3 as t0using (    select '5' as s_id, 
> 'vinoth' as s_name, current_timestamp() as s_ts) t1on t1.s_id = t0.idwhen 
> matched then update set *when not matched then insert 
> *]java.lang.IllegalArgumentException: Merge Key[id] is not Equal to the 
> defined primary key[] in table h3 at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.buildMergeIntoConfig(MergeIntoHoodieTableCommand.scala:425)
>  at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.run(MergeIntoHoodieTableCommand.scala:147)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> 

[jira] [Commented] (HUDI-2357) MERGE INTO doesn't work for tables created using CTAS

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404966#comment-17404966
 ] 

ASF GitHub Bot commented on HUDI-2357:
--

pengzhiwei2018 commented on pull request #3534:
URL: https://github.com/apache/hudi/pull/3534#issuecomment-906117662


   > @pengzhiwei2018 - Thanks for the quick fix, I have tested the fix and it 
works for tables created using CTAS, Can we include this fix as part of 0.9.0 
release by any chance?
   
   well, maybe I need sync this to @vinothchandar if we can put this fix to the 
0.9.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> MERGE INTO doesn't work for tables created using CTAS
> -
>
> Key: HUDI-2357
> URL: https://issues.apache.org/jira/browse/HUDI-2357
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Vinoth Govindarajan
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> MERGE INTO command doesn't select the correct primary key for tables created 
> using CTAS, whereas it works for tables created using CREATE TABLE command.
> I guess we are hitting this issue because the key generator class is set to 
> SqlKeyGenerator for tables created using CTAS:
> working use-case:
> {code:java}
> create table h5 (id bigint, name string, ts bigint) using hudi
> options (type = "cow" , primaryKey="id" , preCombineField="ts" );
> merge into h5 as t0
> using (
> select 5 as s_id, 'vinoth' as s_name, current_timestamp() as s_ts
> ) t1
> on t1.s_id = t0.id
> when matched then update set * 
> when not matched then insert *;
> {code}
> hoodie.properties for working use-case:
> {code:java}
> ➜  analytics.db git:(apache_hudi_support) cat h5/.hoodie/hoodie.properties
> #Properties saved on Wed Aug 25 04:10:33 UTC 2021
> #Wed Aug 25 04:10:33 UTC 2021
> hoodie.table.name=h5
> hoodie.table.recordkey.fields=id
> hoodie.table.type=COPY_ON_WRITE
> hoodie.table.precombine.field=ts
> hoodie.table.partition.fields=
> hoodie.archivelog.folder=archived
> hoodie.table.create.schema={"type"\:"record","name"\:"topLevelRecord","fields"\:[{"name"\:"_hoodie_commit_time","type"\:["string","null"]},{"name"\:"_hoodie_commit_seqno","type"\:["string","null"]},{"name"\:"_hoodie_record_key","type"\:["string","null"]},{"name"\:"_hoodie_partition_path","type"\:["string","null"]},{"name"\:"_hoodie_file_name","type"\:["string","null"]},{"name"\:"id","type"\:["long","null"]},{"name"\:"name","type"\:["string","null"]},{"name"\:"ts","type"\:["long","null"]}]}
> hoodie.timeline.layout.version=1
> hoodie.table.version=1{code}
>  
> Whereas this doesn't work:
> {code:java}
> create table h4 using hudi options (type = "cow" , primaryKey="id" , 
> preCombineField="ts" ) as select 5 as id, cast(rand() as string) as name, 
> current_timestamp();
> merge into h3 as t0u sing (select '5' as s_id, 'vinoth' as s_name, 
> current_timestamp() as s_ts) t1 on t1.s_id = t0.id when matched then update 
> set * when not matched then insert *;
> ERROR LOG
> 544702 [main] ERROR org.apache.spark.sql.hive.thriftserver.SparkSQLDriver  - 
> Failed in [merge into analytics.h3 as t0using (    select '5' as s_id, 
> 'vinoth' as s_name, current_timestamp() as s_ts) t1on t1.s_id = t0.idwhen 
> matched then update set *when not matched then insert 
> *]java.lang.IllegalArgumentException: Merge Key[id] is not Equal to the 
> defined primary key[] in table h3 at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.buildMergeIntoConfig(MergeIntoHoodieTableCommand.scala:425)
>  at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.run(MergeIntoHoodieTableCommand.scala:147)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
>  at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229) at 
> org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3618) at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)

[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #3534: [HUDI-2357] MERGE INTO doesn't work for tables created using CTAS

2021-08-25 Thread GitBox


pengzhiwei2018 commented on a change in pull request #3534:
URL: https://github.com/apache/hudi/pull/3534#discussion_r696311825



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -193,15 +193,16 @@ object InsertIntoHoodieTableCommand extends Logging {
 s"[${insertPartitions.keys.mkString(" " )}]" +
 s" not equal to the defined partition in 
table[${table.partitionColumnNames.mkString(",")}]")
 }
-val parameters = withSparkConf(sparkSession, table.storage.properties)() 
++ extraOptions
+val options = table.storage.properties ++ extraOptions

Review comment:
   Yes, we have the UT for merge with a CTAS table. But this bug happen 
only for hive meta enabled because we have missed some hive table properties 
before this PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #3534: [HUDI-2357] MERGE INTO doesn't work for tables created using CTAS

2021-08-25 Thread GitBox


pengzhiwei2018 commented on a change in pull request #3534:
URL: https://github.com/apache/hudi/pull/3534#discussion_r696311825



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -193,15 +193,16 @@ object InsertIntoHoodieTableCommand extends Logging {
 s"[${insertPartitions.keys.mkString(" " )}]" +
 s" not equal to the defined partition in 
table[${table.partitionColumnNames.mkString(",")}]")
 }
-val parameters = withSparkConf(sparkSession, table.storage.properties)() 
++ extraOptions
+val options = table.storage.properties ++ extraOptions

Review comment:
   Yes, we have the UT for merge with a CTAS table. But this bug happen 
only for hive meta because we have missed some hive table properties before 
this PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pengzhiwei2018 commented on pull request #3534: [HUDI-2357] MERGE INTO doesn't work for tables created using CTAS

2021-08-25 Thread GitBox


pengzhiwei2018 commented on pull request #3534:
URL: https://github.com/apache/hudi/pull/3534#issuecomment-906117662


   > @pengzhiwei2018 - Thanks for the quick fix, I have tested the fix and it 
works for tables created using CTAS, Can we include this fix as part of 0.9.0 
release by any chance?
   
   well, maybe I need sync this to @vinothchandar if we can put this fix to the 
0.9.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2365) Optimizing overwriteField method with Objects.equals

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404964#comment-17404964
 ] 

ASF GitHub Bot commented on HUDI-2365:
--

dongkelun removed a comment on pull request #3542:
URL: https://github.com/apache/hudi/pull/3542#issuecomment-906056841






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Optimizing overwriteField method with Objects.equals
> 
>
> Key: HUDI-2365
> URL: https://issues.apache.org/jira/browse/HUDI-2365
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Optimizing overwriteField method with Objects.equals



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] dongkelun removed a comment on pull request #3542: [HUDI-2365]Optimizing overwriteField method with Objects.equals

2021-08-25 Thread GitBox


dongkelun removed a comment on pull request #3542:
URL: https://github.com/apache/hudi/pull/3542#issuecomment-906056841






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2357) MERGE INTO doesn't work for tables created using CTAS

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404961#comment-17404961
 ] 

ASF GitHub Bot commented on HUDI-2357:
--

pengzhiwei2018 commented on a change in pull request #3534:
URL: https://github.com/apache/hudi/pull/3534#discussion_r696311825



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -193,15 +193,16 @@ object InsertIntoHoodieTableCommand extends Logging {
 s"[${insertPartitions.keys.mkString(" " )}]" +
 s" not equal to the defined partition in 
table[${table.partitionColumnNames.mkString(",")}]")
 }
-val parameters = withSparkConf(sparkSession, table.storage.properties)() 
++ extraOptions
+val options = table.storage.properties ++ extraOptions

Review comment:
   Yes, we have the UT for merge with a CTAS table. But this bug happen 
only for hive meta client because we have missed some hive table properties 
before this PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> MERGE INTO doesn't work for tables created using CTAS
> -
>
> Key: HUDI-2357
> URL: https://issues.apache.org/jira/browse/HUDI-2357
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Vinoth Govindarajan
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> MERGE INTO command doesn't select the correct primary key for tables created 
> using CTAS, whereas it works for tables created using CREATE TABLE command.
> I guess we are hitting this issue because the key generator class is set to 
> SqlKeyGenerator for tables created using CTAS:
> working use-case:
> {code:java}
> create table h5 (id bigint, name string, ts bigint) using hudi
> options (type = "cow" , primaryKey="id" , preCombineField="ts" );
> merge into h5 as t0
> using (
> select 5 as s_id, 'vinoth' as s_name, current_timestamp() as s_ts
> ) t1
> on t1.s_id = t0.id
> when matched then update set * 
> when not matched then insert *;
> {code}
> hoodie.properties for working use-case:
> {code:java}
> ➜  analytics.db git:(apache_hudi_support) cat h5/.hoodie/hoodie.properties
> #Properties saved on Wed Aug 25 04:10:33 UTC 2021
> #Wed Aug 25 04:10:33 UTC 2021
> hoodie.table.name=h5
> hoodie.table.recordkey.fields=id
> hoodie.table.type=COPY_ON_WRITE
> hoodie.table.precombine.field=ts
> hoodie.table.partition.fields=
> hoodie.archivelog.folder=archived
> hoodie.table.create.schema={"type"\:"record","name"\:"topLevelRecord","fields"\:[{"name"\:"_hoodie_commit_time","type"\:["string","null"]},{"name"\:"_hoodie_commit_seqno","type"\:["string","null"]},{"name"\:"_hoodie_record_key","type"\:["string","null"]},{"name"\:"_hoodie_partition_path","type"\:["string","null"]},{"name"\:"_hoodie_file_name","type"\:["string","null"]},{"name"\:"id","type"\:["long","null"]},{"name"\:"name","type"\:["string","null"]},{"name"\:"ts","type"\:["long","null"]}]}
> hoodie.timeline.layout.version=1
> hoodie.table.version=1{code}
>  
> Whereas this doesn't work:
> {code:java}
> create table h4 using hudi options (type = "cow" , primaryKey="id" , 
> preCombineField="ts" ) as select 5 as id, cast(rand() as string) as name, 
> current_timestamp();
> merge into h3 as t0u sing (select '5' as s_id, 'vinoth' as s_name, 
> current_timestamp() as s_ts) t1 on t1.s_id = t0.id when matched then update 
> set * when not matched then insert *;
> ERROR LOG
> 544702 [main] ERROR org.apache.spark.sql.hive.thriftserver.SparkSQLDriver  - 
> Failed in [merge into analytics.h3 as t0using (    select '5' as s_id, 
> 'vinoth' as s_name, current_timestamp() as s_ts) t1on t1.s_id = t0.idwhen 
> matched then update set *when not matched then insert 
> *]java.lang.IllegalArgumentException: Merge Key[id] is not Equal to the 
> defined primary key[] in table h3 at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.buildMergeIntoConfig(MergeIntoHoodieTableCommand.scala:425)
>  at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.run(MergeIntoHoodieTableCommand.scala:147)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala

[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #3534: [HUDI-2357] MERGE INTO doesn't work for tables created using CTAS

2021-08-25 Thread GitBox


pengzhiwei2018 commented on a change in pull request #3534:
URL: https://github.com/apache/hudi/pull/3534#discussion_r696311825



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -193,15 +193,16 @@ object InsertIntoHoodieTableCommand extends Logging {
 s"[${insertPartitions.keys.mkString(" " )}]" +
 s" not equal to the defined partition in 
table[${table.partitionColumnNames.mkString(",")}]")
 }
-val parameters = withSparkConf(sparkSession, table.storage.properties)() 
++ extraOptions
+val options = table.storage.properties ++ extraOptions

Review comment:
   Yes, we have the UT for merge with a CTAS table. But this bug happen 
only for hive meta client because we have missed some hive table properties 
before this PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2341) Publish a blog on immutable data lakes

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404954#comment-17404954
 ] 

ASF GitHub Bot commented on HUDI-2341:
--

vingov commented on a change in pull request #3515:
URL: https://github.com/apache/hudi/pull/3515#discussion_r696306531



##
File path: website/blog/2021-08-20-immutable-data-lakes.md
##
@@ -0,0 +1,73 @@
+---
+title: "Immutable data lakes using Apache Hudi"
+excerpt: "How to leverage Apache Hudi for your immutable (or) append only data 
use-case"
+author: shivnarayan
+category: blog
+---
+
+Apache Hudi helps you build and manage data lakes with different table types, 
config knobs to cater to everyone's need.
+We strive to listen to community and build features based on the need. From 
our interactions with the community, we got 
+to know there are quite a few use-cases where Hudi is being used for immutable 
or append only data. This blog will go 
+over details on how to leverage Apache Hudi in building your data lake for 
such immutable or append only data.
+
+
+# Immutable data
+Often times, users route log entries to data lakes, where data is immutable. 
(Add some concrete 
+examples here). Data once ingested won't be updated and can only be deleted. 
Also, most likely, deletes are issued at 
+partition level (delete partitions older than 1 week) granularity.
+
+# Immutable data lakes using Apache Hudi 
+Hudi has an efficient way to ingest data into Hudi for such immutable 
use-cases. "Bulk_Insert" operation in Hudi is 
+commonly used for initial bootstrapping of data into hudi, but also exactly 
fits the bill for such immutable or append 
+only data. And it is known to be performant when compared to regular "insert"s 
or "upsert"s. 
+
+## Bulk_insert vs regular Inserts/Upserts
+With regular inserts and upserts, Hudi executes few steps before data can be 
written to data files. For example, 
+index lookup, small file handling, etc has to be performed before actual 
write. But with bulk_insert, such overhead can 
+be avoided since data is known to be immutable. 
+
+Here is an illustration of steps involved in different operations of interest. 
+
+![Inserts/Upserts](/assets/images/blog/immutable_datalakes/immutable_data_lakes1.jpeg)

Review comment:
   @nsivabalan - I noticed that almost every other image we have embedded 
in our site is a png image, I saw that the build failed again, can you try 
converting these images to png and give it a shot?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Publish a blog on immutable data lakes
> --
>
> Key: HUDI-2341
> URL: https://issues.apache.org/jira/browse/HUDI-2341
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2348) Publish a blog on schema evolution with KafkaAvroCustomDeserializer

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404955#comment-17404955
 ] 

ASF GitHub Bot commented on HUDI-2348:
--

sbernauer commented on pull request #3485:
URL: https://github.com/apache/hudi/pull/3485#issuecomment-906111077


   Thanks a lot @nsivabalan!
   Sorry for not responding, i did not get the local website build to work...
   I applied you patch to the PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Publish a blog on schema evolution with KafkaAvroCustomDeserializer
> ---
>
> Key: HUDI-2348
> URL: https://issues.apache.org/jira/browse/HUDI-2348
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Publish a blog on schema evolution with KafkaAvroCustomDeserializer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] sbernauer commented on pull request #3485: [HUDI-2348] Added blog on "Schema evolution with DeltaStreamer using KafkaSource"

2021-08-25 Thread GitBox


sbernauer commented on pull request #3485:
URL: https://github.com/apache/hudi/pull/3485#issuecomment-906111077


   Thanks a lot @nsivabalan!
   Sorry for not responding, i did not get the local website build to work...
   I applied you patch to the PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vingov commented on a change in pull request #3515: [HUDI-2341] Adding blog on immutable data lakes

2021-08-25 Thread GitBox


vingov commented on a change in pull request #3515:
URL: https://github.com/apache/hudi/pull/3515#discussion_r696306531



##
File path: website/blog/2021-08-20-immutable-data-lakes.md
##
@@ -0,0 +1,73 @@
+---
+title: "Immutable data lakes using Apache Hudi"
+excerpt: "How to leverage Apache Hudi for your immutable (or) append only data 
use-case"
+author: shivnarayan
+category: blog
+---
+
+Apache Hudi helps you build and manage data lakes with different table types, 
config knobs to cater to everyone's need.
+We strive to listen to community and build features based on the need. From 
our interactions with the community, we got 
+to know there are quite a few use-cases where Hudi is being used for immutable 
or append only data. This blog will go 
+over details on how to leverage Apache Hudi in building your data lake for 
such immutable or append only data.
+
+
+# Immutable data
+Often times, users route log entries to data lakes, where data is immutable. 
(Add some concrete 
+examples here). Data once ingested won't be updated and can only be deleted. 
Also, most likely, deletes are issued at 
+partition level (delete partitions older than 1 week) granularity.
+
+# Immutable data lakes using Apache Hudi 
+Hudi has an efficient way to ingest data into Hudi for such immutable 
use-cases. "Bulk_Insert" operation in Hudi is 
+commonly used for initial bootstrapping of data into hudi, but also exactly 
fits the bill for such immutable or append 
+only data. And it is known to be performant when compared to regular "insert"s 
or "upsert"s. 
+
+## Bulk_insert vs regular Inserts/Upserts
+With regular inserts and upserts, Hudi executes few steps before data can be 
written to data files. For example, 
+index lookup, small file handling, etc has to be performed before actual 
write. But with bulk_insert, such overhead can 
+be avoided since data is known to be immutable. 
+
+Here is an illustration of steps involved in different operations of interest. 
+
+![Inserts/Upserts](/assets/images/blog/immutable_datalakes/immutable_data_lakes1.jpeg)

Review comment:
   @nsivabalan - I noticed that almost every other image we have embedded 
in our site is a png image, I saw that the build failed again, can you try 
converting these images to png and give it a shot?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2354) archive delete replacecommit, but stop timeline server meet file not found

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404939#comment-17404939
 ] 

ASF GitHub Bot commented on HUDI-2354:
--

hudi-bot edited a comment on pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#issuecomment-905325754


   
   ## CI report:
   
   * 92cdf3364c6c8f947f1a3bf5c5795a1b987ab83d Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1904)
 
   * 523afa82d084f878d1990d056c5beb1ef3417501 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> archive delete replacecommit, but stop timeline server meet file not found
> --
>
> Key: HUDI-2354
> URL: https://issues.apache.org/jira/browse/HUDI-2354
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>  Labels: pull-request-available
>
> 1、in spark writeclient postcommit will archive replacecommit which meet the 
> archive Requirement
> 21/08/23 14:57:12 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114552.commit
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.requested
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.inflight
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit
>  
> 2、if you start timelineservice, after sparksqlwrite post commit  it will stop 
> .  In HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106) need 
> to read replace instant metadata ,  but the replace instant file is delete , 
> but the timeline not update
>  
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:297)
> at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
> Caused by: org.apache.hudi.exception.HoodieIOException: Could not read commit 
> details from .hoodie/20210823114553.replacecommit
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:555)
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:219)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$resetFileGroupsReplaced$8(AbstractTableFileSystemView.java:217)
> at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:228)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.close(HoodieTableFileSystemView.java:353)
> at 
> java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)
> at 
> org.apache.hudi.common.table.view.FileSystemViewManager.close(FileSystemViewManager.java:118)
> at 
> org.apache.hudi.timeline.service.TimelineService.close(TimelineService.java:207)
> at 
> org.apache.hudi.client.embedded.EmbeddedTimelineService.stop(EmbeddedTimelineService.java:121)
> at 
> org.apache.hudi.client.AbstractHoodieClient.stopEmbeddedServerView(AbstractHoodieClient.java:94)
> at 
> org.apache.hudi.client.AbstractHoodieClient.close(AbstractHoodieClient.java:86)
> at 
> org.apache.hud

[GitHub] [hudi] hudi-bot edited a comment on pull request #3536: [HUDI-2354] Fix TimelineServer error because of replacecommit archive

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#issuecomment-905325754


   
   ## CI report:
   
   * 92cdf3364c6c8f947f1a3bf5c5795a1b987ab83d Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1904)
 
   * 523afa82d084f878d1990d056c5beb1ef3417501 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2354) archive delete replacecommit, but stop timeline server meet file not found

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404937#comment-17404937
 ] 

ASF GitHub Bot commented on HUDI-2354:
--

hudi-bot edited a comment on pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#issuecomment-905325754


   
   ## CI report:
   
   * f8d71b7e83f8e9bc64ca0dfbe70e67a352fba82e Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1903)
 
   * 92cdf3364c6c8f947f1a3bf5c5795a1b987ab83d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1904)
 
   * 523afa82d084f878d1990d056c5beb1ef3417501 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> archive delete replacecommit, but stop timeline server meet file not found
> --
>
> Key: HUDI-2354
> URL: https://issues.apache.org/jira/browse/HUDI-2354
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>  Labels: pull-request-available
>
> 1、in spark writeclient postcommit will archive replacecommit which meet the 
> archive Requirement
> 21/08/23 14:57:12 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114552.commit
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.requested
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.inflight
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit
>  
> 2、if you start timelineservice, after sparksqlwrite post commit  it will stop 
> .  In HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106) need 
> to read replace instant metadata ,  but the replace instant file is delete , 
> but the timeline not update
>  
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:297)
> at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
> Caused by: org.apache.hudi.exception.HoodieIOException: Could not read commit 
> details from .hoodie/20210823114553.replacecommit
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:555)
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:219)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$resetFileGroupsReplaced$8(AbstractTableFileSystemView.java:217)
> at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:228)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.close(HoodieTableFileSystemView.java:353)
> at 
> java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)
> at 
> org.apache.hudi.common.table.view.FileSystemViewManager.close(FileSystemViewManager.java:118)
> at 
> org.apache.hudi.timeline.service.TimelineService.close(TimelineService.java:207)
> at 
> org.apache.hudi.client.embedded.EmbeddedTimelineService.stop(EmbeddedTimelineService.java:121)
> at 
> org.apache.hudi.client.AbstractHoo

[GitHub] [hudi] hudi-bot edited a comment on pull request #3536: [HUDI-2354] Fix TimelineServer error because of replacecommit archive

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#issuecomment-905325754


   
   ## CI report:
   
   * f8d71b7e83f8e9bc64ca0dfbe70e67a352fba82e Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1903)
 
   * 92cdf3364c6c8f947f1a3bf5c5795a1b987ab83d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1904)
 
   * 523afa82d084f878d1990d056c5beb1ef3417501 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2354) archive delete replacecommit, but stop timeline server meet file not found

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404932#comment-17404932
 ] 

ASF GitHub Bot commented on HUDI-2354:
--

hudi-bot edited a comment on pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#issuecomment-905325754


   
   ## CI report:
   
   * f8d71b7e83f8e9bc64ca0dfbe70e67a352fba82e Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1903)
 
   * 92cdf3364c6c8f947f1a3bf5c5795a1b987ab83d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1904)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> archive delete replacecommit, but stop timeline server meet file not found
> --
>
> Key: HUDI-2354
> URL: https://issues.apache.org/jira/browse/HUDI-2354
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>  Labels: pull-request-available
>
> 1、in spark writeclient postcommit will archive replacecommit which meet the 
> archive Requirement
> 21/08/23 14:57:12 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114552.commit
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.requested
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.inflight
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit
>  
> 2、if you start timelineservice, after sparksqlwrite post commit  it will stop 
> .  In HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106) need 
> to read replace instant metadata ,  but the replace instant file is delete , 
> but the timeline not update
>  
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:297)
> at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
> Caused by: org.apache.hudi.exception.HoodieIOException: Could not read commit 
> details from .hoodie/20210823114553.replacecommit
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:555)
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:219)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$resetFileGroupsReplaced$8(AbstractTableFileSystemView.java:217)
> at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:228)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.close(HoodieTableFileSystemView.java:353)
> at 
> java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)
> at 
> org.apache.hudi.common.table.view.FileSystemViewManager.close(FileSystemViewManager.java:118)
> at 
> org.apache.hudi.timeline.service.TimelineService.close(TimelineService.java:207)
> at 
> org.apache.hudi.client.embedded.EmbeddedTimelineService.stop(EmbeddedTimelineService.java:121)
> at 
> org.apache.hudi.client.AbstractHoodieClient.stopEmbeddedServerView(AbstractHoodieClient.

[GitHub] [hudi] hudi-bot edited a comment on pull request #3536: [HUDI-2354] Fix TimelineServer error because of replacecommit archive

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#issuecomment-905325754


   
   ## CI report:
   
   * f8d71b7e83f8e9bc64ca0dfbe70e67a352fba82e Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1903)
 
   * 92cdf3364c6c8f947f1a3bf5c5795a1b987ab83d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1904)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2321) Use the caller classloader for ReflectionUtils

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404918#comment-17404918
 ] 

ASF GitHub Bot commented on HUDI-2321:
--

hudi-bot edited a comment on pull request #3535:
URL: https://github.com/apache/hudi/pull/3535#issuecomment-905240762


   
   ## CI report:
   
   * 3349f0e888f939f1db9cb2b099a50a3bbd5e7362 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1902)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use the caller classloader for ReflectionUtils
> --
>
> Key: HUDI-2321
> URL: https://issues.apache.org/jira/browse/HUDI-2321
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: configs
>Reporter: Zheng yunhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Based on the discussion on stackoverflow: 
> https://stackoverflow.com/questions/1771679/difference-between-threads-context-class-loader-and-normal-classloader
> The {{Thread.currentThread().getContextClassLoader()}} should never be used 
> because the context classloader is not immutable, user can overwrite it when 
> thread switches, it is also nullable.
> The objection here: https://stackoverflow.com/a/36228195 says the the 
> {Thread.currentThread().getContextClassLoader()}} is a JDK design error and 
> the context classloader is never suggested to be used. The API that needs 
> classloader should ask the user to set up the right classloader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3535: [HUDI-2321] Use the caller classloader for ReflectionUtils

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3535:
URL: https://github.com/apache/hudi/pull/3535#issuecomment-905240762


   
   ## CI report:
   
   * 3349f0e888f939f1db9cb2b099a50a3bbd5e7362 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1902)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-52) Implement Savepoints for Merge On Read table #88

2021-08-25 Thread liyan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404912#comment-17404912
 ] 

liyan commented on HUDI-52:
---

Why is this not supported? What are the main problems and difficulties?

> Implement Savepoints for Merge On Read table #88
> 
>
> Key: HUDI-52
> URL: https://issues.apache.org/jira/browse/HUDI-52
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Storage Management, Writer Core
>Affects Versions: 0.9.0
>Reporter: Vinoth Chandar
>Assignee: liwei
>Priority: Major
>  Labels: help-requested, starter
> Fix For: 0.10.0
>
>
> https://github.com/uber/hudi/issues/88



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2367) Handle deletes in S3 deltastreamer stouce

2021-08-25 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-2367:
-

 Summary: Handle deletes in S3 deltastreamer stouce
 Key: HUDI-2367
 URL: https://issues.apache.org/jira/browse/HUDI-2367
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: Sagar Sumit


https://github.com/apache/hudi/pull/3433#pullrequestreview-728946157



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2365) Optimizing overwriteField method with Objects.equals

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404911#comment-17404911
 ] 

ASF GitHub Bot commented on HUDI-2365:
--

dongkelun commented on pull request #3542:
URL: https://github.com/apache/hudi/pull/3542#issuecomment-906088765


   @hudi-bot  run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Optimizing overwriteField method with Objects.equals
> 
>
> Key: HUDI-2365
> URL: https://issues.apache.org/jira/browse/HUDI-2365
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Optimizing overwriteField method with Objects.equals



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] dongkelun commented on pull request #3542: [HUDI-2365]Optimizing overwriteField method with Objects.equals

2021-08-25 Thread GitBox


dongkelun commented on pull request #3542:
URL: https://github.com/apache/hudi/pull/3542#issuecomment-906088765


   @hudi-bot  run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2366) fix hudi generating too many logs

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404910#comment-17404910
 ] 

ASF GitHub Bot commented on HUDI-2366:
--

hudi-bot edited a comment on pull request #3543:
URL: https://github.com/apache/hudi/pull/3543#issuecomment-906068990


   
   ## CI report:
   
   * 95b6526c86d0428b80c9525b94f3bceb1d883bda Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1901)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> fix hudi generating too many logs
> -
>
> Key: HUDI-2366
> URL: https://issues.apache.org/jira/browse/HUDI-2366
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: WangZhongze
>Assignee: WangZhongze
>Priority: Major
>  Labels: pull-request-available
>
> In AbstractTableFileSystemView.isFileSliceAfterPendingCompaction, 
> compactionWithInstantTime of FileSlice will be print in the log output. but 
> in general, FileSlice is not in compaction state, resulting in a null value 
> in the log output. And if there are many FIleslices, a large number of logs 
> will be output in a short time, which can reach more than 90% of the total 
> log, which seriously affects viewing other log information
> Advice: delete this log



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3543: [HUDI-2366] fix too many logs

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3543:
URL: https://github.com/apache/hudi/pull/3543#issuecomment-906068990


   
   ## CI report:
   
   * 95b6526c86d0428b80c9525b94f3bceb1d883bda Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1901)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2365) Optimizing overwriteField method with Objects.equals

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404909#comment-17404909
 ] 

ASF GitHub Bot commented on HUDI-2365:
--

dongkelun edited a comment on pull request #3542:
URL: https://github.com/apache/hudi/pull/3542#issuecomment-906056841


   @hudi-bot  run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Optimizing overwriteField method with Objects.equals
> 
>
> Key: HUDI-2365
> URL: https://issues.apache.org/jira/browse/HUDI-2365
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Optimizing overwriteField method with Objects.equals



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] dongkelun edited a comment on pull request #3542: [HUDI-2365]Optimizing overwriteField method with Objects.equals

2021-08-25 Thread GitBox


dongkelun edited a comment on pull request #3542:
URL: https://github.com/apache/hudi/pull/3542#issuecomment-906056841


   @hudi-bot  run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2354) archive delete replacecommit, but stop timeline server meet file not found

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404908#comment-17404908
 ] 

ASF GitHub Bot commented on HUDI-2354:
--

hudi-bot edited a comment on pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#issuecomment-905325754


   
   ## CI report:
   
   * f8d71b7e83f8e9bc64ca0dfbe70e67a352fba82e Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1903)
 
   * 92cdf3364c6c8f947f1a3bf5c5795a1b987ab83d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> archive delete replacecommit, but stop timeline server meet file not found
> --
>
> Key: HUDI-2354
> URL: https://issues.apache.org/jira/browse/HUDI-2354
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>  Labels: pull-request-available
>
> 1、in spark writeclient postcommit will archive replacecommit which meet the 
> archive Requirement
> 21/08/23 14:57:12 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114552.commit
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.requested
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.inflight
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit
>  
> 2、if you start timelineservice, after sparksqlwrite post commit  it will stop 
> .  In HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106) need 
> to read replace instant metadata ,  but the replace instant file is delete , 
> but the timeline not update
>  
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:297)
> at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
> Caused by: org.apache.hudi.exception.HoodieIOException: Could not read commit 
> details from .hoodie/20210823114553.replacecommit
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:555)
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:219)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$resetFileGroupsReplaced$8(AbstractTableFileSystemView.java:217)
> at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:228)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.close(HoodieTableFileSystemView.java:353)
> at 
> java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)
> at 
> org.apache.hudi.common.table.view.FileSystemViewManager.close(FileSystemViewManager.java:118)
> at 
> org.apache.hudi.timeline.service.TimelineService.close(TimelineService.java:207)
> at 
> org.apache.hudi.client.embedded.EmbeddedTimelineService.stop(EmbeddedTimelineService.java:121)
> at 
> org.apache.hudi.client.AbstractHoodieClient.stopEmbeddedServerView(AbstractHoodieClient.java:94)
> at 
> org.apache.hudi.client.AbstractHoodieClient.close(AbstractHoodieClient.java:86)
> at 
> org.apache.hud

[GitHub] [hudi] hudi-bot edited a comment on pull request #3536: [HUDI-2354] Fix TimelineServer error because of replacecommit archive

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#issuecomment-905325754


   
   ## CI report:
   
   * f8d71b7e83f8e9bc64ca0dfbe70e67a352fba82e Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1903)
 
   * 92cdf3364c6c8f947f1a3bf5c5795a1b987ab83d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2354) archive delete replacecommit, but stop timeline server meet file not found

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404906#comment-17404906
 ] 

ASF GitHub Bot commented on HUDI-2354:
--

hudi-bot edited a comment on pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#issuecomment-905325754


   
   ## CI report:
   
   * 86aeb78e298b7bea6113fd3675f48fdc538bcbf9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1900)
 
   * f8d71b7e83f8e9bc64ca0dfbe70e67a352fba82e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1903)
 
   * 92cdf3364c6c8f947f1a3bf5c5795a1b987ab83d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> archive delete replacecommit, but stop timeline server meet file not found
> --
>
> Key: HUDI-2354
> URL: https://issues.apache.org/jira/browse/HUDI-2354
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>  Labels: pull-request-available
>
> 1、in spark writeclient postcommit will archive replacecommit which meet the 
> archive Requirement
> 21/08/23 14:57:12 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114552.commit
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.requested
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.inflight
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit
>  
> 2、if you start timelineservice, after sparksqlwrite post commit  it will stop 
> .  In HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106) need 
> to read replace instant metadata ,  but the replace instant file is delete , 
> but the timeline not update
>  
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:297)
> at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
> Caused by: org.apache.hudi.exception.HoodieIOException: Could not read commit 
> details from .hoodie/20210823114553.replacecommit
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:555)
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:219)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$resetFileGroupsReplaced$8(AbstractTableFileSystemView.java:217)
> at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:228)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.close(HoodieTableFileSystemView.java:353)
> at 
> java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)
> at 
> org.apache.hudi.common.table.view.FileSystemViewManager.close(FileSystemViewManager.java:118)
> at 
> org.apache.hudi.timeline.service.TimelineService.close(TimelineService.java:207)
> at 
> org.apache.hudi.client.embedded.EmbeddedTimelineService.stop(EmbeddedTimelineService.java:121)
> at 
> org.apache.hudi.client.AbstractHood

[GitHub] [hudi] hudi-bot edited a comment on pull request #3536: [HUDI-2354] Fix TimelineServer error because of replacecommit archive

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#issuecomment-905325754


   
   ## CI report:
   
   * 86aeb78e298b7bea6113fd3675f48fdc538bcbf9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1900)
 
   * f8d71b7e83f8e9bc64ca0dfbe70e67a352fba82e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1903)
 
   * 92cdf3364c6c8f947f1a3bf5c5795a1b987ab83d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2354) archive delete replacecommit, but stop timeline server meet file not found

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404904#comment-17404904
 ] 

ASF GitHub Bot commented on HUDI-2354:
--

zhangyue19921010 commented on a change in pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#discussion_r696258610



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java
##
@@ -212,7 +212,15 @@ private void resetFileGroupsReplaced(HoodieTimeline 
timeline) {
 hoodieTimer.startTimer();
 // for each REPLACE instant, get map of (partitionPath -> deleteFileGroup)
 HoodieTimeline replacedTimeline = timeline.getCompletedReplaceTimeline();
-Stream> resultStream = 
replacedTimeline.getInstants().flatMap(instant -> {
+Stream> resultStream = 
replacedTimeline.getInstants().filter(instant -> {
+  try {
+// Replace instant could be deleted by archive in timeline
+// So that we need to check if the replace commit files were existed.
+return metaClient.getFs().exists(new Path(metaClient.getMetaPath(), 
instant.getFileName()));

Review comment:
   Nice catch ! 
   changed.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> archive delete replacecommit, but stop timeline server meet file not found
> --
>
> Key: HUDI-2354
> URL: https://issues.apache.org/jira/browse/HUDI-2354
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>  Labels: pull-request-available
>
> 1、in spark writeclient postcommit will archive replacecommit which meet the 
> archive Requirement
> 21/08/23 14:57:12 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114552.commit
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.requested
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.inflight
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit
>  
> 2、if you start timelineservice, after sparksqlwrite post commit  it will stop 
> .  In HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106) need 
> to read replace instant metadata ,  but the replace instant file is delete , 
> but the timeline not update
>  
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:297)
> at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
> Caused by: org.apache.hudi.exception.HoodieIOException: Could not read commit 
> details from .hoodie/20210823114553.replacecommit
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:555)
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:219)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$resetFileGroupsReplaced$8(AbstractTableFileSystemView.java:217)
> at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:228)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.close(HoodieTableFileSystemView.java:353)
> at 
> java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)
> at 
> org.apache.hudi.common.table.view.FileSystemViewMana

[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3536: [HUDI-2354] Fix TimelineServer error because of replacecommit archive

2021-08-25 Thread GitBox


zhangyue19921010 commented on a change in pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#discussion_r696258610



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java
##
@@ -212,7 +212,15 @@ private void resetFileGroupsReplaced(HoodieTimeline 
timeline) {
 hoodieTimer.startTimer();
 // for each REPLACE instant, get map of (partitionPath -> deleteFileGroup)
 HoodieTimeline replacedTimeline = timeline.getCompletedReplaceTimeline();
-Stream> resultStream = 
replacedTimeline.getInstants().flatMap(instant -> {
+Stream> resultStream = 
replacedTimeline.getInstants().filter(instant -> {
+  try {
+// Replace instant could be deleted by archive in timeline
+// So that we need to check if the replace commit files were existed.
+return metaClient.getFs().exists(new Path(metaClient.getMetaPath(), 
instant.getFileName()));

Review comment:
   Nice catch ! 
   changed.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2354) archive delete replacecommit, but stop timeline server meet file not found

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404903#comment-17404903
 ] 

ASF GitHub Bot commented on HUDI-2354:
--

hudi-bot edited a comment on pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#issuecomment-905325754


   
   ## CI report:
   
   * 86aeb78e298b7bea6113fd3675f48fdc538bcbf9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1900)
 
   * f8d71b7e83f8e9bc64ca0dfbe70e67a352fba82e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1903)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> archive delete replacecommit, but stop timeline server meet file not found
> --
>
> Key: HUDI-2354
> URL: https://issues.apache.org/jira/browse/HUDI-2354
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>  Labels: pull-request-available
>
> 1、in spark writeclient postcommit will archive replacecommit which meet the 
> archive Requirement
> 21/08/23 14:57:12 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114552.commit
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.requested
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.inflight
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit
>  
> 2、if you start timelineservice, after sparksqlwrite post commit  it will stop 
> .  In HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106) need 
> to read replace instant metadata ,  but the replace instant file is delete , 
> but the timeline not update
>  
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:297)
> at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
> Caused by: org.apache.hudi.exception.HoodieIOException: Could not read commit 
> details from .hoodie/20210823114553.replacecommit
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:555)
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:219)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$resetFileGroupsReplaced$8(AbstractTableFileSystemView.java:217)
> at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:228)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.close(HoodieTableFileSystemView.java:353)
> at 
> java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)
> at 
> org.apache.hudi.common.table.view.FileSystemViewManager.close(FileSystemViewManager.java:118)
> at 
> org.apache.hudi.timeline.service.TimelineService.close(TimelineService.java:207)
> at 
> org.apache.hudi.client.embedded.EmbeddedTimelineService.stop(EmbeddedTimelineService.java:121)
> at 
> org.apache.hudi.client.AbstractHoodieClient.stopEmbeddedServerView(AbstractHoodieClient.j

[GitHub] [hudi] hudi-bot edited a comment on pull request #3536: [HUDI-2354] Fix TimelineServer error because of replacecommit archive

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#issuecomment-905325754


   
   ## CI report:
   
   * 86aeb78e298b7bea6113fd3675f48fdc538bcbf9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1900)
 
   * f8d71b7e83f8e9bc64ca0dfbe70e67a352fba82e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1903)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2354) archive delete replacecommit, but stop timeline server meet file not found

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404901#comment-17404901
 ] 

ASF GitHub Bot commented on HUDI-2354:
--

hudi-bot edited a comment on pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#issuecomment-905325754


   
   ## CI report:
   
   * 86aeb78e298b7bea6113fd3675f48fdc538bcbf9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1900)
 
   * f8d71b7e83f8e9bc64ca0dfbe70e67a352fba82e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> archive delete replacecommit, but stop timeline server meet file not found
> --
>
> Key: HUDI-2354
> URL: https://issues.apache.org/jira/browse/HUDI-2354
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>  Labels: pull-request-available
>
> 1、in spark writeclient postcommit will archive replacecommit which meet the 
> archive Requirement
> 21/08/23 14:57:12 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114552.commit
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.requested
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.inflight
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit
>  
> 2、if you start timelineservice, after sparksqlwrite post commit  it will stop 
> .  In HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106) need 
> to read replace instant metadata ,  but the replace instant file is delete , 
> but the timeline not update
>  
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:297)
> at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
> Caused by: org.apache.hudi.exception.HoodieIOException: Could not read commit 
> details from .hoodie/20210823114553.replacecommit
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:555)
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:219)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$resetFileGroupsReplaced$8(AbstractTableFileSystemView.java:217)
> at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:228)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.close(HoodieTableFileSystemView.java:353)
> at 
> java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)
> at 
> org.apache.hudi.common.table.view.FileSystemViewManager.close(FileSystemViewManager.java:118)
> at 
> org.apache.hudi.timeline.service.TimelineService.close(TimelineService.java:207)
> at 
> org.apache.hudi.client.embedded.EmbeddedTimelineService.stop(EmbeddedTimelineService.java:121)
> at 
> org.apache.hudi.client.AbstractHoodieClient.stopEmbeddedServerView(AbstractHoodieClient.java:94)
> at 
> org.apache.hudi.client.AbstractHoodieClient.close(AbstractHoodieClient.java:86)
> at 
> org.apache.hudi

[GitHub] [hudi] hudi-bot edited a comment on pull request #3536: [HUDI-2354] Fix TimelineServer error because of replacecommit archive

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#issuecomment-905325754


   
   ## CI report:
   
   * 86aeb78e298b7bea6113fd3675f48fdc538bcbf9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1900)
 
   * f8d71b7e83f8e9bc64ca0dfbe70e67a352fba82e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2354) archive delete replacecommit, but stop timeline server meet file not found

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404894#comment-17404894
 ] 

ASF GitHub Bot commented on HUDI-2354:
--

hudi-bot edited a comment on pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#issuecomment-905325754


   
   ## CI report:
   
   * 86aeb78e298b7bea6113fd3675f48fdc538bcbf9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1900)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> archive delete replacecommit, but stop timeline server meet file not found
> --
>
> Key: HUDI-2354
> URL: https://issues.apache.org/jira/browse/HUDI-2354
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>  Labels: pull-request-available
>
> 1、in spark writeclient postcommit will archive replacecommit which meet the 
> archive Requirement
> 21/08/23 14:57:12 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114552.commit
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.requested
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.inflight
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit
>  
> 2、if you start timelineservice, after sparksqlwrite post commit  it will stop 
> .  In HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106) need 
> to read replace instant metadata ,  but the replace instant file is delete , 
> but the timeline not update
>  
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:297)
> at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
> Caused by: org.apache.hudi.exception.HoodieIOException: Could not read commit 
> details from .hoodie/20210823114553.replacecommit
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:555)
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:219)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$resetFileGroupsReplaced$8(AbstractTableFileSystemView.java:217)
> at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:228)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.close(HoodieTableFileSystemView.java:353)
> at 
> java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)
> at 
> org.apache.hudi.common.table.view.FileSystemViewManager.close(FileSystemViewManager.java:118)
> at 
> org.apache.hudi.timeline.service.TimelineService.close(TimelineService.java:207)
> at 
> org.apache.hudi.client.embedded.EmbeddedTimelineService.stop(EmbeddedTimelineService.java:121)
> at 
> org.apache.hudi.client.AbstractHoodieClient.stopEmbeddedServerView(AbstractHoodieClient.java:94)
> at 
> org.apache.hudi.client.AbstractHoodieClient.close(AbstractHoodieClient.java:86)
> at 
> org.apache.hudi.client.AbstractHoodieWriteClient.close(AbstractHoodie

[GitHub] [hudi] hudi-bot edited a comment on pull request #3536: [HUDI-2354] Fix TimelineServer error because of replacecommit archive

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#issuecomment-905325754


   
   ## CI report:
   
   * 86aeb78e298b7bea6113fd3675f48fdc538bcbf9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1900)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2321) Use the caller classloader for ReflectionUtils

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404893#comment-17404893
 ] 

ASF GitHub Bot commented on HUDI-2321:
--

hudi-bot edited a comment on pull request #3535:
URL: https://github.com/apache/hudi/pull/3535#issuecomment-905240762


   
   ## CI report:
   
   * 2552e618287c46065baa96d9c2972f192fde5266 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1883)
 
   * 3349f0e888f939f1db9cb2b099a50a3bbd5e7362 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1902)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use the caller classloader for ReflectionUtils
> --
>
> Key: HUDI-2321
> URL: https://issues.apache.org/jira/browse/HUDI-2321
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: configs
>Reporter: Zheng yunhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Based on the discussion on stackoverflow: 
> https://stackoverflow.com/questions/1771679/difference-between-threads-context-class-loader-and-normal-classloader
> The {{Thread.currentThread().getContextClassLoader()}} should never be used 
> because the context classloader is not immutable, user can overwrite it when 
> thread switches, it is also nullable.
> The objection here: https://stackoverflow.com/a/36228195 says the the 
> {Thread.currentThread().getContextClassLoader()}} is a JDK design error and 
> the context classloader is never suggested to be used. The API that needs 
> classloader should ask the user to set up the right classloader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3535: [HUDI-2321] Use the caller classloader for ReflectionUtils

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3535:
URL: https://github.com/apache/hudi/pull/3535#issuecomment-905240762


   
   ## CI report:
   
   * 2552e618287c46065baa96d9c2972f192fde5266 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1883)
 
   * 3349f0e888f939f1db9cb2b099a50a3bbd5e7362 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1902)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2321) Use the caller classloader for ReflectionUtils

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404892#comment-17404892
 ] 

ASF GitHub Bot commented on HUDI-2321:
--

hudi-bot edited a comment on pull request #3535:
URL: https://github.com/apache/hudi/pull/3535#issuecomment-905240762


   
   ## CI report:
   
   * 2552e618287c46065baa96d9c2972f192fde5266 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1883)
 
   * 3349f0e888f939f1db9cb2b099a50a3bbd5e7362 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use the caller classloader for ReflectionUtils
> --
>
> Key: HUDI-2321
> URL: https://issues.apache.org/jira/browse/HUDI-2321
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: configs
>Reporter: Zheng yunhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Based on the discussion on stackoverflow: 
> https://stackoverflow.com/questions/1771679/difference-between-threads-context-class-loader-and-normal-classloader
> The {{Thread.currentThread().getContextClassLoader()}} should never be used 
> because the context classloader is not immutable, user can overwrite it when 
> thread switches, it is also nullable.
> The objection here: https://stackoverflow.com/a/36228195 says the the 
> {Thread.currentThread().getContextClassLoader()}} is a JDK design error and 
> the context classloader is never suggested to be used. The API that needs 
> classloader should ask the user to set up the right classloader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3535: [HUDI-2321] Use the caller classloader for ReflectionUtils

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3535:
URL: https://github.com/apache/hudi/pull/3535#issuecomment-905240762


   
   ## CI report:
   
   * 2552e618287c46065baa96d9c2972f192fde5266 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1883)
 
   * 3349f0e888f939f1db9cb2b099a50a3bbd5e7362 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2366) fix hudi generating too many logs

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404891#comment-17404891
 ] 

ASF GitHub Bot commented on HUDI-2366:
--

hudi-bot edited a comment on pull request #3543:
URL: https://github.com/apache/hudi/pull/3543#issuecomment-906068990


   
   ## CI report:
   
   * 95b6526c86d0428b80c9525b94f3bceb1d883bda Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1901)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> fix hudi generating too many logs
> -
>
> Key: HUDI-2366
> URL: https://issues.apache.org/jira/browse/HUDI-2366
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: WangZhongze
>Assignee: WangZhongze
>Priority: Major
>  Labels: pull-request-available
>
> In AbstractTableFileSystemView.isFileSliceAfterPendingCompaction, 
> compactionWithInstantTime of FileSlice will be print in the log output. but 
> in general, FileSlice is not in compaction state, resulting in a null value 
> in the log output. And if there are many FIleslices, a large number of logs 
> will be output in a short time, which can reach more than 90% of the total 
> log, which seriously affects viewing other log information
> Advice: delete this log



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3543: [HUDI-2366] fix too many logs

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3543:
URL: https://github.com/apache/hudi/pull/3543#issuecomment-906068990


   
   ## CI report:
   
   * 95b6526c86d0428b80c9525b94f3bceb1d883bda Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1901)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2229) Refact hoodie stream creator in flink

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404887#comment-17404887
 ] 

ASF GitHub Bot commented on HUDI-2229:
--

hudi-bot edited a comment on pull request #3495:
URL: https://github.com/apache/hudi/pull/3495#issuecomment-901095902


   
   ## CI report:
   
   * bbe7dd4a418b756da880733278eddd08558a551a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1899)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refact hoodie stream creator in flink
> -
>
> Key: HUDI-2229
> URL: https://issues.apache.org/jira/browse/HUDI-2229
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: wu xingbo
>Priority: Trivial
>  Labels: beginner, pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> the procedure of hoodiedatastreamsink is duplicated in current 
> HoodieFlinkStreamer and HoodieTableSink, such as index bootstrap, 
> compaction,etc.
> propose to refact them as same function in utility 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2366) fix hudi generating too many logs

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404888#comment-17404888
 ] 

ASF GitHub Bot commented on HUDI-2366:
--

hudi-bot commented on pull request #3543:
URL: https://github.com/apache/hudi/pull/3543#issuecomment-906068990


   
   ## CI report:
   
   * 95b6526c86d0428b80c9525b94f3bceb1d883bda UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> fix hudi generating too many logs
> -
>
> Key: HUDI-2366
> URL: https://issues.apache.org/jira/browse/HUDI-2366
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: WangZhongze
>Assignee: WangZhongze
>Priority: Major
>  Labels: pull-request-available
>
> In AbstractTableFileSystemView.isFileSliceAfterPendingCompaction, 
> compactionWithInstantTime of FileSlice will be print in the log output. but 
> in general, FileSlice is not in compaction state, resulting in a null value 
> in the log output. And if there are many FIleslices, a large number of logs 
> will be output in a short time, which can reach more than 90% of the total 
> log, which seriously affects viewing other log information
> Advice: delete this log



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot commented on pull request #3543: [HUDI-2366] fix too many logs

2021-08-25 Thread GitBox


hudi-bot commented on pull request #3543:
URL: https://github.com/apache/hudi/pull/3543#issuecomment-906068990


   
   ## CI report:
   
   * 95b6526c86d0428b80c9525b94f3bceb1d883bda UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3495: [HUDI-2229] Refact hoodie stream creator

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3495:
URL: https://github.com/apache/hudi/pull/3495#issuecomment-901095902


   
   ## CI report:
   
   * bbe7dd4a418b756da880733278eddd08558a551a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1899)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2366) fix hudi generating too many logs

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404885#comment-17404885
 ] 

ASF GitHub Bot commented on HUDI-2366:
--

AyachiNene opened a new pull request #3543:
URL: https://github.com/apache/hudi/pull/3543


   ## What is the purpose of the pull request
   
   JIRA : https://issues.apache.org/jira/browse/HUDI-2366
   
   Delete a meaningless log with too many log output
   
   ## Brief change log
   
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> fix hudi generating too many logs
> -
>
> Key: HUDI-2366
> URL: https://issues.apache.org/jira/browse/HUDI-2366
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: WangZhongze
>Assignee: WangZhongze
>Priority: Major
>
> In AbstractTableFileSystemView.isFileSliceAfterPendingCompaction, 
> compactionWithInstantTime of FileSlice will be print in the log output. but 
> in general, FileSlice is not in compaction state, resulting in a null value 
> in the log output. And if there are many FIleslices, a large number of logs 
> will be output in a short time, which can reach more than 90% of the total 
> log, which seriously affects viewing other log information
> Advice: delete this log



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2366) fix hudi generating too many logs

2021-08-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2366:
-
Labels: pull-request-available  (was: )

> fix hudi generating too many logs
> -
>
> Key: HUDI-2366
> URL: https://issues.apache.org/jira/browse/HUDI-2366
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: WangZhongze
>Assignee: WangZhongze
>Priority: Major
>  Labels: pull-request-available
>
> In AbstractTableFileSystemView.isFileSliceAfterPendingCompaction, 
> compactionWithInstantTime of FileSlice will be print in the log output. but 
> in general, FileSlice is not in compaction state, resulting in a null value 
> in the log output. And if there are many FIleslices, a large number of logs 
> will be output in a short time, which can reach more than 90% of the total 
> log, which seriously affects viewing other log information
> Advice: delete this log



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] AyachiNene opened a new pull request #3543: [HUDI-2366] fix too many logs

2021-08-25 Thread GitBox


AyachiNene opened a new pull request #3543:
URL: https://github.com/apache/hudi/pull/3543


   ## What is the purpose of the pull request
   
   JIRA : https://issues.apache.org/jira/browse/HUDI-2366
   
   Delete a meaningless log with too many log output
   
   ## Brief change log
   
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2354) archive delete replacecommit, but stop timeline server meet file not found

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404870#comment-17404870
 ] 

ASF GitHub Bot commented on HUDI-2354:
--

zhangyue19921010 commented on a change in pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#discussion_r696258610



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java
##
@@ -212,7 +212,15 @@ private void resetFileGroupsReplaced(HoodieTimeline 
timeline) {
 hoodieTimer.startTimer();
 // for each REPLACE instant, get map of (partitionPath -> deleteFileGroup)
 HoodieTimeline replacedTimeline = timeline.getCompletedReplaceTimeline();
-Stream> resultStream = 
replacedTimeline.getInstants().flatMap(instant -> {
+Stream> resultStream = 
replacedTimeline.getInstants().filter(instant -> {
+  try {
+// Replace instant could be deleted by archive in timeline
+// So that we need to check if the replace commit files were existed.
+return metaClient.getFs().exists(new Path(metaClient.getMetaPath(), 
instant.getFileName()));

Review comment:
   Nice catch !
   Maybe just catch the FileNotFoundException is not good enough, and there may 
other IOExceptions beside FileNotFoundException be threw during 
`getInstantDetails` because of  archive deleting commit files. 
   
   Just changed and catch the HoodieIOException threw from getInstantDetails. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> archive delete replacecommit, but stop timeline server meet file not found
> --
>
> Key: HUDI-2354
> URL: https://issues.apache.org/jira/browse/HUDI-2354
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>  Labels: pull-request-available
>
> 1、in spark writeclient postcommit will archive replacecommit which meet the 
> archive Requirement
> 21/08/23 14:57:12 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114552.commit
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.requested
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.inflight
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit
>  
> 2、if you start timelineservice, after sparksqlwrite post commit  it will stop 
> .  In HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106) need 
> to read replace instant metadata ,  but the replace instant file is delete , 
> but the timeline not update
>  
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:297)
> at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
> Caused by: org.apache.hudi.exception.HoodieIOException: Could not read commit 
> details from .hoodie/20210823114553.replacecommit
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:555)
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:219)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$resetFileGroupsReplaced$8(AbstractTableFileSystemView.java:217)
> at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:228)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystem

[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3536: [HUDI-2354] Fix TimelineServer error because of replacecommit archive

2021-08-25 Thread GitBox


zhangyue19921010 commented on a change in pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#discussion_r696258610



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java
##
@@ -212,7 +212,15 @@ private void resetFileGroupsReplaced(HoodieTimeline 
timeline) {
 hoodieTimer.startTimer();
 // for each REPLACE instant, get map of (partitionPath -> deleteFileGroup)
 HoodieTimeline replacedTimeline = timeline.getCompletedReplaceTimeline();
-Stream> resultStream = 
replacedTimeline.getInstants().flatMap(instant -> {
+Stream> resultStream = 
replacedTimeline.getInstants().filter(instant -> {
+  try {
+// Replace instant could be deleted by archive in timeline
+// So that we need to check if the replace commit files were existed.
+return metaClient.getFs().exists(new Path(metaClient.getMetaPath(), 
instant.getFileName()));

Review comment:
   Nice catch !
   Maybe just catch the FileNotFoundException is not good enough, and there may 
other IOExceptions beside FileNotFoundException be threw during 
`getInstantDetails` because of  archive deleting commit files. 
   
   Just changed and catch the HoodieIOException threw from getInstantDetails. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2354) archive delete replacecommit, but stop timeline server meet file not found

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404866#comment-17404866
 ] 

ASF GitHub Bot commented on HUDI-2354:
--

hudi-bot edited a comment on pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#issuecomment-905325754


   
   ## CI report:
   
   * 0e8abd694f93484a3e6accd9c18f5b538274ecd7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1890)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1891)
 
   * 86aeb78e298b7bea6113fd3675f48fdc538bcbf9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1900)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> archive delete replacecommit, but stop timeline server meet file not found
> --
>
> Key: HUDI-2354
> URL: https://issues.apache.org/jira/browse/HUDI-2354
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>  Labels: pull-request-available
>
> 1、in spark writeclient postcommit will archive replacecommit which meet the 
> archive Requirement
> 21/08/23 14:57:12 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114552.commit
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.requested
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.inflight
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit
>  
> 2、if you start timelineservice, after sparksqlwrite post commit  it will stop 
> .  In HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106) need 
> to read replace instant metadata ,  but the replace instant file is delete , 
> but the timeline not update
>  
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:297)
> at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
> Caused by: org.apache.hudi.exception.HoodieIOException: Could not read commit 
> details from .hoodie/20210823114553.replacecommit
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:555)
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:219)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$resetFileGroupsReplaced$8(AbstractTableFileSystemView.java:217)
> at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:228)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.close(HoodieTableFileSystemView.java:353)
> at 
> java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)
> at 
> org.apache.hudi.common.table.view.FileSystemViewManager.close(FileSystemViewManager.java:118)
> at 
> org.apache.hudi.timeline.service.TimelineService.close(TimelineService.java:207)
> at 
> org.apache.hudi.client.embedded.EmbeddedTimelineService.stop(Embed

[GitHub] [hudi] hudi-bot edited a comment on pull request #3536: [HUDI-2354] Fix TimelineServer error because of replacecommit archive

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#issuecomment-905325754


   
   ## CI report:
   
   * 0e8abd694f93484a3e6accd9c18f5b538274ecd7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1890)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1891)
 
   * 86aeb78e298b7bea6113fd3675f48fdc538bcbf9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1900)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2365) Optimizing overwriteField method with Objects.equals

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404865#comment-17404865
 ] 

ASF GitHub Bot commented on HUDI-2365:
--

dongkelun commented on pull request #3542:
URL: https://github.com/apache/hudi/pull/3542#issuecomment-906056841


   @hudi-bot  run travis 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Optimizing overwriteField method with Objects.equals
> 
>
> Key: HUDI-2365
> URL: https://issues.apache.org/jira/browse/HUDI-2365
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Optimizing overwriteField method with Objects.equals



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] dongkelun commented on pull request #3542: [HUDI-2365]Optimizing overwriteField method with Objects.equals

2021-08-25 Thread GitBox


dongkelun commented on pull request #3542:
URL: https://github.com/apache/hudi/pull/3542#issuecomment-906056841


   @hudi-bot  run travis 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2354) archive delete replacecommit, but stop timeline server meet file not found

2021-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404863#comment-17404863
 ] 

ASF GitHub Bot commented on HUDI-2354:
--

hudi-bot edited a comment on pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#issuecomment-905325754


   
   ## CI report:
   
   * 0e8abd694f93484a3e6accd9c18f5b538274ecd7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1890)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1891)
 
   * 86aeb78e298b7bea6113fd3675f48fdc538bcbf9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> archive delete replacecommit, but stop timeline server meet file not found
> --
>
> Key: HUDI-2354
> URL: https://issues.apache.org/jira/browse/HUDI-2354
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>  Labels: pull-request-available
>
> 1、in spark writeclient postcommit will archive replacecommit which meet the 
> archive Requirement
> 21/08/23 14:57:12 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114552.commit
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.requested
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit.inflight
> 21/08/23 14:57:13 INFO HoodieTimelineArchiveLog: Archived and deleted instant 
> file .hoodie/20210823114553.replacecommit
>  
> 2、if you start timelineservice, after sparksqlwrite post commit  it will stop 
> .  In HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106) need 
> to read replace instant metadata ,  but the replace instant file is delete , 
> but the timeline not update
>  
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:297)
> at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
> Caused by: org.apache.hudi.exception.HoodieIOException: Could not read commit 
> details from .hoodie/20210823114553.replacecommit
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.readDataFromPath(HoodieActiveTimeline.java:555)
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.getInstantDetails(HoodieActiveTimeline.java:219)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$resetFileGroupsReplaced$8(AbstractTableFileSystemView.java:217)
> at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.resetFileGroupsReplaced(AbstractTableFileSystemView.java:228)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:106)
> at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.reset(AbstractTableFileSystemView.java:248)
> at 
> org.apache.hudi.common.table.view.HoodieTableFileSystemView.close(HoodieTableFileSystemView.java:353)
> at 
> java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)
> at 
> org.apache.hudi.common.table.view.FileSystemViewManager.close(FileSystemViewManager.java:118)
> at 
> org.apache.hudi.timeline.service.TimelineService.close(TimelineService.java:207)
> at 
> org.apache.hudi.client.embedded.EmbeddedTimelineService.stop(EmbeddedTimelineService.java:121)
> at 
> org.apache.hudi.client.AbstractHoodieClient.stopEmbeddedServerView(AbstractHoodieC

[GitHub] [hudi] hudi-bot edited a comment on pull request #3536: [HUDI-2354] Fix TimelineServer error because of replacecommit archive

2021-08-25 Thread GitBox


hudi-bot edited a comment on pull request #3536:
URL: https://github.com/apache/hudi/pull/3536#issuecomment-905325754


   
   ## CI report:
   
   * 0e8abd694f93484a3e6accd9c18f5b538274ecd7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1890)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1891)
 
   * 86aeb78e298b7bea6113fd3675f48fdc538bcbf9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   4   5   >