[GitHub] [hudi] n3nash commented on issue #2637: [SUPPORT] - Partial Update : update few columns of a table
n3nash commented on issue #2637: URL: https://github.com/apache/hudi/issues/2637#issuecomment-812335639 @Sugamber If your issue is addressed, please close this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash closed issue #2637: [SUPPORT] - Partial Update : update few columns of a table
n3nash closed issue #2637: URL: https://github.com/apache/hudi/issues/2637 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #2623: org.apache.hudi.exception.HoodieDependentSystemUnavailableException:System HBASE unavailable.
n3nash commented on issue #2623: URL: https://github.com/apache/hudi/issues/2623#issuecomment-812334675 @root18039532923 Not sure if you're asking how to package this but leaving a comment here if that's what you are referring to. 1. Do a git checkout of the 0.7.0 tag. `git checkout tags/version 0.7.0` 2. Patch the change in the https://github.com/apache/hudi/commit/657e73f9b157a1ed4d6e6e515ae8f63156061b88 to your checked out tag. 3. Re-build the packages using command `mvn clean package -DskipTests` Now use these new packages and test out your deployment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on pull request #2751: [HUDI-1748] Read operation will possiblity fail on mor table rt view when a write operations is concurrency running
n3nash commented on pull request #2751: URL: https://github.com/apache/hudi/pull/2751#issuecomment-812329664 @li36909 If I understand this correctly, you are saying reading a MOR table `_rt` view can fail if a concurrent write is happening ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2758: [HUDI-1696] add apache commons-codec dependency to flink-bundle explicitly
codecov-io edited a comment on pull request #2758: URL: https://github.com/apache/hudi/pull/2758#issuecomment-812325694 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2758?src=pr=h1) Report > Merging [#2758](https://codecov.io/gh/apache/hudi/pull/2758?src=pr=desc) (ea0d401) into [master](https://codecov.io/gh/apache/hudi/commit/fe16d0de7c76105775c887b700751241bc82624c?el=desc) (fe16d0d) will **increase** coverage by `0.25%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2758/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2758?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2758 +/- ## + Coverage 52.04% 52.30% +0.25% - Complexity 3625 3688 +63 Files 479 483 +4 Lines 2280423095 +291 Branches 2415 2460 +45 + Hits 1186812079 +211 - Misses 9911 9946 +35 - Partials 1025 1070 +45 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `37.01% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | | | hudicommon | `50.78% <ø> (-0.14%)` | `0.00 <ø> (ø)` | | | hudiflink | `56.71% <ø> (+0.69%)` | `0.00 <ø> (ø)` | | | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisparkdatasource | `71.33% <ø> (+0.45%)` | `0.00 <ø> (ø)` | | | hudisync | `45.47% <ø> (ø)` | `0.00 <ø> (ø)` | | | huditimelineservice | `64.36% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiutilities | `69.69% <ø> (-0.09%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2758?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...ache/hudi/sink/compact/CompactionPlanOperator.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvblBsYW5PcGVyYXRvci5qYXZh) | `58.00% <0.00%> (-21.07%)` | `8.00% <0.00%> (-1.00%)` | | | [.../main/scala/org/apache/hudi/HoodieSparkUtils.scala](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVNwYXJrVXRpbHMuc2NhbGE=) | `83.33% <0.00%> (-5.56%)` | `0.00% <0.00%> (ø%)` | | | [...src/main/scala/org/apache/hudi/DefaultSource.scala](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0RlZmF1bHRTb3VyY2Uuc2NhbGE=) | `78.78% <0.00%> (-5.36%)` | `31.00% <0.00%> (+14.00%)` | :arrow_down: | | [...main/scala/org/apache/hudi/HoodieWriterUtils.scala](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVdyaXRlclV0aWxzLnNjYWxh) | `83.33% <0.00%> (-5.24%)` | `0.00% <0.00%> (ø%)` | | | [...di/common/table/timeline/HoodieActiveTimeline.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUFjdGl2ZVRpbWVsaW5lLmphdmE=) | `66.81% <0.00%> (-3.97%)` | `43.00% <0.00%> (ø%)` | | | [...apache/hudi/sink/compact/CompactionCommitSink.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvbkNvbW1pdFNpbmsuamF2YQ==) | `66.66% <0.00%> (-3.55%)` | `10.00% <0.00%> (-1.00%)` | | | [...rg/apache/hudi/common/table/HoodieTableConfig.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlQ29uZmlnLmphdmE=) | `43.20% <0.00%> (-2.25%)` | `17.00% <0.00%> (ø%)` | | | [...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh) | `66.66% <0.00%> (-1.65%)` | `43.00% <0.00%> (ø%)` | | | [...c/main/java/org/apache/hudi/common/fs/FSUtils.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0ZTVXRpbHMuamF2YQ==) | `47.34% <0.00%> (-0.94%)` | `57.00% <0.00%> (ø%)` | | |
[GitHub] [hudi] codecov-io commented on pull request #2758: [HUDI-1696] add apache commons-codec dependency to flink-bundle explicitly
codecov-io commented on pull request #2758: URL: https://github.com/apache/hudi/pull/2758#issuecomment-812325694 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2758?src=pr=h1) Report > Merging [#2758](https://codecov.io/gh/apache/hudi/pull/2758?src=pr=desc) (ea0d401) into [master](https://codecov.io/gh/apache/hudi/commit/fe16d0de7c76105775c887b700751241bc82624c?el=desc) (fe16d0d) will **increase** coverage by `0.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2758/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2758?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2758 +/- ## + Coverage 52.04% 52.06% +0.01% - Complexity 3625 3626 +1 Files 479 477 -2 Lines 2280422646 -158 Branches 2415 2436 +21 - Hits 1186811790 -78 + Misses 9911 9805 -106 - Partials 1025 1051 +26 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `37.01% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | | | hudicommon | `50.78% <ø> (-0.14%)` | `0.00 <ø> (ø)` | | | hudiflink | `56.71% <ø> (+0.69%)` | `0.00 <ø> (ø)` | | | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisparkdatasource | `71.33% <ø> (+0.45%)` | `0.00 <ø> (ø)` | | | hudisync | `45.47% <ø> (ø)` | `0.00 <ø> (ø)` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.69% <ø> (-0.09%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2758?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...ache/hudi/sink/compact/CompactionPlanOperator.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvblBsYW5PcGVyYXRvci5qYXZh) | `58.00% <0.00%> (-21.07%)` | `8.00% <0.00%> (-1.00%)` | | | [.../main/scala/org/apache/hudi/HoodieSparkUtils.scala](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVNwYXJrVXRpbHMuc2NhbGE=) | `83.33% <0.00%> (-5.56%)` | `0.00% <0.00%> (ø%)` | | | [...src/main/scala/org/apache/hudi/DefaultSource.scala](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0RlZmF1bHRTb3VyY2Uuc2NhbGE=) | `78.78% <0.00%> (-5.36%)` | `31.00% <0.00%> (+14.00%)` | :arrow_down: | | [...main/scala/org/apache/hudi/HoodieWriterUtils.scala](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVdyaXRlclV0aWxzLnNjYWxh) | `83.33% <0.00%> (-5.24%)` | `0.00% <0.00%> (ø%)` | | | [...di/common/table/timeline/HoodieActiveTimeline.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUFjdGl2ZVRpbWVsaW5lLmphdmE=) | `66.81% <0.00%> (-3.97%)` | `43.00% <0.00%> (ø%)` | | | [...apache/hudi/sink/compact/CompactionCommitSink.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvbkNvbW1pdFNpbmsuamF2YQ==) | `66.66% <0.00%> (-3.55%)` | `10.00% <0.00%> (-1.00%)` | | | [...rg/apache/hudi/common/table/HoodieTableConfig.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlQ29uZmlnLmphdmE=) | `43.20% <0.00%> (-2.25%)` | `17.00% <0.00%> (ø%)` | | | [...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh) | `66.66% <0.00%> (-1.65%)` | `43.00% <0.00%> (ø%)` | | | [...c/main/java/org/apache/hudi/common/fs/FSUtils.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0ZTVXRpbHMuamF2YQ==) | `47.34% <0.00%> (-0.94%)` | `57.00% <0.00%> (ø%)` | | |
[jira] [Comment Edited] (HUDI-1696) artifactSet of maven-shade-plugin has not commons-codec
[ https://issues.apache.org/jira/browse/HUDI-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313551#comment-17313551 ] Harshit Mittal edited comment on HUDI-1696 at 4/2/21, 4:07 AM: --- I started looking into this and realized that the fix is merged into master as part of HUDI-1540 for spark-bundle. Should I go ahead and patch the 0.7.0 release as well or leave it be? CC: [~shivnarayan] [~vinoth] was (Author: hmittal83): I started looking into this and realized that the fix is merged into master for spark-bundle. Should I go ahead and patch the 0.7.0 release as well or leave it be? > artifactSet of maven-shade-plugin has not commons-codec > --- > > Key: HUDI-1696 > URL: https://issues.apache.org/jira/browse/HUDI-1696 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Affects Versions: 0.7.0 > Environment: spark2.4.4 > scala2.11.8 > centos7 >Reporter: peng-xin >Priority: Critical > Labels: pull-request-available, sev:high, user-support-issues > Fix For: 0.7.0 > > Attachments: image-2021-03-16-18-20-16-477.png > > Original Estimate: 24h > Remaining Estimate: 24h > > when i use hbase index,it cause some error like below > !image-2021-03-16-18-20-16-477.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1696) artifactSet of maven-shade-plugin has not commons-codec
[ https://issues.apache.org/jira/browse/HUDI-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313551#comment-17313551 ] Harshit Mittal commented on HUDI-1696: -- I started looking into this and realized that the fix is merged into master for spark-bundle. Should I go ahead and patch the 0.7.0 release as well or leave it be? > artifactSet of maven-shade-plugin has not commons-codec > --- > > Key: HUDI-1696 > URL: https://issues.apache.org/jira/browse/HUDI-1696 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Affects Versions: 0.7.0 > Environment: spark2.4.4 > scala2.11.8 > centos7 >Reporter: peng-xin >Priority: Critical > Labels: pull-request-available, sev:high, user-support-issues > Fix For: 0.7.0 > > Attachments: image-2021-03-16-18-20-16-477.png > > Original Estimate: 24h > Remaining Estimate: 24h > > when i use hbase index,it cause some error like below > !image-2021-03-16-18-20-16-477.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hmit opened a new pull request #2758: [HUDI-1696] add apache commons-codec dependency to flink-bundle explicitly
hmit opened a new pull request #2758: URL: https://github.com/apache/hudi/pull/2758 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request This causes the shaded code from apache commons codec to be explicitly pulled into the uber jar ## Brief change log ## Verify this pull request Verified the built uber jar to contain the libraries in the path, also verified that the release has them missing This pull request is a trivial rework / code cleanup without any test coverage. ## Committer checklist - [] Has a corresponding JIRA in PR title & commit - [] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1737) Extract common method in HoodieCreateHandle & FlinkCreateHandle
[ https://issues.apache.org/jira/browse/HUDI-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated HUDI-1737: --- Fix Version/s: 0.9.0 > Extract common method in HoodieCreateHandle & FlinkCreateHandle > --- > > Key: HUDI-1737 > URL: https://issues.apache.org/jira/browse/HUDI-1737 > Project: Apache Hudi > Issue Type: Improvement > Components: Code Cleanup >Reporter: Roc Marshal >Assignee: Roc Marshal >Priority: Major > Labels: hudi-client, pull-request-available > Fix For: 0.9.0 > > > {code:java} > // HoodieCreateHandle.java > //... > @Override > public List close() { > LOG.info("Closing the file " + writeStatus.getFileId() + " as we are done > with all the records " + recordsWritten); > try { > fileWriter.close(); > HoodieWriteStat stat = new HoodieWriteStat(); > stat.setPartitionPath(writeStatus.getPartitionPath()); > stat.setNumWrites(recordsWritten); > stat.setNumDeletes(recordsDeleted); > stat.setNumInserts(insertRecordsWritten); > stat.setPrevCommit(HoodieWriteStat.NULL_COMMIT); > stat.setFileId(writeStatus.getFileId()); > stat.setPath(new Path(config.getBasePath()), path); > long fileSizeInBytes = FSUtils.getFileSize(fs, path); > stat.setTotalWriteBytes(fileSizeInBytes); > stat.setFileSizeInBytes(fileSizeInBytes); > stat.setTotalWriteErrors(writeStatus.getTotalErrorRecords()); > RuntimeStats runtimeStats = new RuntimeStats(); > runtimeStats.setTotalCreateTime(timer.endTimer()); > stat.setRuntimeStats(runtimeStats); > writeStatus.setStat(stat); > LOG.info(String.format("CreateHandle for partitionPath %s fileID %s, took > %d ms.", stat.getPartitionPath(), > stat.getFileId(), runtimeStats.getTotalCreateTime())); > return Collections.singletonList(writeStatus); > } catch (IOException e) { > throw new HoodieInsertException("Failed to close the Insert Handle for > path " + path, e); > } > } > //FlinkCreateHandle.java > private void setUpWriteStatus() throws IOException { > long fileSizeInBytes = fileWriter.getBytesWritten(); > long incFileSizeInBytes = fileSizeInBytes - lastFileSize; > this.lastFileSize = fileSizeInBytes; > HoodieWriteStat stat = new HoodieWriteStat(); > stat.setPartitionPath(writeStatus.getPartitionPath()); > stat.setNumWrites(recordsWritten); > stat.setNumDeletes(recordsDeleted); > stat.setNumInserts(insertRecordsWritten); > stat.setPrevCommit(HoodieWriteStat.NULL_COMMIT); > stat.setFileId(writeStatus.getFileId()); > stat.setPath(new Path(config.getBasePath()), path); > stat.setTotalWriteBytes(incFileSizeInBytes); > stat.setFileSizeInBytes(fileSizeInBytes); > stat.setTotalWriteErrors(writeStatus.getTotalErrorRecords()); > HoodieWriteStat.RuntimeStats runtimeStats = new > HoodieWriteStat.RuntimeStats(); > runtimeStats.setTotalCreateTime(timer.endTimer()); > stat.setRuntimeStats(runtimeStats); > writeStatus.setStat(stat); > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1737) Extract common method in HoodieCreateHandle & FlinkCreateHandle
[ https://issues.apache.org/jira/browse/HUDI-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-1737. -- Resolution: Done 94a5e72f16620da86bd2ca7ef27bb9abc266cc47 > Extract common method in HoodieCreateHandle & FlinkCreateHandle > --- > > Key: HUDI-1737 > URL: https://issues.apache.org/jira/browse/HUDI-1737 > Project: Apache Hudi > Issue Type: Improvement > Components: Code Cleanup >Reporter: Roc Marshal >Assignee: Roc Marshal >Priority: Major > Labels: hudi-client, pull-request-available > Fix For: 0.9.0 > > > {code:java} > // HoodieCreateHandle.java > //... > @Override > public List close() { > LOG.info("Closing the file " + writeStatus.getFileId() + " as we are done > with all the records " + recordsWritten); > try { > fileWriter.close(); > HoodieWriteStat stat = new HoodieWriteStat(); > stat.setPartitionPath(writeStatus.getPartitionPath()); > stat.setNumWrites(recordsWritten); > stat.setNumDeletes(recordsDeleted); > stat.setNumInserts(insertRecordsWritten); > stat.setPrevCommit(HoodieWriteStat.NULL_COMMIT); > stat.setFileId(writeStatus.getFileId()); > stat.setPath(new Path(config.getBasePath()), path); > long fileSizeInBytes = FSUtils.getFileSize(fs, path); > stat.setTotalWriteBytes(fileSizeInBytes); > stat.setFileSizeInBytes(fileSizeInBytes); > stat.setTotalWriteErrors(writeStatus.getTotalErrorRecords()); > RuntimeStats runtimeStats = new RuntimeStats(); > runtimeStats.setTotalCreateTime(timer.endTimer()); > stat.setRuntimeStats(runtimeStats); > writeStatus.setStat(stat); > LOG.info(String.format("CreateHandle for partitionPath %s fileID %s, took > %d ms.", stat.getPartitionPath(), > stat.getFileId(), runtimeStats.getTotalCreateTime())); > return Collections.singletonList(writeStatus); > } catch (IOException e) { > throw new HoodieInsertException("Failed to close the Insert Handle for > path " + path, e); > } > } > //FlinkCreateHandle.java > private void setUpWriteStatus() throws IOException { > long fileSizeInBytes = fileWriter.getBytesWritten(); > long incFileSizeInBytes = fileSizeInBytes - lastFileSize; > this.lastFileSize = fileSizeInBytes; > HoodieWriteStat stat = new HoodieWriteStat(); > stat.setPartitionPath(writeStatus.getPartitionPath()); > stat.setNumWrites(recordsWritten); > stat.setNumDeletes(recordsDeleted); > stat.setNumInserts(insertRecordsWritten); > stat.setPrevCommit(HoodieWriteStat.NULL_COMMIT); > stat.setFileId(writeStatus.getFileId()); > stat.setPath(new Path(config.getBasePath()), path); > stat.setTotalWriteBytes(incFileSizeInBytes); > stat.setFileSizeInBytes(fileSizeInBytes); > stat.setTotalWriteErrors(writeStatus.getTotalErrorRecords()); > HoodieWriteStat.RuntimeStats runtimeStats = new > HoodieWriteStat.RuntimeStats(); > runtimeStats.setTotalCreateTime(timer.endTimer()); > stat.setRuntimeStats(runtimeStats); > writeStatus.setStat(stat); > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] yanghua merged pull request #2745: [HUDI-1737][hudi-client] Code Cleanup: Extract common method in HoodieCreateHandle & FlinkCreateHandle
yanghua merged pull request #2745: URL: https://github.com/apache/hudi/pull/2745 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [HUDI-1737][hudi-client] Code Cleanup: Extract common method in HoodieCreateHandle & FlinkCreateHandle (#2745)
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 94a5e72 [HUDI-1737][hudi-client] Code Cleanup: Extract common method in HoodieCreateHandle & FlinkCreateHandle (#2745) 94a5e72 is described below commit 94a5e72f16620da86bd2ca7ef27bb9abc266cc47 Author: Roc Marshal <64569824+rocmars...@users.noreply.github.com> AuthorDate: Fri Apr 2 11:39:05 2021 +0800 [HUDI-1737][hudi-client] Code Cleanup: Extract common method in HoodieCreateHandle & FlinkCreateHandle (#2745) --- .../org/apache/hudi/io/HoodieCreateHandle.java | 56 ++ .../java/org/apache/hudi/io/FlinkCreateHandle.java | 36 +- 2 files changed, 48 insertions(+), 44 deletions(-) diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java index 357cf1b..6fa9b56 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java @@ -179,29 +179,47 @@ public class HoodieCreateHandle extends fileWriter.close(); - HoodieWriteStat stat = new HoodieWriteStat(); - stat.setPartitionPath(writeStatus.getPartitionPath()); - stat.setNumWrites(recordsWritten); - stat.setNumDeletes(recordsDeleted); - stat.setNumInserts(insertRecordsWritten); - stat.setPrevCommit(HoodieWriteStat.NULL_COMMIT); - stat.setFileId(writeStatus.getFileId()); - stat.setPath(new Path(config.getBasePath()), path); - long fileSizeInBytes = FSUtils.getFileSize(fs, path); - stat.setTotalWriteBytes(fileSizeInBytes); - stat.setFileSizeInBytes(fileSizeInBytes); - stat.setTotalWriteErrors(writeStatus.getTotalErrorRecords()); - RuntimeStats runtimeStats = new RuntimeStats(); - runtimeStats.setTotalCreateTime(timer.endTimer()); - stat.setRuntimeStats(runtimeStats); - writeStatus.setStat(stat); - - LOG.info(String.format("CreateHandle for partitionPath %s fileID %s, took %d ms.", stat.getPartitionPath(), - stat.getFileId(), runtimeStats.getTotalCreateTime())); + setupWriteStatus(); + + LOG.info(String.format("CreateHandle for partitionPath %s fileID %s, took %d ms.", + writeStatus.getStat().getPartitionPath(), writeStatus.getStat().getFileId(), + writeStatus.getStat().getRuntimeStats().getTotalCreateTime())); return Collections.singletonList(writeStatus); } catch (IOException e) { throw new HoodieInsertException("Failed to close the Insert Handle for path " + path, e); } } + + /** + * Set up the write status. + * + * @throws IOException if error occurs + */ + protected void setupWriteStatus() throws IOException { +HoodieWriteStat stat = new HoodieWriteStat(); +stat.setPartitionPath(writeStatus.getPartitionPath()); +stat.setNumWrites(recordsWritten); +stat.setNumDeletes(recordsDeleted); +stat.setNumInserts(insertRecordsWritten); +stat.setPrevCommit(HoodieWriteStat.NULL_COMMIT); +stat.setFileId(writeStatus.getFileId()); +stat.setPath(new Path(config.getBasePath()), path); +stat.setTotalWriteBytes(computeTotalWriteBytes()); +stat.setFileSizeInBytes(computeFileSizeInBytes()); +stat.setTotalWriteErrors(writeStatus.getTotalErrorRecords()); +RuntimeStats runtimeStats = new RuntimeStats(); +runtimeStats.setTotalCreateTime(timer.endTimer()); +stat.setRuntimeStats(runtimeStats); +writeStatus.setStat(stat); + } + + protected long computeTotalWriteBytes() throws IOException { +return FSUtils.getFileSize(fs, path); + } + + protected long computeFileSizeInBytes() throws IOException { +return FSUtils.getFileSize(fs, path); + } + } diff --git a/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/FlinkCreateHandle.java b/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/FlinkCreateHandle.java index 6f4638e..2abefa9 100644 --- a/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/FlinkCreateHandle.java +++ b/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/FlinkCreateHandle.java @@ -22,7 +22,6 @@ import org.apache.hudi.client.WriteStatus; import org.apache.hudi.common.engine.TaskContextSupplier; import org.apache.hudi.common.model.HoodieRecord; import org.apache.hudi.common.model.HoodieRecordPayload; -import org.apache.hudi.common.model.HoodieWriteStat; import org.apache.hudi.common.util.HoodieTimer; import org.apache.hudi.common.util.collection.Pair; import org.apache.hudi.config.HoodieWriteConfig; @@ -30,7 +29,6 @@ import org.apache.hudi.exception.HoodieInsertException; import org.apache.hudi.table.HoodieTable;
[GitHub] [hudi] RocMarshal commented on pull request #2745: [HUDI-1737][hudi-client] Code Cleanup: Extract common method in HoodieCreateHandle & FlinkCreateHandle
RocMarshal commented on pull request #2745: URL: https://github.com/apache/hudi/pull/2745#issuecomment-812296838 @danny0405 Thank you so much. Now, ping @yanghua -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2757: [HUDI-1757] Assigns the buckets by record key for Flink writer
codecov-io edited a comment on pull request #2757: URL: https://github.com/apache/hudi/pull/2757#issuecomment-812247500 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=h1) Report > Merging [#2757](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=desc) (2c8fc00) into [master](https://codecov.io/gh/apache/hudi/commit/9804662bc8e17d6936c20326f17ec7c0360dcaf6?el=desc) (9804662) will **decrease** coverage by `42.74%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2757/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=tree) ```diff @@ Coverage Diff @@ ## master #2757 +/- ## - Coverage 52.12% 9.38% -42.75% + Complexity 3646 48 -3598 Files 480 54 -426 Lines 228671993-20874 Branches 2417 236 -2181 - Hits 11920 187-11733 + Misses 99161793 -8123 + Partials 1031 13 -1018 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `9.38% <ø> (-60.36%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | | | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | | | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | | | [...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=) | `0.00% <0.00%>
[GitHub] [hudi] liujinhui1994 edited a comment on pull request #2666: [HUDI-1160] Support update partial fields for CoW table
liujinhui1994 edited a comment on pull request #2666: URL: https://github.com/apache/hudi/pull/2666#issuecomment-812290646 Your suggestion is very good, please let me think about it, and then reply to you @nsivabalan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] liujinhui1994 commented on pull request #2666: [HUDI-1160] Support update partial fields for CoW table
liujinhui1994 commented on pull request #2666: URL: https://github.com/apache/hudi/pull/2666#issuecomment-812290646 Your suggestion is very good, please let me think about it @nsivabalan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1752) Add HoodieFlinkClient InsertOverwrite
[ https://issues.apache.org/jira/browse/HUDI-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xurunbai updated HUDI-1752: --- Status: Closed (was: Patch Available) > Add HoodieFlinkClient InsertOverwrite > - > > Key: HUDI-1752 > URL: https://issues.apache.org/jira/browse/HUDI-1752 > Project: Apache Hudi > Issue Type: New Feature > Components: CLI, Flink Integration >Reporter: xurunbai >Priority: Minor > Labels: features > Fix For: 0.8.0 > > > Add HoodieFlinkClient InsertOverwrite -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1752) Add HoodieFlinkClient InsertOverwrite
[ https://issues.apache.org/jira/browse/HUDI-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xurunbai updated HUDI-1752: --- Status: Patch Available (was: In Progress) > Add HoodieFlinkClient InsertOverwrite > - > > Key: HUDI-1752 > URL: https://issues.apache.org/jira/browse/HUDI-1752 > Project: Apache Hudi > Issue Type: New Feature > Components: CLI, Flink Integration >Reporter: xurunbai >Priority: Minor > Labels: features > Fix For: 0.8.0 > > > Add HoodieFlinkClient InsertOverwrite -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1752) Add HoodieFlinkClient InsertOverwrite
[ https://issues.apache.org/jira/browse/HUDI-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xurunbai updated HUDI-1752: --- Status: In Progress (was: Open) > Add HoodieFlinkClient InsertOverwrite > - > > Key: HUDI-1752 > URL: https://issues.apache.org/jira/browse/HUDI-1752 > Project: Apache Hudi > Issue Type: New Feature > Components: CLI, Flink Integration >Reporter: xurunbai >Priority: Minor > Labels: features > Fix For: 0.8.0 > > > Add HoodieFlinkClient InsertOverwrite -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] li36909 commented on pull request #2753: [HUDI-1750] Fail to load user's class if user move hudi-spark-bundle jar into spark classpath
li36909 commented on pull request #2753: URL: https://github.com/apache/hudi/pull/2753#issuecomment-812274605 cc @nsivabalan could you help to take a look, thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] li36909 commented on pull request #2751: [HUDI-1748] Read operation will possiblity fail on mor table rt view when a write operations is concurrency running
li36909 commented on pull request #2751: URL: https://github.com/apache/hudi/pull/2751#issuecomment-812271790 @nsivabalan I run the test at hudi 0.7. yes, you are right, I start a spark-shell for upserting, and query the same table by spark datasouce api, then the problem arises. The cause of the problem is clear, during the query, hudi get partitions at MergeOnReadSnapshotRelation, and build a new fsview at HoodieRealtimeInputFormatUtils.groupLogsByBaseFile, when a write operation is happening, HoodieRealtimeInputFormatUtils.groupLogsByBaseFile will find some new base files. we can reproduce this issue by add a Thread.sleep(6) at HoodieRealtimeInputFormatUtils.groupLogsByBaseFile, then run this test: step 1: write first batch into a hudi table: spark-shell import org.apache.hudi.QuickstartUtils. import scala.collection.JavaConversions. import org.apache.spark.sql.SaveMode. import org.apache.hudi.DataSourceReadOptions. import org.apache.hudi.DataSourceWriteOptions. import org.apache.hudi.config.HoodieWriteConfig. val tableName = "hudi_mor_table" val basePath = "hdfs://hacluster/tmp/hudi_mor_table" val dataGen = new DataGenerator val inserts = convertToStringList(dataGen.generateInserts(10)) val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2)) df.write.format("org.apache.hudi"). option("hoodie.datasource.write.table.type", "MERGE_ON_READ"). option("hoodie.datasource.write.operation", "bulk_insert"). options(getQuickstartWriteConfigs). option(PRECOMBINE_FIELD_OPT_KEY, "ts"). option(RECORDKEY_FIELD_OPT_KEY, "uuid"). option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath"). option("hoodie.datasource.write.hive_style_partitioning", "false"). option(TABLE_NAME, tableName). option("hoodie.datasource.hive_sync.enable", "true"). option("hoodie.datasource.hive_sync.use_jdbc", "false"). option("hoodie.datasource.hive_sync.table", "hudi_mor_test"). option("hoodie.datasource.hive_sync.partition_extractor_class", "org.apache.hudi.hive.MultiPartKeysValueExtractor"). option("hoodie.datasource.hive_sync.partition_fields", "continent,country,city"). option("hoodie.datasource.hive_sync.assume_date_partitioning", "false"). option("hoodie.insert.shuffle.parallelism", "2"). option("hoodie.upsert.shuffle.parallelism","2"). option("hoodie.bulkinsert.shuffle.parallelism", "2"). option("hoodie.delete.shuffle.parallelism","2"). mode(Append). save(basePath); step 2: run a query at new spark-shell (when the query hang at Thread.sleep, start to write a new batch at step3) spark.read.format("hudi").load("hdfs://hacluster/tmp/hudi_mor_table/*/*/*/*").count setp 3: go to the spark-shell at step1, write a new batch: df.write.format("org.apache.hudi"). option("hoodie.datasource.write.table.type", "MERGE_ON_READ"). option("hoodie.datasource.write.operation", "bulk_insert"). options(getQuickstartWriteConfigs). option(PRECOMBINE_FIELD_OPT_KEY, "ts"). option(RECORDKEY_FIELD_OPT_KEY, "uuid"). option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath"). option("hoodie.datasource.write.hive_style_partitioning", "false"). option(TABLE_NAME, tableName). option("hoodie.datasource.hive_sync.enable", "true"). option("hoodie.datasource.hive_sync.use_jdbc", "false"). option("hoodie.datasource.hive_sync.table", "hudi_mor_test"). option("hoodie.datasource.hive_sync.partition_extractor_class", "org.apache.hudi.hive.MultiPartKeysValueExtractor"). option("hoodie.datasource.hive_sync.partition_fields", "continent,country,city"). option("hoodie.datasource.hive_sync.assume_date_partitioning", "false"). option("hoodie.insert.shuffle.parallelism", "2"). option("hoodie.upsert.shuffle.parallelism","2"). option("hoodie.bulkinsert.shuffle.parallelism", "2"). option("hoodie.delete.shuffle.parallelism","2"). mode(Append). save(basePath); we can see the step2 will throw a exception -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] root18039532923 commented on issue #2623: org.apache.hudi.exception.HoodieDependentSystemUnavailableException:System HBASE unavailable.
root18039532923 commented on issue #2623: URL: https://github.com/apache/hudi/issues/2623#issuecomment-812271600 how to operate this? @nsivabalan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bvaradar commented on issue #2588: [SUPPORT] Cannot create hive connection
bvaradar commented on issue #2588: URL: https://github.com/apache/hudi/issues/2588#issuecomment-812256475 We use hive-2.3.4 and have more than 100 jobs running -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bvaradar closed issue #2644: Hudi cow table incremental data error
bvaradar closed issue #2644: URL: https://github.com/apache/hudi/issues/2644 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io commented on pull request #2757: [HUDI-1757] Assigns the buckets by record key for Flink writer
codecov-io commented on pull request #2757: URL: https://github.com/apache/hudi/pull/2757#issuecomment-812247500 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=h1) Report > Merging [#2757](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=desc) (b9e79e1) into [master](https://codecov.io/gh/apache/hudi/commit/9804662bc8e17d6936c20326f17ec7c0360dcaf6?el=desc) (9804662) will **increase** coverage by `0.24%`. > The diff coverage is `96.77%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2757/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2757 +/- ## + Coverage 52.12% 52.37% +0.24% - Complexity 3646 3676 +30 Files 480 480 Lines 2286723010 +143 Branches 2417 2455 +38 + Hits 1192012051 +131 - Misses 9916 9921 +5 - Partials 1031 1038 +7 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `37.01% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | | | hudicommon | `50.87% <ø> (+0.01%)` | `0.00 <ø> (ø)` | | | hudiflink | `56.96% <96.77%> (+0.25%)` | `0.00 <5.00> (ø)` | | | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisparkdatasource | `70.87% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisync | `45.47% <ø> (ø)` | `0.00 <ø> (ø)` | | | huditimelineservice | `64.36% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiutilities | `70.81% <ø> (+1.08%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...in/java/org/apache/hudi/table/HoodieTableSink.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZVNpbmsuamF2YQ==) | `12.19% <0.00%> (ø)` | `2.00 <0.00> (ø)` | | | [...va/org/apache/hudi/configuration/FlinkOptions.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9jb25maWd1cmF0aW9uL0ZsaW5rT3B0aW9ucy5qYXZh) | `89.07% <100.00%> (+0.18%)` | `11.00 <0.00> (ø)` | | | [...he/hudi/sink/partitioner/BucketAssignFunction.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL0J1Y2tldEFzc2lnbkZ1bmN0aW9uLmphdmE=) | `87.00% <100.00%> (+0.40%)` | `24.00 <1.00> (ø)` | | | [...g/apache/hudi/sink/partitioner/BucketAssigner.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL0J1Y2tldEFzc2lnbmVyLmphdmE=) | `88.52% <100.00%> (+0.59%)` | `23.00 <2.00> (+2.00)` | | | [.../apache/hudi/sink/partitioner/BucketAssigners.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL0J1Y2tldEFzc2lnbmVycy5qYXZh) | `50.00% <100.00%> (ø)` | `2.00 <0.00> (ø)` | | | [...di/sink/partitioner/delta/DeltaBucketAssigner.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL2RlbHRhL0RlbHRhQnVja2V0QXNzaWduZXIuamF2YQ==) | `69.23% <100.00%> (+0.80%)` | `10.00 <2.00> (+1.00)` | | | [...c/main/java/org/apache/hudi/util/StreamerUtil.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS91dGlsL1N0cmVhbWVyVXRpbC5qYXZh) | `54.62% <100.00%> (+2.40%)` | `18.00 <0.00> (ø)` | | | [...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==) | `79.68% <0.00%> (+1.56%)` | `26.00% <0.00%> (ø%)` | | | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `76.45% <0.00%> (+5.07%)` | `82.00% <0.00%> (+27.00%)` | | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to
[GitHub] [hudi] danny0405 closed pull request #2757: [HUDI-1757] Assigns the buckets by record key for Flink writer
danny0405 closed pull request #2757: URL: https://github.com/apache/hudi/pull/2757 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bvaradar commented on issue #2633: Empty File Slice causing application to fail in small files optimization code
bvaradar commented on issue #2633: URL: https://github.com/apache/hudi/issues/2633#issuecomment-812212286 @umehrot2 : Looking more closely on the code change to fix this. We need to do more changes. the base file check needs to change in https://github.com/apache/hudi/blob/e599764c2dcfbbc15d6554fa0df55b7375e4a31d/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L96 We need to be careful otherwise in determining the current commit Time. I think as fix, it is safer to disregard files in pending compaction when selecting small files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bvaradar edited a comment on issue #2633: Empty File Slice causing application to fail in small files optimization code
bvaradar edited a comment on issue #2633: URL: https://github.com/apache/hudi/issues/2633#issuecomment-810709098 @umehrot2 @n3nash @nsivabalan : My apologies. Sorry for the delay, I finally got chance to look into this . Yes, this will only manifest for case when index can support log files. I believe this is the problem: We are using wrong API of FileSystemView here https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85 We don't include file groups that are in pending compaction but with Hbase Index we are including them. With the current state of code, Including files in pending compaction is an issue. This API "getLatestFileSlicesBeforeOrOn" is originally intended to be used by CompactionAdminClient to figure out log files that were added after pending compaction and rename them such that we can undo the effects of compaction scheduling. There is a different API "getLatestMergedFileSlicesBeforeOrOn" which gives a consolidated view of the latest file slice and includes all data both before and after compaction. This is what should be used in https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85 The other workaround would be excluding file slices in pending compaction when we select small files to avoid the interaction between compactor and ingestion in this case. But, I think we can go with the first option -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] satishkotha commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving
satishkotha commented on issue #2707: URL: https://github.com/apache/hudi/issues/2707#issuecomment-812096323 @ssdong thanks for detailed update. Yes, this one seems little tricky. We'll need some more time to investigate. In the meantime, as a workaround, i think you can disable embedded timeline server by setting option 'hoodie.embed.timeline.server' to false. I can look into why reset needs to be called during close and if we can refresh meta client for getting timeline. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [HUDI-1591] Implement Spark's FileIndex for Hudi to support queries via Hudi DataSource using non-globbed table path and partition pruning (#2651)
This is an automated email from the ASF dual-hosted git repository. uditme pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 684622c [HUDI-1591] Implement Spark's FileIndex for Hudi to support queries via Hudi DataSource using non-globbed table path and partition pruning (#2651) 684622c is described below commit 684622c7c9fa6df9eb177b51cb1e7bd6dd16f78d Author: pengzhiwei AuthorDate: Fri Apr 2 02:12:28 2021 +0800 [HUDI-1591] Implement Spark's FileIndex for Hudi to support queries via Hudi DataSource using non-globbed table path and partition pruning (#2651) --- .../apache/hudi/keygen/CustomAvroKeyGenerator.java | 6 +- .../org/apache/hudi/keygen/CustomKeyGenerator.java | 2 +- .../datasources/SparkParsePartitionUtil.scala | 34 ++ .../java/org/apache/hudi/common/fs/FSUtils.java| 10 + .../hudi/common/table/HoodieTableConfig.java | 10 + .../hudi/common/table/HoodieTableMetaClient.java | 13 + .../scala/org/apache/hudi/DataSourceOptions.scala | 3 + .../main/scala/org/apache/hudi/DefaultSource.scala | 103 +++--- .../org/apache/hudi/HoodieBootstrapRelation.scala | 18 +- .../scala/org/apache/hudi/HoodieFileIndex.scala| 362 + .../org/apache/hudi/HoodieSparkSqlWriter.scala | 8 +- .../scala/org/apache/hudi/HoodieSparkUtils.scala | 12 +- .../scala/org/apache/hudi/HoodieWriterUtils.scala | 31 +- .../hudi/MergeOnReadIncrementalRelation.scala | 3 +- .../apache/hudi/MergeOnReadSnapshotRelation.scala | 70 ++-- .../org/apache/hudi/TestHoodieFileIndex.scala | 252 ++ .../apache/hudi/functional/TestCOWDataSource.scala | 52 ++- .../functional/TestDataSourceForBootstrap.scala| 39 ++- .../apache/hudi/functional/TestMORDataSource.scala | 52 +++ .../datasources/Spark2ParsePartitionUtil.scala | 33 ++ .../datasources/Spark3ParsePartitionUtil.scala | 39 +++ .../hudi/utilities/deltastreamer/DeltaSync.java| 6 + 22 files changed, 1075 insertions(+), 83 deletions(-) diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/CustomAvroKeyGenerator.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/CustomAvroKeyGenerator.java index 724cabd..3b927c9 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/CustomAvroKeyGenerator.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/CustomAvroKeyGenerator.java @@ -44,7 +44,7 @@ import java.util.stream.Collectors; public class CustomAvroKeyGenerator extends BaseKeyGenerator { private static final String DEFAULT_PARTITION_PATH_SEPARATOR = "/"; - private static final String SPLIT_REGEX = ":"; + public static final String SPLIT_REGEX = ":"; /** * Used as a part of config in CustomKeyGenerator.java. @@ -117,8 +117,4 @@ public class CustomAvroKeyGenerator extends BaseKeyGenerator { public String getDefaultPartitionPathSeparator() { return DEFAULT_PARTITION_PATH_SEPARATOR; } - - public String getSplitRegex() { -return SPLIT_REGEX; - } } diff --git a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/CustomKeyGenerator.java b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/CustomKeyGenerator.java index 77896d2..a2a3012 100644 --- a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/CustomKeyGenerator.java +++ b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/CustomKeyGenerator.java @@ -90,7 +90,7 @@ public class CustomKeyGenerator extends BuiltinKeyGenerator { return ""; } for (String field : getPartitionPathFields()) { - String[] fieldWithType = field.split(customAvroKeyGenerator.getSplitRegex()); + String[] fieldWithType = field.split(customAvroKeyGenerator.SPLIT_REGEX); if (fieldWithType.length != 2) { throw new HoodieKeyGeneratorException("Unable to find field names for partition path in proper format"); } diff --git a/hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/execution/datasources/SparkParsePartitionUtil.scala b/hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/execution/datasources/SparkParsePartitionUtil.scala new file mode 100644 index 000..fc2275b --- /dev/null +++ b/hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/execution/datasources/SparkParsePartitionUtil.scala @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at +
[GitHub] [hudi] umehrot2 merged pull request #2651: [HUDI-1591] [RFC-26] Improve Hoodie Table Query Performance And Ease Of Use Fo…
umehrot2 merged pull request #2651: URL: https://github.com/apache/hudi/pull/2651 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan edited a comment on pull request #2751: [HUDI-1748] Read operation will possiblity fail on mor table rt view when a write operations is concurrency running
nsivabalan edited a comment on pull request #2751: URL: https://github.com/apache/hudi/pull/2751#issuecomment-812037473 @li36909 : can you clarify what do you mean by concurrent operations. Ignoring any table services, you mean to say that just a batch of write and rt snapshot view happening concurrently is running into issues? which hudi version are you running. @n3nash : you may want to check it out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #2751: [HUDI-1748] Read operation will possiblity fail on mor table rt view when a write operations is concurrency running
nsivabalan commented on pull request #2751: URL: https://github.com/apache/hudi/pull/2751#issuecomment-812037473 @li36909 : can you clarify what do you mean by concurrent operations. Is it cleaning/archiving along with a batch of write or do you say two batches of write going concurrently ? @n3nash : you may want to check it out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan edited a comment on pull request #2334: [HUDI-1453] Throw Exception when input data schema is not equal to th…
nsivabalan edited a comment on pull request #2334: URL: https://github.com/apache/hudi/pull/2334#issuecomment-812021796 oh, didn't realize default schema compatibility check is false. I assume if I enable schema compatibility check, double to int evolution is likely to fail. So, not sure what this PR tries to achieve. At least from my local test run, integer to double worked for me. https://gist.github.com/nsivabalan/91f12109e0fe1ca9749ff5290c946778 this is COW btw. haven't tested w/ MOR. @pengzhiwei2018 : can you please clarify. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #2334: [HUDI-1453] Throw Exception when input data schema is not equal to th…
nsivabalan commented on pull request #2334: URL: https://github.com/apache/hudi/pull/2334#issuecomment-812021796 oh, didn't realize default schema compatibility check is false. I assume if I enable schema compatibility check, double to int evolution is likely to fail. So, not sure what this PR tries to achieve. At least from my local test run, integer to double worked for me. https://gist.github.com/nsivabalan/91f12109e0fe1ca9749ff5290c946778 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #2334: [HUDI-1453] Throw Exception when input data schema is not equal to th…
nsivabalan commented on pull request #2334: URL: https://github.com/apache/hudi/pull/2334#issuecomment-812019464 just to clarify. some schema evolution just for types are working fine w/ hoodie. for eg: integer to double works fine. Problem is that, double to int is where the issue is. I am not sure why schema compatibility check does not fail this evolution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #2106: [HUDI-1284] preCombine all HoodieRecords and update all fields according to orderingVal
nsivabalan commented on pull request #2106: URL: https://github.com/apache/hudi/pull/2106#issuecomment-811939606 @Karl-WangSK : we have another PR being reviewed right now for partial updates support. https://github.com/apache/hudi/pull/2666 Please check it out. We don't need to parse schema for every record. You can get some inspiration from the linked PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1757) Assigns the buckets by record key for Flink writer
[ https://issues.apache.org/jira/browse/HUDI-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1757: - Labels: pull-request-available (was: ) > Assigns the buckets by record key for Flink writer > -- > > Key: HUDI-1757 > URL: https://issues.apache.org/jira/browse/HUDI-1757 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > > Currently we assign the buckets by record partition path which could cause > hotspot if the partition field is datetime type. Changes to assign buckets by > grouping the record whth their key first, the assignment is valid if only > there is no conflict(two task write to the same bucket). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1757) Assigns the buckets by record key for Flink writer
[ https://issues.apache.org/jira/browse/HUDI-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-1757: - Description: Currently we assign the buckets by record partition path which could cause hotspot if the partition field is datetime type. Changes to assign buckets by grouping the record whth their key first, the assignment is valid if only there is no conflict(two task write to the same bucket). > Assigns the buckets by record key for Flink writer > -- > > Key: HUDI-1757 > URL: https://issues.apache.org/jira/browse/HUDI-1757 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > > Currently we assign the buckets by record partition path which could cause > hotspot if the partition field is datetime type. Changes to assign buckets by > grouping the record whth their key first, the assignment is valid if only > there is no conflict(two task write to the same bucket). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] danny0405 opened a new pull request #2757: [HUDI-1757] Assigns the buckets by record key for Flink writer
danny0405 opened a new pull request #2757: URL: https://github.com/apache/hudi/pull/2757 Currently we assign the buckets by record partition path which could cause hotspot if the partition field is datetime type. Changes to assign buckets by grouping the record whth their key first, the assignment is valid if only there is no conflict(two task write to the same bucket). ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-1757) Assigns the buckets by record key for Flink writer
Danny Chen created HUDI-1757: Summary: Assigns the buckets by record key for Flink writer Key: HUDI-1757 URL: https://issues.apache.org/jira/browse/HUDI-1757 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: Danny Chen Assignee: Danny Chen -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1755) Assigns the buckets by record key for Flink writer
Danny Chen created HUDI-1755: Summary: Assigns the buckets by record key for Flink writer Key: HUDI-1755 URL: https://issues.apache.org/jira/browse/HUDI-1755 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: Danny Chen Assignee: Danny Chen Fix For: 0.9.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1756) Assigns the buckets by record key for Flink writer
Danny Chen created HUDI-1756: Summary: Assigns the buckets by record key for Flink writer Key: HUDI-1756 URL: https://issues.apache.org/jira/browse/HUDI-1756 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: Danny Chen Assignee: Danny Chen Fix For: 0.9.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1754) Assigns the buckets by record key for Flink writer
Danny Chen created HUDI-1754: Summary: Assigns the buckets by record key for Flink writer Key: HUDI-1754 URL: https://issues.apache.org/jira/browse/HUDI-1754 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: Danny Chen Assignee: Danny Chen Fix For: 0.9.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1753) Assigns the buckets by record key for Flink writer
Danny Chen created HUDI-1753: Summary: Assigns the buckets by record key for Flink writer Key: HUDI-1753 URL: https://issues.apache.org/jira/browse/HUDI-1753 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: Danny Chen Fix For: 0.9.0 Currently we assign the buckets by record partition path, which could cause hotspot if the partition field is datetime type. Actually we can decide the buckets by grouping records with their record keys first, the assign is valid only if there is no conflict (two task write to same buckets). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] nsivabalan commented on pull request #2091: HUDI-1283 Fill missing columns with default value when spark dataframe save to hudi table
nsivabalan commented on pull request #2091: URL: https://github.com/apache/hudi/pull/2091#issuecomment-811931337 @ivorzhou : if your requirement is to fetch value from previous version of the record and inject into incoming record, we already have another PR open https://github.com/apache/hudi/pull/2666. Feel to clarify and close this PR out if the requirement is same. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #2666: [HUDI-1160] Support update partial fields for CoW table
nsivabalan commented on a change in pull request #2666: URL: https://github.com/apache/hudi/pull/2666#discussion_r605601097 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java ## @@ -79,6 +80,7 @@ public static final String TIMELINE_LAYOUT_VERSION = "hoodie.timeline.layout.version"; public static final String BASE_PATH_PROP = "hoodie.base.path"; public static final String AVRO_SCHEMA = "hoodie.avro.schema"; + public static final String LAST_AVRO_SCHEMA = "hoodie.last.avro.schema"; Review comment: may be we can call this as "latest table schema". ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java ## @@ -106,7 +110,7 @@ public HoodieMergeHandle(HoodieWriteConfig config, String instantTime, HoodieTable hoodieTable, Iterator> recordItr, String partitionPath, String fileId, TaskContextSupplier taskContextSupplier) { -super(config, instantTime, partitionPath, fileId, hoodieTable, taskContextSupplier); +super(config, instantTime, partitionPath, fileId, hoodieTable, getWriterSchemaIncludingAndExcludingMetadataPair(config, hoodieTable), taskContextSupplier); Review comment: Can you help me understand something. Here we have HoodieWriteConfig at two places. As a separate entity (1st arg in constructor) and HoodieTable.getConfig(). I see within getWriterSchemaIncludingAndExcludingMetadataPair(...), we update lastSchema in hoodieWriteConfig, which will update the 1st arg. But table.getConfig() may not be updated right. If above statement is right, how do we rely on table.getConfig.getLastSchema() in *MergeHelper classes. May be I am missing something. can you throw some light. ## File path: hudi-common/src/main/java/org/apache/hudi/common/util/CommitUtils.java ## @@ -59,14 +61,24 @@ public static HoodieCommitMetadata buildMetadata(List writeStat Option> extraMetadata, WriteOperationType operationType, String schemaToStoreInCommit, - String commitActionType) { + String commitActionType, + Boolean updatePartialFields, + HoodieTableMetaClient metaClient) { HoodieCommitMetadata commitMetadata = buildMetadataFromStats(writeStats, partitionToReplaceFileIds, commitActionType, operationType); // add in extra metadata if (extraMetadata.isPresent()) { extraMetadata.get().forEach(commitMetadata::addMetadata); } +if (updatePartialFields) { + try { Review comment: if updatePartialFields is set, can't we rely on config.getLastSchema() in all places where this method is called similar to how we pass in config.getSchema(). Is it not guaranteed that config.lastSchema() will be set by the time we reach here if updatePartialFields is set to true? Trying to see if we can avoid parsing the schema from table once again since we have already done it once. ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/PartialUpdatePayload.java ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.model; + +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.IndexedRecord; +import org.apache.hudi.common.util.Option; + +import java.io.IOException; +import java.util.List; + +public class PartialUpdatePayload extends OverwriteWithLatestAvroPayload { Review comment: java docs w/ example would be great. ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java ## @@ -123,6 +127,22 @@ public HoodieMergeHandle(HoodieWriteConfig config, String instantTime, HoodieTab init(fileId, this.partitionPath,
[GitHub] [hudi] liujinhui1994 commented on pull request #2666: [HUDI-1160] Support update partial fields for CoW table
liujinhui1994 commented on pull request #2666: URL: https://github.com/apache/hudi/pull/2666#issuecomment-811861142 > @liujinhui1994 : Can you please update the description with an example. 1、old data id nameage 1 liujinhui 26 2 sam 25 3 madeline30 2、new data {"id": 1,"name": "abc"} 3、 final result 1 abc 26 2 sam 25 3 madeline30 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] liujinhui1994 removed a comment on pull request #2666: [HUDI-1160] Support update partial fields for CoW table
liujinhui1994 removed a comment on pull request #2666: URL: https://github.com/apache/hudi/pull/2666#issuecomment-811860789 > @liujinhui1994 : Can you please update the description with an example. 一、old data id nameage 1 liujinhui 26 2 sam 25 3 madeline30 二、new data {"id": 1,"name": "abc"} 三、 final result 1 abc 26 2 sam 25 3 madeline30 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] liujinhui1994 commented on pull request #2666: [HUDI-1160] Support update partial fields for CoW table
liujinhui1994 commented on pull request #2666: URL: https://github.com/apache/hudi/pull/2666#issuecomment-811860789 > @liujinhui1994 : Can you please update the description with an example. 一、old data id nameage 1 liujinhui 26 2 sam 25 3 madeline30 二、new data {"id": 1,"name": "abc"} 三、 final result 1 abc 26 2 sam 25 3 madeline30 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] liujinhui1994 commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp
liujinhui1994 commented on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-811856655 > Myself and Nishith discussed on this. Here is our proposal. > Let's rely on Deltastreamer.Config.checkpoint to pass in any type of checkpoint. > We can add another config called "checkpoint.type" which could default to string for all default checkpoints. For checkpoint of interest of this PR, we could set the value for this new config to "timestamp". > > With this, its upto each source to parse and interpret the checkpoint value and DeltaSync does not need to deal w/ diff checkpointing formats. > > Having said this, DeltaSync readFromSource() should not have any changes in this diff. > KafkaOffsetGen should have logic to parse diff checkpoint values, based on two values(deltastreamer.config.checkpoint and checkpoint.type). > > With this, we also moved source specific checkpointing logic within source specific class and did not leak it to DeltaSync which should be agnostic to different Source. > > @liujinhui1994 : Let me know what do you think. Happy to chat more on this. Great, I will modify this PR based on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #2666: [HUDI-1160] Support update partial fields for CoW table
nsivabalan commented on pull request #2666: URL: https://github.com/apache/hudi/pull/2666#issuecomment-811856279 @liujinhui1994 : Can you please update the description with an example. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp
nsivabalan commented on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-81184 Myself and Nishith discussed on this. Here is our proposal. Let's rely on Deltastreamer.Config.checkpoint to pass in any type of checkpoint. We can add another config called "checkpoint.type" which could default to string for all default checkpoints. For checkpoint of interest of this PR, we could set the value for this new config to "timestamp". With this, its upto each source to parse and interpret the checkpoint value and DeltaSync does not need to deal w/ diff checkpointing formats. Having said this, DeltaSync readFromSource() should not have any changes in this diff. KafkaOffsetGen should have logic to parse diff checkpoint values, based on two values(deltastreamer.config.checkpoint and checkpoint.type). With this, we also moved source specific checkpointing logic within source specific class and did not leak it to DeltaSync which should be agnostic to different Source. @liujinhui1994 : Let me know what do you think. Happy to chat more on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xurunbai commented on pull request #2755: Add HoodieFlinkClient InsertOverwrite
xurunbai commented on pull request #2755: URL: https://github.com/apache/hudi/pull/2755#issuecomment-811770811 > @xurunbai Thanks for your contribution. Can you file a jira ticket for this feature? OK -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xurunbai closed pull request #2755: Add HoodieFlinkClient InsertOverwrite
xurunbai closed pull request #2755: URL: https://github.com/apache/hudi/pull/2755 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-1752) Add HoodieFlinkClient InsertOverwrite
xurunbai created HUDI-1752: -- Summary: Add HoodieFlinkClient InsertOverwrite Key: HUDI-1752 URL: https://issues.apache.org/jira/browse/HUDI-1752 Project: Apache Hudi Issue Type: New Feature Components: CLI, Flink Integration Reporter: xurunbai Fix For: 0.8.0 Add HoodieFlinkClient InsertOverwrite -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] aditiwari01 opened a new issue #2756: OrderingVal not being honoured for payloads in log files (for MOR table)
aditiwari01 opened a new issue #2756: URL: https://github.com/apache/hudi/issues/2756 While creating HoodieRecordPayloads from log files in case of MOR tables, the payloads are created without any orderingVal (even if specified while writing data). Due to this the precombine function could result in any payload irrespective of its orderingVal. Attaching a sample script to reproduce the issue. In this example, for key "key1", 1st insert is with ts=1000. Then we update with ts=2000. Thenn we updated with ts=500. Ideally after last update if we snnapshot query the table, we must get key1 with ts=2000 (since our ordering field is ts). However it shows entry of ts=1000 because from logs it ignores ts=2000 and only picks up ts=500. Also AFAIU, the same flow will be used while compaction and then we might lose data forever. [Hudi_sample_commands.txt](https://github.com/apache/hudi/files/6242206/Hudi_sample_commands.txt) A workaround is to implement a HoodiePayloadClass and hardcode ordering field in the costructor only. But we can not keep it parameterised since properties are not available in the constructor. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yanghua commented on pull request #2755: Add HoodieFlinkClient InsertOverwrite
yanghua commented on pull request #2755: URL: https://github.com/apache/hudi/pull/2755#issuecomment-811710440 @xurunbai Thanks for your contribution. Can you file a jira ticket for this feature? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-1738) Emit deletes for Flink MOR table streaming read
[ https://issues.apache.org/jira/browse/HUDI-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-1738. -- Resolution: Implemented 9804662bc8e17d6936c20326f17ec7c0360dcaf6 > Emit deletes for Flink MOR table streaming read > --- > > Key: HUDI-1738 > URL: https://issues.apache.org/jira/browse/HUDI-1738 > Project: Apache Hudi > Issue Type: New Feature > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Current we did a soft delete for DELETE row data when writes into hoodie > table. For streaming read of MOR table, the Flink reader detects the delete > records and still emit them if the record key semantics are still kept. > This is useful and actually a must for streaming ETL pipeline incremental > computation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[hudi] branch master updated: [HUDI-1738] Emit deletes for flink MOR table streaming read (#2742)
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 9804662 [HUDI-1738] Emit deletes for flink MOR table streaming read (#2742) 9804662 is described below commit 9804662bc8e17d6936c20326f17ec7c0360dcaf6 Author: Danny Chan AuthorDate: Thu Apr 1 15:25:31 2021 +0800 [HUDI-1738] Emit deletes for flink MOR table streaming read (#2742) Current we did a soft delete for DELETE row data when writes into hoodie table. For streaming read of MOR table, the Flink reader detects the delete records and still emit them if the record key semantics are still kept. This is useful and actually a must for streaming ETL pipeline incremental computation. --- .../java/org/apache/hudi/keygen/KeyGenUtils.java | 27 + .../apache/hudi/client/HoodieFlinkWriteClient.java | 4 +- .../table/timeline/HoodieActiveTimeline.java | 23 +++- .../org/apache/hudi/sink/StreamWriteFunction.java | 14 +-- .../hudi/sink/StreamWriteOperatorCoordinator.java | 73 +++-- .../apache/hudi/sink/compact/CompactFunction.java | 14 +-- .../hudi/sink/compact/CompactionCommitSink.java| 14 +-- .../hudi/sink/compact/CompactionPlanOperator.java | 45 +--- .../hudi/sink/event/BatchWriteSuccessEvent.java| 5 +- .../org/apache/hudi/table/HoodieTableSource.java | 27 +++-- .../table/format/mor/MergeOnReadInputFormat.java | 119 +++-- .../table/format/mor/MergeOnReadTableState.java| 33 +- .../apache/hudi/util/StringToRowDataConverter.java | 107 ++ .../sink/TestStreamWriteOperatorCoordinator.java | 6 +- .../apache/hudi/source/TestStreamReadOperator.java | 18 ++-- .../apache/hudi/table/HoodieDataSourceITCase.java | 35 ++ .../apache/hudi/table/format/TestInputFormat.java | 25 + .../test/java/org/apache/hudi/utils/TestData.java | 8 +- .../hudi/utils/TestStringToRowDataConverter.java | 107 ++ .../utils/factory/CollectSinkTableFactory.java | 11 +- 20 files changed, 557 insertions(+), 158 deletions(-) diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java index 1f59bab..4773ef1 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java @@ -31,6 +31,7 @@ import java.io.IOException; import java.io.UnsupportedEncodingException; import java.net.URLEncoder; import java.nio.charset.StandardCharsets; +import java.util.Arrays; import java.util.List; public class KeyGenUtils { @@ -41,6 +42,32 @@ public class KeyGenUtils { protected static final String DEFAULT_PARTITION_PATH = "default"; protected static final String DEFAULT_PARTITION_PATH_SEPARATOR = "/"; + /** + * Extracts the record key fields in strings out of the given record key, + * this is the reverse operation of {@link #getRecordKey(GenericRecord, String)}. + * + * @see SimpleAvroKeyGenerator + * @see org.apache.hudi.keygen.ComplexAvroKeyGenerator + */ + public static String[] extractRecordKeys(String recordKey) { +String[] fieldKV = recordKey.split(","); +if (fieldKV.length == 1) { + return fieldKV; +} else { + // a complex key + return Arrays.stream(fieldKV).map(kv -> { +final String[] kvArray = kv.split(":"); +if (kvArray[1].equals(NULL_RECORDKEY_PLACEHOLDER)) { + return null; +} else if (kvArray[1].equals(EMPTY_RECORDKEY_PLACEHOLDER)) { + return ""; +} else { + return kvArray[1]; +} + }).toArray(String[]::new); +} + } + public static String getRecordKey(GenericRecord record, List recordKeyFields) { boolean keyIsNullEmpty = true; StringBuilder recordKey = new StringBuilder(); diff --git a/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java b/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java index de4d8ca..6a6bced 100644 --- a/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java +++ b/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java @@ -429,8 +429,8 @@ public class HoodieFlinkWriteClient extends HoodieFlinkTable table = getHoodieTable(); String commitType = CommitUtils.getCommitActionType(HoodieTableType.valueOf(tableType)); HoodieActiveTimeline activeTimeline = table.getMetaClient().getActiveTimeline(); -activeTimeline.deletePending(HoodieInstant.State.INFLIGHT, commitType, instant); -
[GitHub] [hudi] yanghua merged pull request #2742: [HUDI-1738] Emit deletes for flink MOR table streaming read
yanghua merged pull request #2742: URL: https://github.com/apache/hudi/pull/2742 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE
xiarixiaoyao commented on pull request #2722: URL: https://github.com/apache/hudi/pull/2722#issuecomment-811705289 thanks @garyli1019 . ok, i will try to add unit test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yanghua commented on pull request #2747: [HUDI-1743] Added support for SqlFileBasedTransformer
yanghua commented on pull request #2747: URL: https://github.com/apache/hudi/pull/2747#issuecomment-811679455 > @yanghua - I've fixed the build, can you please merge this code? Thanks, IMO, can you add a unit test for the feature. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xurunbai commented on pull request #2755: Add HoodieFlinkClient InsertOverwrite
xurunbai commented on pull request #2755: URL: https://github.com/apache/hudi/pull/2755#issuecomment-811678698 Add HoodieFlinkClient InsertOverwrite -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xurunbai opened a new pull request #2755: Add HoodieFlinkClient InsertOverwrite
xurunbai opened a new pull request #2755: URL: https://github.com/apache/hudi/pull/2755 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org