[GitHub] [hudi] n3nash commented on issue #2637: [SUPPORT] - Partial Update : update few columns of a table

2021-04-01 Thread GitBox


n3nash commented on issue #2637:
URL: https://github.com/apache/hudi/issues/2637#issuecomment-812335639


   @Sugamber If your issue is addressed, please close this issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash closed issue #2637: [SUPPORT] - Partial Update : update few columns of a table

2021-04-01 Thread GitBox


n3nash closed issue #2637:
URL: https://github.com/apache/hudi/issues/2637


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on issue #2623: org.apache.hudi.exception.HoodieDependentSystemUnavailableException:System HBASE unavailable.

2021-04-01 Thread GitBox


n3nash commented on issue #2623:
URL: https://github.com/apache/hudi/issues/2623#issuecomment-812334675


   @root18039532923 Not sure if you're asking how to package this but leaving a 
comment here if that's what you are referring to.
   
   1. Do a git checkout of the 0.7.0 tag. `git checkout tags/version 0.7.0`
   2. Patch the change in the 
https://github.com/apache/hudi/commit/657e73f9b157a1ed4d6e6e515ae8f63156061b88 
to your checked out tag.
   3. Re-build the packages using command `mvn clean package -DskipTests` 
   
   Now use these new packages and test out your deployment.   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on pull request #2751: [HUDI-1748] Read operation will possiblity fail on mor table rt view when a write operations is concurrency running

2021-04-01 Thread GitBox


n3nash commented on pull request #2751:
URL: https://github.com/apache/hudi/pull/2751#issuecomment-812329664


   @li36909 If I understand this correctly, you are saying reading a MOR table 
`_rt` view can fail if a concurrent write is happening ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2758: [HUDI-1696] add apache commons-codec dependency to flink-bundle explicitly

2021-04-01 Thread GitBox


codecov-io edited a comment on pull request #2758:
URL: https://github.com/apache/hudi/pull/2758#issuecomment-812325694


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2758?src=pr=h1) Report
   > Merging 
[#2758](https://codecov.io/gh/apache/hudi/pull/2758?src=pr=desc) (ea0d401) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/fe16d0de7c76105775c887b700751241bc82624c?el=desc)
 (fe16d0d) will **increase** coverage by `0.25%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2758/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2758?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2758  +/-   ##
   
   + Coverage 52.04%   52.30%   +0.25% 
   - Complexity 3625 3688  +63 
   
 Files   479  483   +4 
 Lines 2280423095 +291 
 Branches   2415 2460  +45 
   
   + Hits  1186812079 +211 
   - Misses 9911 9946  +35 
   - Partials   1025 1070  +45 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.01% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.78% <ø> (-0.14%)` | `0.00 <ø> (ø)` | |
   | hudiflink | `56.71% <ø> (+0.69%)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `71.33% <ø> (+0.45%)` | `0.00 <ø> (ø)` | |
   | hudisync | `45.47% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `69.69% <ø> (-0.09%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2758?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...ache/hudi/sink/compact/CompactionPlanOperator.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvblBsYW5PcGVyYXRvci5qYXZh)
 | `58.00% <0.00%> (-21.07%)` | `8.00% <0.00%> (-1.00%)` | |
   | 
[.../main/scala/org/apache/hudi/HoodieSparkUtils.scala](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVNwYXJrVXRpbHMuc2NhbGE=)
 | `83.33% <0.00%> (-5.56%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...src/main/scala/org/apache/hudi/DefaultSource.scala](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0RlZmF1bHRTb3VyY2Uuc2NhbGE=)
 | `78.78% <0.00%> (-5.36%)` | `31.00% <0.00%> (+14.00%)` | :arrow_down: |
   | 
[...main/scala/org/apache/hudi/HoodieWriterUtils.scala](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVdyaXRlclV0aWxzLnNjYWxh)
 | `83.33% <0.00%> (-5.24%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...di/common/table/timeline/HoodieActiveTimeline.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUFjdGl2ZVRpbWVsaW5lLmphdmE=)
 | `66.81% <0.00%> (-3.97%)` | `43.00% <0.00%> (ø%)` | |
   | 
[...apache/hudi/sink/compact/CompactionCommitSink.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvbkNvbW1pdFNpbmsuamF2YQ==)
 | `66.66% <0.00%> (-3.55%)` | `10.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/common/table/HoodieTableConfig.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlQ29uZmlnLmphdmE=)
 | `43.20% <0.00%> (-2.25%)` | `17.00% <0.00%> (ø%)` | |
   | 
[...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh)
 | `66.66% <0.00%> (-1.65%)` | `43.00% <0.00%> (ø%)` | |
   | 
[...c/main/java/org/apache/hudi/common/fs/FSUtils.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0ZTVXRpbHMuamF2YQ==)
 | `47.34% <0.00%> (-0.94%)` | `57.00% <0.00%> (ø%)` | |
   | 

[GitHub] [hudi] codecov-io commented on pull request #2758: [HUDI-1696] add apache commons-codec dependency to flink-bundle explicitly

2021-04-01 Thread GitBox


codecov-io commented on pull request #2758:
URL: https://github.com/apache/hudi/pull/2758#issuecomment-812325694


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2758?src=pr=h1) Report
   > Merging 
[#2758](https://codecov.io/gh/apache/hudi/pull/2758?src=pr=desc) (ea0d401) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/fe16d0de7c76105775c887b700751241bc82624c?el=desc)
 (fe16d0d) will **increase** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2758/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2758?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2758  +/-   ##
   
   + Coverage 52.04%   52.06%   +0.01% 
   - Complexity 3625 3626   +1 
   
 Files   479  477   -2 
 Lines 2280422646 -158 
 Branches   2415 2436  +21 
   
   - Hits  1186811790  -78 
   + Misses 9911 9805 -106 
   - Partials   1025 1051  +26 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.01% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.78% <ø> (-0.14%)` | `0.00 <ø> (ø)` | |
   | hudiflink | `56.71% <ø> (+0.69%)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `71.33% <ø> (+0.45%)` | `0.00 <ø> (ø)` | |
   | hudisync | `45.47% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.69% <ø> (-0.09%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2758?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...ache/hudi/sink/compact/CompactionPlanOperator.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvblBsYW5PcGVyYXRvci5qYXZh)
 | `58.00% <0.00%> (-21.07%)` | `8.00% <0.00%> (-1.00%)` | |
   | 
[.../main/scala/org/apache/hudi/HoodieSparkUtils.scala](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVNwYXJrVXRpbHMuc2NhbGE=)
 | `83.33% <0.00%> (-5.56%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...src/main/scala/org/apache/hudi/DefaultSource.scala](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0RlZmF1bHRTb3VyY2Uuc2NhbGE=)
 | `78.78% <0.00%> (-5.36%)` | `31.00% <0.00%> (+14.00%)` | :arrow_down: |
   | 
[...main/scala/org/apache/hudi/HoodieWriterUtils.scala](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVdyaXRlclV0aWxzLnNjYWxh)
 | `83.33% <0.00%> (-5.24%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...di/common/table/timeline/HoodieActiveTimeline.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUFjdGl2ZVRpbWVsaW5lLmphdmE=)
 | `66.81% <0.00%> (-3.97%)` | `43.00% <0.00%> (ø%)` | |
   | 
[...apache/hudi/sink/compact/CompactionCommitSink.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2NvbXBhY3QvQ29tcGFjdGlvbkNvbW1pdFNpbmsuamF2YQ==)
 | `66.66% <0.00%> (-3.55%)` | `10.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/common/table/HoodieTableConfig.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlQ29uZmlnLmphdmE=)
 | `43.20% <0.00%> (-2.25%)` | `17.00% <0.00%> (ø%)` | |
   | 
[...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh)
 | `66.66% <0.00%> (-1.65%)` | `43.00% <0.00%> (ø%)` | |
   | 
[...c/main/java/org/apache/hudi/common/fs/FSUtils.java](https://codecov.io/gh/apache/hudi/pull/2758/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0ZTVXRpbHMuamF2YQ==)
 | `47.34% <0.00%> (-0.94%)` | `57.00% <0.00%> (ø%)` | |
   | 

[jira] [Comment Edited] (HUDI-1696) artifactSet of maven-shade-plugin has not commons-codec

2021-04-01 Thread Harshit Mittal (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313551#comment-17313551
 ] 

Harshit Mittal edited comment on HUDI-1696 at 4/2/21, 4:07 AM:
---

I started looking into this and realized that the fix is merged into master as 
part of HUDI-1540 for spark-bundle. Should I go ahead and patch the 0.7.0 
release as well or leave it be?
CC: [~shivnarayan] [~vinoth]


was (Author: hmittal83):
I started looking into this and realized that the fix is merged into master for 
spark-bundle. Should I go ahead and patch the 0.7.0 release as well or leave it 
be?

> artifactSet of maven-shade-plugin has not commons-codec
> ---
>
> Key: HUDI-1696
> URL: https://issues.apache.org/jira/browse/HUDI-1696
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.7.0
> Environment: spark2.4.4
> scala2.11.8
> centos7
>Reporter: peng-xin
>Priority: Critical
>  Labels: pull-request-available, sev:high, user-support-issues
> Fix For: 0.7.0
>
> Attachments: image-2021-03-16-18-20-16-477.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> when i use hbase index,it cause some error like below
> !image-2021-03-16-18-20-16-477.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1696) artifactSet of maven-shade-plugin has not commons-codec

2021-04-01 Thread Harshit Mittal (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313551#comment-17313551
 ] 

Harshit Mittal commented on HUDI-1696:
--

I started looking into this and realized that the fix is merged into master for 
spark-bundle. Should I go ahead and patch the 0.7.0 release as well or leave it 
be?

> artifactSet of maven-shade-plugin has not commons-codec
> ---
>
> Key: HUDI-1696
> URL: https://issues.apache.org/jira/browse/HUDI-1696
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.7.0
> Environment: spark2.4.4
> scala2.11.8
> centos7
>Reporter: peng-xin
>Priority: Critical
>  Labels: pull-request-available, sev:high, user-support-issues
> Fix For: 0.7.0
>
> Attachments: image-2021-03-16-18-20-16-477.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> when i use hbase index,it cause some error like below
> !image-2021-03-16-18-20-16-477.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hmit opened a new pull request #2758: [HUDI-1696] add apache commons-codec dependency to flink-bundle explicitly

2021-04-01 Thread GitBox


hmit opened a new pull request #2758:
URL: https://github.com/apache/hudi/pull/2758


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   This causes the shaded code from apache commons codec to be explicitly 
pulled into the uber jar
   
   ## Brief change log
   
   ## Verify this pull request
   Verified the built uber jar to contain the libraries in the path, also 
verified that the release has them missing
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [] Has a corresponding JIRA in PR title & commit

- [] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1737) Extract common method in HoodieCreateHandle & FlinkCreateHandle

2021-04-01 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-1737:
---
Fix Version/s: 0.9.0

> Extract common method in HoodieCreateHandle & FlinkCreateHandle
> ---
>
> Key: HUDI-1737
> URL: https://issues.apache.org/jira/browse/HUDI-1737
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Roc Marshal
>Assignee: Roc Marshal
>Priority: Major
>  Labels: hudi-client, pull-request-available
> Fix For: 0.9.0
>
>
> {code:java}
> // HoodieCreateHandle.java
> //...
> @Override
> public List close() {
>   LOG.info("Closing the file " + writeStatus.getFileId() + " as we are done 
> with all the records " + recordsWritten);
>   try {
> fileWriter.close();
> HoodieWriteStat stat = new HoodieWriteStat();
> stat.setPartitionPath(writeStatus.getPartitionPath());
> stat.setNumWrites(recordsWritten);
> stat.setNumDeletes(recordsDeleted);
> stat.setNumInserts(insertRecordsWritten);
> stat.setPrevCommit(HoodieWriteStat.NULL_COMMIT);
> stat.setFileId(writeStatus.getFileId());
> stat.setPath(new Path(config.getBasePath()), path);
> long fileSizeInBytes = FSUtils.getFileSize(fs, path);
> stat.setTotalWriteBytes(fileSizeInBytes);
> stat.setFileSizeInBytes(fileSizeInBytes);
> stat.setTotalWriteErrors(writeStatus.getTotalErrorRecords());
> RuntimeStats runtimeStats = new RuntimeStats();
> runtimeStats.setTotalCreateTime(timer.endTimer());
> stat.setRuntimeStats(runtimeStats);
> writeStatus.setStat(stat);
> LOG.info(String.format("CreateHandle for partitionPath %s fileID %s, took 
> %d ms.", stat.getPartitionPath(),
> stat.getFileId(), runtimeStats.getTotalCreateTime()));
> return Collections.singletonList(writeStatus);
>   } catch (IOException e) {
> throw new HoodieInsertException("Failed to close the Insert Handle for 
> path " + path, e);
>   }
> }
> //FlinkCreateHandle.java
> private void setUpWriteStatus() throws IOException {
>   long fileSizeInBytes = fileWriter.getBytesWritten();
>   long incFileSizeInBytes = fileSizeInBytes - lastFileSize;
>   this.lastFileSize = fileSizeInBytes;
>   HoodieWriteStat stat = new HoodieWriteStat();
>   stat.setPartitionPath(writeStatus.getPartitionPath());
>   stat.setNumWrites(recordsWritten);
>   stat.setNumDeletes(recordsDeleted);
>   stat.setNumInserts(insertRecordsWritten);
>   stat.setPrevCommit(HoodieWriteStat.NULL_COMMIT);
>   stat.setFileId(writeStatus.getFileId());
>   stat.setPath(new Path(config.getBasePath()), path);
>   stat.setTotalWriteBytes(incFileSizeInBytes);
>   stat.setFileSizeInBytes(fileSizeInBytes);
>   stat.setTotalWriteErrors(writeStatus.getTotalErrorRecords());
>   HoodieWriteStat.RuntimeStats runtimeStats = new 
> HoodieWriteStat.RuntimeStats();
>   runtimeStats.setTotalCreateTime(timer.endTimer());
>   stat.setRuntimeStats(runtimeStats);
>   writeStatus.setStat(stat);
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1737) Extract common method in HoodieCreateHandle & FlinkCreateHandle

2021-04-01 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-1737.
--
Resolution: Done

94a5e72f16620da86bd2ca7ef27bb9abc266cc47

> Extract common method in HoodieCreateHandle & FlinkCreateHandle
> ---
>
> Key: HUDI-1737
> URL: https://issues.apache.org/jira/browse/HUDI-1737
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: Roc Marshal
>Assignee: Roc Marshal
>Priority: Major
>  Labels: hudi-client, pull-request-available
> Fix For: 0.9.0
>
>
> {code:java}
> // HoodieCreateHandle.java
> //...
> @Override
> public List close() {
>   LOG.info("Closing the file " + writeStatus.getFileId() + " as we are done 
> with all the records " + recordsWritten);
>   try {
> fileWriter.close();
> HoodieWriteStat stat = new HoodieWriteStat();
> stat.setPartitionPath(writeStatus.getPartitionPath());
> stat.setNumWrites(recordsWritten);
> stat.setNumDeletes(recordsDeleted);
> stat.setNumInserts(insertRecordsWritten);
> stat.setPrevCommit(HoodieWriteStat.NULL_COMMIT);
> stat.setFileId(writeStatus.getFileId());
> stat.setPath(new Path(config.getBasePath()), path);
> long fileSizeInBytes = FSUtils.getFileSize(fs, path);
> stat.setTotalWriteBytes(fileSizeInBytes);
> stat.setFileSizeInBytes(fileSizeInBytes);
> stat.setTotalWriteErrors(writeStatus.getTotalErrorRecords());
> RuntimeStats runtimeStats = new RuntimeStats();
> runtimeStats.setTotalCreateTime(timer.endTimer());
> stat.setRuntimeStats(runtimeStats);
> writeStatus.setStat(stat);
> LOG.info(String.format("CreateHandle for partitionPath %s fileID %s, took 
> %d ms.", stat.getPartitionPath(),
> stat.getFileId(), runtimeStats.getTotalCreateTime()));
> return Collections.singletonList(writeStatus);
>   } catch (IOException e) {
> throw new HoodieInsertException("Failed to close the Insert Handle for 
> path " + path, e);
>   }
> }
> //FlinkCreateHandle.java
> private void setUpWriteStatus() throws IOException {
>   long fileSizeInBytes = fileWriter.getBytesWritten();
>   long incFileSizeInBytes = fileSizeInBytes - lastFileSize;
>   this.lastFileSize = fileSizeInBytes;
>   HoodieWriteStat stat = new HoodieWriteStat();
>   stat.setPartitionPath(writeStatus.getPartitionPath());
>   stat.setNumWrites(recordsWritten);
>   stat.setNumDeletes(recordsDeleted);
>   stat.setNumInserts(insertRecordsWritten);
>   stat.setPrevCommit(HoodieWriteStat.NULL_COMMIT);
>   stat.setFileId(writeStatus.getFileId());
>   stat.setPath(new Path(config.getBasePath()), path);
>   stat.setTotalWriteBytes(incFileSizeInBytes);
>   stat.setFileSizeInBytes(fileSizeInBytes);
>   stat.setTotalWriteErrors(writeStatus.getTotalErrorRecords());
>   HoodieWriteStat.RuntimeStats runtimeStats = new 
> HoodieWriteStat.RuntimeStats();
>   runtimeStats.setTotalCreateTime(timer.endTimer());
>   stat.setRuntimeStats(runtimeStats);
>   writeStatus.setStat(stat);
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] yanghua merged pull request #2745: [HUDI-1737][hudi-client] Code Cleanup: Extract common method in HoodieCreateHandle & FlinkCreateHandle

2021-04-01 Thread GitBox


yanghua merged pull request #2745:
URL: https://github.com/apache/hudi/pull/2745


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated: [HUDI-1737][hudi-client] Code Cleanup: Extract common method in HoodieCreateHandle & FlinkCreateHandle (#2745)

2021-04-01 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 94a5e72  [HUDI-1737][hudi-client] Code Cleanup: Extract common method 
in HoodieCreateHandle & FlinkCreateHandle (#2745)
94a5e72 is described below

commit 94a5e72f16620da86bd2ca7ef27bb9abc266cc47
Author: Roc Marshal <64569824+rocmars...@users.noreply.github.com>
AuthorDate: Fri Apr 2 11:39:05 2021 +0800

[HUDI-1737][hudi-client] Code Cleanup: Extract common method in 
HoodieCreateHandle & FlinkCreateHandle (#2745)
---
 .../org/apache/hudi/io/HoodieCreateHandle.java | 56 ++
 .../java/org/apache/hudi/io/FlinkCreateHandle.java | 36 +-
 2 files changed, 48 insertions(+), 44 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java
index 357cf1b..6fa9b56 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java
@@ -179,29 +179,47 @@ public class HoodieCreateHandle extends
 
   fileWriter.close();
 
-  HoodieWriteStat stat = new HoodieWriteStat();
-  stat.setPartitionPath(writeStatus.getPartitionPath());
-  stat.setNumWrites(recordsWritten);
-  stat.setNumDeletes(recordsDeleted);
-  stat.setNumInserts(insertRecordsWritten);
-  stat.setPrevCommit(HoodieWriteStat.NULL_COMMIT);
-  stat.setFileId(writeStatus.getFileId());
-  stat.setPath(new Path(config.getBasePath()), path);
-  long fileSizeInBytes = FSUtils.getFileSize(fs, path);
-  stat.setTotalWriteBytes(fileSizeInBytes);
-  stat.setFileSizeInBytes(fileSizeInBytes);
-  stat.setTotalWriteErrors(writeStatus.getTotalErrorRecords());
-  RuntimeStats runtimeStats = new RuntimeStats();
-  runtimeStats.setTotalCreateTime(timer.endTimer());
-  stat.setRuntimeStats(runtimeStats);
-  writeStatus.setStat(stat);
-
-  LOG.info(String.format("CreateHandle for partitionPath %s fileID %s, 
took %d ms.", stat.getPartitionPath(),
-  stat.getFileId(), runtimeStats.getTotalCreateTime()));
+  setupWriteStatus();
+
+  LOG.info(String.format("CreateHandle for partitionPath %s fileID %s, 
took %d ms.",
+  writeStatus.getStat().getPartitionPath(), 
writeStatus.getStat().getFileId(),
+  writeStatus.getStat().getRuntimeStats().getTotalCreateTime()));
 
   return Collections.singletonList(writeStatus);
 } catch (IOException e) {
   throw new HoodieInsertException("Failed to close the Insert Handle for 
path " + path, e);
 }
   }
+
+  /**
+   * Set up the write status.
+   *
+   * @throws IOException if error occurs
+   */
+  protected void setupWriteStatus() throws IOException {
+HoodieWriteStat stat = new HoodieWriteStat();
+stat.setPartitionPath(writeStatus.getPartitionPath());
+stat.setNumWrites(recordsWritten);
+stat.setNumDeletes(recordsDeleted);
+stat.setNumInserts(insertRecordsWritten);
+stat.setPrevCommit(HoodieWriteStat.NULL_COMMIT);
+stat.setFileId(writeStatus.getFileId());
+stat.setPath(new Path(config.getBasePath()), path);
+stat.setTotalWriteBytes(computeTotalWriteBytes());
+stat.setFileSizeInBytes(computeFileSizeInBytes());
+stat.setTotalWriteErrors(writeStatus.getTotalErrorRecords());
+RuntimeStats runtimeStats = new RuntimeStats();
+runtimeStats.setTotalCreateTime(timer.endTimer());
+stat.setRuntimeStats(runtimeStats);
+writeStatus.setStat(stat);
+  }
+
+  protected long computeTotalWriteBytes() throws IOException {
+return FSUtils.getFileSize(fs, path);
+  }
+
+  protected long computeFileSizeInBytes() throws IOException {
+return FSUtils.getFileSize(fs, path);
+  }
+
 }
diff --git 
a/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/FlinkCreateHandle.java
 
b/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/FlinkCreateHandle.java
index 6f4638e..2abefa9 100644
--- 
a/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/FlinkCreateHandle.java
+++ 
b/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/FlinkCreateHandle.java
@@ -22,7 +22,6 @@ import org.apache.hudi.client.WriteStatus;
 import org.apache.hudi.common.engine.TaskContextSupplier;
 import org.apache.hudi.common.model.HoodieRecord;
 import org.apache.hudi.common.model.HoodieRecordPayload;
-import org.apache.hudi.common.model.HoodieWriteStat;
 import org.apache.hudi.common.util.HoodieTimer;
 import org.apache.hudi.common.util.collection.Pair;
 import org.apache.hudi.config.HoodieWriteConfig;
@@ -30,7 +29,6 @@ import org.apache.hudi.exception.HoodieInsertException;
 import org.apache.hudi.table.HoodieTable;
 
 

[GitHub] [hudi] RocMarshal commented on pull request #2745: [HUDI-1737][hudi-client] Code Cleanup: Extract common method in HoodieCreateHandle & FlinkCreateHandle

2021-04-01 Thread GitBox


RocMarshal commented on pull request #2745:
URL: https://github.com/apache/hudi/pull/2745#issuecomment-812296838


   @danny0405 Thank you so much.
   Now, ping @yanghua 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2757: [HUDI-1757] Assigns the buckets by record key for Flink writer

2021-04-01 Thread GitBox


codecov-io edited a comment on pull request #2757:
URL: https://github.com/apache/hudi/pull/2757#issuecomment-812247500


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=h1) Report
   > Merging 
[#2757](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=desc) (2c8fc00) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/9804662bc8e17d6936c20326f17ec7c0360dcaf6?el=desc)
 (9804662) will **decrease** coverage by `42.74%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2757/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2757   +/-   ##
   
   - Coverage 52.12%   9.38%   -42.75% 
   + Complexity 3646  48 -3598 
   
 Files   480  54  -426 
 Lines 228671993-20874 
 Branches   2417 236 -2181 
   
   - Hits  11920 187-11733 
   + Misses 99161793 -8123 
   + Partials   1031  13 -1018 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.38% <ø> (-60.36%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | 
[...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=)
 | `0.00% <0.00%> 

[GitHub] [hudi] liujinhui1994 edited a comment on pull request #2666: [HUDI-1160] Support update partial fields for CoW table

2021-04-01 Thread GitBox


liujinhui1994 edited a comment on pull request #2666:
URL: https://github.com/apache/hudi/pull/2666#issuecomment-812290646


   Your suggestion is very good, please let me think about it, and then reply 
to you
   @nsivabalan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on pull request #2666: [HUDI-1160] Support update partial fields for CoW table

2021-04-01 Thread GitBox


liujinhui1994 commented on pull request #2666:
URL: https://github.com/apache/hudi/pull/2666#issuecomment-812290646


   Your suggestion is very good, please let me think about it
   @nsivabalan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1752) Add HoodieFlinkClient InsertOverwrite

2021-04-01 Thread xurunbai (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xurunbai updated HUDI-1752:
---
Status: Closed  (was: Patch Available)

> Add HoodieFlinkClient InsertOverwrite
> -
>
> Key: HUDI-1752
> URL: https://issues.apache.org/jira/browse/HUDI-1752
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: CLI, Flink Integration
>Reporter: xurunbai
>Priority: Minor
>  Labels: features
> Fix For: 0.8.0
>
>
> Add HoodieFlinkClient InsertOverwrite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1752) Add HoodieFlinkClient InsertOverwrite

2021-04-01 Thread xurunbai (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xurunbai updated HUDI-1752:
---
Status: Patch Available  (was: In Progress)

> Add HoodieFlinkClient InsertOverwrite
> -
>
> Key: HUDI-1752
> URL: https://issues.apache.org/jira/browse/HUDI-1752
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: CLI, Flink Integration
>Reporter: xurunbai
>Priority: Minor
>  Labels: features
> Fix For: 0.8.0
>
>
> Add HoodieFlinkClient InsertOverwrite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1752) Add HoodieFlinkClient InsertOverwrite

2021-04-01 Thread xurunbai (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xurunbai updated HUDI-1752:
---
Status: In Progress  (was: Open)

> Add HoodieFlinkClient InsertOverwrite
> -
>
> Key: HUDI-1752
> URL: https://issues.apache.org/jira/browse/HUDI-1752
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: CLI, Flink Integration
>Reporter: xurunbai
>Priority: Minor
>  Labels: features
> Fix For: 0.8.0
>
>
> Add HoodieFlinkClient InsertOverwrite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] li36909 commented on pull request #2753: [HUDI-1750] Fail to load user's class if user move hudi-spark-bundle jar into spark classpath

2021-04-01 Thread GitBox


li36909 commented on pull request #2753:
URL: https://github.com/apache/hudi/pull/2753#issuecomment-812274605


   cc @nsivabalan could you help to take a look, thank you.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] li36909 commented on pull request #2751: [HUDI-1748] Read operation will possiblity fail on mor table rt view when a write operations is concurrency running

2021-04-01 Thread GitBox


li36909 commented on pull request #2751:
URL: https://github.com/apache/hudi/pull/2751#issuecomment-812271790


   @nsivabalan I run the test at hudi 0.7. yes, you are right, I start a 
spark-shell for upserting, and query the same table by spark datasouce api, 
then the problem arises. The cause of the problem is clear, during the query, 
hudi get partitions at MergeOnReadSnapshotRelation, and build a new fsview at 
HoodieRealtimeInputFormatUtils.groupLogsByBaseFile, when a write operation is 
happening,  HoodieRealtimeInputFormatUtils.groupLogsByBaseFile will find some 
new base files.
   we can reproduce this issue by add a Thread.sleep(6) at 
HoodieRealtimeInputFormatUtils.groupLogsByBaseFile, then run this test:
   step 1: write first batch into a hudi table:
   spark-shell
   
   import org.apache.hudi.QuickstartUtils.
   import scala.collection.JavaConversions.
   import org.apache.spark.sql.SaveMode.
   import org.apache.hudi.DataSourceReadOptions.
   import org.apache.hudi.DataSourceWriteOptions.
   import org.apache.hudi.config.HoodieWriteConfig.
   
   
   val tableName = "hudi_mor_table"
   val basePath = "hdfs://hacluster/tmp/hudi_mor_table"
   val dataGen = new DataGenerator
   val inserts = convertToStringList(dataGen.generateInserts(10))
   val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
   df.write.format("org.apache.hudi").
   option("hoodie.datasource.write.table.type", "MERGE_ON_READ").
   option("hoodie.datasource.write.operation", "bulk_insert").
   options(getQuickstartWriteConfigs).
   option(PRECOMBINE_FIELD_OPT_KEY, "ts").
   option(RECORDKEY_FIELD_OPT_KEY, "uuid").
   option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
   option("hoodie.datasource.write.hive_style_partitioning", "false").
   option(TABLE_NAME, tableName).
   option("hoodie.datasource.hive_sync.enable", "true").
   option("hoodie.datasource.hive_sync.use_jdbc", "false").
   option("hoodie.datasource.hive_sync.table", "hudi_mor_test").
   option("hoodie.datasource.hive_sync.partition_extractor_class", 
"org.apache.hudi.hive.MultiPartKeysValueExtractor").
   option("hoodie.datasource.hive_sync.partition_fields", 
"continent,country,city").
   option("hoodie.datasource.hive_sync.assume_date_partitioning", "false").
   option("hoodie.insert.shuffle.parallelism", "2").
   option("hoodie.upsert.shuffle.parallelism","2").
   option("hoodie.bulkinsert.shuffle.parallelism", "2").
   option("hoodie.delete.shuffle.parallelism","2").
   mode(Append).
   save(basePath);
   
   step 2: run a query at new spark-shell (when the query hang at Thread.sleep, 
start to write a new batch at step3)
   
spark.read.format("hudi").load("hdfs://hacluster/tmp/hudi_mor_table/*/*/*/*").count
   
   setp 3:  go to the spark-shell at step1, write a new batch:
   df.write.format("org.apache.hudi").
   option("hoodie.datasource.write.table.type", "MERGE_ON_READ").
   option("hoodie.datasource.write.operation", "bulk_insert").
   options(getQuickstartWriteConfigs).
   option(PRECOMBINE_FIELD_OPT_KEY, "ts").
   option(RECORDKEY_FIELD_OPT_KEY, "uuid").
   option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
   option("hoodie.datasource.write.hive_style_partitioning", "false").
   option(TABLE_NAME, tableName).
   option("hoodie.datasource.hive_sync.enable", "true").
   option("hoodie.datasource.hive_sync.use_jdbc", "false").
   option("hoodie.datasource.hive_sync.table", "hudi_mor_test").
   option("hoodie.datasource.hive_sync.partition_extractor_class", 
"org.apache.hudi.hive.MultiPartKeysValueExtractor").
   option("hoodie.datasource.hive_sync.partition_fields", 
"continent,country,city").
   option("hoodie.datasource.hive_sync.assume_date_partitioning", "false").
   option("hoodie.insert.shuffle.parallelism", "2").
   option("hoodie.upsert.shuffle.parallelism","2").
   option("hoodie.bulkinsert.shuffle.parallelism", "2").
   option("hoodie.delete.shuffle.parallelism","2").
   mode(Append).
   save(basePath);
   
   we can see the step2 will throw a exception


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] root18039532923 commented on issue #2623: org.apache.hudi.exception.HoodieDependentSystemUnavailableException:System HBASE unavailable.

2021-04-01 Thread GitBox


root18039532923 commented on issue #2623:
URL: https://github.com/apache/hudi/issues/2623#issuecomment-812271600


   how to operate this? @nsivabalan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar commented on issue #2588: [SUPPORT] Cannot create hive connection

2021-04-01 Thread GitBox


bvaradar commented on issue #2588:
URL: https://github.com/apache/hudi/issues/2588#issuecomment-812256475


   We use hive-2.3.4 and have more than 100 jobs running  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar closed issue #2644: Hudi cow table incremental data error

2021-04-01 Thread GitBox


bvaradar closed issue #2644:
URL: https://github.com/apache/hudi/issues/2644


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io commented on pull request #2757: [HUDI-1757] Assigns the buckets by record key for Flink writer

2021-04-01 Thread GitBox


codecov-io commented on pull request #2757:
URL: https://github.com/apache/hudi/pull/2757#issuecomment-812247500


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=h1) Report
   > Merging 
[#2757](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=desc) (b9e79e1) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/9804662bc8e17d6936c20326f17ec7c0360dcaf6?el=desc)
 (9804662) will **increase** coverage by `0.24%`.
   > The diff coverage is `96.77%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2757/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2757  +/-   ##
   
   + Coverage 52.12%   52.37%   +0.24% 
   - Complexity 3646 3676  +30 
   
 Files   480  480  
 Lines 2286723010 +143 
 Branches   2417 2455  +38 
   
   + Hits  1192012051 +131 
   - Misses 9916 9921   +5 
   - Partials   1031 1038   +7 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.01% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `∅ <ø> (∅)` | `0.00 <ø> (ø)` | |
   | hudicommon | `50.87% <ø> (+0.01%)` | `0.00 <ø> (ø)` | |
   | hudiflink | `56.96% <96.77%> (+0.25%)` | `0.00 <5.00> (ø)` | |
   | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `70.87% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisync | `45.47% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `64.36% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `70.81% <ø> (+1.08%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2757?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...in/java/org/apache/hudi/table/HoodieTableSink.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZVNpbmsuamF2YQ==)
 | `12.19% <0.00%> (ø)` | `2.00 <0.00> (ø)` | |
   | 
[...va/org/apache/hudi/configuration/FlinkOptions.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9jb25maWd1cmF0aW9uL0ZsaW5rT3B0aW9ucy5qYXZh)
 | `89.07% <100.00%> (+0.18%)` | `11.00 <0.00> (ø)` | |
   | 
[...he/hudi/sink/partitioner/BucketAssignFunction.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL0J1Y2tldEFzc2lnbkZ1bmN0aW9uLmphdmE=)
 | `87.00% <100.00%> (+0.40%)` | `24.00 <1.00> (ø)` | |
   | 
[...g/apache/hudi/sink/partitioner/BucketAssigner.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL0J1Y2tldEFzc2lnbmVyLmphdmE=)
 | `88.52% <100.00%> (+0.59%)` | `23.00 <2.00> (+2.00)` | |
   | 
[.../apache/hudi/sink/partitioner/BucketAssigners.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL0J1Y2tldEFzc2lnbmVycy5qYXZh)
 | `50.00% <100.00%> (ø)` | `2.00 <0.00> (ø)` | |
   | 
[...di/sink/partitioner/delta/DeltaBucketAssigner.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL3BhcnRpdGlvbmVyL2RlbHRhL0RlbHRhQnVja2V0QXNzaWduZXIuamF2YQ==)
 | `69.23% <100.00%> (+0.80%)` | `10.00 <2.00> (+1.00)` | |
   | 
[...c/main/java/org/apache/hudi/util/StreamerUtil.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS91dGlsL1N0cmVhbWVyVXRpbC5qYXZh)
 | `54.62% <100.00%> (+2.40%)` | `18.00 <0.00> (ø)` | |
   | 
[...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==)
 | `79.68% <0.00%> (+1.56%)` | `26.00% <0.00%> (ø%)` | |
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2757/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `76.45% <0.00%> (+5.07%)` | `82.00% <0.00%> (+27.00%)` | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to 

[GitHub] [hudi] danny0405 closed pull request #2757: [HUDI-1757] Assigns the buckets by record key for Flink writer

2021-04-01 Thread GitBox


danny0405 closed pull request #2757:
URL: https://github.com/apache/hudi/pull/2757


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar commented on issue #2633: Empty File Slice causing application to fail in small files optimization code

2021-04-01 Thread GitBox


bvaradar commented on issue #2633:
URL: https://github.com/apache/hudi/issues/2633#issuecomment-812212286


   @umehrot2 : Looking more closely on the code change to fix this. We need to 
do more changes.
   the base file check  needs to change  in 
https://github.com/apache/hudi/blob/e599764c2dcfbbc15d6554fa0df55b7375e4a31d/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L96
   
   We need to be careful otherwise in determining the current commit Time. I 
think as fix, it is safer to disregard files in pending compaction when 
selecting small files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar edited a comment on issue #2633: Empty File Slice causing application to fail in small files optimization code

2021-04-01 Thread GitBox


bvaradar edited a comment on issue #2633:
URL: https://github.com/apache/hudi/issues/2633#issuecomment-810709098


   @umehrot2 @n3nash  @nsivabalan : My apologies. Sorry for the delay, I 
finally got chance to look into this . 
   
   Yes, this will only manifest for case when index can support log files. I 
believe this is the problem:  We are using wrong API of FileSystemView here
   
   
https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85
   
   We don't include file groups that are in pending compaction but with Hbase 
Index we are including  them. With the current state of code, Including files 
in pending compaction is an issue.
   
   This API "getLatestFileSlicesBeforeOrOn" is originally intended to be used 
by CompactionAdminClient to figure out log files that were added after pending 
compaction and rename them such that we can undo the effects of compaction 
scheduling.  There is a different API "getLatestMergedFileSlicesBeforeOrOn" 
which gives a consolidated view of the latest file slice and includes all data 
both before and after compaction. This is what should be used in 
   
   
https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85
   
   The other workaround would be excluding file slices in pending compaction 
when we select small files to avoid the interaction between compactor and 
ingestion in this case.  But, I think we can go with the first option 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] satishkotha commented on issue #2707: [SUPPORT] insert_ovewrite_table failed on archiving

2021-04-01 Thread GitBox


satishkotha commented on issue #2707:
URL: https://github.com/apache/hudi/issues/2707#issuecomment-812096323


   @ssdong thanks for detailed update. Yes, this one seems little tricky. We'll 
need some more time to investigate. In the meantime, as a workaround, i think 
you can disable embedded timeline server by setting option 
'hoodie.embed.timeline.server'  to false.
   
I can look into why reset needs to be called during close and if we can 
refresh meta client for getting timeline.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated: [HUDI-1591] Implement Spark's FileIndex for Hudi to support queries via Hudi DataSource using non-globbed table path and partition pruning (#2651)

2021-04-01 Thread uditme
This is an automated email from the ASF dual-hosted git repository.

uditme pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 684622c  [HUDI-1591] Implement Spark's FileIndex for Hudi to support 
queries via Hudi DataSource using non-globbed table path and partition pruning 
(#2651)
684622c is described below

commit 684622c7c9fa6df9eb177b51cb1e7bd6dd16f78d
Author: pengzhiwei 
AuthorDate: Fri Apr 2 02:12:28 2021 +0800

[HUDI-1591] Implement Spark's FileIndex for Hudi to support queries via 
Hudi DataSource using non-globbed table path and partition pruning (#2651)
---
 .../apache/hudi/keygen/CustomAvroKeyGenerator.java |   6 +-
 .../org/apache/hudi/keygen/CustomKeyGenerator.java |   2 +-
 .../datasources/SparkParsePartitionUtil.scala  |  34 ++
 .../java/org/apache/hudi/common/fs/FSUtils.java|  10 +
 .../hudi/common/table/HoodieTableConfig.java   |  10 +
 .../hudi/common/table/HoodieTableMetaClient.java   |  13 +
 .../scala/org/apache/hudi/DataSourceOptions.scala  |   3 +
 .../main/scala/org/apache/hudi/DefaultSource.scala | 103 +++---
 .../org/apache/hudi/HoodieBootstrapRelation.scala  |  18 +-
 .../scala/org/apache/hudi/HoodieFileIndex.scala| 362 +
 .../org/apache/hudi/HoodieSparkSqlWriter.scala |   8 +-
 .../scala/org/apache/hudi/HoodieSparkUtils.scala   |  12 +-
 .../scala/org/apache/hudi/HoodieWriterUtils.scala  |  31 +-
 .../hudi/MergeOnReadIncrementalRelation.scala  |   3 +-
 .../apache/hudi/MergeOnReadSnapshotRelation.scala  |  70 ++--
 .../org/apache/hudi/TestHoodieFileIndex.scala  | 252 ++
 .../apache/hudi/functional/TestCOWDataSource.scala |  52 ++-
 .../functional/TestDataSourceForBootstrap.scala|  39 ++-
 .../apache/hudi/functional/TestMORDataSource.scala |  52 +++
 .../datasources/Spark2ParsePartitionUtil.scala |  33 ++
 .../datasources/Spark3ParsePartitionUtil.scala |  39 +++
 .../hudi/utilities/deltastreamer/DeltaSync.java|   6 +
 22 files changed, 1075 insertions(+), 83 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/CustomAvroKeyGenerator.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/CustomAvroKeyGenerator.java
index 724cabd..3b927c9 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/CustomAvroKeyGenerator.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/CustomAvroKeyGenerator.java
@@ -44,7 +44,7 @@ import java.util.stream.Collectors;
 public class CustomAvroKeyGenerator extends BaseKeyGenerator {
 
   private static final String DEFAULT_PARTITION_PATH_SEPARATOR = "/";
-  private static final String SPLIT_REGEX = ":";
+  public static final String SPLIT_REGEX = ":";
 
   /**
* Used as a part of config in CustomKeyGenerator.java.
@@ -117,8 +117,4 @@ public class CustomAvroKeyGenerator extends 
BaseKeyGenerator {
   public String getDefaultPartitionPathSeparator() {
 return DEFAULT_PARTITION_PATH_SEPARATOR;
   }
-
-  public String getSplitRegex() {
-return SPLIT_REGEX;
-  }
 }
diff --git 
a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/CustomKeyGenerator.java
 
b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/CustomKeyGenerator.java
index 77896d2..a2a3012 100644
--- 
a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/CustomKeyGenerator.java
+++ 
b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/CustomKeyGenerator.java
@@ -90,7 +90,7 @@ public class CustomKeyGenerator extends BuiltinKeyGenerator {
   return "";
 }
 for (String field : getPartitionPathFields()) {
-  String[] fieldWithType = 
field.split(customAvroKeyGenerator.getSplitRegex());
+  String[] fieldWithType = field.split(customAvroKeyGenerator.SPLIT_REGEX);
   if (fieldWithType.length != 2) {
 throw new HoodieKeyGeneratorException("Unable to find field names for 
partition path in proper format");
   }
diff --git 
a/hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/execution/datasources/SparkParsePartitionUtil.scala
 
b/hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/execution/datasources/SparkParsePartitionUtil.scala
new file mode 100644
index 000..fc2275b
--- /dev/null
+++ 
b/hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/execution/datasources/SparkParsePartitionUtil.scala
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ 

[GitHub] [hudi] umehrot2 merged pull request #2651: [HUDI-1591] [RFC-26] Improve Hoodie Table Query Performance And Ease Of Use Fo…

2021-04-01 Thread GitBox


umehrot2 merged pull request #2651:
URL: https://github.com/apache/hudi/pull/2651


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan edited a comment on pull request #2751: [HUDI-1748] Read operation will possiblity fail on mor table rt view when a write operations is concurrency running

2021-04-01 Thread GitBox


nsivabalan edited a comment on pull request #2751:
URL: https://github.com/apache/hudi/pull/2751#issuecomment-812037473


   @li36909 : can you clarify what do you mean by concurrent operations. 
Ignoring any table services, you mean to say that just a batch of write and rt 
snapshot view happening concurrently is running into issues? which hudi version 
are you running.   
   @n3nash : you may want to check it out. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #2751: [HUDI-1748] Read operation will possiblity fail on mor table rt view when a write operations is concurrency running

2021-04-01 Thread GitBox


nsivabalan commented on pull request #2751:
URL: https://github.com/apache/hudi/pull/2751#issuecomment-812037473


   @li36909 : can you clarify what do you mean by concurrent operations. Is it 
cleaning/archiving along with a batch of write or do you say two batches of 
write going concurrently ? 
   @n3nash : you may want to check it out. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan edited a comment on pull request #2334: [HUDI-1453] Throw Exception when input data schema is not equal to th…

2021-04-01 Thread GitBox


nsivabalan edited a comment on pull request #2334:
URL: https://github.com/apache/hudi/pull/2334#issuecomment-812021796


   oh, didn't realize default schema compatibility check is false. I assume if 
I enable schema compatibility check, double to int evolution is likely to fail. 
So, not sure what this PR tries to achieve. At least from my local test run, 
integer to double worked for me. 
   https://gist.github.com/nsivabalan/91f12109e0fe1ca9749ff5290c946778
   this is COW btw. haven't tested w/ MOR. 
   @pengzhiwei2018 : can you please clarify. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #2334: [HUDI-1453] Throw Exception when input data schema is not equal to th…

2021-04-01 Thread GitBox


nsivabalan commented on pull request #2334:
URL: https://github.com/apache/hudi/pull/2334#issuecomment-812021796


   oh, didn't realize default schema compatibility check is false. I assume if 
I enable schema compatibility check, double to int evolution is likely to fail. 
So, not sure what this PR tries to achieve. At least from my local test run, 
integer to double worked for me. 
   https://gist.github.com/nsivabalan/91f12109e0fe1ca9749ff5290c946778
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #2334: [HUDI-1453] Throw Exception when input data schema is not equal to th…

2021-04-01 Thread GitBox


nsivabalan commented on pull request #2334:
URL: https://github.com/apache/hudi/pull/2334#issuecomment-812019464


   just to clarify. some schema evolution just for types are working fine w/ 
hoodie. for eg: integer to double works fine. Problem is that, double to int is 
where the issue is. I am not sure why schema compatibility check does not fail 
this evolution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #2106: [HUDI-1284] preCombine all HoodieRecords and update all fields according to orderingVal

2021-04-01 Thread GitBox


nsivabalan commented on pull request #2106:
URL: https://github.com/apache/hudi/pull/2106#issuecomment-811939606


   @Karl-WangSK : we have another PR being reviewed right now for partial 
updates support. https://github.com/apache/hudi/pull/2666
   Please check it out. We don't need to parse schema for every record. You can 
get some inspiration from the linked PR. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1757) Assigns the buckets by record key for Flink writer

2021-04-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1757:
-
Labels: pull-request-available  (was: )

> Assigns the buckets by record key for Flink writer
> --
>
> Key: HUDI-1757
> URL: https://issues.apache.org/jira/browse/HUDI-1757
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
>
>  Currently we assign the buckets by record partition path which could cause 
> hotspot if the partition field is datetime type. Changes to assign buckets by 
> grouping the record whth their key first, the assignment is valid if only 
> there is no conflict(two task write to the same bucket).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1757) Assigns the buckets by record key for Flink writer

2021-04-01 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-1757:
-
Description:  Currently we assign the buckets by record partition path 
which could cause hotspot if the partition field is datetime type. Changes to 
assign buckets by grouping the record whth their key first, the assignment is 
valid if only there is no conflict(two task write to the same bucket).

> Assigns the buckets by record key for Flink writer
> --
>
> Key: HUDI-1757
> URL: https://issues.apache.org/jira/browse/HUDI-1757
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>
>  Currently we assign the buckets by record partition path which could cause 
> hotspot if the partition field is datetime type. Changes to assign buckets by 
> grouping the record whth their key first, the assignment is valid if only 
> there is no conflict(two task write to the same bucket).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] danny0405 opened a new pull request #2757: [HUDI-1757] Assigns the buckets by record key for Flink writer

2021-04-01 Thread GitBox


danny0405 opened a new pull request #2757:
URL: https://github.com/apache/hudi/pull/2757


   Currently we assign the buckets by record partition path which could
   cause hotspot if the partition field is datetime type. Changes to assign
   buckets by grouping the record whth their key first, the assignment is
   valid if only there is no conflict(two task write to the same bucket).
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-1757) Assigns the buckets by record key for Flink writer

2021-04-01 Thread Danny Chen (Jira)
Danny Chen created HUDI-1757:


 Summary: Assigns the buckets by record key for Flink writer
 Key: HUDI-1757
 URL: https://issues.apache.org/jira/browse/HUDI-1757
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1755) Assigns the buckets by record key for Flink writer

2021-04-01 Thread Danny Chen (Jira)
Danny Chen created HUDI-1755:


 Summary: Assigns the buckets by record key for Flink writer
 Key: HUDI-1755
 URL: https://issues.apache.org/jira/browse/HUDI-1755
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1756) Assigns the buckets by record key for Flink writer

2021-04-01 Thread Danny Chen (Jira)
Danny Chen created HUDI-1756:


 Summary: Assigns the buckets by record key for Flink writer
 Key: HUDI-1756
 URL: https://issues.apache.org/jira/browse/HUDI-1756
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1754) Assigns the buckets by record key for Flink writer

2021-04-01 Thread Danny Chen (Jira)
Danny Chen created HUDI-1754:


 Summary: Assigns the buckets by record key for Flink writer
 Key: HUDI-1754
 URL: https://issues.apache.org/jira/browse/HUDI-1754
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1753) Assigns the buckets by record key for Flink writer

2021-04-01 Thread Danny Chen (Jira)
Danny Chen created HUDI-1753:


 Summary: Assigns the buckets by record key for Flink writer
 Key: HUDI-1753
 URL: https://issues.apache.org/jira/browse/HUDI-1753
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
 Fix For: 0.9.0


Currently we assign the buckets by record partition path, which could cause 
hotspot if the partition field is datetime type. Actually we can decide the 
buckets by grouping records with their record keys first, the assign is valid 
only if there is no conflict (two task write to same buckets).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan commented on pull request #2091: HUDI-1283 Fill missing columns with default value when spark dataframe save to hudi table

2021-04-01 Thread GitBox


nsivabalan commented on pull request #2091:
URL: https://github.com/apache/hudi/pull/2091#issuecomment-811931337


   @ivorzhou : if your requirement is to fetch value from previous version of 
the record and inject into incoming record, we already have another PR open 
https://github.com/apache/hudi/pull/2666. Feel to clarify and close this PR out 
if the requirement is same. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #2666: [HUDI-1160] Support update partial fields for CoW table

2021-04-01 Thread GitBox


nsivabalan commented on a change in pull request #2666:
URL: https://github.com/apache/hudi/pull/2666#discussion_r605601097



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##
@@ -79,6 +80,7 @@
   public static final String TIMELINE_LAYOUT_VERSION = 
"hoodie.timeline.layout.version";
   public static final String BASE_PATH_PROP = "hoodie.base.path";
   public static final String AVRO_SCHEMA = "hoodie.avro.schema";
+  public static final String LAST_AVRO_SCHEMA = "hoodie.last.avro.schema";

Review comment:
   may be we can call this as "latest table schema". 

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
##
@@ -106,7 +110,7 @@
   public HoodieMergeHandle(HoodieWriteConfig config, String instantTime, 
HoodieTable hoodieTable,
Iterator> recordItr, String 
partitionPath, String fileId,
TaskContextSupplier taskContextSupplier) {
-super(config, instantTime, partitionPath, fileId, hoodieTable, 
taskContextSupplier);
+super(config, instantTime, partitionPath, fileId, hoodieTable, 
getWriterSchemaIncludingAndExcludingMetadataPair(config, hoodieTable), 
taskContextSupplier);

Review comment:
   Can you help me understand something. Here we have HoodieWriteConfig at 
two places. As a separate entity (1st arg in constructor) and 
HoodieTable.getConfig(). I see within 
getWriterSchemaIncludingAndExcludingMetadataPair(...), we update lastSchema in 
hoodieWriteConfig, which will update the 1st arg. But table.getConfig() may not 
be updated right. 
   If above statement is right, how do we rely on 
table.getConfig.getLastSchema() in *MergeHelper classes. 
   May be I am missing something. can you throw some light. 

##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/CommitUtils.java
##
@@ -59,14 +61,24 @@ public static HoodieCommitMetadata 
buildMetadata(List writeStat
Option> 
extraMetadata,
WriteOperationType 
operationType,
String 
schemaToStoreInCommit,
-   String commitActionType) {
+   String commitActionType,
+   Boolean updatePartialFields,
+   HoodieTableMetaClient 
metaClient) {
 
 HoodieCommitMetadata commitMetadata = buildMetadataFromStats(writeStats, 
partitionToReplaceFileIds, commitActionType, operationType);
 
 // add in extra metadata
 if (extraMetadata.isPresent()) {
   extraMetadata.get().forEach(commitMetadata::addMetadata);
 }
+if (updatePartialFields) {
+  try {

Review comment:
   if updatePartialFields is set, can't we rely on config.getLastSchema() 
in all places where this method is called similar to how we pass in 
config.getSchema(). Is it not guaranteed that config.lastSchema() will be set 
by the time we reach here if updatePartialFields is set to true?
   Trying to see if we can avoid parsing the schema from table once again since 
we have already done it once. 

##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/PartialUpdatePayload.java
##
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+import org.apache.hudi.common.util.Option;
+
+import java.io.IOException;
+import java.util.List;
+
+public class PartialUpdatePayload extends OverwriteWithLatestAvroPayload {

Review comment:
   java docs w/ example would be great. 

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
##
@@ -123,6 +127,22 @@ public HoodieMergeHandle(HoodieWriteConfig config, String 
instantTime, HoodieTab
 init(fileId, this.partitionPath, 

[GitHub] [hudi] liujinhui1994 commented on pull request #2666: [HUDI-1160] Support update partial fields for CoW table

2021-04-01 Thread GitBox


liujinhui1994 commented on pull request #2666:
URL: https://github.com/apache/hudi/pull/2666#issuecomment-811861142


   > @liujinhui1994 : Can you please update the description with an example.
   
   1、old data 
   id nameage
   1  liujinhui   26
   2  sam 25
   3  madeline30
   
   2、new data
   {"id": 1,"name": "abc"}
   
   3、 final result
   1  abc 26
   2  sam 25
   3  madeline30


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 removed a comment on pull request #2666: [HUDI-1160] Support update partial fields for CoW table

2021-04-01 Thread GitBox


liujinhui1994 removed a comment on pull request #2666:
URL: https://github.com/apache/hudi/pull/2666#issuecomment-811860789


   > @liujinhui1994 : Can you please update the description with an example.
   
   一、old data 
   id nameage
   1  liujinhui   26
   2  sam 25
   3  madeline30
   
   二、new data
   {"id": 1,"name": "abc"}
   
   三、 final result
   1  abc 26
   2  sam 25
   3  madeline30 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on pull request #2666: [HUDI-1160] Support update partial fields for CoW table

2021-04-01 Thread GitBox


liujinhui1994 commented on pull request #2666:
URL: https://github.com/apache/hudi/pull/2666#issuecomment-811860789


   > @liujinhui1994 : Can you please update the description with an example.
   
   一、old data 
   id nameage
   1  liujinhui   26
   2  sam 25
   3  madeline30
   
   二、new data
   {"id": 1,"name": "abc"}
   
   三、 final result
   1  abc 26
   2  sam 25
   3  madeline30 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-04-01 Thread GitBox


liujinhui1994 commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-811856655


   > Myself and Nishith discussed on this. Here is our proposal.
   > Let's rely on Deltastreamer.Config.checkpoint to pass in any type of 
checkpoint.
   > We can add another config called "checkpoint.type" which could default to 
string for all default checkpoints. For checkpoint of interest of this PR, we 
could set the value for this new config to "timestamp".
   > 
   > With this, its upto each source to parse and interpret the checkpoint 
value and DeltaSync does not need to deal w/ diff checkpointing formats.
   > 
   > Having said this, DeltaSync readFromSource() should not have any changes 
in this diff.
   > KafkaOffsetGen should have logic to parse diff checkpoint values, based on 
two values(deltastreamer.config.checkpoint and checkpoint.type).
   > 
   > With this, we also moved source specific checkpointing logic within source 
specific class and did not leak it to DeltaSync which should be agnostic to 
different Source.
   > 
   > @liujinhui1994 : Let me know what do you think. Happy to chat more on this.
   
   Great, I will modify this PR based on this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #2666: [HUDI-1160] Support update partial fields for CoW table

2021-04-01 Thread GitBox


nsivabalan commented on pull request #2666:
URL: https://github.com/apache/hudi/pull/2666#issuecomment-811856279


   @liujinhui1994 : Can you please update the description with an example. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-04-01 Thread GitBox


nsivabalan commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-81184


   Myself and Nishith discussed on this. Here is our proposal. 
   Let's rely on Deltastreamer.Config.checkpoint to pass in any type of 
checkpoint. 
   We can add another config called "checkpoint.type" which could default to 
string for all default checkpoints. For checkpoint of interest of this PR, we 
could set the value for this new config to "timestamp". 
   
   With this, its upto each source to parse and interpret the checkpoint value 
and DeltaSync does not need to deal w/ diff checkpointing formats. 
   
   Having said this, DeltaSync readFromSource() should not have any changes in 
this diff. 
   KafkaOffsetGen should have logic to parse diff checkpoint values, based on 
two values(deltastreamer.config.checkpoint and checkpoint.type). 
   
   With this, we also moved source specific checkpointing logic within source 
specific class and did not leak it to DeltaSync which should be agnostic to 
different Source. 
   
   @liujinhui1994 : Let me know what do you think. Happy to chat more on this. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xurunbai commented on pull request #2755: Add HoodieFlinkClient InsertOverwrite

2021-04-01 Thread GitBox


xurunbai commented on pull request #2755:
URL: https://github.com/apache/hudi/pull/2755#issuecomment-811770811


   > @xurunbai Thanks for your contribution. Can you file a jira ticket for 
this feature?
   
   OK


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xurunbai closed pull request #2755: Add HoodieFlinkClient InsertOverwrite

2021-04-01 Thread GitBox


xurunbai closed pull request #2755:
URL: https://github.com/apache/hudi/pull/2755


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-1752) Add HoodieFlinkClient InsertOverwrite

2021-04-01 Thread xurunbai (Jira)
xurunbai created HUDI-1752:
--

 Summary: Add HoodieFlinkClient InsertOverwrite
 Key: HUDI-1752
 URL: https://issues.apache.org/jira/browse/HUDI-1752
 Project: Apache Hudi
  Issue Type: New Feature
  Components: CLI, Flink Integration
Reporter: xurunbai
 Fix For: 0.8.0


Add HoodieFlinkClient InsertOverwrite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] aditiwari01 opened a new issue #2756: OrderingVal not being honoured for payloads in log files (for MOR table)

2021-04-01 Thread GitBox


aditiwari01 opened a new issue #2756:
URL: https://github.com/apache/hudi/issues/2756


   While creating HoodieRecordPayloads from log files in case of MOR tables, 
the payloads are created without any orderingVal (even if specified while 
writing data). Due to this the precombine function could result in any payload 
irrespective of its orderingVal.
   
   Attaching a sample script to reproduce the issue.
   
   In this example, for key "key1", 1st insert is with ts=1000. Then we update 
with ts=2000. Thenn we updated with ts=500. Ideally after last update if we 
snnapshot query the table, we must get key1 with ts=2000 (since our ordering 
field is ts). However it shows entry of ts=1000 because from logs it ignores 
ts=2000 and only picks up ts=500.
   
   Also AFAIU, the same flow will be used while compaction and then we might 
lose data forever.
   
   
[Hudi_sample_commands.txt](https://github.com/apache/hudi/files/6242206/Hudi_sample_commands.txt)
   
   A workaround is to implement a HoodiePayloadClass and hardcode ordering 
field in the costructor only. But we can not keep it parameterised since 
properties are not available in the constructor.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua commented on pull request #2755: Add HoodieFlinkClient InsertOverwrite

2021-04-01 Thread GitBox


yanghua commented on pull request #2755:
URL: https://github.com/apache/hudi/pull/2755#issuecomment-811710440


   @xurunbai Thanks for your contribution. Can you file a jira ticket for this 
feature?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (HUDI-1738) Emit deletes for Flink MOR table streaming read

2021-04-01 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-1738.
--
Resolution: Implemented

9804662bc8e17d6936c20326f17ec7c0360dcaf6

> Emit deletes for Flink MOR table streaming read
> ---
>
> Key: HUDI-1738
> URL: https://issues.apache.org/jira/browse/HUDI-1738
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Current we did a soft delete for DELETE row data when writes into hoodie 
> table. For streaming read of MOR table, the Flink reader detects the delete 
> records and still emit them if the record key semantics are still kept.
> This is useful and actually a must for streaming ETL pipeline incremental 
> computation. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated: [HUDI-1738] Emit deletes for flink MOR table streaming read (#2742)

2021-04-01 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 9804662  [HUDI-1738] Emit deletes for flink MOR table streaming read 
(#2742)
9804662 is described below

commit 9804662bc8e17d6936c20326f17ec7c0360dcaf6
Author: Danny Chan 
AuthorDate: Thu Apr 1 15:25:31 2021 +0800

[HUDI-1738] Emit deletes for flink MOR table streaming read (#2742)

Current we did a soft delete for DELETE row data when writes into hoodie
table. For streaming read of MOR table, the Flink reader detects the
delete records and still emit them if the record key semantics are still
kept.

This is useful and actually a must for streaming ETL pipeline
incremental computation.
---
 .../java/org/apache/hudi/keygen/KeyGenUtils.java   |  27 +
 .../apache/hudi/client/HoodieFlinkWriteClient.java |   4 +-
 .../table/timeline/HoodieActiveTimeline.java   |  23 +++-
 .../org/apache/hudi/sink/StreamWriteFunction.java  |  14 +--
 .../hudi/sink/StreamWriteOperatorCoordinator.java  |  73 +++--
 .../apache/hudi/sink/compact/CompactFunction.java  |  14 +--
 .../hudi/sink/compact/CompactionCommitSink.java|  14 +--
 .../hudi/sink/compact/CompactionPlanOperator.java  |  45 +---
 .../hudi/sink/event/BatchWriteSuccessEvent.java|   5 +-
 .../org/apache/hudi/table/HoodieTableSource.java   |  27 +++--
 .../table/format/mor/MergeOnReadInputFormat.java   | 119 +++--
 .../table/format/mor/MergeOnReadTableState.java|  33 +-
 .../apache/hudi/util/StringToRowDataConverter.java | 107 ++
 .../sink/TestStreamWriteOperatorCoordinator.java   |   6 +-
 .../apache/hudi/source/TestStreamReadOperator.java |  18 ++--
 .../apache/hudi/table/HoodieDataSourceITCase.java  |  35 ++
 .../apache/hudi/table/format/TestInputFormat.java  |  25 +
 .../test/java/org/apache/hudi/utils/TestData.java  |   8 +-
 .../hudi/utils/TestStringToRowDataConverter.java   | 107 ++
 .../utils/factory/CollectSinkTableFactory.java |  11 +-
 20 files changed, 557 insertions(+), 158 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java
index 1f59bab..4773ef1 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java
@@ -31,6 +31,7 @@ import java.io.IOException;
 import java.io.UnsupportedEncodingException;
 import java.net.URLEncoder;
 import java.nio.charset.StandardCharsets;
+import java.util.Arrays;
 import java.util.List;
 
 public class KeyGenUtils {
@@ -41,6 +42,32 @@ public class KeyGenUtils {
   protected static final String DEFAULT_PARTITION_PATH = "default";
   protected static final String DEFAULT_PARTITION_PATH_SEPARATOR = "/";
 
+  /**
+   * Extracts the record key fields in strings out of the given record key,
+   * this is the reverse operation of {@link #getRecordKey(GenericRecord, 
String)}.
+   *
+   * @see SimpleAvroKeyGenerator
+   * @see org.apache.hudi.keygen.ComplexAvroKeyGenerator
+   */
+  public static String[] extractRecordKeys(String recordKey) {
+String[] fieldKV = recordKey.split(",");
+if (fieldKV.length == 1) {
+  return fieldKV;
+} else {
+  // a complex key
+  return Arrays.stream(fieldKV).map(kv -> {
+final String[] kvArray = kv.split(":");
+if (kvArray[1].equals(NULL_RECORDKEY_PLACEHOLDER)) {
+  return null;
+} else if (kvArray[1].equals(EMPTY_RECORDKEY_PLACEHOLDER)) {
+  return "";
+} else {
+  return kvArray[1];
+}
+  }).toArray(String[]::new);
+}
+  }
+
   public static String getRecordKey(GenericRecord record, List 
recordKeyFields) {
 boolean keyIsNullEmpty = true;
 StringBuilder recordKey = new StringBuilder();
diff --git 
a/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java
 
b/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java
index de4d8ca..6a6bced 100644
--- 
a/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java
+++ 
b/hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java
@@ -429,8 +429,8 @@ public class HoodieFlinkWriteClient extends
 HoodieFlinkTable table = getHoodieTable();
 String commitType = 
CommitUtils.getCommitActionType(HoodieTableType.valueOf(tableType));
 HoodieActiveTimeline activeTimeline = 
table.getMetaClient().getActiveTimeline();
-activeTimeline.deletePending(HoodieInstant.State.INFLIGHT, commitType, 
instant);
-

[GitHub] [hudi] yanghua merged pull request #2742: [HUDI-1738] Emit deletes for flink MOR table streaming read

2021-04-01 Thread GitBox


yanghua merged pull request #2742:
URL: https://github.com/apache/hudi/pull/2742


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE

2021-04-01 Thread GitBox


xiarixiaoyao commented on pull request #2722:
URL: https://github.com/apache/hudi/pull/2722#issuecomment-811705289


   thanks @garyli1019 .   ok, i will try to add unit test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua commented on pull request #2747: [HUDI-1743] Added support for SqlFileBasedTransformer

2021-04-01 Thread GitBox


yanghua commented on pull request #2747:
URL: https://github.com/apache/hudi/pull/2747#issuecomment-811679455


   > @yanghua - I've fixed the build, can you please merge this code?
   
   Thanks, IMO, can you add a unit test for the feature.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xurunbai commented on pull request #2755: Add HoodieFlinkClient InsertOverwrite

2021-04-01 Thread GitBox


xurunbai commented on pull request #2755:
URL: https://github.com/apache/hudi/pull/2755#issuecomment-811678698


   Add HoodieFlinkClient InsertOverwrite


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xurunbai opened a new pull request #2755: Add HoodieFlinkClient InsertOverwrite

2021-04-01 Thread GitBox


xurunbai opened a new pull request #2755:
URL: https://github.com/apache/hudi/pull/2755


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org